This is what the TensorFlow docs suggest:
Input shape: 4+D tensor with shape: batch_shape + (channels, rows, cols) if data_format=’channels_first’ or 4+D tensor with shape: batch_shape + (rows, cols, channels) if data_format=’channels_last’.
In other words, say that you have a (WidthxHeight=50×60) pixels RGB image. 50 width means that there are 50 columns, and 60 height that there are 60 rows. RGB means that your image has 3 image channels. The correct
input_shape would then be
(60, 50, 3) in channels first strategy, and
(3, 60, 50) in channels last strategy.
By default, TensorFlow/Keras use a channels last setting.