How to find the value for Keras input_shape/input_dim?

How to find the value for Keras input_shape/input_dim?

Developing a machine learning model with today’s tools is much easier than it was years ago. Keras is one of the deep learning frameworks that can be used for developing deep learning models – and it’s actually my lingua franca for doing so.

One of the aspects of building a deep learning model is specifying the shape of your input data, so that the model knows how to process it. In today’s blog, we’ll look at precisely this tiny part of building a machine learning model with Keras. We’ll answer these questions in particular:

  • What is the “shape” of any data?
  • What are the input_shape and input_dim properties in Keras?
  • Given an arbitrary dataset, how can we find the shape of the dataset as a whole?
  • How can we convert the shape we identified into sample size, so that Keras understands it?
  • How does all this come together – i.e., can we build an example of a Keras model that shows how it’s done?

Are you ready?

Let’s go! πŸ™‚



The first layer in your Keras model: specifying input shape or input dim

Here’s a very simple neural network:

It has three layers. In yellow, you see the input layer. This layer is like the entry point to the layers which process the information – it often simply takes the data that you serve the network, feeding it to the hidden layers, in blue. These layers are primarily responsible for processing towards the expected end result (which could be a correct classification, for example). Then, there is the output layer, which – depending on the problem such as regression or classification – is simply one parameter or a few of them. Depending on how you configure this layer, the output can be e.g. a probability distribution over the classes that are present in your classification problem.

Now, let’s go back and take a look at the input layer. Understand that we have a neural network – which is eager to process data – and a dataset. The dataset contains samples, and often thousands or even hundreds of thousands of them. Each sample is fed to the network in sequential order. When all of them are fed, we say that one epoch was completed – or, in plainer English, one iteration.

There is an obvious connection between the input layer and each individual sample. They must be of the same shape. If you imagine a scenario where a kid has to put a square block into one of three possible holes: a square hole, a circular hole or a rectangular hole. Now, you’ll immediately see what action the kid has to take: match the shape of the hole with the shape of the object.

The same is true for input datasets. Each sample must match the shape of the input layer for the connection to be established. If both shapes aren’t equal, the network cannot process the data that you’re trying to feed it.

With this understanding, let’s now take a look at the rank and the shape of Tensors (or arrays) in more detail, before we continue with how Keras input layers expect to receive information about such shapes by means of the input_shape and input_dim properties.

The rank and shape of a Tensor (or Array, if you wish)

Say that we have this Array:

[[1, 2, 3], [4, 5, 6]]

Which, if fed to a framework that runs on top of TensorFlow, is converted into Tensor format – which is TensorFlow’s representation for numeric data (TensorFlow, n.d.)

Now, we can distinguish between rank and shape (TensorFlow, n.d.). The distinction is simple:

  • The rank of a Tensor represents the number of dimensions for your Tensor.
  • The shape of a Tensor represents the number of samples within each dimension.

Tensors can be multidimensional. That is, they are representations in “some” mathematical space. Just like we can position ourselves at some (x, y, z) position in 3D space and compare our position with someone else’s, Tensors are representations in some space. From this, and TensorFlow (n.d.), it follows that:

  • A rank-0 Tensor is a scalar value; a number, that has magnitude, but no direction.
  • A rank-1 Tensor is a vector; it has magnitude and direction;
  • A rank-2 Tensor is a matrix; it is a table of numbers;
  • A rank-3 Tensor is a cube of numbers.

From the image above, what follows with respect to shape:

Let's pause for a second! πŸ‘©β€πŸ’»

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to MachineCurve's free Machine Learning update today! You will learn new things and better understand concepts you already know.

We send emails at least every Friday. Welcome!

By signing up, you consent that any information you receive can include services and special offers by email.
  • There’s no shape for the rank-0 Tensor, because it has no dimensions. The shape would hence be an empty array, or [].
  • The rank-1 Tensor has a shape of [3].
  • The rank-2 Tensor has a shape of [3, 6]: three rows, six columns.
  • The rank-3 Tensor has a shape of [2, 2, 2]: each axis has so many elements.

Keras input layers: the input_shape and input_dim properties

Now that we know about the rank and shape of Tensors, and how they are related to neural networks, we can go back to Keras. More specifically, let’s take a look at how we can connect the shape of your dataset to the input layer through the input_shape and input_dim properties.

Let’s begin with input_shape:

model = Sequential()
model.add(Dense(4, input_shape=(10,))

Here, the input layer would expect a one-dimensional array with 10 elements for input. It would produce 4 outputs in return.

Input shape

It’s actually really simple. The input shape parameter simply tells the input layer what the shape of one sample looks like (Keras, n.d.). Adding it to your input layer, will ensure that a match is made.

Input dim

Sometimes, though, you just have one dimension – which is the case with one-dimensional / flattened arrays, for example. In this case, you can also simply use input_dim: specifying the number of elements within that first dimension only. For example:

model = Sequential()
model.add(Dense(32, input_dim=784))

This would make the input layer expect a one-dimensional array of 784 elements as each individual sample. It would produce 32 outputs. This is the kind of information bottleneck that we often want to see!


Using Numpy to find the shape of your dataset

Now, suppose that I’m loading an example dataset – such as the MNIST dataset from the Keras Datasets.

That would be something like this:

from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Now, how can we find the shape of the dataset?

Very simple – we can use the Numpy package used for numbers processing!

Let’s add this import to the top:

import numpy as np

And then we add this to the bottom:

training_set_shape = x_train.shape
print(training_set_shape)

Yielding this as a whole:

import numpy as np
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
training_set_shape = x_train.shape
print(training_set_shape)

Let’s now run it and see what happens.

$ python datasettest.py
2020-04-05 19:22:27.146991: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
(60000, 28, 28)

Et voila: a shape of (60000, 28, 28). From this, we can derive that we have 60.000 samples – of 28 x 28 pixels. As the number of image channels is not present, we can assume that it’s 1 – and that the images thus must be grayscale. There we go!

Never miss new Machine Learning articles βœ…

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to MachineCurve's free Machine Learning update today! You will learn new things and better understand concepts you already know.

We send emails at least every Friday. Welcome!

By signing up, you consent that any information you receive can include services and special offers by email.

Altering the shape to sample level

Unfortunately, we’re not there yet. We cannot use this shape as our input_shape. This latter has to be the input shape of one sample, remember? Not the shape of the dataset as a whole.

Now, from the (60000, 28, 28), which elements contribute to our knowledge about the shape at sample level?

Indeed, the 28 and 28 – while the 60.000 is not of interest (after all, at sample level, this would be 1).

Now, with images, we would often use Convolutional Neural Networks. In those models, we use Conv layers, which expect the input_shape in a very specific way. Specifically, they expect it as follows: (x_shape, y_shape, channels). We already have x_shape and y_shape, which are both 28. We don’t have channels yet, but do know about its value: 1. By consequence, our value for input_shape will be (28, 28, 1)!

However, we can also automate this, for the case when we want to use a different image dataset. We simply add the following:

number_of_channels = 1
sample_shape = (training_set_shape[1], training_set_shape[2], number_of_channels)

We could even expand on our prints:

print(f'Dataset shape: {training_set_shape}')
print(f'Sample shape: {sample_shape}')

Indeed, it would yield the same output:

$ python datasettest.py
2020-04-05 19:28:28.235295: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
Dataset shape: (60000, 28, 28)
Sample shape: (28, 28, 1)

A Keras example

Now that we know about Tensor shapes, their importance for neural network input layers, and how to derive the sample shape for a dataset, let’s now see if we can expand this to a real Keras model.

For this, we’ll be analyzing the simple two-dimensional ConvNet that we created in a different blog post.

Here is the code – you can find the analysis below it:

from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam

# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 32, 32, 3
loss_function = sparse_categorical_crossentropy
no_classes = 10
no_epochs = 25
optimizer = Adam()
validation_split = 0.2
verbosity = 1

# Load CIFAR-10 data
(input_train, target_train), (input_test, target_test) = cifar10.load_data()

# Determine shape of the data
input_shape = (img_width, img_height, img_num_channels)

# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')

# Scale data
input_train = input_train / 255
input_test = input_test / 255

# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))

# Display a model summary
model.summary()

# Compile the model
model.compile(loss=loss_function,
              optimizer=optimizer,
              metrics=['accuracy'])

# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=batch_size,
            epochs=no_epochs,
            verbose=verbosity,
            validation_split=validation_split)

# Generate generalization metrics
score = model.evaluate(input_test, target_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')

Specifically, we can observe:

  • That the img_width and img_height are 32. This is correct, as we’re now using a different dataset – see cifar10.load_data() – where the images are 32 x 32 pixels.
  • The value for img_num_channels was set to 3. This is also correct, because the CIFAR10 dataset contains RGB images – which have three image channels. So no 1 anymore – and our final sample shape will be (32, 32, 3).
  • We subsequently set the comuted input_shape as the input_shape of our first Conv2D layer – specifying the input layer implicitly (which is just how it’s done with Keras).

There we go – we can now actually determine the input shape for our data and use it to create Keras models! 😎


Summary

In this blog post, we’ve looked at the Keras input_shape and input_dim properties. We did so in quite a chained way, by first looking at the link between neural network input layers and the shape of your dataset – and specifically, the shape at sample level.

Additionally, we looked at the concepts of rank and shape in order to understand the foundations of a layer’s input shape / dim in the first place.

Then, we looked at how the Keras framework for deep learning implements specifying the input shape / dimension by means of the beforementioned properties. This included looking at how to determine the input shape for your dataset at dataset level, converting it into sample level shape, and subsequently using it in an actual Keras model. We provided a simple example by means of a ConvNet implementation. I really hope that it helps you build your Keras models – as I know that it’s often these simple steps that get you stuck!

Join hundreds of other learners! 😎

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to MachineCurve's free Machine Learning update today! You will learn new things and better understand concepts you already know.

We send emails at least every Friday. Welcome!

By signing up, you consent that any information you receive can include services and special offers by email.

If you have any questions, remarks, or other comments – feel free to drop a message in the comments section below! 😎 I’ll happily answer and help you build your Keras model. Thank you for reading MachineCurve today and happy engineering!

πŸš€ Boost your ML knowledge with MachineCurve Continue your Keras journey πŸ‘©β€πŸ’» Learn about supervised learning with the Keras Deep Learning framework, including tutorials on ConvNets, autoencoders, activation functions, optimizers... and a lot more! Python examples are included. Enjoy our 100+ free Keras tutorials

References

TensorFlow tensors. (n.d.). TensorFlow. https://www.tensorflow.org/guide/tensor#top_of_page

Keras. (n.d.). Guide to the sequential modelhttps://keras.io/getting-started/sequential-model-guide/

Do you want to start learning ML from a developer perspective? πŸ‘©β€πŸ’»

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to learn new things and better understand concepts you already know. We send emails every Friday.

By signing up, you consent that any information you receive can include services and special offers by email.

14 thoughts on “How to find the value for Keras input_shape/input_dim?

  1. Young Jin Kim

    In a Keras model, [1] is adapted as below,

    model=tensorflow.keras.Sequential([Dense(units=1, input_shape = [1])])
    Could you tell me what does [1] means in this case?

    1. Chris

      Hi Young Jin Kim,
      That would be a one-dimensional array with the first dimension having a length of 1.
      Best,
      Chris

  2. Denis

    Hi Chris,

    Thanks for your explanation.
    What I don’t get is why there’s an empty space when using input_shape then?
    e.g. input_shape(10, )
    Since it’s not used, why does it exist for?

    Thanks in advance.

    1. Chris

      Hi Denis,

      While the space itself is not necessary (input_shape = (10,) would work too), I think your question is in fact why you need that comma when defining your input_shape.
      Input shape specifies the dimensionality of your feature space. For example, if you input three distinct features (different variables / columns), you have a three-dimensional feature space. Traditionally, dimensionality has been provided by means of tuples.
      Python however ignores the parentheses around a one-number tuple (a.k.a. one-dimensional feature space, as in the scenario you have a question about), as you can find here: https://note.nkmk.me/en/python-tuple-single-empty/
      That’s why you need to specify it as (10,) instead of (10), or you’ll get an error.
      If you have one-dimensional input shape, by the way, you could also write input_dim = 10.

      Hope this helps.

      Best,
      Chris

  3. Cam

    Hi Chris,

    This post was very helpful. Thank you!

    I still am a bit confused, syntactically, about the one-dim arrays. For instance, let’s say I have a one-dim array as follows:

    x = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)

    To me, that’s a one-dim array with 6 elements.

    input_shape=[1] works just fine, as per your first answer above. But,

    input_shape=(6,) gives me an error in Keras.

    Doesn’t the second argument specify a one-dim array with 6 elements?

    Thanks!

    1. Chris

      Hi Cam,

      Thanks for your comment!

      The input shape specifies “the shape of one sample”.

      If your array is x = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float), you specify one array with six elements. What is therefore the “shape of one sample”? It’s [1]: it has one dimension and one item per dimension (it’s a scalar value).

      We can see this here, if we construct a corresponding y array and specify the shape:


      import tensorflow
      from tensorflow.keras.models import Sequential
      from tensorflow.keras.layers import Dense
      import numpy as np

      # Model configuration
      batch_size = 250
      no_epochs = 1
      verbosity = 1

      # Array of 6 values
      x = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0])
      y = np.array([0, 1, 0, 1, 0, 1])

      # Create the model
      model = Sequential()
      model.add(Dense(16, activation='relu', input_shape=[1]))
      model.add(Dense(8, activation='relu'))
      model.add(Dense(1, activation='sigmoid'))

      # Compile the model
      model.compile(loss=tensorflow.keras.losses.binary_crossentropy,
      optimizer=tensorflow.keras.optimizers.Adam(),
      metrics=['accuracy'])

      # Fit data to model
      model.fit(x, y,
      batch_size=batch_size,
      epochs=no_epochs,
      verbose=verbosity)

      It trains successfully:

      6/6 [==============================] - 1s 166ms/sample - loss: 0.7686 - accuracy: 0.3333

      If we had a six-dimensional array with (in this case) two samples instead, our input shape (“the shape of one sample”) would be (6,), because there are six values per dimension:


      import tensorflow
      from tensorflow.keras.models import Sequential
      from tensorflow.keras.layers import Dense
      import numpy as np

      # Model configuration
      batch_size = 250
      no_epochs = 1
      verbosity = 1

      # Array of 2 values
      x = np.array([
      [-1.0, 0.0, 1.0, 2.0, 3.0, 4.0],
      [-1.0, 0.0, 1.0, 2.0, 3.0, 4.0]
      ])
      y = np.array([0, 1])

      # Create the model
      model = Sequential()
      model.add(Dense(16, activation='relu', input_shape=(6,)))
      model.add(Dense(8, activation='relu'))
      model.add(Dense(1, activation='sigmoid'))

      # Compile the model
      model.compile(loss=tensorflow.keras.losses.binary_crossentropy,
      optimizer=tensorflow.keras.optimizers.Adam(),
      metrics=['accuracy'])

      # Fit data to model
      model.fit(x, y,
      batch_size=batch_size,
      epochs=no_epochs,
      verbose=verbosity)

      It also successfully trains:

      2/2 [==============================] - 1s 304ms/sample - loss: 1.0497 - accuracy: 0.5000

      Especially by comparing the implementations for x in the two examples, and ask yourself all the time “what is the shape of ONE sample”, you can find the answer to your input shape configuration πŸ™‚

      I hope this helps! If not, please let me know.

      Best,
      Chris

  4. Mark

    Hi Chris,
    How to get a Keras model’s input values for every epoch?
    Thanks for every reply.

    1. Chris

      Hi Mark,

      Thanks for your comment. I’m not entirely sure what you mean here. Could you perhaps elaborate a bit?

      Thanks, best,
      Chris

      1. Mark

        Chris sure;
        For example:

        space = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
        expec = np.array([[0, 0], [1, 0], [0, 0], [0, 1]])
        model = Sequential([
        Dense(2, input_dim=2),
        Dense(2),
        Dense(1)
        ])

        model.fit(space, expec, batch_size=1, epochs=1, callbacks=[Callbacks()])

        I want to know input values to the first layer (in this example, name: ‘dense’ in Keras back) at every epoch. As you know, all input values are sent sequentially to the first layer. And sometimes we need to know these values for each epoch.

        I couldn’t find the answer in the all huge internet. Or i’m missing something.

        1. Mark

          ADDING: I mean i want to know which ‘space’ value is processing at every epoch.

          get_input_at(index) method only gives Tensor features
          get_output_at() method is too.
          get_weights() gives only dense layers outputs

          But i need values like [1 and 0s] as elements of ‘space’.

          1. Chris

            Hi Mark,

            If I understand you correctly, you want to figure out which value from the space array is fed forward through the network at every epoch.
            An epoch is one iteration of the cycle of feeding forward – computing loss – improving the model, and an epoch itself is composed of batches: in your case, batches of length = 1, meaning that each individual sample is fed forward through the network.
            However, unless you configure steps_per_epoch in model.fit, which limits the amount of batches fed forward, all batches (and thus all samples) will be fed through the network once per epoch.
            In other words, each iteration will see all the samples just once unless configured otherwise.

            Now, having answered that, let’s move towards the second part of your question – printing the inputs to your Keras model on screen when every epoch starts. Unfortunately, this seems not to be built in to Keras at this point in time:

            At the time it seems to be impossible to actually access the data within the symbolic tensor. It also seems unlikely that such functionality will be added in the future since in the Tensorflow page it says.

            (Source: https://stackoverflow.com/questions/44587813/accessing-input-layer-data-in-tensorflow-keras)

            However, Keras does support creating your own layers (https://keras.io/guides/making_new_layers_and_models_via_subclassing/), so what we could do in theory is replicate the Dense layer and add print functionality for printing the inputs:


            import tensorflow
            from tensorflow.keras.models import Sequential
            from tensorflow.keras.layers import Dense
            from tensorflow.keras.losses import binary_crossentropy
            from tensorflow.keras.optimizers import Adam
            from tensorflow.keras.callbacks import LambdaCallback
            import numpy as np

            # Model configuration
            batch_size = 1
            no_epochs = 10
            verbosity = 2

            # Array of 6 values
            x = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0])
            y = np.array([0, 1, 0, 1, 0, 1])

            # DenseAndPrint Layer
            # Source before adaptation: https://keras.io/guides/making_new_layers_and_models_via_subclassing/
            class DenseAndPrint(tensorflow.keras.layers.Layer):
            def __init__(self, units=32, input_dim=32, activation='linear'):
            super(DenseAndPrint, self).__init__()
            w_init = tensorflow.random_normal_initializer()
            self.w = tensorflow.Variable(
            initial_value=w_init(shape=(input_dim, units), dtype="float32"),
            trainable=True,
            )
            b_init = tensorflow.zeros_initializer()
            self.b = tensorflow.Variable(
            initial_value=b_init(shape=(units,), dtype="float32"), trainable=True
            )
            self.activation = activation

            def call(self, inputs):
            tensorflow.print(inputs)
            mult = tensorflow.matmul(inputs, self.w) + self.b
            if self.activation == 'relu':
            return tensorflow.math.maximum(mult, 0)
            else:
            return mult

            # Create the model
            model = Sequential()
            model.add(DenseAndPrint(16, activation='relu', input_dim=1))
            model.add(Dense(8, activation='relu'))
            model.add(Dense(1, activation='sigmoid'))

            # Compile the model
            model.compile(loss=binary_crossentropy,
            optimizer=Adam(),
            metrics=['accuracy'])

            # Fit data to model
            model.fit(x, y,
            batch_size=batch_size,
            epochs=no_epochs,
            verbose=verbosity)

            The DenseAndPrint layer performs the same linear operation (inputs * weights + bias) and if configured also applies ReLU activation max(0, output). With batch_size = 1, this is what is output on my screen when running:


            Epoch 10/10
            [[3]]
            [[-1]]
            [[4]]
            [[0]]
            [[1]]
            [[2]]
            6/6 - 0s - loss: 0.6836 - accuracy: 0.6667

            Clearly, we see the batches of data moving forward through the model during training. You can also write some extra code which allows you to write those batches to CSV file (https://realpython.com/python-csv/) or JSON (https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/), but that’s up to you πŸ˜‰

            Hope this helps.

            Best,
            Chris

          2. Chris

            P.S. Forgive the weird indents, this happens when it’s a comment-to-a-comment etc. I pasted the code here so that you don’t need to indent yourself: https://pastebin.com/UbuRzWAK

  5. Mark

    Dear Chris, thank you very much for your research and response. It was really useful.

    Also today I reviewed your website in detail and decided that you are a very good developer. I am sure that you will achieve serious success. I will recommend your website in many places.

    Best regards.

    1. Chris

      Hi Mark,

      Thanks a lot for your comment and your compliment!! Greatly appreciated 😊

      Good luck with your ML projects, best,
      Chris

Leave a Reply

Your email address will not be published. Required fields are marked *