Last Updated on 28 April 2020

Developing a machine learning model with today’s tools is much easier than it was years ago. Keras is one of the deep learning frameworks that can be used for developing deep learning models – and it’s actually my lingua franca for doing so.

One of the aspects of building a deep learning model is specifying the shape of your input data, so that the model knows how to process it. In today’s blog, we’ll look at precisely this tiny part of building a machine learning model with Keras. We’ll answer these questions in particular:

- What is the “shape” of any data?
- What are the
`input_shape`

and`input_dim`

properties in Keras? - Given an arbitrary dataset, how can we find the shape of the dataset as a whole?
- How can we convert the shape we identified into sample size, so that Keras understands it?
- How does all this come together – i.e., can we build an example of a Keras model that shows how it’s done?

Are you ready?

Let’s go! 🙂

## Table of contents

## The first layer in your Keras model: specifying input shape or input dim

Here’s a very simple neural network:

It has three layers. In yellow, you see the input layer. This layer is like the entry point to the layers which process the information – it often simply takes the data that you serve the network, feeding it to the hidden layers, in blue. These layers are primarily responsible for processing towards the expected end result (which could be a correct classification, for example). Then, there is the output layer, which – depending on the problem such as regression or classification – is simply one parameter or a few of them. Depending on how you configure this layer, the output can be e.g. a probability distribution over the classes that are present in your classification problem.

Now, let’s go back and take a look at the input layer. Understand that we have a neural network – which is eager to process data – and a dataset. The dataset contains samples, and often thousands or even hundreds of thousands of them. Each sample is fed to the network in sequential order. When all of them are fed, we say that *one epoch* was completed – or, in plainer English, one iteration.

There is an obvious connection between the input layer and each individual sample. They must be of the same shape. If you imagine a scenario where a kid has to put a square block into one of three possible holes: a square hole, a circular hole or a rectangular hole. Now, you’ll immediately see what action the kid has to take: match the shape of the hole with the shape of the object.

The same is true for input datasets. Each sample must match the shape of the input layer for the connection to be established. If both shapes aren’t equal, the network cannot process the data that you’re trying to feed it.

With this understanding, let’s now take a look at the *rank* and the *shape* of Tensors (or arrays) in more detail, before we continue with how Keras input layers expect to receive information about such shapes by means of the `input_shape`

and `input_dim`

properties.

### The rank and shape of a Tensor (or Array, if you wish)

Say that we have this Array:

`[[1, 2, 3], [4, 5, 6]]`

Code language: JSON / JSON with Comments (json)

Which, if fed to a framework that runs on top of TensorFlow, is converted into Tensor format – which is TensorFlow’s representation for numeric data (TensorFlow, n.d.)

Now, we can distinguish between *rank* and *shape* (TensorFlow, n.d.). The distinction is simple:

- The
**rank**of a Tensor represents the*number of dimensions*for your Tensor. - The
**shape**of a Tensor represents the*number of samples within each dimension*.

Tensors can be multidimensional. That is, they are representations in “some” mathematical space. Just like we can position ourselves at some (x, y, z) position in 3D space and compare our position with someone else’s, Tensors are representations in some space. From this, and TensorFlow (n.d.), it follows that:

## Let's pause for a second! 👩💻

**free**Machine Learning update today! You will learn

**new things**and

**better understand**concepts you already know.

We send emails at least every Friday. Welcome!

- A
**rank-0****Tensor**is a scalar value; a number, that has magnitude, but no direction. - A
**rank-1 Tensor**is a vector; it has magnitude*and*direction; - A
**rank-2 Tensor**is a matrix; it is a table of numbers; - A
**rank-3 Tensor**is a cube of numbers.

From the image above, what follows with respect to shape:

- There’s no shape for the rank-0 Tensor, because it has no dimensions. The shape would hence be an empty array, or
`[]`

. - The rank-1 Tensor has a shape of
`[3]`

. - The rank-2 Tensor has a shape of
`[3, 6]`

: three rows, six columns. - The rank-3 Tensor has a shape of
`[2, 2, 2]`

: each axis has so many elements.

### Keras input layers: the `input_shape`

and `input_dim`

properties

Now that we know about the rank and shape of Tensors, and how they are related to neural networks, we can go back to Keras. More specifically, let’s take a look at how we can connect the *shape of your dataset* to the input layer through the `input_shape`

and `input_dim`

properties.

Let’s begin with `input_shape`

:

`model = Sequential() model.add(Dense(4, input_shape=(10,))`

Here, the input layer would expect a one-dimensional array with 10 elements for input. It would produce 4 outputs in return.

#### Input shape

It’s actually really simple. The input shape parameter simply tells the input layer **what the shape of one sample looks like** (Keras, n.d.). Adding it to your input layer, will ensure that a match is made.

#### Input dim

Sometimes, though, you just have one dimension – which is the case with one-dimensional / flattened arrays, for example. In this case, you can also simply use `input_dim`

: specifying the number of elements within that first dimension only. For example:

`model = Sequential() model.add(Dense(32, input_dim=784))`

This would make the input layer expect a one-dimensional array of 784 elements as each individual sample. It would produce 32 outputs. This is the kind of information bottleneck that we often want to see!

## Using Numpy to find the shape of your dataset

Now, suppose that I’m loading an example dataset – such as the MNIST dataset from the Keras Datasets.

That would be something like this:

```
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
```

Code language: JavaScript (javascript)

Now, how can we find the *shape* of the dataset?

Very simple – we can use the Numpy package used for numbers processing!

Let’s add this import to the top:

`import numpy as np`

Code language: JavaScript (javascript)

And then we add this to the bottom:

```
training_set_shape = x_train.shape
print(training_set_shape)
```

Code language: PHP (php)

Yielding this as a whole:

```
import numpy as np
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
training_set_shape = x_train.shape
print(training_set_shape)
```

Code language: JavaScript (javascript)

Let’s now run it and see what happens.

## Never miss new Machine Learning articles ✅

**free**Machine Learning update today! You will learn

**new things**and

**better understand**concepts you already know.

We send emails at least every Friday. Welcome!

```
$ python datasettest.py
2020-04-05 19:22:27.146991: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
(60000, 28, 28)
```

Code language: JavaScript (javascript)

Et voila: a shape of `(60000, 28, 28)`

. From this, we can derive that we have 60.000 samples – of 28 x 28 pixels. As the number of image channels is not present, we can assume that it’s 1 – and that the images thus must be grayscale. There we go!

## Altering the shape to sample level

Unfortunately, we’re not there yet. We cannot use this shape as our `input_shape`

. This latter has to be the input shape of *one sample*, remember? Not the shape of the dataset as a whole.

Now, from the `(60000, 28, 28)`

, which elements contribute to our knowledge about the shape at sample level?

Indeed, the 28 and 28 – while the 60.000 is not of interest (after all, at sample level, this would be 1).

Now, with images, we would often use Convolutional Neural Networks. In those models, we use Conv layers, which expect the `input_shape`

in a very specific way. Specifically, they expect it as follows: `(x_shape, y_shape, channels)`

. We already have `x_shape`

and `y_shape`

, which are both 28. We don’t have `channels`

yet, but do know about its value: 1. By consequence, our value for `input_shape`

will be `(28, 28, 1)`

!

However, we can also automate this, for the case when we want to use a different image dataset. We simply add the following:

`number_of_channels = 1 sample_shape = (training_set_shape[1], training_set_shape[2], number_of_channels)`

We could even expand on our prints:

```
print(f'Dataset shape: {training_set_shape}')
print(f'Sample shape: {sample_shape}')
```

Code language: PHP (php)

Indeed, it would yield the same output:

```
$ python datasettest.py
2020-04-05 19:28:28.235295: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
Dataset shape: (60000, 28, 28)
Sample shape: (28, 28, 1)
```

Code language: JavaScript (javascript)

## A Keras example

Now that we know about Tensor shapes, their importance for neural network input layers, and how to derive the sample shape for a dataset, let’s now see if we can expand this to a real Keras model.

For this, we’ll be analyzing the simple two-dimensional ConvNet that we created in a different blog post.

Here is the code – you can find the analysis below it:

```
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 32, 32, 3
loss_function = sparse_categorical_crossentropy
no_classes = 10
no_epochs = 25
optimizer = Adam()
validation_split = 0.2
verbosity = 1
# Load CIFAR-10 data
(input_train, target_train), (input_test, target_test) = cifar10.load_data()
# Determine shape of the data
input_shape = (img_width, img_height, img_num_channels)
# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')
# Scale data
input_train = input_train / 255
input_test = input_test / 255
# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))
# Display a model summary
model.summary()
# Compile the model
model.compile(loss=loss_function,
optimizer=optimizer,
metrics=['accuracy'])
# Fit data to model
history = model.fit(input_train, target_train,
batch_size=batch_size,
epochs=no_epochs,
verbose=verbosity,
validation_split=validation_split)
# Generate generalization metrics
score = model.evaluate(input_test, target_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')
```

Code language: PHP (php)

Specifically, we can observe:

- That the
`img_width`

and`img_height`

are 32. This is correct, as we’re now using a different dataset – see`cifar10.load_data()`

– where the images are 32 x 32 pixels. - The value for
`img_num_channels`

was set to 3. This is also correct, because the CIFAR10 dataset contains RGB images – which have three image channels. So no 1 anymore – and our final sample shape will be`(32, 32, 3)`

. - We subsequently set the comuted
`input_shape`

as the`input_shape`

of our first Conv2D layer – specifying the input layer implicitly (which is just how it’s done with Keras).

There we go – we can now actually determine the input shape for our data and use it to create Keras models! 😎

## Summary

In this blog post, we’ve looked at the Keras `input_shape`

and `input_dim`

properties. We did so in quite a chained way, by first looking at the link between neural network input layers and the shape of your dataset – and specifically, the shape at sample level.

Additionally, we looked at the concepts of rank and shape in order to understand the foundations of a layer’s input shape / dim in the first place.

## Join hundreds of other learners! 😎

**free**Machine Learning update today! You will learn

**new things**and

**better understand**concepts you already know.

We send emails at least every Friday. Welcome!

Then, we looked at how the Keras framework for deep learning implements specifying the input shape / dimension by means of the beforementioned properties. This included looking at how to determine the input shape for your dataset at dataset level, converting it into sample level shape, and subsequently using it in an actual Keras model. We provided a simple example by means of a ConvNet implementation. I really hope that it helps you build your Keras models – as I know that it’s often these simple steps that get you stuck!

If you have any questions, remarks, or other comments – feel free to drop a message in the comments section below! 😎 I’ll happily answer and help you build your Keras model. Thank you for reading MachineCurve today and happy engineering!

**Keras**journey 👩💻 Learn about

**supervised learning**with the Keras Deep Learning framework, including tutorials on

**ConvNets**,

**autoencoders**,

**activation functions**,

**optimizers**... and a lot more! Python examples are included. Enjoy our 100+

**free**Keras tutorials

## References

*TensorFlow tensors*. (n.d.). TensorFlow. https://www.tensorflow.org/guide/tensor#top_of_page

Keras. (n.d.). *Guide to the sequential model*. https://keras.io/getting-started/sequential-model-guide/

In a Keras model, [1] is adapted as below,

model=tensorflow.keras.Sequential([Dense(units=1, input_shape = [1])])

Could you tell me what does [1] means in this case?

Hi Young Jin Kim,

That would be a one-dimensional array with the first dimension having a length of 1.

Best,

Chris

Hi Chris,

Thanks for your explanation.

What I don’t get is why there’s an empty space when using input_shape then?

e.g. input_shape(10, )

Since it’s not used, why does it exist for?

Thanks in advance.

Hi Denis,

While the space itself is not necessary (

`input_shape = (10,)`

would work too), I think your question is in fact why you need that comma when defining your`input_shape`

.Input shape specifies the dimensionality of your feature space. For example, if you input three distinct features (different variables / columns), you have a three-dimensional feature space. Traditionally, dimensionality has been provided by means of tuples.

Python however ignores the parentheses around a one-number tuple (a.k.a. one-dimensional feature space, as in the scenario you have a question about), as you can find here: https://note.nkmk.me/en/python-tuple-single-empty/

That’s why you need to specify it as

`(10,)`

instead of`(10)`

, or you’ll get an error.If you have one-dimensional input shape, by the way, you could also write

`input_dim = 10`

.Hope this helps.

Best,

Chris

Hi Chris,

This post was very helpful. Thank you!

I still am a bit confused, syntactically, about the one-dim arrays. For instance, let’s say I have a one-dim array as follows:

x = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)

To me, that’s a one-dim array with 6 elements.

input_shape=[1] works just fine, as per your first answer above. But,

input_shape=(6,) gives me an error in Keras.

Doesn’t the second argument specify a one-dim array with 6 elements?

Thanks!

Hi Cam,

Thanks for your comment!

The input shape specifies “the shape of one sample”.

If your array is

`x = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)`

, you specify one array with six elements. What is therefore the “shape of one sample”? It’s`[1]`

: it has one dimension and one item per dimension (it’s a scalar value).We can see this here, if we construct a corresponding

`y`

array and specify the shape:import tensorflow

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

import numpy as np

`# Model configuration`

batch_size = 250

no_epochs = 1

verbosity = 1

`# Array of 6 values`

x = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0])

y = np.array([0, 1, 0, 1, 0, 1])

`# Create the model`

model = Sequential()

model.add(Dense(16, activation='relu', input_shape=[1]))

model.add(Dense(8, activation='relu'))

model.add(Dense(1, activation='sigmoid'))

`# Compile the model`

model.compile(loss=tensorflow.keras.losses.binary_crossentropy,

optimizer=tensorflow.keras.optimizers.Adam(),

metrics=['accuracy'])

`# Fit data to model`

model.fit(x, y,

batch_size=batch_size,

epochs=no_epochs,

verbose=verbosity)

It trains successfully:

`6/6 [==============================] - 1s 166ms/sample - loss: 0.7686 - accuracy: 0.3333`

If we had a six-dimensional array with (in this case) two samples instead, our input shape (“the shape of one sample”) would be

`(6,)`

, because there are six values per dimension:import tensorflow

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

import numpy as np

`# Model configuration`

batch_size = 250

no_epochs = 1

verbosity = 1

`# Array of 2 values`

x = np.array([

[-1.0, 0.0, 1.0, 2.0, 3.0, 4.0],

[-1.0, 0.0, 1.0, 2.0, 3.0, 4.0]

])

y = np.array([0, 1])

`# Create the model`

model = Sequential()

model.add(Dense(16, activation='relu', input_shape=(6,)))

model.add(Dense(8, activation='relu'))

model.add(Dense(1, activation='sigmoid'))

`# Compile the model`

model.compile(loss=tensorflow.keras.losses.binary_crossentropy,

optimizer=tensorflow.keras.optimizers.Adam(),

metrics=['accuracy'])

`# Fit data to model`

model.fit(x, y,

batch_size=batch_size,

epochs=no_epochs,

verbose=verbosity)

It also successfully trains:

`2/2 [==============================] - 1s 304ms/sample - loss: 1.0497 - accuracy: 0.5000`

Especially by comparing the implementations for

`x`

in the two examples, and ask yourself all the time “what is the shape of ONE sample”, you can find the answer to your input shape configuration 🙂I hope this helps! If not, please let me know.

Best,

Chris

Hi Chris,

How to get a Keras model’s input values for every epoch?

Thanks for every reply.

Hi Mark,

Thanks for your comment. I’m not entirely sure what you mean here. Could you perhaps elaborate a bit?

Thanks, best,

Chris

Chris sure;

For example:

…

space = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

expec = np.array([[0, 0], [1, 0], [0, 0], [0, 1]])

model = Sequential([

Dense(2, input_dim=2),

Dense(2),

Dense(1)

])

…

model.fit(space, expec, batch_size=1, epochs=1, callbacks=[Callbacks()])

I want to know input values to the first layer (in this example, name: ‘dense’ in Keras back) at every epoch. As you know, all input values are sent sequentially to the first layer. And sometimes we need to know these values for each epoch.

I couldn’t find the answer in the all huge internet. Or i’m missing something.

ADDING: I mean i want to know which ‘space’ value is processing at every epoch.

get_input_at(index) method only gives Tensor features

get_output_at() method is too.

get_weights() gives only dense layers outputs

But i need values like [1 and 0s] as elements of ‘space’.

Hi Mark,

If I understand you correctly, you want to figure out which value from the

`space`

array is fed forward through the network at every epoch.An epoch is one iteration of the cycle of feeding forward – computing loss – improving the model, and an epoch itself is composed of batches: in your case, batches of length = 1, meaning that each individual sample is fed forward through the network.

However, unless you configure

`steps_per_epoch`

in`model.fit`

, which limits the amount of batches fed forward, all batches (and thus all samples) will be fed through the network once per epoch.In other words, each iteration will see all the samples just once unless configured otherwise.

Now, having answered that, let’s move towards the second part of your question – printing the inputs to your Keras model on screen when every epoch starts. Unfortunately, this seems not to be built in to Keras at this point in time:

(Source: https://stackoverflow.com/questions/44587813/accessing-input-layer-data-in-tensorflow-keras)

However, Keras does support creating your own layers (https://keras.io/guides/making_new_layers_and_models_via_subclassing/), so what we could do in theory is replicate the Dense layer and add print functionality for printing the inputs:

import tensorflow

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense

from tensorflow.keras.losses import binary_crossentropy

from tensorflow.keras.optimizers import Adam

from tensorflow.keras.callbacks import LambdaCallback

import numpy as np

`# Model configuration`

batch_size = 1

no_epochs = 10

verbosity = 2

`# Array of 6 values`

x = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0])

y = np.array([0, 1, 0, 1, 0, 1])

`# DenseAndPrint Layer`

# Source before adaptation: https://keras.io/guides/making_new_layers_and_models_via_subclassing/

class DenseAndPrint(tensorflow.keras.layers.Layer):

def __init__(self, units=32, input_dim=32, activation='linear'):

super(DenseAndPrint, self).__init__()

w_init = tensorflow.random_normal_initializer()

self.w = tensorflow.Variable(

initial_value=w_init(shape=(input_dim, units), dtype="float32"),

trainable=True,

)

b_init = tensorflow.zeros_initializer()

self.b = tensorflow.Variable(

initial_value=b_init(shape=(units,), dtype="float32"), trainable=True

)

self.activation = activation

`def call(self, inputs):`

tensorflow.print(inputs)

mult = tensorflow.matmul(inputs, self.w) + self.b

if self.activation == 'relu':

return tensorflow.math.maximum(mult, 0)

else:

return mult

`# Create the model`

model = Sequential()

model.add(DenseAndPrint(16, activation='relu', input_dim=1))

model.add(Dense(8, activation='relu'))

model.add(Dense(1, activation='sigmoid'))

`# Compile the model`

model.compile(loss=binary_crossentropy,

optimizer=Adam(),

metrics=['accuracy'])

`# Fit data to model`

model.fit(x, y,

batch_size=batch_size,

epochs=no_epochs,

verbose=verbosity)

The

`DenseAndPrint`

layer performs the same linear operation (`inputs * weights + bias`

) and if configured also applies ReLU activation`max(0, output)`

. With`batch_size = 1`

, this is what is output on my screen when running:Epoch 10/10

[[3]]

[[-1]]

[[4]]

[[0]]

[[1]]

[[2]]

6/6 - 0s - loss: 0.6836 - accuracy: 0.6667

Clearly, we see the batches of data moving forward through the model during training. You can also write some extra code which allows you to write those batches to CSV file (https://realpython.com/python-csv/) or JSON (https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/), but that’s up to you 😉

Hope this helps.

Best,

Chris

P.S. Forgive the weird indents, this happens when it’s a comment-to-a-comment etc. I pasted the code here so that you don’t need to indent yourself: https://pastebin.com/UbuRzWAK

Dear Chris, thank you very much for your research and response. It was really useful.

Also today I reviewed your website in detail and decided that you are a very good developer. I am sure that you will achieve serious success. I will recommend your website in many places.

Best regards.

Hi Mark,

Thanks a lot for your comment and your compliment!! Greatly appreciated 😊

Good luck with your ML projects, best,

Chris

For keras use print(model._build_input_shape)

Hi Sha,

Thank you for the addition!

Best,

Chris