How to use Conv2D with Keras?

One of the most widely used layers within the Keras framework for deep learning is the Conv2D layer. However, especially for beginners, it can be difficult to understand what the layer is and what it does.

For this reason, we'll explore this layer in today's blog post. What is the Conv2D layer? How is it related to Convolutional Neural Networks? What does the "2D" mean - two dimensions? And how to actually implement it?

Those are questions that we'll answer today. Firstly, we'll take a look at ConvNets in general - discussing what they are and how they play an important role in today's deep learning world. Secondly, we move on to the Keras framework, and study how they are represented by means of Conv2D layers. Thirdly, we'll implement an actual model, guiding you through the code step by step. Finally, we'll run the model, and discuss our results.

Are you ready? Let's go!

Some theory about Conv2D: about convolutional neural networks

In my opinion, it's important to dive a bit into concepts first before we discuss code, as there's no point in giving you code examples if you don't understand why things are as they are.

Now, let's take a look at some theory related to the Keras Conv2D layer. here, we'll discuss three things:

What is a neural network? Very briefly, we'll take a look at what neural networks are - and why today's ones are different than the ones from the past, and other machine learning models.
What are convolutional neural networks? There are many such models. Convolutional ones, have really thrived over the past few years. What are they? And how are they related to Conv2D layers? We'll take a look.
What is the impact of ConvNets / how are they used in practice? ConvNets have especially boosted the popularity of deep learning because of their interesting applications, especially in computer vision. We'll explore a few of these.

Okay, let's begin.

What is a neural network?

Since hundreds of years, philosophers and scientists have been interested in the human brain. The brain, you must know, is extremely efficient in what it does - allowing humans to think with unprecedented complexity and sheer flexibility.

Some of those philosophers and scientists have also been interested in finding out how to make an artificial brain - that is, use a machine to mimic the functionality of the brain.

Obviously, this was not possible until the era of computing. That is, only since the 1950s, when computers emerged as a direct consequence of the Second World War, scientists could actually build artificially intelligent systems.

Initially, researchers like Frank Rosenblatt attempted to mimic the neural structure of the brain. As you likely know since you likely have some background in neural networks, or at least know what they are, our brains consist of individual neurons and "highways" - synapses - in between them.

Neurons fire based on inputs, and by consequence trigger synapses to become stronger over time - allowing entire patterns of information processing to be shaped within the brain, giving humans the ability to think and act in very complex ways.

Whereas the so-called "Rosenblatt Perceptron" (click the link above if you want to know more) was just one artificial neuron, today's neural networks are complex networks of many neurons, like this:

A complex neural network. These and even more complex neural nets provide different layers of possibly non-linear functionality, and may thus be used in deep learning.

What you see above is what is known as "fully connected neurons". Each neuron is connected to a neuron in the next layer, except for the input and output layer. Growing complexity means that the number of connections grows. This isn't good news, as this means that (1) the time required to train the network increases significantly and (2) the network is more prone to "overfitting". Are there more efficient ways, perhaps?

What are convolutional neural networks?

There are!

And convolutional neural networks, or ConvNets for short, are one of them. They are primarily used for computer vision tasks - although they have emerged in the areas of text processing as well. Not spoiling too much - we'll show some examples in the next section - let's now take a look at what makes ConvNets different.

The answer to this quest is relatively simple: ConvNets also contain layers that are not fully connected, and are built in a different way - convolutional layers.

Let's schematically draw two such layers.

On the left, you see the first layer - and the pixels of, say, your input image. The yellow part is the "convolutional layer", and more precisely, one of the filters (convolutional layers often contain many such filters which are learnt based on the data). It slides over the input image, and averages a box of pixels into just one value. Repeating this process in many layer, we generate a small, very abstract image that we can use for classification.

How are ConvNets used in practice?

As a result of this, we see many interesting applications of convolutional layers these days - especially in the field of computer vision. For example, object detectors use ConvNets to "detect" known objects within images or even videos, allowing you to draw bounding boxes and act based on the observation:

https://www.youtube.com/watch?v=yQwfDxBMtXg

The Keras framework: Conv2D layers

Such layers are also represented within the Keras deep learning framework. For two-dimensional inputs, such as images, they are represented by keras.layers.Conv2D: the Conv2D layer!

In more detail, this is its exact representation (Keras, n.d.):

keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

Now, what does each attribute mean?

Filters represents the number of filters that should be learnt by the convolutional layer. From the schematic drawing above, you should understand that each filter slides over the input image, generating a "feature map" as output.
The kernel size represents the number of pixels in height and width that should be summarized, i.e. the two-dimensional width and height of the filter.
The stride tells us how the kernel jumps over the input image. If the stride is 1, it slides pixel by pixel. If it's two, it jumps one pixel. It jumps two with a stride of 3, and so on.
The padding tells us what happens when the kernels/filters don't fit, for example because the input image has a width and height that do not match with the combination of kernel size and stride.
Depending on the backend you're using Keras with, the channels (each image has image channels, e.g. 3 channels with Red-Green-Blue or RGB) are in the first dimension or the last. Hence, the data format represents whether it's a channels first or channels last approach. With recent versions of Keras, which support TensorFlow only, this is no longer a concern.
If you're using dilated convolutions, the dilation rate can be specified as well.
The activation function to which the linear output of the Conv2D layer is fed to make it nonlinear can be specified too.
A bias value can be added to each layer in order to scale the learnt function vertically. This possibly improves training results. It can be configured here, especially if you don't want to use biases. By default, it's enabled.
The initializer for the kernels, the biases can be configured too, as well as regularizers and constraints.

Implementing a Keras model with Conv2D

Let's now see how we can implement a Keras model using Conv2D layers. It's important to remember that we need Keras for this to work, and more specifically we need the newest version. That means that we best install TensorFlow version 2.0+, which supports Keras out of the box. I cannot explain here how to install Tensorflow, but if you Google for "installing TensorFlow", you'll most likely find a perfect example.

Obviously, you'll also need a recent version of Python - possibly, using Anaconda.

Full model code

This is the model that we'll be coding today. Don't worry - I will walk you through every step, but here's the code as a whole for those who just wish to copy and play:

from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam

# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 32, 32, 3
loss_function = sparse_categorical_crossentropy
no_classes = 10
no_epochs = 100
optimizer = Adam()
validation_split = 0.2
verbosity = 1

# Load CIFAR-10 data
(input_train, target_train), (input_test, target_test) = cifar10.load_data()

# Determine shape of the data
input_shape = (img_width, img_height, img_num_channels)

# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')

# Scale data
input_train = input_train / 255
input_test = input_test / 255

# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))

# Compile the model
model.compile(loss=loss_function,
              optimizer=optimizer,
              metrics=['accuracy'])

# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=batch_size,
            epochs=no_epochs,
            verbose=verbosity,
            validation_split=validation_split)

# Generate generalization metrics
score = model.evaluate(input_test, target_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')

Let's now study the model in more detail.

The imports

The first thing we'll need to do is import some things:

from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam

We'll be using the CIFAR10 dataset today. Later, we'll see what it looks like. We also use the Sequential API. This allows us to stack the layers nicely. Subsequently, we import the Dense (from densely-connected), Flatten and Conv2D layers.

The principle here is as follows:

The Conv2D layers will transform the input image into a very abstract representation.
This representation can be used by densely-connected layers to generate a classification.
However, as Dense layers can only handle one-dimensional data, we have to convert the multidimensional feature map output by the final Conv2D layer into one-dimensional format first. We can do so with the Flatten layer.

Next, we import the optimizer and the loss function. These will help us with improving the model: the optimizer adapts the weights, while the loss function computes the difference between the predictions and the ground truth of your training dataset. Loss functions are tailored to the problem you're trying to solve. For multiclass classification scenarios, which is what we're doing today, categorical crossentropy loss is a good choice. However, as our dataset targets are integers rather than vectors, we use the sparse equivalent.

The model configuration

Next up, the model configuration.

We set the batch size to 50. This means that 50 samples are fed to the model in each step.
We also define the image width, height and the number of channels. As our dataset contains 32x32 pixel RGB images, we set them to 32, 32, 3, respectively.
For the loss function and optimizer, we set the values that we just discussed.
As our dataset has 10 classes, we set no_classes to 10.
We will train our model for 25 iterations, or epochs. Usually, the number of epochs is a very large number, but as this is an educational scenario, we keep it low. Experiment with a few settings to see how it works!
We use 20% of the training data for validation purposes - i.e., to see how well your model performs after each iteration. This helps spot whether our model is overfitting.
Finally, we set verbosity mode to 1 - or True, showing all the output on screen.

# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 32, 32, 3
loss_function = sparse_categorical_crossentropy
no_classes = 10
no_epochs = 25
optimizer = Adam()
validation_split = 0.2
verbosity = 1

Loading and preparing our dataset

The third step is to load and prepare our dataset.

But wait, what dataset will we be using?

Let's take a look at the CIFAR10 dataset:

These are just a few samples from this dataset - as you can see, it contains many common day classes such as truck, deer, and automobile. We'll load and prepare it as follows:

# Load CIFAR-10 data
(input_train, target_train), (input_test, target_test) = cifar10.load_data()

# Determine shape of the data
input_shape = (img_width, img_height, img_num_channels)

# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')

# Scale data
input_train = input_train / 255
input_test = input_test / 255

The first step, loading the data from the Keras datasets wrapper, should be clear. The same goes for determining the shape of our data - which is done based on the configuration settings that we discussed earlier.

Now, for the other two steps, these are just technicalities. By casting our data into float32, the training process will presumably be faster if you run it on a GPU. Scaling the data ensures that we have smaller weight updates, benefiting the final outcome.

Specifying model architecture

Now that all the "work upfront" is complete, we can actually specify the model:

# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))

First, we instantiate the Sequential API - literally laying the foundation on top of which we can stack layers.

As you can see, we specify three Conv2D layers in sequential order, with 3x3 kernel sizes, ReLU activation and 32, 64 and 128 filters, respectively.

Next, we use Flatten, and have two Dense layers to generate the classification. The last layer doesn't activate with ReLU, but with Softmax instead. This allows us to generate a true multiclass probability distribution, which is what we need if we want to answer the question "which class is most likely?".

Now that we have specified the architecture, or the framework, we can compile (or initialize) the model and fit the data (i.e., start training).

Model compilation and fitting the data

Keras allows you to do so quite easily: with model.compile and model.fit. The compile call allows you to specify the loss function, the optimizer and additional metrics, of which we use accuracy, as it's intuitive to humans.

Then, with fit, we can fit the input_train and target_train (i.e. the inputs and targets of our training set) to the model, actually starting the training process. We do so based on the options that we configured earlier, i.e. batch size, number of epochs, verbosity mode and validation split.

# Compile the model
model.compile(loss=loss_function,
              optimizer=optimizer,
              metrics=['accuracy'])

# Fit data to model
history = model.fit(input_train, target_train,
            batch_size=batch_size,
            epochs=no_epochs,
            verbose=verbosity,
            validation_split=validation_split)

Model evaluation

The final step is to evaluate our model after we performed training. Keras allows you to do so with model.evaluate. As you can see, instead of the training dataset, we're using testing data here: input_test and target_test. This way, we can be sure that we test the model with data that it hasn't seen before during training, evaluating its power to generalize to new data (which happens in real-world settings all the time!). Evaluation is done in a non-verbose way, and the results are printed on screen.

# Generate generalization metrics
score = model.evaluate(input_test, target_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')

Running our model

Now, let's run the model. Say that you save your model as model.py. Open a terminal, cd to the folder where your file is located, and run python model.py. You should see the training process begin on screen.

2020-03-30 20:25:58.336324: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
40000/40000 [==============================] - 35s 887us/sample - loss: 1.4811 - accuracy: 0.4676 - val_loss: 1.1419 - val_accuracy: 0.5906
Epoch 2/25
40000/40000 [===================>

Once it finishes, you should get an evaluation that is close to this:

Test loss: 4.3477772548675535 / Test accuracy: 0.6200000047683716

62% is not extremely good, but it's not very bad either. For sure that we can improve a lot, but that wasn't the point of this blog post! 😉

Summary

In this blog post, we looked at how two-dimensional convolutional layers can be used with the Keras deep learning framework. Having studied a little bit of neural network theory up front, and diving into the concepts of convolutional layers, we quickly moved on to Keras and its Conv2D representation. With an example model, which we looked at step by step, we showed you how you can create a Keras ConvNet yourself.

I hope you've learnt something from today's blog post. If you did, please feel free to leave a comment in the comments section below! Please do the same if you have questions or when you have remarks - I'll happily answer and improve my blog post if necessary.

For now, thank you for reading MachineCurve today and happy engineering! 😎

References

Keras. (n.d.). Convolutional layers: Conv2D. Home - Keras Documentation. https://keras.io/layers/convolutional/#conv2d

Hi, I'm Chris!

I know a thing or two about AI and machine learning. Welcome to MachineCurve.com, where machine learning is explained in gentle terms.

Getting started

Foundation models

Learn how large language models and other foundation models are working and how you can train open source ones yourself.

Keras

Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.

TensorFlow

TensorFlow is the most popular deep learning framework. It is is used by many companies.

PyTorch

PyTorch is a deep learning framework which is popular for its ease of use and flexibility.

Machine learning theory

Read about the fundamentals of machine learning, deep learning and artificial intelligence.

Transformer architectures

Emerging since 2017, Transformer architectures are part of the state of the art in deep learning.

Most recent articles

January 8, 2024

LLM in a Flash: improving memory requirements of large language models

January 2, 2024

What is Retrieval-Augmented Generation?

December 27, 2023

Building a zero-shot image classifier with CLIP and HuggingFace Transformers

December 27, 2023

In-Context Learning: what it is and how it works

December 22, 2023

CLIP: how it works, how it's trained and how to use it

Article tags

conv2d

convolutional neural networks

deep learning

keras

machine learning

neural networks

tutorial

Connect on social media

Connect with me on LinkedIn

To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!

See my work on GitHub

My work is available on GitHub. Feel free to check it out and see if it can be of use to you!

Side info

The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.

All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.

If you have any questions or remarks, feel free to get in touch.

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.

Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.

Mathjax is licensed under the Apache License, Version 2.0.

How to use Conv2D with Keras?

March 30, 2020 by Chris

Some theory about Conv2D: about convolutional neural networks

What is a neural network?

What are convolutional neural networks?

How are ConvNets used in practice?

The Keras framework: Conv2D layers

Implementing a Keras model with Conv2D

Full model code

The imports

The model configuration

Loading and preparing our dataset

Specifying model architecture

Model compilation and fitting the data

Model evaluation

Running our model

Summary

References

Hi, I'm Chris!

I know a thing or two about AI and machine learning. Welcome to MachineCurve.com, where machine learning is explained in gentle terms.

Getting started

Foundation models

Keras

TensorFlow

PyTorch

Machine learning theory

Transformer architectures

Most recent articles

January 8, 2024

LLM in a Flash: improving memory requirements of large language models

January 2, 2024

What is Retrieval-Augmented Generation?

December 27, 2023

Building a zero-shot image classifier with CLIP and HuggingFace Transformers

December 27, 2023

In-Context Learning: what it is and how it works

December 22, 2023

CLIP: how it works, how it's trained and how to use it

Article tags

Most popular articles

February 18, 2020

How to use K-fold Cross Validation with TensorFlow 2 and Keras?

December 28, 2020

Introduction to Transformers in Machine Learning

December 27, 2021

StyleGAN, a step-by-step introduction

July 17, 2019

This Person Does Not Exist - how does it work?

October 26, 2020

Your First Machine Learning Project with TensorFlow 2.0 and Keras

Connect on social media

Connect with me on LinkedIn

See my work on GitHub

Side info

Getting started

Foundation models

Keras

TensorFlow

PyTorch

Machine learning theory

Transformer architectures

Most popular articles

February 18, 2020

How to use K-fold Cross Validation with TensorFlow 2 and Keras?

December 28, 2020

Introduction to Transformers in Machine Learning

December 27, 2021

StyleGAN, a step-by-step introduction

July 17, 2019

This Person Does Not Exist - how does it work?

October 26, 2020

Your First Machine Learning Project with TensorFlow 2.0 and Keras

Side info

Connect with me on LinkedIn

See my work on GitHub