Creating a Multilayer Perceptron with PyTorch and Lightning

Multilayer Perceptrons or MLPs are one of the basic types of neural networks that can be created. Still, they are very important, because they also lie at the basis of more advanced models. If you know that Multilayer Perceptrons are often called feedforward segments in these architectures, you can easily see that they are heavily used in Transformer models as well as in Convolutional Neural Networks.

A basic MLP. License: public domain.

In other words: basic does not mean useless. Quite the contrary, for MLPs.

Today, there are two frameworks that are heavily used for creating neural networks with Python. The first is TensorFlow. This article however provides a tutorial for creating an MLP with PyTorch, the second framework that is very popular these days. It also instructs how to create one with PyTorch Lightning. After reading this tutorial, you will...

Have refreshed the basics of Multilayer Perceptrons.
Understand how to build an MLP with PyTorch.
Also understand how to build one with PyTorch Lightning.

Let's get to work! 🚀

Summary and code examples: MLP with PyTorch and Lightning

Multilayer Perceptrons are straight-forward and simple neural networks that lie at the basis of all Deep Learning approaches that are so common today. Having emerged many years ago, they are an extension of the simple Rosenblatt Perceptron from the 50s, having made feasible after increases in computing power. Today, they are used in many neural networks, sometimes augmented with other layer types as well.

Being composed of layers of neurons that are stacked on top of each other, these networks - which are also called MLP - can be used for a wide variety of purposes, being regression and classification. In this article, we will show you how you can create MLPs with PyTorch and PyTorch Lightning, which are very prominent in today's machine learning and deep learning industry.

First, we'll show two full-fledged examples of an MLP - the first created with classic PyTorch, the second with Lightning.

Classic PyTorch

Defining a Multilayer Perceptron in classic PyTorch is not difficult; it just takes quite a few lines of code. We'll explain every aspect in detail in this tutorial, but here is already a complete code example for a PyTorch created Multilayer Perceptron. If you want to understand everything in more detail, make sure to rest of the tutorial as well. Best of luck! :)

import os
import torch
from torch import nn
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from torchvision import transforms

class MLP(nn.Module):
  '''
    Multilayer Perceptron.
  '''
  def __init__(self):
    super().__init__()
    self.layers = nn.Sequential(
      nn.Flatten(),
      nn.Linear(32 * 32 * 3, 64),
      nn.ReLU(),
      nn.Linear(64, 32),
      nn.ReLU(),
      nn.Linear(32, 10)
    )


  def forward(self, x):
    '''Forward pass'''
    return self.layers(x)


if __name__ == '__main__':

  # Set fixed random number seed
  torch.manual_seed(42)

  # Prepare CIFAR-10 dataset
  dataset = CIFAR10(os.getcwd(), download=True, transform=transforms.ToTensor())
  trainloader = torch.utils.data.DataLoader(dataset, batch_size=10, shuffle=True, num_workers=1)

  # Initialize the MLP
  mlp = MLP()

  # Define the loss function and optimizer
  loss_function = nn.CrossEntropyLoss()
  optimizer = torch.optim.Adam(mlp.parameters(), lr=1e-4)

  # Run the training loop
  for epoch in range(0, 5): # 5 epochs at maximum

    # Print epoch
    print(f'Starting epoch {epoch+1}')

    # Set current loss value
    current_loss = 0.0

    # Iterate over the DataLoader for training data
    for i, data in enumerate(trainloader, 0):

      # Get inputs
      inputs, targets = data

      # Zero the gradients
      optimizer.zero_grad()

      # Perform forward pass
      outputs = mlp(inputs)

      # Compute loss
      loss = loss_function(outputs, targets)

      # Perform backward pass
      loss.backward()

      # Perform optimization
      optimizer.step()

      # Print statistics
      current_loss += loss.item()
      if i % 500 == 499:
          print('Loss after mini-batch %5d: %.3f' %
                (i + 1, current_loss / 500))
          current_loss = 0.0

  # Process is complete.
  print('Training process has finished.')

PyTorch Lightning

You can also get started with PyTorch Lightning straight away. Here, we provided a full code example for an MLP created with Lightning. Once more: if you want to understand everything in more detail, make sure to read the rest of this tutorial as well! :D

import os
import torch
from torch import nn
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from torchvision import transforms
import pytorch_lightning as pl

class MLP(pl.LightningModule):

  def __init__(self):
    super().__init__()
    self.layers = nn.Sequential(
      nn.Linear(32 * 32 * 3, 64),
      nn.ReLU(),
      nn.Linear(64, 32),
      nn.ReLU(),
      nn.Linear(32, 10)
    )
    self.ce = nn.CrossEntropyLoss()

  def forward(self, x):
    return self.layers(x)

  def training_step(self, batch, batch_idx):
    x, y = batch
    x = x.view(x.size(0), -1)
    y_hat = self.layers(x)
    loss = self.ce(y_hat, y)
    self.log('train_loss', loss)
    return loss

  def configure_optimizers(self):
    optimizer = torch.optim.Adam(self.parameters(), lr=1e-4)
    return optimizer


if __name__ == '__main__':
  dataset = CIFAR10(os.getcwd(), download=True, transform=transforms.ToTensor())
  pl.seed_everything(42)
  mlp = MLP()
  trainer = pl.Trainer(auto_scale_batch_size='power', gpus=0, deterministic=True, max_epochs=5)
  trainer.fit(mlp, DataLoader(dataset))

What is a Multilayer Perceptron?

Created by Wiso at Wikipedia. License: public domain.

I always tend to think that it is good practice if you understand some concepts before you write some code. That's why we'll take a look at the basics of Multilayer Perceptrons, abbreviated as MLPs, in this section. Once completed, we move on and start writing some code with PyTorch and Lightning.

Back in the 1950s, in the era where people had just started using computing technology after they found it really useful, there was a psychologist named Frank Rosenblatt. The man imagined what it would be like to add intelligence to machines - in other words, to make a machine that can think. The result is the Rosenblatt Perceptron - a mathematical operation where some input is passed through a neuron, where weights are memoralized and where the end result is used to optimize the weights. While it can learn a binary classifier, it fell short of learning massively complex functions like thinking and such.

Besides theoretical issues, the absence of sufficient computing power also meant that neural networks could not be utilized massively. Decades later, technological progress made possible the growth into multilayer perceptrons, or MLPs. In these perceptrons, more than just one neuron is used for generating predictions. In addition, neurons are stacked in layers of increasing abstractness, where each layers learns more abstract patterns. That is, while one layer can learn to detect lines, another can learn to detect noses.

In MLPs, the input data is fed to an input layer that shares the dimensionality of the input space. For example, if you feed input samples with 8 features per sample, you'll also have 8 neurons in the input layer. After being processed by the input layer, the results are passed to the next layer, which is called a hidden layer. The final layer is an output. Its neuron structure depends on the problem you are trying to solve (i.e. one neuron in the case of regression and binary classification problems; multiple neurons in a multiclass classification problem).

If you look closely, you can see that each neuron passes the input to all neurons in the subsequent (or downstream) layer. This is why such layers are also called densely-connected, or Dense. In TensorFlow and Keras they are available as tensorflow.keras.layers.Dense; PyTorch utilizes them as torch.nn.Linear.

Creating an MLP with PyTorch

Now that we understand what an MLP looks like, it is time to build one with PyTorch. Below, we will show you how you can create your own PyTorch based MLP with step-by-step examples. In addition to that, we also show you how to build one with PyTorch Lightning. This is a library on top of PyTorch which allows you to build models with much less overhead (for example, by automating away explicitly stating the training loop).

First, we'll show you how to build an MLP with classic PyTorch, then how to build one with Lightning.

Classic PyTorch

Implementing an MLP with classic PyTorch involves six steps:

Importing all dependencies, meaning os, torch and torchvision.
Defining the MLP neural network class as a nn.Module.
Adding the preparatory runtime code.
Preparing the CIFAR-10 dataset and initializing the dependencies (loss function, optimizer).
Defining the custom training loop, where all the magic happens.

Importing all dependencies

The first step here is to add all the dependencies. We need os for file input/output functionality, as we will save the CIFAR-10 dataset to local disk later in this tutorial. We'll also import torch, which imports PyTorch. From it we import nn, which allows us to define a neural network module. We also import the DataLoader (for feeding data into the MLP during training), the CIFAR10 dataset (for obvious purposes) and transforms, which allows us to perform transformations on the data prior to feeding it to the MLP.

import os
import torch
from torch import nn
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from torchvision import transforms

Defining the MLP neural network class

Next up is defining the MLP class, which replicates the nn.Module class. This Module class instructs the implementation of our neural network and is therefore really useful when creating one. It has two definitions: __init__, or the constructor, and forward, which implements the forward pass.

In the constructor, we first invoke the superclass initialization and then define the layers of our neural network. We stack all layers (three densely-connected layers with Linear and ReLU activation functions using nn.Sequential. We also add nn.Flatten() at the start. Flatten converts the 3D image representations (width, height and channels) into 1D format, which is necessary for Linear layers. Note that with image data it is often best to use Convolutional Neural Networks. This is out of scope for this tutorial and will be covered in another one.

The forward pass allows us to react to input data - for example, during the training process. In our case, it does nothing but feeding the data through the neural network layers, and returning the output.

class MLP(nn.Module):
  '''
    Multilayer Perceptron.
  '''
  def __init__(self):
    super().__init__()
    self.layers = nn.Sequential(
      nn.Flatten(),
      nn.Linear(32 * 32 * 3, 64),
      nn.ReLU(),
      nn.Linear(64, 32),
      nn.ReLU(),
      nn.Linear(32, 10)
    )


  def forward(self, x):
    '''Forward pass'''
    return self.layers(x)

Adding runtime code

After defining the class, we can move on and write the runtime code. This code is actually executed at runtime, i.e. when you call the Python script from the terminal with e.g. python mlp.py. The class itself is then not yet used, but we will do so shortly.

The first thing we define in the runtime code is setting the seed of the random number generator. Using a fixed seed ensures that this generator is initialized with the same starting value. This benefits reproducibility of your ML findings.

if __name__ == '__main__':

  # Set fixed random number seed
  torch.manual_seed(42)

Preparing the CIFAR-10 dataset and initializing dependencies

The next code we add involves preparing the CIFAR-10 dataset. Some samples from this dataset are visualized in the image on the right. The dataset contains 10 classes and has 60.000 32 by 32 pixel images, with 6000 images per class.

Loading and preparing the CIFAR-10 data is a two-step process:

Initializing the dataset itself, by means of CIFAR10. Here, in increasing order, you specify the directory where the dataset has to be saved, that it must be downloaded, and that they must be converted into Tensor format.
Initializing the DataLoader, which takes the dataset, a batch size, shuffle parameter (whether the data must be ordered at random) and the number of workers to load data with. In PyTorch, data loaders are used for feeding data to the model uniformly.

  # Prepare CIFAR-10 dataset
  dataset = CIFAR10(os.getcwd(), download=True, transform=transforms.ToTensor())
  trainloader = torch.utils.data.DataLoader(dataset, batch_size=10, shuffle=True, num_workers=1)

Now, it's time to initialize the MLP - and use the class that we had not yet used before. We also specify the loss function (categorical crossentropy loss) and the Adam optimizer. The optimizer works on the parameters of the MLP and utilizes a learning rate of 10e-4. We'll use them next.

  # Initialize the MLP
  mlp = MLP()

  # Define the loss function and optimizer
  loss_function = nn.CrossEntropyLoss()
  optimizer = torch.optim.Adam(mlp.parameters(), lr=1e-4)

Defining the training loop

The core part of our runtime code is the training loop. In this loop, we perform the epochs, or training iterations. For every iteration, we iterate over the training dataset, perform the entire forward and backward passes, and perform model optimization.

Step-by-step, these are the things that happen within the loop:

Of course, we have a number of full iterations - also known as epochs. Here, we use 5 epochs, as defined by the range(0, 5).
We set the current loss value for printing to 0.0.
Per epoch, we iterate over the training dataset - and more specifically, the minibatches within this training dataset as specified by the batch size (set in the trainloader above). Here, we do the following things:
- We decompose the data into inputs and targets (or x and y values, respectively).
- We zero the gradients in the optimizer, to ensure that it starts freshly for this minibatch.
- We perform the forward pass - which in effect is feeding the inputs to the model, which, recall, was initialized as mlp.
- We then compute the loss value based on the outputs of the model and the ground truth, available in targets.
- This is followed by the backward pass, where the gradients are computed, and optimization, where the model is adapted.
- Finally, we print some statistics - but only at every 500th minibatch. At the end of the entire process, we print that the training process has finished.

  # Run the training loop
  for epoch in range(0, 5): # 5 epochs at maximum

    # Print epoch
    print(f'Starting epoch {epoch+1}')

    # Set current loss value
    current_loss = 0.0

    # Iterate over the DataLoader for training data
    for i, data in enumerate(trainloader, 0):

      # Get inputs
      inputs, targets = data

      # Zero the gradients
      optimizer.zero_grad()

      # Perform forward pass
      outputs = mlp(inputs)

      # Compute loss
      loss = loss_function(outputs, targets)

      # Perform backward pass
      loss.backward()

      # Perform optimization
      optimizer.step()

      # Print statistics
      current_loss += loss.item()
      if i % 500 == 499:
          print('Loss after mini-batch %5d: %.3f' %
                (i + 1, current_loss / 500))
          current_loss = 0.0

  # Process is complete.
  print('Training process has finished.')

Full model code

For the full model code, see the full code example at the beginning of this tutorial.

Running the training process

Now, when you save the code e.g. to a file called mlp.py and run python mlp.py, you'll see the following when your PyTorch has been installed successfully.

Starting epoch 1
Loss after mini-batch   500: 2.232
Loss after mini-batch  1000: 2.087
Loss after mini-batch  1500: 2.004
Loss after mini-batch  2000: 1.963
Loss after mini-batch  2500: 1.943
Loss after mini-batch  3000: 1.926
Loss after mini-batch  3500: 1.904
Loss after mini-batch  4000: 1.878
Loss after mini-batch  4500: 1.872
Loss after mini-batch  5000: 1.874
Starting epoch 2
Loss after mini-batch   500: 1.843
Loss after mini-batch  1000: 1.828
Loss after mini-batch  1500: 1.830
Loss after mini-batch  2000: 1.819
...

Great! 😎

PyTorch Lightning

Another approach for creating your PyTorch based MLP is using PyTorch Lightning. It is a library that is available on top of classic PyTorch (and in fact, uses classic PyTorch) that makes creating PyTorch models easier.

The reason is simple: writing even a simple PyTorch model means writing a lot of code. And in fact, writing a lot of code that does nothing more than the default training process (like our training loop above).

In Lightning, these elements are automated as much as possible. In addition, running your code on a GPU does not mean converting your code to CUDA format (which we even haven't done above!). And there are other benefits. Since Lightning is nothing more than classic PyTorch structured differently, there is significant adoption of Lightning. We'll therefore also show you how to create that MLP with Lightning - and you will see that it saves a lot of lines of code.

Importing all dependencies

The first step is importing all dependencies. If you have also followed the classic PyTorch example above, you can see that it is not so different from classic PyTorch. In fact, we use the same imports - os for file I/O, torch and its sub imports for PyTorch functionality, but now also pytorch_lightning for Lightning functionality.

import os
import torch
from torch import nn
from torchvision.datasets import CIFAR10
from torch.utils.data import DataLoader
from torchvision import transforms
import pytorch_lightning as pl

Defining the MLP LightningModule

In PyTorch Lightning, all functionality is shared in a LightningModule - which is a structured version of the nn.Module that is used in classic PyTorch. Here, the __init__ and forward definitions capture the definition of the model. We specify a neural network with three MLP layers and ReLU activations in self.layers. We also specify the cross entropy loss in self.ce. In forward, we perform the forward pass.

Different in Lightning is that it also requires you to pass the training_step and configure_optimizers definitions. This is mandatory because Lightning strips away the training loop. The training_step allows you to compute the loss (which is then used for optimization purposes under the hood), and for these optimization purposes you'll need an optimizer, which is specified in configure_optimizers.

That's it for the MLP!

class MLP(pl.LightningModule):

  def __init__(self):
    super().__init__()
    self.layers = nn.Sequential(
      nn.Linear(32 * 32 * 3, 64),
      nn.ReLU(),
      nn.Linear(64, 32),
      nn.ReLU(),
      nn.Linear(32, 10)
    )
    self.ce = nn.CrossEntropyLoss()

  def forward(self, x):
    return self.layers(x)

  def training_step(self, batch, batch_idx):
    x, y = batch
    x = x.view(x.size(0), -1)
    y_hat = self.layers(x)
    loss = self.ce(y_hat, y)
    self.log('train_loss', loss)
    return loss

  def configure_optimizers(self):
    optimizer = torch.optim.Adam(self.parameters(), lr=1e-4)
    return optimizer

Adding runtime code: dataset, seed, and the Trainer

Since Lightning hides much of the training loop, your runtime code becomes really small!

You have to define your dataset by initializing CIFAR10, just like with the original example.
You'll seed everything to 42 to ensure that all pseudo-random number generators are initialized with fixed starting values.
You initialize the MLP.
You initialize the Trainer object, which is responsible for automating away much of the training loop, pass configuration options and then fit the data available in the dataset through the DataLoader.

if __name__ == '__main__':
  dataset = CIFAR10(os.getcwd(), download=True, transform=transforms.ToTensor())
  pl.seed_everything(42)
  mlp = MLP()
  trainer = pl.Trainer(auto_scale_batch_size='power', gpus=1, deterministic=True, max_epochs=5)
  trainer.fit(mlp, DataLoader(dataset))

Please do note that automating away the training loop does not mean that you lose all control over the loop. You can still control it if you want by means of your code. This is however out of scope for this tutorial.

Full model code

For the full model code, see the full code example at the beginning of this tutorial.

Running the training process

Now, when you save the code e.g. to a file called mlp-lightning.py and run python mlp-lightning.py, you'll see the following when your PyTorch and PyTorch Lightning have been installed successfully.

``` PU available: True, used: True TPU available: None, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

Hi, I'm Chris!

I know a thing or two about AI and machine learning. Welcome to MachineCurve.com, where machine learning is explained in gentle terms.

Getting started

Foundation models

Learn how large language models and other foundation models are working and how you can train open source ones yourself.

Keras

Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.

TensorFlow

TensorFlow is the most popular deep learning framework. It is is used by many companies.

PyTorch

PyTorch is a deep learning framework which is popular for its ease of use and flexibility.

Machine learning theory

Read about the fundamentals of machine learning, deep learning and artificial intelligence.

Transformer architectures

Emerging since 2017, Transformer architectures are part of the state of the art in deep learning.

Most recent articles

January 8, 2024

LLM in a Flash: improving memory requirements of large language models

January 2, 2024

What is Retrieval-Augmented Generation?

December 27, 2023

Building a zero-shot image classifier with CLIP and HuggingFace Transformers

December 27, 2023

In-Context Learning: what it is and how it works

December 22, 2023

CLIP: how it works, how it's trained and how to use it

Article tags

deep learning

lightning

machine learning

mlp

multilayer perceptron

neural network

neural networks

pytorch

pytorch lightning

Connect on social media

Connect with me on LinkedIn

To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!

See my work on GitHub

My work is available on GitHub. Feel free to check it out and see if it can be of use to you!

Side info

The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.

All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.

If you have any questions or remarks, feel free to get in touch.

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.

Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.

Mathjax is licensed under the Apache License, Version 2.0.

Creating a Multilayer Perceptron with PyTorch and Lightning

January 26, 2021 by Chris

Summary and code examples: MLP with PyTorch and Lightning

Classic PyTorch

PyTorch Lightning

What is a Multilayer Perceptron?

Creating an MLP with PyTorch

Classic PyTorch

Importing all dependencies

Defining the MLP neural network class

Adding runtime code

Preparing the CIFAR-10 dataset and initializing dependencies

Defining the training loop

Full model code

Running the training process

PyTorch Lightning

Importing all dependencies

Defining the MLP LightningModule

Adding runtime code: dataset, seed, and the Trainer

Full model code

Running the training process

Hi, I'm Chris!

I know a thing or two about AI and machine learning. Welcome to MachineCurve.com, where machine learning is explained in gentle terms.

Getting started

Foundation models

Keras

TensorFlow

PyTorch

Machine learning theory

Transformer architectures

Most recent articles

January 8, 2024

LLM in a Flash: improving memory requirements of large language models

January 2, 2024

What is Retrieval-Augmented Generation?

December 27, 2023

Building a zero-shot image classifier with CLIP and HuggingFace Transformers

December 27, 2023

In-Context Learning: what it is and how it works

December 22, 2023

CLIP: how it works, how it's trained and how to use it

Article tags

Most popular articles

February 18, 2020

How to use K-fold Cross Validation with TensorFlow 2 and Keras?

December 28, 2020

Introduction to Transformers in Machine Learning

December 27, 2021

StyleGAN, a step-by-step introduction

July 17, 2019

This Person Does Not Exist - how does it work?

October 26, 2020

Your First Machine Learning Project with TensorFlow 2.0 and Keras

Connect on social media

Connect with me on LinkedIn

See my work on GitHub

Side info

Getting started

Foundation models

Keras

TensorFlow

PyTorch

Machine learning theory

Transformer architectures

Most popular articles

February 18, 2020

How to use K-fold Cross Validation with TensorFlow 2 and Keras?

December 28, 2020

Introduction to Transformers in Machine Learning

December 27, 2021

StyleGAN, a step-by-step introduction

July 17, 2019

This Person Does Not Exist - how does it work?

October 26, 2020

Your First Machine Learning Project with TensorFlow 2.0 and Keras

Side info

Connect with me on LinkedIn

See my work on GitHub