# How to use K-fold Cross Validation with PyTorch?

Last Updated on 15 February 2021

Machine learning models must be evaluated with a test set after they have been trained. We do this to ensure that models have not overfit and to ensure that they work with real-life datasets, which may have slightly deviating distributions compared to the training set.

But in order to make your model really robust, simply evaluating with a train/test split may not be enough.

For example, take the situation where you have a dataset composed of samples from two classes. Most of the samples in the first 80% of your dataset belong to class A, whereas most of the samples in the other 20% belong to class B. If you would take a simple 80/20 hold-out split, then your datasets would have vastly different distributions – and evaluation might result in wrong conclusions.

That’s something what you want to avoid. In this article, you’ll therefore learn about another technique that can be applied – K-fold Cross Validation. By generating train/test splits across multiple folds, you can perform multiple training and testing sessions, with different splits. You’ll also see how you can use K-fold Cross Validation with PyTorch, one of the leading libraries for neural networks these days.

After reading this tutorial, you will…

• Understand why K-fold Cross Validation can improve your confidence in model evaluation results.
• Have an idea about how K-fold Cross Validation works.
• Know how to implement K-fold Cross Validation with PyTorch.

Update 15/Feb/2021: fixed small textual error.

## Summary and code example: K-fold Cross Validation with PyTorch

Model evaluation is often performed with a hold-out split, where an often 80/20 split is made and where 80% of your dataset is used for training the model. and 20% for evaluating the model. While this is a simple approach, it is also very naïve, since it assumes that data is representative across the splits, that it’s not a time series dataset and that there are no redundant samples within the datasets.

K-fold Cross Validation is a more robust evaluation technique. It splits the dataset in $$k-1$$ training batches and 1 testing batch across $$k$$ folds, or situations. Using the training batches, you can then train your model, and subsequently evaluate it with the testing batch. This allows you to train the model for multiple times with different dataset configurations. Even better, it allows you to be more confident in your model evaluation results.

Below, you will see a full example of using K-fold Cross Validation with PyTorch, using Scikit-learn’s KFold functionality. It can be used on the go. If you want to understand things in more detail, however, it’s best to continue reading the rest of the tutorial as well! 🚀

.wp-block-code{border:0;padding:0;}.wp-block-code > div{overflow:auto;}.shcb-language{border:0;clip:rect(1px,1px,1px,1px);-webkit-clip-path:inset(50%);clip-path:inset(50%);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;width:1px;word-wrap:normal;word-break:normal;}.hljs{box-sizing:border-box;}.hljs.shcb-code-table{display:table;width:100%;}.hljs.shcb-code-table > .shcb-loc{color:inherit;display:table-row;width:100%;}.hljs.shcb-code-table .shcb-loc > span{display:table-cell;}.wp-block-code code.hljs:not(.shcb-wrap-lines){white-space:pre;}.wp-block-code code.hljs.shcb-wrap-lines{white-space:pre-wrap;}.hljs.shcb-line-numbers{border-spacing:0;counter-reset:line;}.hljs.shcb-line-numbers > .shcb-loc{counter-increment:line;}.hljs.shcb-line-numbers .shcb-loc > span{padding-left:.75em;}.hljs.shcb-line-numbers .shcb-loc::before{border-right:1px solid #ddd;content:counter(line);display:table-cell;padding:0 .75em;text-align:right;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none;white-space:nowrap;width:1%;}import os
import torch
from torch import nn
from torchvision.datasets import MNIST
from torchvision import transforms
from sklearn.model_selection import KFold

class SimpleConvNet(nn.Module):
'''
Simple Convolutional Neural Network
'''
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Conv2d(1, 10, kernel_size=3),
nn.ReLU(),
nn.Flatten(),
nn.Linear(26 * 26 * 10, 50),
nn.ReLU(),
nn.Linear(50, 20),
nn.ReLU(),
nn.Linear(20, 10)
)

def forward(self, x):
'''Forward pass'''
return self.layers(x)

if __name__ == '__main__':

# Configuration options
k_folds = 5
num_epochs = 1
loss_function = nn.CrossEntropyLoss()

# For fold results
results = {}

# Set fixed random number seed
torch.manual_seed(42)

# Prepare MNIST dataset by concatenating Train/Test part; we split later.
dataset = ConcatDataset([dataset_train_part, dataset_test_part])

# Define the K-fold Cross Validator
kfold = KFold(n_splits=k_folds, shuffle=True)

# Start print
print('--------------------------------')

# K-fold Cross Validation model evaluation
for fold, (train_ids, test_ids) in enumerate(kfold.split(dataset)):

# Print
print(f'FOLD {fold}')
print('--------------------------------')

# Sample elements randomly from a given list of ids, no replacement.
train_subsampler = torch.utils.data.SubsetRandomSampler(train_ids)
test_subsampler = torch.utils.data.SubsetRandomSampler(test_ids)

# Define data loaders for training and testing data in this fold
dataset,
batch_size=10, sampler=train_subsampler)
dataset,
batch_size=10, sampler=test_subsampler)

# Init the neural network
network = SimpleConvNet()

# Initialize optimizer

# Run the training loop for defined number of epochs
for epoch in range(0, num_epochs):

# Print epoch
print(f'Starting epoch {epoch+1}')

# Set current loss value
current_loss = 0.0

# Iterate over the DataLoader for training data
for i, data in enumerate(trainloader, 0):

# Get inputs
inputs, targets = data

# Perform forward pass
outputs = network(inputs)

# Compute loss
loss = loss_function(outputs, targets)

# Perform backward pass
loss.backward()

# Perform optimization
optimizer.step()

# Print statistics
current_loss += loss.item()
if i % 500 == 499:
print('Loss after mini-batch %5d: %.3f' %
(i + 1, current_loss / 500))
current_loss = 0.0

# Process is complete.
print('Training process has finished. Saving trained model.')

print('Starting testing')

# Saving the model
save_path = f'./model-fold-{fold}.pth'
torch.save(network.state_dict(), save_path)

# Evaluationfor this fold
correct, total = 0, 0

# Iterate over the test data and generate predictions
for i, data in enumerate(testloader, 0):

# Get inputs
inputs, targets = data

# Generate outputs
outputs = network(inputs)

# Set total and correct
_, predicted = torch.max(outputs.data, 1)
total += targets.size(0)
correct += (predicted == targets).sum().item()

# Print accuracy
print('Accuracy for fold %d: %d %%' % (fold, 100.0 * correct / total))
print('--------------------------------')
results[fold] = 100.0 * (correct / total)

# Print fold results
print(f'K-FOLD CROSS VALIDATION RESULTS FOR {k_folds} FOLDS')
print('--------------------------------')
sum = 0.0
for key, value in results.items():
print(f'Fold {key}: {value} %')
sum += value
print(f'Average: {sum/len(results.items())} %')Code language: Python (python)

## What is K-fold Cross Validation?

Suppose that your goal is to build a classifier that correctly classifies input images – like in the example below. You input an image that represents a handwritten digit, and the output is expected to be 5.

There are myriad ways for building such a classifier in terms of the model type that can be chosen. But which is best? You have to evaluate each model in order to find how well it works.

### Why using train/test splits for model evaluation?

Model evaluation happens after a machine learning model has been trained. It ensures that the model works with real-world data too by feeding samples from an dataset called the test set, which contains samples that the model has not seen before.

By comparing the subsequent predictions with the ground truth labels that are also available for these samples, we can see how well the model performs on this dataset. And we can thus also see how well it performs on data from the real world, if we used that during model evaluation.

However, we have to be cautious when evaluating our model. We cannot simply use the data that we trained the model with, to avoid becoming a student who grades their own homework.

## Let's pause for a second! 👩‍💻

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to MachineCurve's free Machine Learning update today! You will learn new things and better understand concepts you already know.

We send emails at least every Friday. Welcome!
By signing up, you consent that any information you receive can include services and special offers by email.

Because that is what would happen when you evaluated with your training data: as the model has learned to capture patterns related to that particular dataset, the model might perform poorly if these patterns were spurious and therefore not present within real-world data. Especially with high-variance models, this can become a problem.

Instead, we evaluate models with that test set, which has been selected and contains samples not present within the training set. But how to construct this test set is another question. There are multiple methods for doing so. Let’s take a look at a naïve strategy first. We then understand why we might apply K-fold Cross Validation instead.

### Simple hold-out splits: a naïve strategy

Here’s that naïve way, which is also called a simple hold-out split:

With this technique, you simply take a part of your original dataset, set it apart, and consider that to be testing data.

Traditionally, such splits are taken in an 80/20 fashion, where 80% of the data is used for training the model, and 20% is used for evaluating it.

There are a few reasons why this is a naïve approach: you’ll have to keep these edge cases in mind all the time (Chollet, 2017):

1. Data representativeness: all datasets, which are essentially samples, must represent the patterns in the population as much as possible. This becomes especially important when you generate samples from a sample (i.e., from your full dataset). For example, if the first part of your dataset has pictures of ice cream, while the latter one only represents espressos, trouble is guaranteed when you generate the split as displayed above. Random shuffling may help you solve these issues.
2. The arrow of time: if you have a time series dataset, your dataset is likely ordered chronologically. If you’d shuffle randomly, and then perform simple hold-out validation, you’d effectively “[predict] the future given the past” (Chollet, 2017). Such temporal leaks don’t benefit model performance.
3. Data redundancy: if some samples appear more than once, a simple hold-out split with random shuffling may introduce redundancy between training and testing datasets. That is, identical samples belong to both datasets. This is problematic too, as data used for training thus leaks into the dataset for testing implicitly.

That’s why it’s often a better idea to validate your model more robustly. Let’s take a look at K-fold Cross Validation for doing so.

### Introducing K-fold Cross Validation

What if we could try multiple variations of this train/test split?

We would then have a model that is evaluated much more robustly.

And precisely that is what K-fold Cross Validation is all about.

In K-fold Cross Validation, you set a number $$k$$ to any integer value $$> 1$$, and $$k$$ splits will be generated. Each split has $$1/k$$ samples that belong to a test dataset, while the rest of your data can be used for training purposes.

As in each split a different part of the training data will be used for validation purposes, you effectively train and evaluate your model multiple times, allowing you to tell whether it works with more confidence than with a simple hold-out split.

Let’s now take a look at how we can implement K-fold Cross Validation with PyTorch!

## Implementing K-fold Cross Validation with PyTorch

Now that you understand how K-fold Cross Validation works, let’s take a look at how you can apply it with PyTorch. Using K-fold CV with PyTorch involves the following steps:

1. Ensuring that your dependencies are up to date.
3. Defining the nn.Module class of your neural network.
6. Defining the K-fold Cross Validator, and generating folds.
7. Iterating over each fold, training and evaluating another model instance.
8. Averaging across all folds to get final performance.

### What you’ll need to run the code

Running this example requires that you have installed the following dependencies:

## Never miss new Machine Learning articles ✅

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to MachineCurve's free Machine Learning update today! You will learn new things and better understand concepts you already know.

We send emails at least every Friday. Welcome!
By signing up, you consent that any information you receive can include services and special offers by email.
• Python, to run everything. Make sure to install 3.8+, although it’ll also run with slightly older versions.
• PyTorch, which is the deep learning library that you are training the models with.
• Scikit-learn, for generating the folds.

Let’s open up a code editor and create a file, e.g. called kfold.py. Obviously, you might also want to run everything inside a Jupyter Notebook. That’s up to you 🙂

### Model imports

The first thing we do is specifying the model imports. We import these Python modules:

• For file input/output, we use os.
• All PyTorch functionality is imported as torch. We also have some sub imports:
• Neural network functionality is imported as nn.
• The DataLoader that we import from torch.utils.data is used for passing data to the neural network.
• The ConcatDataset will be used for concatenating the train and test parts of the MNIST dataset, which we’ll use for training the model. K-fold CV means that you generate the splits yourself, so you don’t want PyTorch to do this for you – as you’d effectively lose data.
• We also import specific functionality related to Computer Vision – using torchvision. First, we import the MNIST dataset from torchvision.datasets. We also import transforms from Torch Vision, which allows us to convert the data into Tensor format later.
• Finally, we import KFold from sklearn.model_selection to allow us to perform K-fold Cross Validation.
import os
import torch
from torch import nn
from torchvision.datasets import MNIST
from torchvision import transforms
from sklearn.model_selection import KFoldCode language: Python (python)

### Model class

Let’s define a simple convolutional neural network, i.e. a SimpleConvNet, that utilizes the nn.Module base class – and thus effectively implements a PyTorch neural network.

We can implement it as follows, by specifying the __init__ constructor definition and the forward pass. In the __init__ definition, we specify the neural network as a Sequential stack of PyTorch layers. You can see that we use one convolutional layer (Conv2d) with ReLU activations and some Linear layers responsible for generating the predictions. Due to the simplicity of the MNIST dataset, this should suffice. We store the stack in self.layers, which we use in the forward pass, as defined in the forward definition. Here, we simply pass the data – available in x – to the layers.

class SimpleConvNet(nn.Module):
'''
Simple Convolutional Neural Network
'''
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Conv2d(1, 10, kernel_size=3),
nn.ReLU(),
nn.Flatten(),
nn.Linear(26 * 26 * 10, 50),
nn.ReLU(),
nn.Linear(50, 20),
nn.ReLU(),
nn.Linear(20, 10)
)

def forward(self, x):
'''Forward pass'''
return self.layers(x)
Code language: Python (python)

### Runtime code

Now that we have defined the model class, it’s time to write some runtime code. By runtime code, we mean that you’ll write code that actually runs when you run the Python file or the Jupyter Notebook. The class you defined before namely specifies a skeleton, and you’ll have to initialize it first in order to have it run. We’ll do that next. More specifically, our runtime code covers the following aspects:

1. The preparatory steps, where we perform some (no surprise) preparation steps for running the model.
3. Defining the K-fold Cross Validator to generate the folds.
4. Then, generating the splits that we can actually use for training the model, which we also do – once for every fold.
5. After training for every fold, we evaluate the performance for that fold.
6. Finally, we perform performance evaluation for the model – across the folds.

Actually, it’s that easy! 🙂

#### Preparatory steps

Below, we define some preparation steps that are executed prior to starting the training process across the set of folds. You can see that we run everything within the __main__ name, meaning that this code only runs when we execute the Python file. In this part, we do the following things:

1. We set the configuration options. We’ll generate 5 folds (by setting $$k = 5$$), we train for 1 epoch (normally, this value is much higher, but here we only want to illustrate K-fold CV to work), and we set nn.CrossEntropyLoss as our loss function.
2. We define a dictionary that will store the results for every fold.
3. We set a fixed random number seed, meaning that all our pseudo random number initializers will be initialized using the same initialization token.
if __name__ == '__main__':

# Configuration options
k_folds = 5
num_epochs = 1
loss_function = nn.CrossEntropyLoss()

# For fold results
results = {}

# Set fixed random number seed
torch.manual_seed(42)Code language: Python (python)

We then load the MNIST dataset. If you’re used to working with the PyTorch datasets, you may be familiar with this code already. However, the third line may still be a bit unclear – but it’s actually really simple to understand what is happening here.

We simply merge together the train=True and train=False parts of the MNIST dataset, which is already split in a simple hold-out split by PyTorch’s torchvision.

And we don’t want that – recall that K-fold Cross Validation generates the train/test splits across $$k$$ folds, where $$k-1$$ parts are used for training your model and 1 part for model evaluation.

To solve this, we simply load both parts, and then concatenate them in a ConcatDataset object. Don’t worry about shuffling the data – you’ll see that this is taken care of next.

  # Prepare MNIST dataset by concatenating Train/Test part; we split later.
dataset = ConcatDataset([dataset_train_part, dataset_test_part])Code language: Python (python)

#### Defining the K-fold Cross Validator

Because next, we do define the shuffle – when we initialize the K-fold Cross Validator. Here, we set shuffle=True, meaning that shuffling occurs before the data is split into batches. k_folds indicates the number of folds, as you would have expected.

  # Define the K-fold Cross Validator
kfold = KFold(n_splits=k_folds, shuffle=True)

# Start print
print('--------------------------------')Code language: Python (python)

#### Generating the splits and training the model for a fold

We can now generate the splits and train our model. You can do so by defining a loop where you iterate over the splits, specifying the fold and the list of identifiers of the training and testing samples for that particular fold. These can be used for performing the actual training process.

Within the for loop, we first perform a print statement, indicating the current fold. You then perform the training process. This involves the following steps:

## Join hundreds of other learners! 😎

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to MachineCurve's free Machine Learning update today! You will learn new things and better understand concepts you already know.

We send emails at least every Friday. Welcome!
By signing up, you consent that any information you receive can include services and special offers by email.
• Sampling the actual elements from the train_ids or test_ids with a SubsetRandomSampler. A sampler can be used within a DataLoader to use particular samples only; in this case based on identifiers, because the SubsetRandomSampler samples elements randomly from a list, without replacements. In other words, you create two subsamplers that adhere to the split as specified within the for loop.
• With the data loaders, you’ll actually sample these samples from the full dataset. You can use any batch size that fits in memory, but a batch size of 10 works well in pretty much all of the cases.
• After preparing the dataset for this particular fold, you initialize the neural network by initializing the class – using SimpleConvNet().
• Then, when the neural network is initialized, you can initialize the optimizer for this particular training session – in this case, we use Adam, with a 1e-4 learning rate.
• In PyTorch, you’ll have to define your own training loop. It’s relatively simple: you iterate over the number of epochs; within an epoch, over the minibatches; per minibatch, you perform the forward pass, the backward pass and subsequent optimization. That’s what is happening here. Click the link if you want to understand this process in more detail.
  # K-fold Cross Validation model evaluation
for fold, (train_ids, test_ids) in enumerate(kfold.split(dataset)):

# Print
print(f'FOLD {fold}')
print('--------------------------------')

# Sample elements randomly from a given list of ids, no replacement.
train_subsampler = torch.utils.data.SubsetRandomSampler(train_ids)
test_subsampler = torch.utils.data.SubsetRandomSampler(test_ids)

# Define data loaders for training and testing data in this fold
dataset,
batch_size=10, sampler=train_subsampler)
dataset,
batch_size=10, sampler=test_subsampler)

# Init the neural network
network = SimpleConvNet()

# Initialize optimizer

# Run the training loop for defined number of epochs
for epoch in range(0, num_epochs):

# Print epoch
print(f'Starting epoch {epoch+1}')

# Set current loss value
current_loss = 0.0

# Iterate over the DataLoader for training data
for i, data in enumerate(trainloader, 0):

# Get inputs
inputs, targets = data

# Perform forward pass
outputs = network(inputs)

# Compute loss
loss = loss_function(outputs, targets)

# Perform backward pass
loss.backward()

# Perform optimization
optimizer.step()

# Print statistics
current_loss += loss.item()
if i % 500 == 499:
print('Loss after mini-batch %5d: %.3f' %
(i + 1, current_loss / 500))
current_loss = 0.0

# Process is complete.
print('Training process has finished. Saving trained model.')Code language: Python (python)

#### Fold evaluation

After training a model within a particular fold, you must evaluate it too. That’s what we’ll do next. First, we save the model – so that it will be usable for generating productions later, should you want to re-use it. We then perform model evaluation activities – iterating over the testloader and generating predictions for all the samples in the test batch/test part of the fold split. We compute accuracy after evaluation, print it on screen, and add it to the results dictionary for that particular fold.


print('Starting testing')

# Saving the model
save_path = f'./model-fold-{fold}.pth'
torch.save(network.state_dict(), save_path)

# Evaluation for this fold
correct, total = 0, 0

# Iterate over the test data and generate predictions
for i, data in enumerate(testloader, 0):

# Get inputs
inputs, targets = data

# Generate outputs
outputs = network(inputs)

# Set total and correct
_, predicted = torch.max(outputs.data, 1)
total += targets.size(0)
correct += (predicted == targets).sum().item()

# Print accuracy
print('Accuracy for fold %d: %d %%' % (fold, 100.0 * correct / total))
print('--------------------------------')
results[fold] = 100.0 * (correct / total)Code language: Python (python)

#### Model evaluation

Finally, once all folds have passed, we have the results for every fold. Now, it’s time to perform full model evaluation – and we can do so more robustly because we have information from across all the folds. Here’s how you can show the results for every fold, and then print the average on screen.

It allows you to do two things

1. See whether your model performs well across all the folds; this is true if the accuracies for every fold don’t deviate too significantly.
2. If they do, you know in which fold, and can take a closer look at the data to see what is happening there.
  # Print fold results
print(f'K-FOLD CROSS VALIDATION RESULTS FOR {k_folds} FOLDS')
print('--------------------------------')
sum = 0.0
for key, value in results.items():
print(f'Fold {key}: {value} %')
sum += value
print(f'Average: {sum/len(results.items())} %')Code language: Python (python)

#### Full code

Instead of reading the explanation above, you might also be interested in simply running the code. If so, here it is 😊

import os
import torch
from torch import nn
from torchvision.datasets import MNIST
from torchvision import transforms
from sklearn.model_selection import KFold

class SimpleConvNet(nn.Module):
'''
Simple Convolutional Neural Network
'''
def __init__(self):
super().__init__()
self.layers = nn.Sequential(
nn.Conv2d(1, 10, kernel_size=3),
nn.ReLU(),
nn.Flatten(),
nn.Linear(26 * 26 * 10, 50),
nn.ReLU(),
nn.Linear(50, 20),
nn.ReLU(),
nn.Linear(20, 10)
)

def forward(self, x):
'''Forward pass'''
return self.layers(x)

if __name__ == '__main__':

# Configuration options
k_folds = 5
num_epochs = 1
loss_function = nn.CrossEntropyLoss()

# For fold results
results = {}

# Set fixed random number seed
torch.manual_seed(42)

# Prepare MNIST dataset by concatenating Train/Test part; we split later.
dataset = ConcatDataset([dataset_train_part, dataset_test_part])

# Define the K-fold Cross Validator
kfold = KFold(n_splits=k_folds, shuffle=True)

# Start print
print('--------------------------------')

# K-fold Cross Validation model evaluation
for fold, (train_ids, test_ids) in enumerate(kfold.split(dataset)):

# Print
print(f'FOLD {fold}')
print('--------------------------------')

# Sample elements randomly from a given list of ids, no replacement.
train_subsampler = torch.utils.data.SubsetRandomSampler(train_ids)
test_subsampler = torch.utils.data.SubsetRandomSampler(test_ids)

# Define data loaders for training and testing data in this fold
dataset,
batch_size=10, sampler=train_subsampler)
dataset,
batch_size=10, sampler=test_subsampler)

# Init the neural network
network = SimpleConvNet()

# Initialize optimizer

# Run the training loop for defined number of epochs
for epoch in range(0, num_epochs):

# Print epoch
print(f'Starting epoch {epoch+1}')

# Set current loss value
current_loss = 0.0

# Iterate over the DataLoader for training data
for i, data in enumerate(trainloader, 0):

# Get inputs
inputs, targets = data

# Perform forward pass
outputs = network(inputs)

# Compute loss
loss = loss_function(outputs, targets)

# Perform backward pass
loss.backward()

# Perform optimization
optimizer.step()

# Print statistics
current_loss += loss.item()
if i % 500 == 499:
print('Loss after mini-batch %5d: %.3f' %
(i + 1, current_loss / 500))
current_loss = 0.0

# Process is complete.
print('Training process has finished. Saving trained model.')

print('Starting testing')

# Saving the model
save_path = f'./model-fold-{fold}.pth'
torch.save(network.state_dict(), save_path)

# Evaluationfor this fold
correct, total = 0, 0

# Iterate over the test data and generate predictions
for i, data in enumerate(testloader, 0):

# Get inputs
inputs, targets = data

# Generate outputs
outputs = network(inputs)

# Set total and correct
_, predicted = torch.max(outputs.data, 1)
total += targets.size(0)
correct += (predicted == targets).sum().item()

# Print accuracy
print('Accuracy for fold %d: %d %%' % (fold, 100.0 * correct / total))
print('--------------------------------')
results[fold] = 100.0 * (correct / total)

# Print fold results
print(f'K-FOLD CROSS VALIDATION RESULTS FOR {k_folds} FOLDS')
print('--------------------------------')
sum = 0.0
for key, value in results.items():
print(f'Fold {key}: {value} %')
sum += value
print(f'Average: {sum/len(results.items())} %')Code language: Python (python)

## After evaluation, what’s next?

Running the code gives you the following result for 5 folds with one epoch per fold.

--------------------------------
FOLD 0
--------------------------------
Starting epoch 1
Loss after mini-batch   500: 1.875
Loss after mini-batch  1000: 0.810
Loss after mini-batch  1500: 0.545
Loss after mini-batch  2000: 0.450
Loss after mini-batch  2500: 0.415
Loss after mini-batch  3000: 0.363
Loss after mini-batch  3500: 0.342
Loss after mini-batch  4000: 0.373
Loss after mini-batch  4500: 0.331
Loss after mini-batch  5000: 0.295
Loss after mini-batch  5500: 0.298
Training process has finished. Saving trained model.
Starting testing
Accuracy for fold 0: 91 %
--------------------------------
FOLD 1
--------------------------------
Starting epoch 1
Loss after mini-batch   500: 1.782
Loss after mini-batch  1000: 0.727
Loss after mini-batch  1500: 0.494
Loss after mini-batch  2000: 0.419
Loss after mini-batch  2500: 0.386
Loss after mini-batch  3000: 0.367
Loss after mini-batch  3500: 0.352
Loss after mini-batch  4000: 0.329
Loss after mini-batch  4500: 0.307
Loss after mini-batch  5000: 0.297
Loss after mini-batch  5500: 0.289
Training process has finished. Saving trained model.
Starting testing
Accuracy for fold 1: 91 %
--------------------------------
FOLD 2
--------------------------------
Starting epoch 1
Loss after mini-batch   500: 1.735
Loss after mini-batch  1000: 0.723
Loss after mini-batch  1500: 0.501
Loss after mini-batch  2000: 0.412
Loss after mini-batch  2500: 0.364
Loss after mini-batch  3000: 0.366
Loss after mini-batch  3500: 0.332
Loss after mini-batch  4000: 0.319
Loss after mini-batch  4500: 0.322
Loss after mini-batch  5000: 0.292
Loss after mini-batch  5500: 0.293
Training process has finished. Saving trained model.
Starting testing
Accuracy for fold 2: 91 %
--------------------------------
FOLD 3
--------------------------------
Starting epoch 1
Loss after mini-batch   500: 1.931
Loss after mini-batch  1000: 1.048
Loss after mini-batch  1500: 0.638
Loss after mini-batch  2000: 0.475
Loss after mini-batch  2500: 0.431
Loss after mini-batch  3000: 0.394
Loss after mini-batch  3500: 0.390
Loss after mini-batch  4000: 0.373
Loss after mini-batch  4500: 0.383
Loss after mini-batch  5000: 0.349
Loss after mini-batch  5500: 0.350
Training process has finished. Saving trained model.
Starting testing
Accuracy for fold 3: 90 %
--------------------------------
FOLD 4
--------------------------------
Starting epoch 1
Loss after mini-batch   500: 2.003
Loss after mini-batch  1000: 0.969
Loss after mini-batch  1500: 0.556
Loss after mini-batch  2000: 0.456
Loss after mini-batch  2500: 0.423
Loss after mini-batch  3000: 0.372
Loss after mini-batch  3500: 0.362
Loss after mini-batch  4000: 0.332
Loss after mini-batch  4500: 0.316
Loss after mini-batch  5000: 0.327
Loss after mini-batch  5500: 0.304
Training process has finished. Saving trained model.
Starting testing
Accuracy for fold 4: 90 %
--------------------------------
K-FOLD CROSS VALIDATION RESULTS FOR 5 FOLDS
--------------------------------
Fold 0: 91.87857142857143 %
Fold 1: 91.75 %
Fold 2: 91.85 %
Fold 3: 90.35714285714286 %
Fold 4: 90.82142857142857 %
Average: 91.33142857142857 %Code language: CSS (css)

Indeed, this is the MNIST dataset, for which we get great results with only limited iterations – but that was something that we expected 🙂

However, what we also see is that performance is relatively equal across the folds – so we don’t see any weird outliers that skew our model evaluation efforts.

This ensures that the distribution of the data was relatively equal across splits and that it will likely work on real-world data if it has a relatively similar distribution.

Generally, what I would now do often is to retrain the model with the full dataset, without evaluation on a hold-out split (or with a really small one – e.g. 5%). We have already seen that it generalizes and that it does so across folds. We can now use all the data at hand to boost performance perhaps slightly further.

I’d love to know what you think about this too, as this is a strategy that confused some people in my K-fold Cross Validation for TensorFlow tutorial.

## Recap

In this tutorial, we looked at applying K-fold Cross Validation with the PyTorch framework for deep learning. We saw that K-fold Cross Validation generates $$k$$ different situations called folds using your dataset, where the data is split in $$k-1$$ training batches and 1 test batch per fold. K-fold Cross Validation can be used for evaluating your PyTorch model more thoroughly, giving you more confidence in the fact that performance hasn’t been skewed by a weird outlier in your dataset.

Besides theoretical stuff, we also provided a PyTorch example that shows how you can apply K-fold Cross Validation with the framework combined with Scikit-learn’s KFold functionality. I hope that you have learned something from it. If you did, please feel free to leave a message in the comments section below 💬 I’d love to hear from you.

Thank you for reading MachineCurve today and happy engineering! 😎

## References

Chollet, F. (2017). Deep Learning with Python. New York, NY: Manning Publications.

PyTorch Lightning. (2021, January 12). https://www.pytorchlightning.ai/

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to MachineCurve's free Machine Learning update today! You will learn new things and better understand concepts you already know.

We send emails at least every Friday. Welcome!
By signing up, you consent that any information you receive can include services and special offers by email.

PyTorch. (n.d.). https://pytorch.org

## Do you want to start learning ML from a developer perspective? 👩‍💻

Blogs at MachineCurve teach Machine Learning for Developers. Sign up to learn new things and better understand concepts you already know. We send emails every Friday.
By signing up, you consent that any information you receive can include services and special offers by email.

## 6 thoughts on “How to use K-fold Cross Validation with PyTorch?”

1. Davide

Which validation split do I need when performing K-fold CrossValidation here?

1. Chris

Hi Davide,

Let me take away some confusion here: while the technique is called K-fold Cross Validation, it essentially splits the data into K train/test splits. Especially with neural networks, which also use validation data, this can be confusing. To be clear:

* Each train/test split produced by K-fold CV must be used for either training your model, or evaluating it AFTER it was trained.
* A portion of the train data can be used for validation purposes in a neural network sense.
* Like the 80/20 train/test splits (i.e. K=5) that are common, it can also be a good idea to split your train data into true train/validation data in a 80/20 or 90/10 fashion.

Hope this helps!

Best,
Chris

2. Agnes V

HI Chris, thank you for the indepth article!

I have a question about the ConcatDataset. You write that you generate a split later. What does this mean, and why do you merge the testing data with the training data before performing K-fold CV?

Thank you and greetings
Agnes

1. Chris

Hi Agnes,

PyTorch provides the MNIST dataset already in a X/Y split between training and testing data, i.e. in a simple hold-out split fashion. That’s what we wanted to avoid in the first place, and then deciding to only use the training data in generating the folds effectively means you’re throwing away some of your data. That’s why we first used the ConcatDataset to join the split into a larger whole again, and then used K-fold CV for generating the K splits.

Hope this helps 🙂

Best,
Chris

3. Philipp

Thumbs up to this really nice post!! But, shouldn’t the first sentence read like this? “Machine learning models must be evaluated with a __ test __ set after they have been trained.”

1. Chris

Thank you, Philipp, for the kind words – and that’s indeed the case! My mistake.
Fixed 🙂

Best,
Chris