Member-only story

Periodically Save Trained Neural Network Models in PyTorch

Sybernix

5 min readNov 25, 2021

Periodically saving models. Source: https://towardsdatascience.com/ml-design-pattern-2-checkpoints-e6ca25a4c5fe

In my previous blogs, we saw how we can train models in PyTorch and how to plot the losses and error rates [1] [2].

Training a Neural Network in PyTorch for a Computer Vision Task — Person Re-Identification

Neural networks are powerful constructs that mimic the functionality of the human brain to solve various problems that…

niruhan.medium.com

We need to save our trained model so that we can use it for predictions whenever needed. We can choose to save only the model we get after our final epoch. However, due to overfitting, our best model could have been something that was produced before the final epoch. So it is better to store your model periodically so that later we can take a look at the loss and error graphs and pick a model that performs the best for our predictions.

Let’s first define a function to save the models.

def save_network(network, epoch_label):
    save_filename = 'net_%s.pth' % epoch_label
    save_path = os.path.join('./savedModels', save_filename)
    torch.save(network.state_dict(), save_path)

This function will take our model and the epoch number as inputs and save the state dictionary of the model. Instead of saving the state dictionary, we can save the entire model as torch.save(model, PATH) but this will introduce some unexpected errors when we try to use the model on a different machine than the one we trained [3].

Note that torch.save(network.state_dict(), save_path) will save the model weights for the current device. If you are training on GPU and you want to save the model for use in CPU you can write torch.save(network.cpu().state_dict(), save_path) . But then we need to send the model back to the current device that you are using to train. The save_nerwork function will be as follows in this case.

def save_network(network, epoch_label):
    save_filename = 'net_%s.pth' % epoch_label
    save_path = os.path.join('./savedModels', save_filename)
    torch.save(network.cpu().state_dict(), save_path)
    if torch.cuda.is_available():
        network.cuda()

Now, at the end of the validation stage of each epoch, we can call this function to persist the model. However, this might consume a lot of disk space…

Periodically Save Trained Neural Network Models in PyTorch

Training a Neural Network in PyTorch for a Computer Vision Task — Person Re-Identification

Neural networks are powerful constructs that mimic the functionality of the human brain to solve various problems that…

Written by Sybernix

No responses yet