Member-only story

In my previous blogs, we saw how we can train models in PyTorch and how to plot the losses and error rates [1] [2].
We need to save our trained model so that we can use it for predictions whenever needed. We can choose to save only the model we get after our final epoch. However, due to overfitting, our best model could have been something that was produced before the final epoch. So it is better to store your model periodically so that later we can take a look at the loss and error graphs and pick a model that performs the best for our predictions.
Let’s first define a function to save the models.
def save_network(network, epoch_label):
save_filename = 'net_%s.pth' % epoch_label
save_path = os.path.join('./savedModels', save_filename)
torch.save(network.state_dict(), save_path)
This function will take our model and the epoch number as inputs and save the state dictionary of the model. Instead of saving the state dictionary, we can save the entire model as torch.save(model, PATH)
but this will introduce some unexpected errors when we try to use the model on a different machine than the one we trained [3].
Note that torch.save(network.state_dict(), save_path)
will save the model weights for the current device. If you are training on GPU and you want to save the model for use in CPU you can write torch.save(network.cpu().state_dict(), save_path)
. But then we need to send the model back to the current device that you are using to train. The save_nerwork
function will be as follows in this case.
def save_network(network, epoch_label):
save_filename = 'net_%s.pth' % epoch_label
save_path = os.path.join('./savedModels', save_filename)
torch.save(network.cpu().state_dict(), save_path)
if torch.cuda.is_available():
network.cuda()
Now, at the end of the validation stage of each epoch, we can call this function to persist the model. However, this might consume a lot of disk space…