Training a Neural Network in PyTorch for a Computer Vision Task — Person Re-Identification

Sybernix
7 min readNov 21, 2021

Neural networks are powerful constructs that mimic the functionality of the human brain to solve various problems that are difficult to be solved with deterministic algorithms. PyTorch is one of the best frameworks which can help us easily write and train neural networks in Python. Though neural networks are used to solve a variety of problems, we will focus on a computer vision problem called “person re-identification”. It is somewhat similar to facial recognition, but we can use full-body images of people to identify them, You can read more about this in my blog linked below.

In this blog, I have simplified the person re-id implementation at [1] so that the code is easy to understand even for beginners who are getting started with computer vision and deep neural networks.

In this blog, we will use a dataset named “Market 1501” to train our network. It can be downloaded at [2]. PyTorch expects the input data to be available in a specific directory structure. So we need to preprocess the data so that it is PyTorch friendly. I have explained how to do the pre-processing in my previous blog linked below.

Designing the Neural Network Model

There are many standard deep neural network models such as ResNet50, Inception V3, and DenseNet that are so powerful that they excel in several computer vision tasks. In this blog, we will use a modified version of ResNet50. In this section, let’s see how to create the necessary model.

First, let’s import the required libraries.

import torch
import torch.nn as nn
from torchvision import models
from torchinfo import summary

We can import the resnet50 as follows. Note that we are setting preTrained=true so that we will have the network pre-trained on ImageNet dataset which has 1000 classes. You can checkout the actual resnet PyTorch implementation at [3].

model_ft = models.resnet50(pretrained=True)

The architecture of ResNet50 is as follows.

Image source: https://www.researchgate.net/publication/349717475_Performance_Evaluation_of_Deep_CNN-Based_Crack_Detection_and_Localization_Techniques_for_Concrete_Structures/figures?lo=1

There is a 7x7 convolution at the beginning followed by a max-pooling layer. Then there are 4 layers consisting of various operations. Average pooling is the penultimate step. As the final layer, we have a fully connected layer that has 1000 classes.

We need to do a few modifications to the imported model so that it can be applied to our Market 1501 dataset. First, we need to replace the final fully connected layer that supports 1000 classes with our own classifier. Market 1501 has 751 classes, so we need a classifier that supports 751 classes.

We can create a fully connected layer in PyTorch using Linear which is part of the torch.nn library as follows.

classifier = nn.Sequential(*[nn.Linear(2048, class_num)])

According to PyTorch documentation [5] nn.Linear applies a linear transformation to the incoming data: y = xA^T + b.

Now, we need to merge the ResNet50 model and out fully connected layer. We also need to define how the forward propagation will pass through our network. This can be accomplished using the following code.

import torch
import torch.nn as nn
from torchvision import models
from torchinfo import summary


# Define the ResNet50-based Model
class ft_net(nn.Module):
def __init__(self, class_num=751):
super(ft_net, self).__init__()
# load the model
model_ft = models.resnet50(pretrained=True)
self.model = model_ft
classifier = nn.Sequential(*[nn.Linear(2048, class_num)])
self.classifier = classifier

def forward(self, x):
x = self.model.conv1(x)
x = self.model.bn1(x)
x = self.model.relu(x)
x = self.model.maxpool(x)
x = self.model.layer1(x)
x = self.model.layer2(x)
x = self.model.layer3(x)
x = self.model.layer4(x)
x = self.model.avgpool(x)
x = torch.squeeze(x)
x = self.classifier(x) # use our classifier.
return x

As you can see above, we have defined a function named forward and provided the forward propagation path. Initially, our input x passes through all the layers in the ResNet50 model, and at the end, we have appended our own classifier which supports 751 classes in Market1501 dataset.

Let’s print the summary of our model using the torchinfo library and examine the structure of our model and the number of parameters.

print(summary(ft_net(), input_size=(2, 3, 256, 128)))

Here, the input_size contains 4 parameters and follows,

  1. 2 — batch size
  2. 3 — number of channels
  3. 256 — heigh of the input
  4. 128 — width of the input

We will get an output as follows when running the print command.

Modified ResNet50 model summary

Now, our neural network model is ready and we will see how we can train this model using our preprocessed dataset.

Training the Neural Network

First, we will import the necessary libraries.

import os
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
from simple_model import ft_net

Note that we have saved the model that we created in the previous section in a file named simple_model.py . Next up, we will define some constants that are useful further down our code.

h, w = 256, 128
data_dir = '/home/niruhan/Personal/paper/Market-1501-v15.09.15/pytorch'
batchsize = 2
num_epochs = 1
use_gpu = torch.cuda.is_available()

h & w are the height and width of our input images. We are keeping the batchsize as 2 but it can be increased if you have enough memory and GPU. num_epochs is set to 1 as we just want to test our train code. It can be increased when doing the actual training.

use_gpu will be true if you have the GPU version of the PyTorch installed. It will be false if your machine doesn’t have a compatible GPU and you have installed CPU-only version of PyTorch. Do not worry, our code can be tested on any version of the PyTorch. However, the training will be slow on CPU-only version and you may need to rent some GPU instances to complete the training.

You need to change the data_dir to point to the location that contains the pre-processed Market 1501 dataset. You can follow my blog at [6] to pre-process the data.

As the third step, we need to load the data using PyTorch data loader. While loading, we can also apply some transformations to our data so that it is easy to process or increases the dataset size. You can read about the different types of transformations and how they impact the input image at [7].

Note that we need to load training dataset as well as validation dataset. Training set is used to train the network and the validation set is used to test the accuracy of our model after each epoch of training.

transform_train_list = [
transforms.Resize((h, w), interpolation=3),
transforms.Pad(10),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]

transform_val_list = [
transforms.Resize(size=(h, w), interpolation=3),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]

data_transforms = {
'train': transforms.Compose(transform_train_list),
'val': transforms.Compose(transform_val_list),
}

image_datasets = {}
image_datasets['train'] = datasets.ImageFolder(os.path.join(data_dir, 'train'),
data_transforms['train'])
image_datasets['val'] = datasets.ImageFolder(os.path.join(data_dir, 'val'),data_transforms['val'])

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batchsize, shuffle=True, num_workers=8)
for x in ['train', 'val']}

After loading the data, we need to initialize our neural network model as follows.

class_names = image_datasets['train'].classes
model = ft_net(len(class_names))

We need a loss criterion and an optimizer for training. We will use cross-entropy loss as our criterion.

criterion = nn.CrossEntropyLoss()

Stochastic gradient descent can be used as optimizer.

optim_name = optim.SGD

We can adjust the learning rate in our optimizer. Since we have imported a pre-trained model, we can reduce the learning rate of all the layers except the last fully connected layer that we implemented. This can be achieved as follows.

lr = 0.05

ignored_params = list(map(id, model.classifier.parameters() ))
base_params = filter(lambda p: id(p) not in ignored_params, model.parameters())
classifier_params = model.classifier.parameters()
optimizer = optim_name([
{'params': base_params, 'lr': 0.1 * lr},
{'params': classifier_params, 'lr': lr}
], weight_decay=5e-4, momentum=0.9, nesterov=True)

Now we are ready to train our model. We can iterate through the data in data loader as follows.

for data in dataloaders['train']:
# get a batch of inputs
inputs, labels = data

We need to convert the inputs and labels to autograd variables in PyTorch.

inputs, labels = Variable(inputs), Variable(labels)

Now, we can pass the inputs through our model as follows.

outputs = model(inputs)

Loss can be calculated as follows.

loss = criterion(outputs, labels)

Now, we need to backpropagate our loss to train the model.

loss.backward()

These are the basic steps involved in training. However, we need to add a loop for epochs and conditionally switch between training and validation phase. So the entire loop will be as follows.

for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'val']:
if phase == 'train':
model.train(True) # Set model to training mode
else:
model.train(False) # Set model to evaluate mode

# Iterate over data.
for data in dataloaders[phase]:
# get a batch of inputs
inputs, labels = data
now_batch_size, c, h, w = inputs.shape
if now_batch_size < batchsize: # skip the last batch
continue
# print(inputs.shape)
# wrap them in Variable, if gpu is used, we transform the data to cuda.
if use_gpu:
inputs = Variable(inputs.cuda())
labels = Variable(labels.cuda())
else:
inputs, labels = Variable(inputs), Variable(labels)

# zero the parameter gradients
optimizer.zero_grad()

# -------- forward --------
outputs = model(inputs)
_, preds = torch.max(outputs.data, 1)
loss = criterion(outputs, labels)

# -------- backward + optimize --------
# only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()

Voila!! You have all the code you need to train your model. Be warned that training takes dozens of hours or even days to produce a usable model. You can find the simple_model.py and simple_train.py in my GitHub repo at [8].

However, we need mechanisms to save our trained model so that it can be used for inference. We also need to plot the loss and accuracy after each epoch to find the best model and prevent overfitting. We will see how we can add these functionalities in my upcoming blogs.

References

[1] https://github.com/layumi/Person_reID_baseline_pytorch
[2] http://zheng-lab.cecs.anu.edu.au/Project/project_reid.html
[3] https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py
[4] https://pytorch.org/hub/pytorch_vision_resnet/
[5] https://pytorch.org/docs/stable/generated/torch.nn.Linear.html
[6] https://niruhan.medium.com/pre-processing-market1501-person-reid-dataset-for-pytorch-fbb4912f4cc5
[7] https://towardsdatascience.com/improves-cnn-performance-by-applying-data-transformation-bf86b3f4cef4
[8] https://github.com/niruhan/reid-implementation

--

--