How to Add Data Transformations in PyTorch?

Overview

In this article, we will learn about data transformation in PyTorch. We look at the different categories of data transformations available in PyTorch while going through the various options available for these types.

Introduction

To build any machine learning or deep learning-based system, the most important component of the pipeline that determines the quality of our models is the data used to train the models.

Different models require input data for training in different formats. Moreover, data quality and quantity are also considered crucial parts of the modeling workflow.

To this end, the transformation of data forms an integral part of the model-building process. Let us first look at what exactly is meant by the transformation of data.

What are Data Transformations?

Transforming the data simply implies subjecting the data to such techniques that change the data from its original form. The transformations could deal with changes in any aspect like changing the brightness, size, or shape of images, changing the length of the sequence in text data, replacing some words in text sentences while retaining the syntactic and semantic information, converting the features values of columns in a tabular dataset, so on and so forth.

The purpose of transforming data values could be manifold - we might want to transform the data to convert it into a form that is ready to be fed into the machine learning model, or we might want to transform data so that it forms better features for model training purposes.

Another useful purpose served by data transformation is called data augmentation wherein we modify the original data samples to create new ones while also retaining the original ones. This is done to augment the original data with synthetic data created out of it.

Need for Data Augmentation

Data Augmentation is an extremely handy tool while building machine learning systems with not enough data available at hand. The synthetic data created by transforming the original one increases the number of samples available to use for training. Although different from the original data, synthetic data does not deviate from the distribution of the original one and thus does not add unnecessary noise to the training process.

And while being different it is also ensured that we are not overfitting our models.

Hence, with augmented data, we are able to train and build better models in terms of generalizability and accuracy of predictions.

How to Perform Data Transformation?

The torchvision.transforms module is the primary API dealing with data transformations in PyTorch. As the name suggests, it provides support for the most commonly used image transformations in the computer vision domain.

Most transformation classes in this module (for example - torchvision.transforms.PILToTensor) have a function equivalent (the equivalent would be torchvision.transforms.functional.pil_to_tensor).

Functional transforms are useful in case we need to build a more complex transformation pipeline such as in the case of image segmentation tasks by virtue of their ability to provide fine-grained control over the transformations. We will look more into functional transforms shortly.

We will first look at the various transform classes in the torchvision.transforms module. Some points to note before that -

Most of these transformations accept both PIL images and images in the form of pytorch tensors, although some of the transformations are compatible with PIL images while some are compatible with tensor images only. PyTorch offers Conversion Transforms that could be used to convert tensor images to PIL images and vice versa. We discuss about them shortly.
The transformations that accept single images as PyTorch tensors also accept a batch of tensor images. A Tensor Image is a tensor with its shape as (C, H, W), where C represents the number of colour channels, and H and W represent the image height and width respectively. A batch of Tensor Images is a tensor with dimensionality (B, C, H, W), where B corresponds to the number of images in a batch of data.
The range of the values expected for a tensor image is implicit and is defined by the tensor's data type. Tensor images of the float data type are expected to have values in the range [0, 1) whereas Tensor images with an integer data type are expected to have values in the range [0, MAX_DTYPE] where MAX_DTYPE denotes the largest value that can be represented in that data type.

Transformations Supporting PIL Image and torch.*Tensor

As discussed most classes in the torchvision.transforms module can work with both PIL Images and PyTorch tensors. We discuss some examples of such transformations first.

(Full List available here.)

1. transforms.Pad Pad is used for padding the given tensor or PIL image on all sides with a given “pad” value. the syntax is as below :

The arguments -

padding - If a single integer is provided for this parameter, it is used to pad all borders while If the sequence of length 2 is provided, it is used as the padding on left/right and top/bottom respectively and if a sequence of length 4 is provided, this is used as the padding for the borders left, top, right and bottom respectively.
fill (can be either a number or tuple) – It denotes the Pixel fill value for "constant" fill. If a tuple of length 3, it is used to fill Red, Green, and Blue color channels respectively. For PyTorch Tensors, only a number is supported and only an integer or a tuple value is supported for PIL Image.

2. transforms.RandomHorizontalFlip This transform Horizontally flips the given image randomly with a given probability. The syntax looks like -

Here the parameter p accepts a float-type value and denotes the probability of the image being flipped.

3. transforms.CenterCrop This transformation Crops the given image at the centre.

where size denotes the Desired output size of the crop.

If the size of the input image is smaller than the desired size of the output along any edge, the input image is first padded with 0s and then center cropped.

4. transforms.Resize Resize is used to resize the input image to the desired size. The syntax is as follows -

Resizing comes in handy when the original data is present in a different dimensionality than what is expected by the model.

Also, by downsampling the size the resolution is lowered which implies a lesser number of features, and eventually, the network will be able to train faster.

5. transforms.RandomRotation

This is used to rotate the input image by certain degrees. If the value provided for degrees is an integer rather than a tuple as (min, max), then the integer is interpreted into a range as (-degrees, +degrees).

6. transforms.ColorJitter ColorJitter is used to randomly change the brightness, contrast, saturation, and hue of the input image. The syntax follows as -

Transforms on torch.*Tensor Only

Certain transforms work only on PyTorch tensors, and do not support PIL Images. A full list of them is available here.

1. transforms.Normalize Normalize is used to normalize the input tensor image with the provided mean and standard deviation.

Given the mean as: $(mean[1],...,mean[n])$ and standard deviation (std argument) as : $(std[1],..,std[n])$ for $n$ color channels, this transform will normalize each channel of the input Tensor like so:

$output[channel] = (input[channel] - mean[channel]) / std[channel]$

Note that this transform acts out of place and it does not mutate the input tensor in-place.

Transforms on PIL Image only

We will now look at examples of transformations that expect a PIL Image as input.

1. RANDOMCHOICE

RandomChoice applies a single transformation randomly picked from a list of transforms.

2. RANDOMORDER

RandomOrder Applies all transformations from a list of transforms objects in a random order.

Composing Transformations

1. transforms.Compose Rather than subjecting the input image to successive transformations manually, we could chain the transformations together and subject the image through a single object that'd deal with applying the composed transformations to the image in succession. Compose is used to chain several transforms objects together.

Here the parameter transforms represents a python list of several transforms objects chained in succession.

Note that the Compose transform does not support torchscript, and hence we cannot script a Compose object using torch.jit.script, like so:

Scriptable Transformations

As pointed out earlier, Compose is not compatible with torchscript. So, in order to script the transformations, we could use nn.Sequential instead of Compose like so:

Importantly, while scripting transformations with torch.jit.script this way, we need to make sure that the transforms object passed to it contains only scriptable transforms that are the ones that work with pytorch tensors.

Note that some of the transformations defined above use a random number generator for their parameters which means that these randomized transformations will apply the same transformation to all the images in a given batch, but they will produce different transformations (due to the randomness) across different calls i.e. batches.

Functional Transformations

As opposed to randomized transformations, Functional Transformations do not use a random number generator for their parameters hence providing us with more control over the data transformation pipeline. This also means that we manually need to specify or generate all the parameters, but the functional transforms shall give us reproducible results across different calls.

The functional form of transformations are supported by the torchvision.transforms.functional API and a full list can be viewed here.

An example of using Functional Transformations to produce reproducible results is as follows -

Here, as can be seen, functional transforms with the same parameters are applied to multiple images.

Example - Applying Linear Transformations on Input Data

A linear transformation could me mathematically represented as the following - $y = wx + b$

This is also how input vectors in feed-forward neural networks are mathematically acted on to calculate inputs for further hidden layers.

PyTorch provides an easy way to apply linear transformation to any given vector by means of the torch.nn.Linear API.

We see a working example of this below:

Conclusion

In this article, we learned about data transformations in PyTorch. In particular,

We understood the benefits and uses of applying transformations to the original data.
Then, we learned how PyTorch's API could be used to transform the data.
We looked at the major types of data transformations available in PyTorch while looking at some specific examples therein.
We also learned how we could create scriptable transformations in PyTorch.