Siamese Networks in NLP

Learn via video courses
Topics Covered

Overview

The Siamese networks in NLP take only a few input images to train the neural network. It can predict the output and classifies the images based on the previously provided input images. Thus, we can classify the data set without even training the network again and again. A neural network architecture having two or more identical sub-networks is known as Siamese networks in NLP. The Siamese neural networks belong to the class of neural network architectures, and the word identical used for the Siamese networks means that the involved sub-networks have the same configuration with the same weights and parameters. So, if we update a parameter in a sub-network, the mirrored sub-network will also get updated.

Pre-requisites

Before learning about the siamese networks in NLP, let us first learn some basics about NLP itself.

  • NLP stands for Natural Language Processing. In NLP, we perform the analysis and synthesis of the input, and the trained NLP model then predicts the necessary output.
  • NLP is the backbone of technologies like `Artificial Intelligence and Deep Learning.
  • In basic terms, we can say that NLP is nothing but the computer program's ability to process and understand the provided human language.
  • The NLP process starts by first converting our input text into a series of tokens (called the Doc object) and then performing several operations of the Doc object.
  • A typical NLP processing process consists of stages like tokenizer, tagger, lemmatizer, parser, and entity recognizer. In every stage, the input is the Doc object, and the output is the processed Doc.
  • Every stage makes some kind of respective change to the Doc object and feeds it to the subsequent stage of the process.

Introduction

NLP is based on learning the data set and then predicting the output based on the inputs. It is used as a base in many technologies like AI, ML, etc. So, to train an NLP model, we use various techniques. In many areas, we need to recognize the face, signature, posture, position, etc. Conventional training is good, but it gives us probability and takes a lot of time and huge data set to train a model. Then there come the Siamese networks in NLP, which take only a few input images to train the neural network. It can predict the output and classifies the images based on the previously provided input images.

The Siamese neural network solves our major problem of involvement of a large dataset. This feature has made the Siamese neural network so popular these days. Let us learn about the Siamese network in NLP in detail.

Siamese Neural Networks

A neural network architecture having two or more identical sub-networks is known as siamese networks in NLP. The Siamese neural networks belong to the class of neural network architectures, and the word identical used for the Siamese networks means that the involved sub-networks have the same configuration with the same weights and parameters. So, if we update a parameter in a sub-network the mirrored sub-network will also get updated.

As we can see that the updation in one sub-network results in the updation of the data into the mirrored sub-network, so we can get an idea that this technique can be used to detect the similarity between the inputs. Now, the comparison can be made based on the feature vectors of both the sub-networks. This feature of comparison makes the siamese networks in NLP very useful in a lot of real-world applications. Please refer to the next sections for the use cases of the Siamese networks in NLP.

Why are the Siamese networks being used extensively these days? Well, in the old days, we used neural networks to read, learn, and then predict multiple classes. This process not only required a lot of data sets, but also there was a major problem of adding or removing new classes to the original data. So, to update the neural networks, we had to update the entire data set, and this was very computationally expensive and time-consuming.

Then there come the siamese networks in NLP (SNN), which learns and uses a similarity function to complete our work faster and in a more efficient manner. The Siamese network uses two images and checks if they are the same or not, and in this way, it classifies the provided data. Thus, we can classify the data set without even training the network again and again.

Let us now look at the functions used in the Siamese networks in NLP in the next section.

Loss Functions in Siamese Networks

We mainly use two types of loss functions with the siamese networks in NLP they are triplet loss and contrastive loss. We cannot use the cross entropy loss with the Siamese networks because the Siamese networks learn in a pairwise manner. Let us now learn about the loss functions involved in the siamese networks.

Triplet Loss

In the triplet loss function, we make a comparison of the anchor or baseline input to the negative or falsy input and the positive or truthy input. We then minimize the distance between the baseline input and the positive input, and similarly, we'll maximize the distance between the negative input and baseline input.

The overall formula for the same is:

L(A,P,N)=max(f(A)f(P)2f(A)f(N)2+α,0)\mathcal{L}(A, P, N)=\max \left(\|\mathrm{f}(A)-\mathrm{f}(P)\|^2-\|\mathrm{f}(A)-\mathrm{f}(N)\|^2+\alpha, 0\right)

Let us learn the various symbols associated with the above formula. The term alpha is a margin term that is used to stretch or increase the distance between the dissimilar and similar pairs present in the triplet (fa, fp, and fn). Here fa means feature embeddings of the anchor input, fp means feature embeddings of the positive input, and fn means feature embeddings of the negative input.

Now, during the training of the siamese networks in NLP, the triplet image (involving the anchor image, negative image, and positive image) is fed to the learning model as a single sample input. The Siamese network then learns and gets the distances between the inputs and thus gets trained.

The formal definition of the siamese networks in NLP in terms of the triplet loss function is- A siamese network is a network of similar neural networks in which we provide the input vectors to extract the features and the features are later passed on to the triplet function in a few shot learning process.

Contrastive Loss

Contrastive loss is a more popular loss function that is more widely used these days. It is different from the conventional error-predicting loss as it is more of a distance-based loss. The contrastive loss is based on the Euclidean distance theorem. The contrastive loss helps us to learn the embeddings that involve two similar points. It calculates the Euclidean distance (low Euclidean distance) between two similar points and two dissimilar points (huge Euclidean distance).

The overall formula for the same is:

(1Y)12(DW)2+(Y)12{max(0,mDW)}2(1-Y) \frac{1}{2}\left(D_W\right)^2+(Y) \frac{1}{2}\left\{\max \left(0, m-D_W\right)\right\}^2

The formula for calculating the Euclidean distance is:

{GW(X1)GW(X2)}2\sqrt{\left\{G_W\left(X_1\right)-G_W\left(X_2\right)\right\}^2}

Pros and Cons of Siamese Networks

So far we have learned a lot about the siamese networks in NLP, let u now learn some of the advantages and disadvantages of using the siamese networks before getting into the code example.

Advantages or Pros of Siamese Networks in NLP

  • We only need to provide a few images per class to train the Siamese networks in NLP. Only a few data sets of images are sufficient to recognize similar images in the future.
  • The Siamese networks believe in one-shot learning hence it is more robust to the imbalance classes.
  • If we add a classifier with the siamese networks and average them then it will work better than averaging two correlates supervised models (for example GBM and RF classifier).
  • The learning is better and different from the learning of a classifier in NLP, hence more used than a simple classifier.
  • The Siamese networks in NLP learn in a way that first the deep layer learns the input and thus the upper layer then learns progressively. This learning way helps the Siamese networks to place the same type of classes or concepts close together and thus helps in learning the semantic similarities.

Disadvantages or Cons of Siamese Networks in NLP

There are not many cons associated with the siamese networks in NLP. We have only a few disadvantages associated with the siamese networks in NLP, let us discuss them.

  • The Siamese networks take comparatively more time to learn and get trained than a normal neural network.
  • The siamese networks learn from the quadratic pairs to get all the available information hence it is more time-consuming than a normal classification type of learning.
  • The siamese network does not output the probability, it only provides us with the distance between the provided input by telling us the distance between the classes.

Code example: Classification model using Siamese Networks

Importing the Necessary Modules.

Defining the Global Variables

Defining the Functions to Read the Labels of the Text from the File and Read the Images From the File

Defining the Class to Call the Urls and Fetch Data Using the Various Functions.

Defining another class that will calculate the resolution.

Defining the Function to Train the Model.

Defining the Function to Test the Created Model.

Defining the Main Function.

In the above code, we have built a data set class that provides us with random sample images. We have trained the model with positive and negative samples. In each iteration, we have provided both the positive and negative so that the model can get trained adequately.

Applications of Siamese Networks

So far we have learned a lot about the Siamese networks in NLP and even saw a code example. Let us now see the areas in which this technique is used.

  • The siamese network is used in Pedestrian tracking for video surveillance. For this, we combine the SNN with the position and size features of the image patches and then track numerous persons present in the field of view of the camera. We detect the positions of the people and thus learn the association between the computing trajectories and multiple frames.
  • The SNN is used for one-shot learning of a training model. We only need to provide a few images per class to train the Siamese networks in NLP. Only a few data sets of images are sufficient to recognize similar images in the future.
  • Cosegmentation is another use case of the siamese networks in NLP.
  • We even use the Siamese networks to match the resumes to the job. The networks try to match the candidate's resume with the job role to find whether the job offering is similar to the candidate's skills or not.
  • We also use the Siamese networks to classify the MNIST images.

Conclusion

  • The siamese networks in NLP take only a few input images to train the neural network. It can predict the output and classifies the images based on the previously provided input images.
  • A Siamese network is a network of similar neural networks in which we provide the input vectors to extract the features, and the features are later passed on to the triplet function in a few-shot learning process.
  • The Siamese neural networks belong to the class of neural network architectures, and the word identical used for the Siamese networks means that the involved sub-networks have the same configuration with the same weights and parameters
  • In the triplet loss function, we make a comparison of the anchor or baseline input to the negative or false input and the positive or truthy input.
  • Contrastive loss is a more popular loss function that is more widely used these days. It is different from the conventional error-predicting loss as it is more of a distance-based loss.
  • The siamese networks in NLP learn in a way that first, the deep layer learns the input, and thus the upper layer then learns progressively. This learning way helps the Siamese networks to place the same type of classes or concepts close together and thus helps in learning the semantic similarities.
  • The Siamese network is used in Pedestrian tracking for video surveillance. Cosegmentation is another use case of the Siamese networks in NLP. We even use the Siamese networks to match the resumes to the job. We also use the Siamese networks to classify the MNIST images.