What is Deep Learning?
You have always wondered how google translates an entire web page to a different language in a matter of seconds or how shopping applications like Amazon and Flipkart show suggestions for future purchases, all of this is a product of deep learning but the question is What is Deep Learning?.
Deep Learning is a subset of machine learning that uses artificial neural networks to imitate the working of the human brain. In deep learning, an algorithm is implemented to make computers learn to perform classification/regression tasks on complex data which can be either structured (tabular datasets) or unstructured (i.e images, text, or sound). These algorithms are able to achieve state-of-art accuracy that can even surpass human-level performance sometimes.
How Does Deep Learning Work?
Neuron derives its name and meaning from the neuron in the brain. It can be seen as an on/off switch which either passes the input data to the next layer or blocks the information. In terms of Deep Learning, it is referred to as an artificial neuron or perceptron.
In the image below, these circles represent neurons that are interconnected. These are classified into different hierarchies of layers termed Input, Hidden, and Output Layers.
Flow of Working of Different Layers
- The first layer, the input layer, receives the input data and passes it to the first hidden layer.
- The hidden layers now perform the calculations on the received data. The biggest challenge here in neural networks creation is to decide the number of neurons and optimal number of hidden layers.
- Finally, the output layer takes in the inputs which are passed in from the layers before it, performs the calculations via its neurons to compute the output.
Deep learning requires a large amount of data for best results, while processing the data, neural networks can classify data with labels received from the dataset involving highly complex mathematical calculations. For example, in Facial Recognition, the model works by learning to detect and recognize edges and lines of the face, then to more significant features, and finally, to overall representations of the face.
While representing a neural network, every node is provided with information in the form of inputs. The node then multiplies the inputs with randomly initialized weights and adds a bias value to the result. At last, nonlinear activation functions are applied to determine which neuron to fire.
Examples of Deep Learning
Computer Vision for Driverless Cars Neural networks are used to understand the working of traffic signals, road scenarios, speed limits, and for training our model a large amount of data is required, which would increase the efficiency and thus lead to increased decision making. Driverless cars use these neural networks as they allow cars to collect data on its surroundings from cameras and other sensors, interpret it, and decide what actions to take.
Virtual Assistants and Chatbots Deep learning helps to translate the speech and human language which is a common functionality of virtual assistants like Cortona, Google Assistant, Alexa.
Recommendation Engines All of the shopping and OTT applications like Amazon Prime, Netflix store user’s data and buying habits which are then used to train the recommendation models to show a user the suggestions for future buying and watching.
Medical and Pharmaceutical Industry Deep learning has been widely used in customizing medicines based on a particular genome and disease and has led to increased attention by the largest pharmaceutical companies.
Other than the above-mentioned, other use cases of deep learning are fraud detection, facial recognition, translations, and this list is endless.
But Why Is Deep Learning Becoming So Popular Now?
The answer to the above question can be condensed into 4 major reasons –
Increase in the Volume of Data Data is Everywhere now and as we all know deep learning algorithms use a huge amount of data for training models and learning complex structure hidden within the data. Larger data not only gives better learning but also a better generalization.
Better Algorithms New research papers are getting published almost every day in the field of deep learning which is leading to producing more and more new and improved algorithms.
Improved Computational Power Deep learning is popular for its accuracy especially on unstructured (eg. images, videos, textual) data that requires very high computational power, which is becoming cheaper with every coming day. Moreover, cloud platforms like Google Colab, Kaggle, and Binder are providing users with free GPU’s for research and learning purposes.
Increased Demand The large share for deep learning becoming popular is attributed to the factors such as growing usage of learning analytics and its use cases like growing acceptance of assistants driven by machine learning, NLP, computer vision techniques, and an increase in demand for AI platforms for manufacturing operations.
Various Algorithms in Deep Learning
Deep learning algorithms are widely used in industries to solve complex problems. These algorithms use different types of neural networks to perform specific tasks.
Artificial Neural Networks (ANN’s) These networks can make predictions and classifications by studying the data patterns. The more data is provided, the more patterns can be learned by this method. The two main processes involved in the proper functioning of neural networks are, namely feedforward and backpropagation.
Feedforward networks like ANN calculate output values based on input values, while backpropagation is an algorithm used for training the value of weights and bias based on errors obtained from output values.
The learning process begins with randomly initiating the weights, and hyperparameters like learning rate, and epoch (number of iterations). We have to determine the number of input neurons, hidden neurons, and output neurons. Epochs are the number of iterations during the learning process, after it reaches a certain iteration, the learning process will stop. We can also determine the minimum error value to stop the learning process using tuning of various hyperparameters.
Learning rate is used to determine the speed of learning, the greater the value of learning rate, the faster the machine is learning. The value returned by the multiplication of initialized weights with the input values is entered into the activation function. The activation function that is mostly is the sigmoid function.
Backpropagation is the method of updating and evaluating the parameters (weights and bias) used in the feed-forward network. The weight value will be trained according to the error value obtained. To calculate errors methods such as MSE (Mean Squared Error), SSE (Sum of Squared Error), etc are used. The optimization algorithm that performs backpropagation used in ANN are Adam, RMSProp, Gradient Descent, etc.
Convolutional Neural Networks (CNN’s)
These neural networks are popularly known as ConvNets. These consist of multiple layers that are mainly used for Image processing and object detection. CNN’s are majorly used to detect objects, process medical images, forecast time series, and detect anomalies.
ConvNets works with the help of multiple layers that are used to process and extract features from images. There are three types of layers that make up the whole ConvNet Architecture; these are the convolutional layers, pooling layers, and fully-connected (FC) layers. When these layers are stacked, a CNN architecture will be formed.This reduces the images into a form which is easier to process, without losing features which are critical for getting a good prediction.
Convolution Layer – The Kernel
The Kernel is an element of the Convolutional Network that is involved in carrying out the convolution operation in the first part of the layer. In the above Image, K, represented in the color yellow and it’s in the form of a 3x3x1 matrix. Here, the Kernel shifts 9 times because stride = 1 (Non-Strided), every time a matrix multiplication operation is performed between K and the image portion over which the kernel is hovering. CNN’s have a ReLU (Rectified Linear Unit) layer to perform operations on elements. The output is a feature map.
ReLu function => f(u)=max(0,u)
Which means maximum of 0 or u is the output.
The output of the Convolution layer is known as a feature map, which gives us information about the image such as the corners and edges. Later, this feature map is fed to other layers to learn several other features of the input image
The feature map is fed into a pooling layer in the next step. Pooling is a method of down-sampling that reduces the dimensions of the feature map. Pooling leads to a decrease in the computational power required to process the data. Moreover, it helps in extracting features that do not change with change in position in the image, which further helps in maintaining the effectiveness of the trained model.
Pooling is of 2 types: Max Pooling and Average Pooling.
Max Pooling returns the maximum value from the part of the image covered by the Kernel. While, Average Pooling returns the average of all the values from the part of the image covered by the Kernel.
Fully Connected Layer
At last, the image will be flattened into a column vector. This output is then fed to an artificial forward neural network which trains its weights using backpropagation in every training iteration. Continuous training after a certain number of epochs, the model can distinguish between more significant and other less significant (low level) features in images and use Softmax Classifier (final layer of the network that yields the probability scores for each class present in dataset) for classifying the images into the desired class.
Different Types of ConvNets are ResNet, LeNet, AlexNet, VGGNet etc. These models have pre trained weights on datasets like ImageNet, Coco Image dataset, so that they can be used in other models as per requirement.
Recurrent Neural Networks (RNNs)
RNNs are a powerful and robust type of neural network that uses internal memory during its training process. These networks have connections that form directed cycles and allow the outputs from the previous step to be fed as inputs to the current step. RNNs are widely used in image captioning, time series forecastings, named entity recognition, handwriting recognition, and language translation.
Below is an unfolded (the neural network at different time intervals is shown as a single folded diagram) RNN, the output at time (t-1) is fed as input at time t, similarly output at time t is fed as input at time (t+1). Recurrent neural networks have the possibility of processing inputs of any length.
Real-life example of RNN is Google’s Autocomplete feature which works by using a repository of vast volumes of most frequently occurring consecutive words, this data is fed into a recurrent neural network to train a model, Now if you try search on google ‘How to copy a cell in‘ Google will try to suggest you excel as the next word, thus auto-completing the search. Since in this case google has already trained a RNN model on a vast volume of data which helps in predicting the next word by the user. These Auto-completions are rendered automatically by the trained algorithm, which produces the queries most likely to capture the searcher’s intent based on the user’s behavior and patterns of similar searches.
Generative Adversarial Networks (GANs)
GANs are a method of generative modeling with the help of deep learning algorithms used to create new data instances that resemble the training data. These networks have two components: a generator, which learns to generate fake data, and a discriminator, which learns from that false generated data.
GANs are utilized in satellite images and for simulating gravitational lensing in dark-matter research. Game developers use these networks to improve low-resolution, low-dimensional textures in old video games by revamping them in 4K or higher resolutions with the help of image training. GANs are also used to generate realistic human images and cartoons/anime characters also.
How Do Gans Work?
- The discriminator learns to distinguish between the generated fake data and the real sample data.
- During the initial phase of training, the generator produces fake data, and the discriminator quickly learns to tell that it’s false.
- The GAN then sends the results to the generator and the discriminator to update the model.
Diagram of how GANs work
GAN’s perform the iteration of gradient descent (an optimization algorithm for finding a local minima of a differentiable function) on Discriminator using real and generated images by Generator. Then we fix Discriminator and train Generator for another single iteration to fool the Discriminator. GAN’s aim is to optimize the functioning by iterating both Generator and Discriminator in alternating steps until good quality images from the generator and discriminator won’t be able to differentiate between real and fake images.
Key differences between Machine Learning and Deep Learning?
|Machine Learning||Deep Learning|
|Machine Learning is a superset of Deep Learning.||Deep Learning is a subset of Machine Learning.|
|In this human intervention is required to identify and code the applied features based on the data.||While here, deep learning models try to learn those features without any additional human intervention.|
|These models can be trained on lower-end machines without much computing power in lesser time.||These models require much more computation power and more time to train due to the complexity of the mathematical calculations involved in the algorithms|
|Algorithms used here tend to parse data in folds (section or a sample of data), then those folds are combined to come up with a result.||Deep learning systems look at an entire problem dataset in one fell swoop.|
|The two broad categories of machine learning are: Supervised and Unsupervised learning.||Deep learning is mainly based on a layered structure of neurons called an artificial neural network.|
|Since it is mainly used for structured (Excel files or SQL database) data, the output is usually a numeric value like classification or regression.||Mainly used for unstructured data, the output is usually diverse like a score, an element, classification, or simply a text.|
|Applications are medical diagnosis, statistical prediction, classification, prediction, fraud detection.||Applications are virtual assistants, shopping & entertainment, facial recognition, language translations, pharmaceuticals, and computer vision.|
Venn Diagram to Understand Usually Confused Terms AI/ML/DL
Career Prospects in Deep Learning
As the demand for AI and machine learning has increased over the period of time, companies require skilled professionals with complete knowledge of these growing technologies and experience.
Those who acquire skills in machine learning and deep learning, there are a wide range of jobs in multinational corporations across India and the world, including software engineers, electronic engineering systems analysts, data scientists, or engineers.
Graduates with degrees in AI, Data Science or Computer science related fields that provide a unique skill set useful in a broad range of application sectors are very likely to succeed in securing a highly paid job. Job roles like Data Scientist, Data Engineer, Data Analyst, Machine Learning Engineer, Computer Vision Engineer are the career opportunities in this domain.
So to conclude, Deep learning is an Artificial Intelligence branch that mimics the workings of the human brain in processing data for use in object detection, recognizing speech, translating languages, and making decisions. Deep learning is able to learn with minimal human intervention, drawing from data that is both unstructured and structured. The first few layers in the neural network perform basic processes like feature extraction in a series of stages, similar to what the human brain seems to do. The level of complexity of such features increases through the network, with the actual decisions taking place in the last few layers. of the network structure. Deep learning models can be trained with the help of various libraries like TensorFlow, Keras, PyTorch, Fast.ai, etc.