There are many different kinds of neural networks but the recurrent neural network is something special and that is mostly because of its ability to remember many previous inputs it gets as it has an internal memory.
Due to this internal memory, the recurrent neural network found very useful to recognize patterns in sequences of data such as text, genomes, handwriting, the spoken word or stock markets
But what this all means to us? As a human, we unconsciously remember previous events so we can produce the output, for ex. when you add 1+1+1, you first remember 1+1=2, and then you add that 2 to 1, to get 3.
In a traditional neural network, this was a major drawback, all the input or output the neural network receives or produce we assume that they are independent to each other, which is a very bad idea in many tasks.
If we take a traditional neural network, in our previous ex. then they will completely fail in it.
They will not be able to add 1+1+1, because here the inputs are independent and they will never be able to link 1+1=2, with the next operation.
But with the recurrent neural network, it is a work of a minute. RNNs has been designed to overcome this drawback. It forms a much deeper understanding of a sequence and its context, compared to other algorithms
Recurrent neural network this ability helps us to build highly complicated software like Siri from Apple and Google Translate and Google Voice from Google.
How does it work?
Before getting into how recurrent neural network works we first need to understand the working of feedforward neural networks.
The major difference between both of this neural network is the way they channel information through a series of mathematical operations performed at the nodes of the network.
The feedforward neural networks are the most basic types of artificial neural network and the information in it only flows in the forward direction in every layer of the network.
The input layer accepts the inputs, feeds it to the hidden layers for all the calculations and manipulations. This calculated information is then forwarded to the output layer to produce the output.
The output is considered if it is above a certain value i.e threshold (usually 0) and the neuron fires with an activated output (usually 1) and if it does not fire, the deactivated value is emitted (usually -1).
Because of such a pattern of flow of information, the information never touches a node twice in a feedforward neural network.
The feedforward neural network also doesn’t have a memory, so it can’t remember the previous inputs it received and therefore it performed very badly in predicting what’s coming next.
But that doesn’t mean FFN is useless, FNN is very useful in classifications. For example, in a given image, it may classify the image as bus, van, ship and etc. The feedforward networks should be trained in order to do such predictions.
On the other hand, if we talk about recurrent neural network works, then the RNN works on the principle of saving the output of a layer and feeding this back to the input to help in predicting the outcome of the layer.
In the recurrent neural network, the first layer is formed similarly to the feed forward neural network with the product of the sum of the weights and the features.
The recurrent neural network process starts once this is computed, this means that from one-time step to the next each neuron it will remember some information it had in the previous time-step.
That sequential information is preserved in the recurrent network’s hidden state, which manages to span many time steps as it cascades forward to affect the processing of each new example.
It is finding correlations between events separated by many moments, and these correlations are called “long-term dependencies”, because an event downstream in time depends upon, and is a function of, one or more events that came before.
One way to think about the recurrent neural network is this: they are a way to share weights over time.
This makes each neuron act like a memory cell in performing computations. In this process, we need to let the neural network to work on the front propagation and remember what information it needs for later use.
Here, if the prediction is wrong we use the learning rate or error correction to make small changes so that it will gradually work towards making the right prediction during the backpropagation.
In RNN here is looping of information in every layer. Recurrent networks are distinguished from feedforward networks by that feedback loop connected to their past decisions, ingesting their own outputs moment after moment as input.
In theory, the recurrent neural network can make use of information in arbitrarily long sequences, but in practice, they are limited to looking back only a few steps.
Here is what a typical recurrent neural network looks like:
The above diagram shows a recurrent neural network being unrolled (or unfolded) into a full network. By unrolling we simply mean that we write out the network for the complete sequence.
For example, if the sequence we care about is a sentence of 5 words, the network would be unrolled into a 5-layer neural network, one layer for each word.
There are different types of recurrent neural networks like One to One, it is a single input and single output network, One to Many, it is a single input and multiple outputs networks.
Other two are Many to One it is a multiple inputs and single output recurrent neural networks and the last one is Many to Many which is a multiple input and multiple outputs recurrent neural networks.
How RNN learn
The artificial neural networks are created with interconnected data processing components that are loosely designed to function like the human brain.
These neural networks are composed of layers of artificial neurons or network nodes that have the capability to process input and forward output to other nodes in the network.
The nodes are connected by edges or weights that influence a signal’s strength and the network’s ultimate output.
In some cases, artificial neural networks process information in a single direction from input to output which we saw in feedforward neural networks. RNNs, on the other hand, can be layered to process information in two directions.
Unlike feedforward neural networks, RNNs use feedback loops such as Backpropagation Through Time or BPTT throughout the computational process to loop information back into the network.
This connects inputs together and is what enables RNNs to process sequential and temporal data.
Backpropagation Through Time
The Backpropagation Through Time or BBT is the training algorithm used to update weights in recurrent neural networks like LSTMs.
But what does Backpropagation means?
Backpropagation is a technique used to effectively frame sequence prediction problems for the recurrent neural network. The ultimate goal of the backpropagation is to minimize the error of the network outputs.
In neural networks, we generally do forwardpropagation, in order to get the output from the model and to check whether the generated output is correct or incorrect, to get the error.
Whereas in backpropagation, to minimize the error, we propagate backward through the neural network to find the partial derivatives of the error with respect to the weights, which enables you to subtract this value from the weights.
The errors are calculated by their partial derivatives – ∂E/∂w, or the relationship between their rates of change. Those derivatives are then used by our learning rule, gradient descent, to adjust the weights up or down, whichever direction decreases error.
That is exactly how a Neural Network learns during the training process.
The Backpropagation Through Time is the application of backpropagation training algorithm on a recurrent neural network which is then applied to the sequence data like the time series.
The recurrent neural network is shown one input each timestep and predicts the corresponding output. So, we can say that BTPP works by unrolling all input timesteps.
Each timestep has one input time step, one output time step and one copy of the network. Then the errors are calculated and accumulated for each timestep. The network is then rolled back to update the weights.
But when the when number of time steps increases the computation also increases and this also a disadvantage of BPTT. Because of this model get noisy and the high cost of single parameter updates makes the BPTT impossible to use for a large number of iterations.
This problem further gets solved by Truncated Backpropagation technique. The Truncated Backpropagation is an advanced version of Backpropagation Through Time for the recurrent neural network.
In this technique, the sequence is processed one timestep at a time and periodically the BPTT update is performed for a fixed number of time steps.
The Truncated BPTT is an approximation of full BPTT that is preferred for long sequences since full BPTT’s forward/backward cost per parameter update becomes very high over many time steps.
Long Short-Term Memory units
The one major drawback to the standard recurrent neural network is the vanishing gradient problem, because of this problem the performance of neural network highly get affected and they can’t be trained properly.
This problem especially gets occurs with deeply layered neural networks, which are used to process complex data and it becomes even worse as the number of layers in the architecture increases.
The standard RNNs that use a gradient-based learning method degrade the bigger and more complex they get. Tuning the parameters effectively at the earliest layers becomes too time-consuming and computationally expensive.
The one solution to this problem is Long Short-Term Memory or LSTM. LSTM’s enable RNN’s to remember their inputs over a long period of time.
The recurrent neural network can figure out what data is important and should be remembered and looped back into the network, and what data can be forgotten and this is because LSTM’s contain their information in a memory.
This memory is much like the memory of a computer because the LSTM can read, write and delete information from its memory.
This memory can be seen as a gated cell, where gated means that the cell decides whether or not to store or delete information based on the importance it assigns to the information.
The assigning of importance happens through weights, which are also learned by the algorithm. This simply means that it learns over time which information is important and which not.
The recurrent neural network built with LSTM units categorize data into short term and long term memory cells.
Long Short-Term Memory units, were invented by computer scientists Sepp Hochreiter and Jurgen Schmidhuber in 1997.
Create RNN from scratch with python
More in AI :