You are on page 1of 13

Steps for training a Recurrent Neural Network

Given below are few steps for training a recurrent neural network.

1. In the input layers, the initial input is sent with all having the same weight and activation
function.
2. Using the current input and the previous state output, the current state is calculated.
3. Now the current state ht will become ht-1 for the second time step.
4. This keeps on repeating for all the steps, and to solve any particular problem, it can go on
as many times to join the information from all the previous steps.
5. The final step is then calculated by the current state of the final state and all other
previous steps.
6. Now an error is generated by calculating the difference between the actual output and the
output generated by our RNN model.
7. The final step is when the process of backpropagation occurs where in the error is
backpropagated to update the weights.

Advantages & Disadvantages of Recurrent Neural Network

Following are the advantages & disadvantages mentioned below.


Advantages

1. RNN can process inputs of any length.


2. An RNN model is modeled to remember each information throughout the time which is
very helpful in any time series predictor.
3. Even if the input size is larger, the model size does not increase.
4. The weights can be shared across the time steps.
5. RNN can use their internal memory for processing the arbitrary series of inputs which is
not the case with feed forward neural networks.

Disadvantages

1. Due to its recurrent nature, the computation is slow.


2. Training of RNN models can be difficult.
3. If we are using relu or tanh as activation functions, it becomes very difficult to process
sequences that are very long.
4. Prone to problems such as exploding and gradient vanishing.

Mathematical Analysis of Recurrent Neural Networks


Math in a Vanilla Recurrent Neural Network

1. Vanilla Forward Pass

2. Vanilla Backward Pass

3. Vanilla Bidirectional Pass

4. Training of Vanilla RNN

5. Vanishing and exploding gradient problems

1. Vanilla Forward Pass

The forward pass of a vanilla RNN


a. The same as that of an MLP with a single hidden layer

b. Except that activations arrive at the hidden layer from both the current
external input and the hidden layer activations one step back in time.

For the input to hidden units we have


For the output unit we have

The complete sequence of hidden activations can be calculated by starting at t =


1 and Recursively applying the three equations, incrementing t at each step.
2.Vanilla Backward Pass
Given the partial derivatives of the objective function with respect to the network
outputs, we now need the derivatives with respect to the weights.We focus on
BPTT since it is both conceptually simpler and more efficient in computation time
(though not in memory). Like standard back-propagation, BPTT consists of a
repeated application of the chain rule.
. Back-propagation through time
Don't be fooled by the fancy name. It's just the standard back-propagation.

Back-propagation through time

1. The complete sequence of delta terms can be calculated by starting at t = T and recursively
applying the below functions, decrementing t at each step.
2. Note that δjT+1 = 0, for all j, since no error is received from beyond the end of the sequence.

3. Finally, bearing in mind that the weights to and from each unit in the
hidden layer are the same at every time-step, we sum over the whole
sequence to get the derivatives with respect to each of the network weights

3. Vanilla Bidirectional Pass

For many sequence labeling tasks, we would like to have access to future.
1. Algorithm looks like this

4. Training of Vanilla RNN


We have discussed how RNN can be differentiated with respect to suitable objective
functions, and thereby they could be trained with any gradient-descent based algorithm
 Just treat them as a normal CNN

One of the great things about RNN: lots of engineering choices


 Preprocessing and post processing

5.Vanishing and exploding gradient problems


Multiply the same matrix at each time step during backpropagation
Example how gradient vanishes Similar but simpler RNN formulation:

Applications of Recurrent Neural Networks (RNNs)

A. Prediction problems
RNNs are generally useful in working with sequence prediction problems. Sequence prediction
problems come in many forms and are best described by the types of inputs and outputs it
supports.

Sequence prediction problems include:

One-to-Many:In this type of problem, an observation is mapped as input to a sequence with


multiple steps as an output.
Many-to-One: Here a sequence of multiple steps as input are mapped to a class or quantity
prediction.
Many-to-Many: A sequence of multiple steps as input are mapped to a sequence with multiple
steps as output.The Many-to-Many problem is often referred to as sequence-to-sequence.

The problem with Recurrent neural networks was that they were traditionally difficult to train.
The Long Short-Term Memory, or LSTM, network is one of the most successful RNN because it
solves the problems of training a recurrent network and in turn has been used on a wide range of
applications. RNNs and LSTMs have received the most success when working with sequences of
words and paragraphs, generally in the field of natural language processing(NLP).They are also
used as generative models that produce a sequence output, not only with text, but on applications
such as generating handwriting.

B. Language Modelling and Generating Text


Taking a sequence of words as input, we try to predict the possibility of the next word. This can
be considered to be one of the most useful approaches for translation since the most likely
sentence would be the one that is correct. In this method, the probability of the output of a
particular time-step is used to sample the words in the next iteration.

C. Machine Translation
RNNs in one form or the other can be used for translating text from one language to other .
Almost all of the Translation systems being used today use some advanced version of a RNN.
The input can be the source language and the output will be in the target language which the user
wants.

Currently one of the most popular and prominent machine translation application is Google
Translate. There are even numerous custom recurrent neural network applications used to refine
and confine content by various platforms. E-Commerce platforms like Flipkart, Amazon, and
eBay make use of machine translation in many areas and it also helps with the efficiency of the
search results.
D. Speech Recognition
RNNs can be used for predicting phonetic segments considering sound waves from a medium as
an input source .The set of inputs consists of phoneme or acoustic signals from an audio which
are processed in a proper manner and taken as inputs. The RNN network will compute the
phonemes and then produce a phonetic segment along with the likelihood of output.The steps
used in speech recognition are as follows:-

 The input data is first processed and recognized through a neural network. The result
consists of a varied collection of input sound waves.

 The information contained in the sound wave is further classified by intent and through
key words related to the query.

 Then input sound waves are classified into phonetic segments and are pieced together
into cohesive words using a RNN application. The output consists of a pattern of phonetic
segments put together into a singular whole in a logical manner.

Read this paper to read more about speech recognition using RNNs.

E. Generating Image Descriptions


A combination of CNNs and RNNs are used to provide a description of what exactly is
happening inside an image. CNN does the segmentation part and RNN then uses the
segmented data to recreate the description.

F. Call Center Analysis


This can be considered as one of the major applications of RNNs in the field of audio processing.
Customer metrics are measured on the output of the call rather than the call itself. However,
analyzing the call itself will provide businesses with the solutions to why the support staff
succeeded and what were the steps taken in resolving their customer issue. This learning can then
be studied and reapplied to other similar scenarios or to train new support representatives. Hence
the entire process can be automated based on the use of Recurrent Neural Networks to process
and synthesize actual speech from the call for analysis purpose. Such synthesized speech can be
further taken as an input to a tone analysis algorithm to measure the emotions and sentiments
related to various parts of the conversation. This would help the business identify when the
customer is satisfied with the service and support and when a customer has faced issues.
Although these things can be done by involving simple human observations, automation will
help business quantify the results and derive insights from them for future knowledge since all
the output data would be stored in a database.

G. Face detection, OCR Applications as Image Recognition

Image recognition is one of the major applications of computer vision. It is also one of the most
accessible form of RNN to explain. In its core, the algorithm is designed to consider one unit of
image as input and produce the description of the image in the form of multiple groups of
output .

The image recognition framework includes:


Convolutional neural network that processes the image and recognizes the features of the
pictures,
Recurrent neural networks that makes use of the known features to make sense of the image
and put together a proper description of the input image. There are numerous benefits of Image
recognition in the field of business - it can be used as a streamlining tool to makes it easier for
the customer to operate with the service, find relevant images, navigate through information, and
make purchases and furthermore adding to the security of the customer. The most prominent
industries that are making use of image recognition are Search engines, e-Commerce, Social
Media, Security and Networking.
Limitations of Recurrent Neural Networks

 The basic RNN has some important drawbacks:

• Training time: Training a RNN is known to be very slow. For example, the RNNs used in the
experiments described in Mikolov (2010) took several weeks of training, although the authors
considered only about 17% of the NYT section of English Giga word for training. Usually it takes
about 10–50 training epochs to achieve convergence although cases have been reported where
even thousands of epochs were needed . In addition the size of the vocabulary |V| which for many
language and speech applications is usually very large, plays a crucial role in the real
complexity of the training.
• Fixed number of hidden neurons: The number of hidden neurons nH has to be fixed in advance.
However, in practice the user has no clue as how to choose an appropriate number of hidden neurons,
since there does not exist a generally accepted method that determines this number. A lot of rules of
thumb are available but these rules can give very different values for nH.
• Small context size in practice. Although in theory the context size that can be taken into account is
unlimited ( if the genuine consecutive scheme is used, the history of words that is taken into account
equals all previous words relative to the current word), the range of context that is actually
accessed is quite limited. This observation is often referred to as the vanishing gradient problem .

References

[1]. CS224d Deep NLP, “Recurrent Neural Networks”, Richard Socher, Stanford University

[2]. “Supervised Sequence Labelling with Recurrent Neural Networks”, Alex Graves, Doktors
der Naturwissenschaften (Dr. rer. nat.)

[3]. “Recurrent Neural Networks”,Priya pedamkar.

[4].”A survey on the application of recurrent neural networks to statistical language


modeling” ,Wim De Mulder, Steven Bethard, Marie-Francine Moens.

You might also like