You are on page 1of 48

RNN(Recurrent Neural Networks)

RNN:
 This type of neural networks saves the output of a particular layer and this is

fed back to the input in order to predict the output of the layer.
 Feed forward neural network can be converted to Recurrent Neural Network.
RNN:
 The nodes in the different layers of neural networks are compressed to form a

single layer of neural networks.


 A, B and C are the parameters of the neural networks.
Fully Connected RNN:
 “x” is the input layer , “h” is the hidden layer and “y” is the output layer.

 A,B and C are the input parameters which are used to improve the output of

the model.
 At any given time instance ‘t’ , the current input is the combination of input at

x(t) and x(t-1).


 The output at any given time is fed back to the input in order to improve the

output.
Fully Connected RNN:
Reasons To Use RNN:
 RNNs were created in order to overcome the issues in feed forward neural networks.

 Following are the drawbacks of feed forward NNs:

 It cannot handle sequential data.

 It considers only the present input.

 It cannot remember previous inputs.

 In order to overcome all these issues , RNNs can be used.

 It can handle sequential data , accept the current input data and previously received inputs.

 Can memorize previous inputs due to their internal memory.


RNN - Working
 The input layer ‘x’ takes in the input to the neural network , processes it and passes it to

the middle layer.


 The middle layer ‘h’ consists of multiple hidden layers .

 The middle layer contains its own activation function , weights and biases.

 In case of a normal neural network , the different parameters of the hidden layer are not

affected by the previous layer.


 If the neural network does not have any memory , then we can go for a Recurrent Neural

Network(RNN).
RNN - Working
 RNNs will standardize the different activation functions , weights and biases

 So that each hidden layers has the same parameters.

 Instead of creating multiple hidden layers , it will create one and loop over it

as many times as required.


FNNs - RNNs
 A feed – forward neural networks allows information to flow in only one

direction.
 It goes from the input nodes , through the hidden layers and to the output

nodes.
 There are no cycles or loops in the network.
FNNs - RNNs
 In a feed – forward neural network , the decisions are based on the current

input.
 It doesn’t memorize the past data and there is no future scope.

 FFNNs are used in general regression and classification problems.


APPLICATIONS - RNNs
 Image Captioning:

 This type of networks can be used to caption an image by analyzing the

activities present.

 Time Series Prediction:

 This type of neural networks can be applied to any time – series prediction.

 They are used to predict the prices of stocks in a specific month.


APPLICATIONS - RNNs
 Natural Language Processing:

 Text mining and sentiment analysis can be carried out using RNN for Natural

language processing(NLP).

 Machine Translation:

 If the input is given in one language , RNNs can be used to translate the into

different languages as output.


Types Of RNNs
 One to One

 One to Many

 Many to One

 Many to Many
Types Of RNNs
 One To One RNN:

 This type of neural network models is used to solve any ML problems which

has a single input and a single output.


Types Of RNNs
 One to Many RNN:

 This type of neural network has a single input and multiple outputs.
Types Of RNNs
 Many To One RNN:

 It takes a sequence of inputs and produces a single output.

 Sentiment analysis is a good example of this type of network where a given

sentence can be classified as expressing positive or negative sentiments.


Types Of RNNs
 Many To Many RNN:

 It takes a sequence of inputs and produces a sequence of outputs.


Problems - RNNs
 Vanishing Gradient Problem:

 They enable us to model time – dependent and sequential data problems .

 They are used in solving problems like stock market prediction , machine translation and text

generation.
 They are harder to train because of the gradient problem.

 They suffer from the problem of vanishing gradients.

 They carry information used in the RNN .

 When the gradient becomes too small , the parameter updates become insignificant.

 So , it is going to be difficult to learn long data sequences.


Problems - RNNs
Problems - RNNs
 Exploding Gradient Problem:

 When we train a neural network , if the slope grows exponentially instead of decaying ,

it is called an exploding gradient.


 This problem arises when large error gradients accumulate .

 They result in very large updates to the neural network model weights during the
training process.
 It requires a long training time , performance and accuracy is not good when dealing

with exploding gradient problems.


Gradient Problem Solutions
Gradient Problem Solutions
 Discuss the most popular and efficient way to deal with gradient problems,

(i.e),LSTMs.
 Suppose , we want to predict the last word in the text: “ The clouds are in the

 _________”.

 The answer is the “sky”.

 We don’t require any further context to predict the last word in the sentence.
Gradient Problem Solutions
 Consider this:”I have been staying in France for last 10 years.I can speak fluent _________”.

 The word we predict will depend on the previous words in the context.

 Here , we need the context of France in order to predict the last word and the most suitable

answer is “French”.
 The gap between the relevant information and the point where it’s needed may have become

very large.
 LSTMs are used to solve this problem.
BackPropagation Through Time
 This algorithm can be applied to RNN that has time series data as the input.

 In a RNN , one input is fed into the network at a time and one output is obtained.

 In Backpropagation , we use the current as well as the previous inputs as the input.

 It is called as a timestep and one timestep will consist of many time series data points entering the

RNN simultaneously.
 Once the neural network has been trained on a timeset and given an output , the output is used to

calculate and accumulate the errors.


 After that , the network is rolled back up and the weights are recalculated and updated keeping the

errors in mind.
LSTM:
 It is a kind of recurrent neural network .

 In RNN , the output from the previous step is fed as input to the current step.

 It was designed by Hochreiter and Schmidhuber.

 It was able to handle the problem of long-term dependencies of RNN.

 It cannot predict the word stored in the long term memory but can give more

accurate predictions from the recent information.


 As the length of the gap increase , the performance of RNN is not very good.

 LSTM can retain the information for a very long period of time.

 It is used for processing , predicting and classifying on the basis of time series

data.
Structure Of LSTM:
 It has a chain structure that contains four neural networks and different memory blocks called cells.
Structure Of LSTM:

 Information is retained by the cells and memory manipulations are done by the gates.

 There are three gates:

 Forget Gate: The information that are no longer useful in the cell state is removed with the forget gate.
Forget Gate:
 Two inputs x_t(input at particular time instant) and h_t-1(previous cell output) are fed to the gate and

multiplied by the weight matrices followed by the addition of bias.


 The result is passed through an activation function which gives a binary output.

 For a particular cell state , if the output is 0, then that piece of information is lost and if the output is 1,

then the information is retained for future use.


Input Gate:
 We can add some useful information to the cell state using input gate.
Input Gate:

 First , the information is regulated using the sigmoid function.

 Then , filter the values to be remembered using filter gates using inputs h_t-1 and x_t.

 Then , a vector is created using tanh function that gives the output from -1 to +1.

 It contains all the possible values from h_t-1 and x_t.

 Finally , the values of the vector and regulated values are multiplied to obtain the useful information.
Output Gate:
 Useful information can be extracted from the current cell state and presented at the output with the help
of output gate.
Output Gate:
 First , a vector is generated by applying tanh function on the cell.

 Then , the information is regulated using the Sigmoid function and filter the values to be remembered

using inputs h_t-1 and x_t.


 Finally , the values of the vector and the regulated values are multiplied .

 The result is sent as an output and input to the next cell.


Applications Of LSTM:
 Language Modelling.

 Machine Translation.

 Image Captioning.

 Handwriting Generation.

 Question Answering Chatbots.


Keras – time series prediction using lstm RNN

 We are going to write a simple long short term memory(LSTM) based RNN in

order to do sequence analysis.


 A sequence is a set of values where each value corresponds to a particular

instance of time.
 Consider an example of reading a sentence.

 When we understand the word in the correct order , we can understand the

meaning of the sentence.


Time series prediction
 Understand each and every word in the given context and the sentence is

classified as positive or negative sentiment.


 Words are considered as values , first value corresponds to first word , second

value corresponds to second word .


 The order will be strictly maintained.

 Sequence analysis is used in NLP to find the sentiment analysis of the given

text.
Time series prediction
 Let us create a LSTM model to analyze the IMDB movie reviews and find its

positive/negative sentiment.
Model description:

The core features of the model are as follows −

Input layer using Embedding layer with 128 features.

First layer, Dense consists of 128 units with normal dropout and recurrent

dropout set to 0.2.


Output layer, Dense consists of 1 unit and ‘sigmoid’ activation function.

Use binary_crossentropy as loss function.


Model description:

 Use adam as Optimizer.

 Use accuracy as metrics.

 Use 32 as batch size.

 Use 15 as epochs.

 Use 80 as the maximum length of the word.

 Use 2000 as the maximum number of word in a given sentence.


Step 1: import the modules
 Import all the necessary modules

from keras.preprocessing import sequence


from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
Step 2 – Load Data
 Let us import the imdb dataset.

 (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = 2000)

 imdb is a dataset provided by Keras. It represents a collection of movies and

its reviews.
 num_words represent the maximum number of words in the review.
Step 3 – Process The Data
 Let us change the dataset according to our model, so that it can be fed into our

model. The data can be changed using the below code −


x_train = sequence.pad_sequences(x_train, maxlen=80)


 x_test = sequence.pad_sequences(x_test, maxlen=80)
Step3 – Process the Data
 sequence.pad_sequences convert the list of input data with shape, (data) into

2D NumPy array of shape (data, timesteps). Basically, it adds timesteps


concept into the given data. It generates the timesteps of length, maxlen.
Step 4 – Create The Model
 model = Sequential() model.add(Embedding(2000, 128))

model.add(LSTM(128, dropout = 0.2, recurrent_dropout = 0.2))


model.add(Dense(1, activation = 'sigmoid'))

 We have used Embedding layer as input layer and then added the LSTM

layer. Finally, a Dense layer is used as output layer.


Step 5 – Compile The Model

 Let us compile the model using selected loss function, optimizer and metrics.

model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics =


['accuracy'])
Step 6 : Train The Model
 Let us train the model using fit() method.

model.fit( x_train, y_train, batch_size = 32, epochs = 15, validation_data =


(x_test, y_test) )
Step 7 – Evaluate The Model
 Let us evaluate the model using test data.
Step 7 – Evaluate The Model
 score, acc = model.evaluate(x_test, y_test, batch_size = 32) print('Test score:', score)

print('Test accuracy:', acc)

 Executing the above code will output the below information −

Test score: 1.145306069601178 Test accuracy: 0.81292

You might also like