RNN (Recurrent Neural Networks)

RNN(Recurrent Neural Networks)
RNN:
 This type of neural networks saves the output of a particular layer and this is
fed back to the input in order to predict the output of the layer.
 Feed forward neural network can be converted to Recurrent Neural Network.
RNN:
 The nodes in the different layers of neural networks are compressed to form a
single layer of neural networks.

 A, B and C are the parameters of the neural networks.
Fully Connected RNN:
 “x” is the input layer , “h” is the hidden layer and “y” is the output layer.
 A,B and C are the input parameters which are used to improve the output of
the model.
 At any given time instance ‘t’ , the current input is the combination of input at
x(t) and x(t-1).

 The output at any given time is fed back to the input in order to improve the
output.
Fully Connected RNN:
Reasons To Use RNN:
 RNNs were created in order to overcome the issues in feed forward neural networks.
 Following are the drawbacks of feed forward NNs:
 It cannot handle sequential data.
 It considers only the present input.
 It cannot remember previous inputs.
 In order to overcome all these issues , RNNs can be used.
 It can handle sequential data , accept the current input data and previously received inputs.
 Can memorize previous inputs due to their internal memory.

RNN - Working
 The input layer ‘x’ takes in the input to the neural network , processes it and passes it to
the middle layer.

 The middle layer ‘h’ consists of multiple hidden layers .
 The middle layer contains its own activation function , weights and biases.
 In case of a normal neural network , the different parameters of the hidden layer are not
affected by the previous layer.

 If the neural network does not have any memory , then we can go for a Recurrent Neural
Network(RNN).
RNN - Working
 RNNs will standardize the different activation functions , weights and biases
 So that each hidden layers has the same parameters.
 Instead of creating multiple hidden layers , it will create one and loop over it
as many times as required.

FNNs - RNNs
 A feed – forward neural networks allows information to flow in only one
direction.
 It goes from the input nodes , through the hidden layers and to the output
nodes.
 There are no cycles or loops in the network.
FNNs - RNNs
 In a feed – forward neural network , the decisions are based on the current
input.
 It doesn’t memorize the past data and there is no future scope.
 FFNNs are used in general regression and classification problems.

APPLICATIONS - RNNs
 Image Captioning:
 This type of networks can be used to caption an image by analyzing the
activities present.
 Time Series Prediction:
 This type of neural networks can be applied to any time – series prediction.
 They are used to predict the prices of stocks in a specific month.

APPLICATIONS - RNNs
 Natural Language Processing:
 Text mining and sentiment analysis can be carried out using RNN for Natural
language processing(NLP).
 Machine Translation:
 If the input is given in one language , RNNs can be used to translate the into
different languages as output.

Types Of RNNs
 One to One
 One to Many
 Many to One
 Many to Many
Types Of RNNs
 One To One RNN:
 This type of neural network models is used to solve any ML problems which
has a single input and a single output.

Types Of RNNs
 One to Many RNN:
 This type of neural network has a single input and multiple outputs.
Types Of RNNs
 Many To One RNN:
 It takes a sequence of inputs and produces a single output.
 Sentiment analysis is a good example of this type of network where a given
sentence can be classified as expressing positive or negative sentiments.

Types Of RNNs
 Many To Many RNN:
 It takes a sequence of inputs and produces a sequence of outputs.

Problems - RNNs
 Vanishing Gradient Problem:
 They enable us to model time – dependent and sequential data problems .
 They are used in solving problems like stock market prediction , machine translation and text
generation.
 They are harder to train because of the gradient problem.
 They suffer from the problem of vanishing gradients.
 They carry information used in the RNN .
 When the gradient becomes too small , the parameter updates become insignificant.
 So , it is going to be difficult to learn long data sequences.

Problems - RNNs
Problems - RNNs
 Exploding Gradient Problem:
 When we train a neural network , if the slope grows exponentially instead of decaying ,
it is called an exploding gradient.

 This problem arises when large error gradients accumulate .
 They result in very large updates to the neural network model weights during the
training process.
 It requires a long training time , performance and accuracy is not good when dealing
with exploding gradient problems.

Gradient Problem Solutions
 Discuss the most popular and efficient way to deal with gradient problems,
(i.e),LSTMs.
 Suppose , we want to predict the last word in the text: “ The clouds are in the
 _________”.
 The answer is the “sky”.
 We don’t require any further context to predict the last word in the sentence.
 Consider this:”I have been staying in France for last 10 years.I can speak fluent _________”.
 The word we predict will depend on the previous words in the context.
 Here , we need the context of France in order to predict the last word and the most suitable
answer is “French”.
 The gap between the relevant information and the point where it’s needed may have become
very large.
 LSTMs are used to solve this problem.
BackPropagation Through Time
 This algorithm can be applied to RNN that has time series data as the input.
 In a RNN , one input is fed into the network at a time and one output is obtained.
 In Backpropagation , we use the current as well as the previous inputs as the input.
 It is called as a timestep and one timestep will consist of many time series data points entering the
RNN simultaneously.
 Once the neural network has been trained on a timeset and given an output , the output is used to
calculate and accumulate the errors.

 After that , the network is rolled back up and the weights are recalculated and updated keeping the
errors in mind.
LSTM:
 It is a kind of recurrent neural network .
 In RNN , the output from the previous step is fed as input to the current step.
 It was designed by Hochreiter and Schmidhuber.
 It was able to handle the problem of long-term dependencies of RNN.
 It cannot predict the word stored in the long term memory but can give more
accurate predictions from the recent information.

 As the length of the gap increase , the performance of RNN is not very good.
 LSTM can retain the information for a very long period of time.
 It is used for processing , predicting and classifying on the basis of time series
data.
Structure Of LSTM:
 It has a chain structure that contains four neural networks and different memory blocks called cells.
Structure Of LSTM:
 Information is retained by the cells and memory manipulations are done by the gates.
 There are three gates:
 Forget Gate: The information that are no longer useful in the cell state is removed with the forget gate.
Forget Gate:
 Two inputs x_t(input at particular time instant) and h_t-1(previous cell output) are fed to the gate and
multiplied by the weight matrices followed by the addition of bias.

 The result is passed through an activation function which gives a binary output.
 For a particular cell state , if the output is 0, then that piece of information is lost and if the output is 1,
then the information is retained for future use.

Input Gate:
 We can add some useful information to the cell state using input gate.
Input Gate:
 First , the information is regulated using the sigmoid function.
 Then , filter the values to be remembered using filter gates using inputs h_t-1 and x_t.
 Then , a vector is created using tanh function that gives the output from -1 to +1.
 It contains all the possible values from h_t-1 and x_t.
 Finally , the values of the vector and regulated values are multiplied to obtain the useful information.
Output Gate:
 Useful information can be extracted from the current cell state and presented at the output with the help
of output gate.
Output Gate:
 First , a vector is generated by applying tanh function on the cell.
 Then , the information is regulated using the Sigmoid function and filter the values to be remembered
using inputs h_t-1 and x_t.

 Finally , the values of the vector and the regulated values are multiplied .
 The result is sent as an output and input to the next cell.

Applications Of LSTM:
 Language Modelling.
 Machine Translation.
 Image Captioning.
 Handwriting Generation.
 Question Answering Chatbots.

Keras – time series prediction using lstm RNN
 We are going to write a simple long short term memory(LSTM) based RNN in
order to do sequence analysis.

 A sequence is a set of values where each value corresponds to a particular
instance of time.
 Consider an example of reading a sentence.
 When we understand the word in the correct order , we can understand the
meaning of the sentence.

Time series prediction
 Understand each and every word in the given context and the sentence is
classified as positive or negative sentiment.

 Words are considered as values , first value corresponds to first word , second
value corresponds to second word .

 The order will be strictly maintained.
 Sequence analysis is used in NLP to find the sentiment analysis of the given
text.
Time series prediction
 Let us create a LSTM model to analyze the IMDB movie reviews and find its
positive/negative sentiment.
Model description:
The core features of the model are as follows −
Input layer using Embedding layer with 128 features.
First layer, Dense consists of 128 units with normal dropout and recurrent
dropout set to 0.2.

Output layer, Dense consists of 1 unit and ‘sigmoid’ activation function.
Use binary_crossentropy as loss function.

Model description:
 Use adam as Optimizer.
 Use accuracy as metrics.
 Use 32 as batch size.
 Use 15 as epochs.
 Use 80 as the maximum length of the word.
 Use 2000 as the maximum number of word in a given sentence.

Step 1: import the modules
 Import all the necessary modules
from keras.preprocessing import sequence

from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
Step 2 – Load Data
 Let us import the imdb dataset.
 (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = 2000)
 imdb is a dataset provided by Keras. It represents a collection of movies and
its reviews.
 num_words represent the maximum number of words in the review.
Step 3 – Process The Data
 Let us change the dataset according to our model, so that it can be fed into our
model. The data can be changed using the below code −


x_train = sequence.pad_sequences(x_train, maxlen=80)

 x_test = sequence.pad_sequences(x_test, maxlen=80)
Step3 – Process the Data
 sequence.pad_sequences convert the list of input data with shape, (data) into
2D NumPy array of shape (data, timesteps). Basically, it adds timesteps

concept into the given data. It generates the timesteps of length, maxlen.
Step 4 – Create The Model
 model = Sequential() model.add(Embedding(2000, 128))
model.add(LSTM(128, dropout = 0.2, recurrent_dropout = 0.2))

model.add(Dense(1, activation = 'sigmoid'))
 We have used Embedding layer as input layer and then added the LSTM
layer. Finally, a Dense layer is used as output layer.

Step 5 – Compile The Model
 Let us compile the model using selected loss function, optimizer and metrics.
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics =

['accuracy'])
Step 6 : Train The Model
 Let us train the model using fit() method.
model.fit( x_train, y_train, batch_size = 32, epochs = 15, validation_data =

(x_test, y_test) )
Step 7 – Evaluate The Model
 Let us evaluate the model using test data.
Step 7 – Evaluate The Model
 score, acc = model.evaluate(x_test, y_test, batch_size = 32) print('Test score:', score)
print('Test accuracy:', acc)
 Executing the above code will output the below information −
Test score: 1.145306069601178 Test accuracy: 0.81292

RNN (Recurrent Neural Networks)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RNN (Recurrent Neural Networks)

Uploaded by

Copyright:

Available Formats

RNN(Recurrent Neural Networks)

single layer of neural networks.

x(t) and x(t-1).

 Following are the drawbacks of feed forward NNs:

 It cannot handle sequential data.

 It considers only the present input.

 It cannot remember previous inputs.

 In order to overcome all these issues , RNNs can be used.

 Can memorize previous inputs due to their internal memory.

the middle layer.

affected by the previous layer.

 So that each hidden layers has the same parameters.

as many times as required.

 FFNNs are used in general regression and classification problems.

 This type of networks can be used to caption an image by analyzing the

 Time Series Prediction:

 They are used to predict the prices of stocks in a specific month.

different languages as output.

has a single input and a single output.

 It takes a sequence of inputs and produces a single output.

 Sentiment analysis is a good example of this type of network where a given

sentence can be classified as expressing positive or negative sentiments.

 It takes a sequence of inputs and produces a sequence of outputs.

 They enable us to model time – dependent and sequential data problems .

 They suffer from the problem of vanishing gradients.

 They carry information used in the RNN .

 So , it is going to be difficult to learn long data sequences.

it is called an exploding gradient.

with exploding gradient problems.

 The answer is the “sky”.

calculate and accumulate the errors.

 It was designed by Hochreiter and Schmidhuber.

 It was able to handle the problem of long-term dependencies of RNN.

accurate predictions from the recent information.

 There are three gates:

multiplied by the weight matrices followed by the addition of bias.

then the information is retained for future use.

 First , the information is regulated using the sigmoid function.

 It contains all the possible values from h_t-1 and x_t.

using inputs h_t-1 and x_t.

 The result is sent as an output and input to the next cell.

 Question Answering Chatbots.

order to do sequence analysis.

meaning of the sentence.

classified as positive or negative sentiment.

value corresponds to second word .

The core features of the model are as follows −

Input layer using Embedding layer with 128 features.

dropout set to 0.2.

Use binary_crossentropy as loss function.

 Use 32 as batch size.

 Use 80 as the maximum length of the word.

 Use 2000 as the maximum number of word in a given sentence.

from keras.preprocessing import sequence

 (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = 2000)

 imdb is a dataset provided by Keras. It represents a collection of movies and

model. The data can be changed using the below code −

x_train = sequence.pad_sequences(x_train, maxlen=80)

2D NumPy array of shape (data, timesteps). Basically, it adds timesteps

model.add(LSTM(128, dropout = 0.2, recurrent_dropout = 0.2))

layer. Finally, a Dense layer is used as output layer.

model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics =

model.fit( x_train, y_train, batch_size = 32, epochs = 15, validation_data =