You are on page 1of 27

Recurrent Neural Network

(RNN)
Time-indexed data points
The time-indexed data points may be:
[1] Equally spaced samples from a continuous real-world process.
Examples include
● The still images that comprise the frames of videos
● The discrete amplitudes sampled at fixed intervals that comprise
audio recordings.
● Daily Values of current exchange rate
● Rainfall measurements in Successive days (in certain location)
[2] Ordinal Time steps, with no exact correspondence to durations.
● Natural language (word sequence)
● Neucleotide base pairs in strand of DNA
Traditional Language Models
Traditional Language Models
RECURRENT NEURAL NETWORK (RNN)
Recurrent: perform same task for every element of a sequence
Output: depend on:
previous computations as well as
new inputs
RNNs have a “memory” of past !

Apply the same set of weights (u,v,w) recursively


RNN (FOLDED - UNFOLDED)
RNN (FOLDED - UNFOLDED)

Hidden state at time step t


Output state at time step t

Activation function

Input at time step


Unfolded RNN
(Multiple hidden layers)
Depth (multi Layer)

Time
Examples of Sequence
Application of RNN - LSTM
image sequence named entity
captioning classification translation
recognition
CHARACTER-LEVEL LANGUAGE MODEL
One-HOT Vectors input for Word Sequence
Indices instead of one-hot vectors?
CHARACTER-LEVEL LANGUAGE MODEL
CHARACTER-LEVEL LANGUAGE MODEL
CHARACTER-LEVEL LANGUAGE MODEL
(Generative Model)
Simple and Real RNN
(Number of Parameters)
Generated Text
Using Wikipedia
Generated Text
C Source Code
Generated Text
BACKPROPAGATION THROUGH TIME (BPTT)
BACKPROPAGATION THROUGH TIME
(BPTT)
Calculate gradients of error
with respect to: U, V, W.

Sum up the gradients at each time step


BACKPROPAGATION THROUGH TIME
(BPTT)
Calculate Gradients by chain Rule

Remember:

Softmax fn
BACKPROPAGATION THROUGH TIME
(BPTT)

S1 and S2 Depends on W and U too


BACKPROPAGATION THROUGH TIME
(BPTT)

S1 and S2 Depends on W and U too


BACKPROPAGATION THROUGH TIME
(BPTT)
Sum up the gradients at each time step

Propagation through time


(RNN)
=
Propagation through layers
(FNN)
Gradients of Some Common
Activation functions
VANISHING GRADIENT PROBLEM
Error Gradients pass through nonlinearity every step
Saturation at both ends ==> zero gradient
Vanishing completely after a few time steps.

You might also like