RNN-1 All

Recurrent Neural Network
(RNN)
Time-indexed data points
The time-indexed data points may be:
[1] Equally spaced samples from a continuous real-world process.
Examples include
● The still images that comprise the frames of videos
● The discrete amplitudes sampled at fixed intervals that comprise
audio recordings.
● Daily Values of current exchange rate
● Rainfall measurements in Successive days (in certain location)
[2] Ordinal Time steps, with no exact correspondence to durations.
● Natural language (word sequence)
● Neucleotide base pairs in strand of DNA
Traditional Language Models
Traditional Language Models
RECURRENT NEURAL NETWORK (RNN)
Recurrent: perform same task for every element of a sequence
Output: depend on:
previous computations as well as
new inputs
RNNs have a “memory” of past !
Apply the same set of weights (u,v,w) recursively

RNN (FOLDED - UNFOLDED)
RNN (FOLDED - UNFOLDED)
Hidden state at time step t

Output state at time step t
Activation function
Input at time step

Unfolded RNN
(Multiple hidden layers)
Depth (multi Layer)
Time
Examples of Sequence
Application of RNN - LSTM
image sequence named entity
captioning classification translation
recognition
CHARACTER-LEVEL LANGUAGE MODEL
One-HOT Vectors input for Word Sequence
Indices instead of one-hot vectors?
(Generative Model)
Simple and Real RNN
(Number of Parameters)
Generated Text
Using Wikipedia
Generated Text
C Source Code
Generated Text
BACKPROPAGATION THROUGH TIME (BPTT)
BACKPROPAGATION THROUGH TIME
(BPTT)
Calculate gradients of error
with respect to: U, V, W.
Sum up the gradients at each time step

(BPTT)
Calculate Gradients by chain Rule
Remember:
Softmax fn
(BPTT)
S1 and S2 Depends on W and U too

(BPTT)
S1 and S2 Depends on W and U too

(BPTT)
Sum up the gradients at each time step
Propagation through time

(RNN)
=
Propagation through layers
(FNN)
Gradients of Some Common
Activation functions
VANISHING GRADIENT PROBLEM
Error Gradients pass through nonlinearity every step
Saturation at both ends ==> zero gradient
Vanishing completely after a few time steps.
Tanh Derivative ranges from 0 to 1



From
Recurrent Neural Network (RNN )
to
Lone Short Term Memory (LSTM)
RNN Cell
y’t
Output
Softmax
V
Hidden
Feedback Feedback
(t-1)
W
∑ Activation Fn (t)
Ct ht
W
Use ht as feedback
And as Output
U
Input
t-1 t t+1
From RNN to LSTM
y’t
Output
Softmax
ht
Hidden
ht-1 ∑ Tanh ht
W Ct W
U
X
Input
t-1 t t+1
From RNN to LSTM
Use Feed back from two Inputs: y’t
Output
Ct-1 Previous Cell STATE (before tanh)

Softmax
ht-1 Previous Cell OUTPUT (after tanh)
V
Ct-1
(Memory) (Memory)
Ct-1
(Output ) ht
Hidden
ht-1 ∑ σ Tanh ht
W Ct W (Output )
X
Input
t-1 t t+1
From RNN to LSTM
Attenuate I/P & O/P of Activation function
ft ”forget” Gate (Control Feedback) y’t
Output
it ”Input” Gate (Control Input) Softmax
Ot “Output” Gate (Control Output of tanh) V
Ct-1
(Memory) (Memory)
Ct-1 ft
Attenuation
(Output ) ht
Hidden
ht-1 ∑ σ Tanh ht
W Ct St W (Output )
Ot
U it
Attenuation
Attenuation
X
Input
t-1 t t+1
From RNN to LSTM
ft, it , Ot Attenuation Factors
All Factors are based on: y’t
Output
I/P to cell (Xt) with Param [ Uf Ui UO ] Softmax

O/P of Prev. cell (ht-1) with Param [Wf Wi WO] V
(Use Different Parameters (Weights) for each)
Ct-1
(Memory) (Memory)
Ct-1 ft
(Output ) ht
Hidden
ht-1 ∑ σ Tanh ht
W Ct St W (Output )
Ot
U it
σ σ σ
∑ ∑ ∑
Wf Uf X Wi Ui WO UO
Input
ht-1
t-1 t X t+1
From RNN to LSTM
Control Gates Depend on:

Input X (t)
Previous Output ( t-1)
Values Ranges from 0 to 1 (Segmoid)
Remember
In RNN
St = tanh ( W St-1 + U Xt)
Y’ = Softmax ( V ht ) Y’= softmax( V St )

LSTM Cell
st
ht -1 ht
LSTM Cell
Cell State
The cell state carries the essential information over time
st
ht -1 ht
LSTM Cell
Activation Functions
σ ∈ (0, 1): control gate – something like a switch
tanh ∈ −1, 1 : recurrent nonlinearity
st
ht -1 ht
LSTM Cell
forget Gate
Decide what to forget and what to remember for the new memory
Sigmoid 1 ==> Remember everything

Sigmoid 0 ==> Forget everything
st
ht -1 ht
LSTM Cell
Input Gate
Decide what new information should you add to the new memory
Modulate the input it

Generate candidate memories Ct
st
ht -1 ht
LSTM Cell
Update State
Compute and update the current cell state Ct
Depends on the previous cell state
What we decide to forget
What inputs we allow
The candidate memories
st
ht -1 ht
LSTM Cell
Cell Output
Modulate the output
Does the cell state contain something relevant? --> Sigmoid 1
st
ht -1 ht
Unrolled LSTM
t-1 t t+1

RNN-1 All

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RNN-1 All

Uploaded by

Copyright:

Available Formats

Recurrent Neural Network

Apply the same set of weights (u,v,w) recursively

Hidden state at time step t

Input at time step

Sum up the gradients at each time step

S1 and S2 Depends on W and U too

S1 and S2 Depends on W and U too

Propagation through time

Tanh Derivative ranges from 0 to 1

Tanh Derivative ranges from 0 to 1

Tanh Derivative ranges from 0 to 1

Ct-1 Previous Cell STATE (before tanh)

it ”Input” Gate (Control Input) Softmax

Ot “Output” Gate (Control Output of tanh) V

I/P to cell (Xt) with Param [ Uf Ui UO ] Softmax

Control Gates Depend on:

Values Ranges from 0 to 1 (Segmoid)

St = tanh ( W St-1 + U Xt)

Y’ = Softmax ( V ht ) Y’= softmax( V St )

tanh ∈ −1, 1 : recurrent nonlinearity

Sigmoid 1 ==> Remember everything

Modulate the input it

You might also like