You are on page 1of 62

LSTM

Long Short Term Memory


LSTM

GRU – gated
recurrent unit
reset gate Update gate
(more compression)

It combines the forget and input into a single update gate.


It also merges the cell state and hidden state. This is simpler
than LSTM. There are many other variants too.

X,*: element-wise multiply


GRUs also takes xt and ht-1 as inputs. They perform some
calculations and then pass along h t. What makes them different
from LSTMs is that GRUs don't need the cell layer to pass
values along. The calculations within each iteration insure that
the ht values being passed along either retain a high amount of
old information or are jump-started with a high amount of new
information.
Feed-forward vs Recurrent Network
1. Feedforward network does not have input at each step
2. Feedforward network has different parameters for each layer

f a f a f a f
x 1 2 3 y
1 2 3 4

t is layer
at = ft(at-1) = σ(Wtat-1 + bt)

h h h h y
0 f 1 f 2 f 3
f g 4

x x x x
1 2 3 4
t is time step

at= f(at-1, xt) = σ(Wh at-1 + Wixt + bi)

We will turn the recurrent network 90 degrees.


yt

No input xt at
each step hat-1
t-1
ahtt
No output yt at
each step 1-
at-1 is the output of
the (t-1)-th layer
reset update
at is the output of r z h'
the t-th layer
No reset gate
hat-1
t-1
xt xt
h’=σ(Wat-1)
z=σ(W’at-1)
Highway Network at = z ◉ at-1 + (1-z) ◉ h

• Highway Network • Residual Network


a t
at
+ at-1
z controls red arrow
h’
h’
Gate copy
controlle copy
r
at-1 at-1

Training Very Deep Networks Deep Residual Learning for Image


https://arxiv.org/pdf/ Recognition
1507.06228v2.pdf http://arxiv.org/abs/1512.03385
output layer output layer output layer

Highway Network automatically


determines the layers needed!
Input layer Input layer Input layer
Highway Network Experiments

You might also like