You are on page 1of 11

Gated RNNs

Geena Kim

University of Colorado Boulder


How hard is it to train an RNN?

• Slow to train (TBPTT helps)

• RNN can suffer exploding/vanishing gradient

• First or early memory or info get lost through the time step
Remedies

• ReLU activation function


• Truncated BPTT
• Clip gradient
• Use learning rate scheduling
• Add residual connection
• Change architectures- LSTM, GRU
Long-term dependencies

• Skip connections

• Leaky units
Long Short-Term Memory cell
What is LSTM cell?

A Vania RNN cell An LSTM cell

RNN Ct-1 Ct
ht-1 ht LSTM
Cell Cell
ht-1 ht

Xt
Xt
Inside the LSTM cell

Ct-1 * + Ct

LSTM Cell

ht-1 * ht

Xt
Inside the LSTM cell

C * + Ct
t-
1 * *
LSTM Cell
f i g o
h ht
t-
1
Xt
Gated Recurrent Unit (GRU)

ht-1 + ht
* *
LSTM Cell 1- *

Xt
Gated Recurrent Unit (GRU)

ht-1 g + ht
* *
LSTM Cell 1- *
r z

Xt
Gated Recurrent Unit (GRU)

ht- g + ht
1 * *
LSTM Cell 1- *
r z

Xt

You might also like