Professional Documents
Culture Documents
2
Idea: Make “remembering” easy
▪ Define a more complicated update mechanism for the changing of the internal state
▪ By default, LSTMs remember the information from the last step
▪ Items are overwritten as an active choice
3
LSTM diagram
output
input
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
4
LSTM diagram
ℎ𝑡−1 ℎ𝑡
5
LSTM diagram
Decide what
to “forget”
ℎ𝑡−1 ℎ𝑡
6
LSTM diagram
Add in “new”
information
ℎ𝑡−1 ℎ𝑡
7
LSTM diagram
Decide what
to “forget”
ℎ𝑡−1
𝑐𝑡−1 𝑐𝑡 ℎ𝑡−1
𝑥𝑡
ℎ𝑡−1 ℎ𝑡 𝑥𝑡
ℎ𝑡−1 , 𝑥𝑡
concatenation of vectors
8
LSTM diagram
Decide what
to “forget”
𝑐𝑡−1 𝑐𝑡
𝑓𝑡 = 𝜎 𝑊𝑓 ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝑓
9
LSTM diagram
Decide what
to “forget”
𝑐𝑡−1 𝑐𝑡
𝑓𝑡 = 𝜎 𝑊𝑓 ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝑓
10
LSTM diagram
Add in “new”
information
𝑐𝑡−1 𝑐𝑡
𝑖𝑡 = 𝜎 𝑊𝑖 ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝑖
𝐶𝑡′ = 𝑡𝑎𝑛ℎ 𝑊𝐶 ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝐶
ℎ𝑡−1 ℎ𝑡
11
LSTM diagram
Add in “new”
information
𝑐𝑡−1 𝑐𝑡
𝑖𝑡 = 𝜎 𝑊𝑖 ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝑖
𝐶𝑡′ = 𝑡𝑎𝑛ℎ 𝑊𝐶 ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝐶
ℎ𝑡−1 ℎ𝑡
12
LSTM diagram
Add in “new”
information
𝑐𝑡−1 𝑐𝑡
𝑖𝑡 = 𝜎 𝑊𝑖 ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝑖
𝐶𝑡′ = 𝑡𝑎𝑛ℎ 𝑊𝐶 ℎ𝑡−1 , 𝑥𝑡 + 𝑏𝐶
ℎ𝑡−1 ℎ𝑡
13
LSTM diagram
𝐶𝑡 = 𝑓𝑖 ∗ 𝐶𝑡−1 + 𝑖𝑡 ∗ 𝐶𝑡′
ℎ𝑡−1 ℎ𝑡
forget add
the old the new
(or not) (or not)
14
LSTM diagram
𝐶𝑡 = 𝑓𝑖 ∗ 𝐶𝑡−1 + 𝑖𝑡 ∗ 𝐶𝑡′
ℎ𝑡−1 ℎ𝑡
forget add
the old the new
(or not) (or not)
15
LSTM diagram
𝐶𝑡 = 𝑓𝑖 ∗ 𝐶𝑡−1 + 𝑖𝑡 ∗ 𝐶𝑡′
ℎ𝑡−1 ℎ𝑡
forget add
the old the new
(or not) (or not)
16
LSTM diagram
17
LSTM diagram
18
LSTM unrolled
19
Final Points
▪ This is the most common version of LSTM, but there are many different “flavors”
– Gated Recurrent Unit (GRU)
– Depth-Gated RNN
▪ LSTMs have considerably more parameters than plain RNNs
▪ Most of the big performance improvements in NLP have come from LSTMs, not
plain RNN
20