Professional Documents
Culture Documents
Lecture11 VDL
Lecture11 VDL
Lecture 12
◼ https://vlu.cs.uni-kl.de/
◼ Positions
^ Couple of PhD positions
^ Masters thesis
^ Masters project
^ Hiwi
Mikolov, Chen, Corrado and Dean: Efficient Estimation of Word Representations in Vector Space. ICLR Workshops, 2013.
Mikolov, Chen, Corrado and Dean: Efficient Estimation of Word Representations in Vector Space. ICLR Workshops, 2013.
Mikolov, Chen, Corrado and Dean: Efficient Estimation of Word Representations in Vector Space. ICLR Workshops, 2013.
Mikolov, Chen, Corrado and Dean: Efficient Estimation of Word Representations in Vector Space. ICLR Workshops, 2013.
◼ Embed
◼ Encode
◼ (Attend)
◼ Decode
◼ Predict
Decoder RNN
Encoder RNN
Decoder RNN
Encoder RNN
• First we will show via diagram (no equations), then we will show
with equations
Decoder RNN
Encoder
RNN
il a m’ entarté <START>
Decoder RNN
Encoder
RNN
il a m’ entarté <START>
Decoder RNN
Encoder
RNN
il a m’ entarté <START>
Decoder RNN
Encoder
RNN
il a m’ entarté <START>
scores distribution
Attention Attention encoder hidden state (”he”)
Decoder RNN
Encoder
RNN
il a m’ entarté <START>
scores distribution
Attention Attention
The attention output mostly contains
information from the hidden states that
received high attention.
Decoder RNN
Encoder
RNN
il a m’ entarté <START>
scores distribution
Attention Attention ŷ with decoder hidden state, then
use to compute ŷ as before
Decoder RNN
Encoder
RNN
il a m’ entarté <START>
scores distribution
ŷ
Attention Attention
Decoder RNN
Encoder
RNN
scores distribution
Attention Attention ŷ
Decoder RNN
Encoder
RNN
Decoder RNN
Encoder
RNN
scores distribution
Attention Attention
ŷ
Decoder RNN
Encoder
RNN
Decoder RNN
Encoder
RNN
• We take softmax to get the attention distribution for this step (this is a
probability distribution and sums to 1)
• We use to take a weighted sum of the encoder hidden states to get the
attention output