Professional Documents
Culture Documents
Image Captions With Deep Learning: Yulia Kogan & Ron Shiff
Image Captions With Deep Learning: Yulia Kogan & Ron Shiff
(d V )
• “Semantically” close vectors are
close In the Vector Space.
• Semantic Relations are preserved
in Vector Space:
“king”+”woman”-”man”=“queen”
Word Vectors
• A word Vector can be written as :
x Wy where y is a “one hot” vector, W R dx|V |
• Beneficial for most Deep learning tasks
RNN– Language Model
(Based on Richard Socher’s lecture – Deep Learning in NLP Stanford)
A language model computes a probability for a sequence of words:
P ( w1 ,..., wT )
Examples:
• Word ordering:
P ( the cat is small) P (small is the cat )
• Word Choice:
P (I am going home) P (I am going house)
Recurrent Neural Networks Language Model
• Each output depends on all previous inputs
RNN– Language Model
• Input : Word Vectors – x1 ,..., xt ,..., xT
• At each time, compute:
ht (W hh ht 1 W hx xt )
yˆ t soft max(W s ht 1 )
• Output: yˆ t , j Pˆ ( xt 1 v j | xt ,..., x1 )
xt R d ht R Dh yˆ t R |V |
W hx R Dh xd W hh R Dh xDh W s R |V | xDh
Recurrent Neural Networks-Language Model
• Total Objective is to maximize the log-likelihood w.r.t parameters
5
T |V |
J ML ( ) yt , j log( yˆ t , j ) 4
t 1 j 1
3
-log(y)
2
yˆ Pˆ ( x v | x ,..., x )
t, j t 1 j t 1 1
• log-likelihood: -1
T T
0 0.5 y 1 1.5
J t J t yt t
ht hk
• By Chain rule:
W yt ht
k 1 hk W
ht t
hi t
hk
i k 1 hi 1
diag ( ' (zi ))W
i k 1
ht t
hi t
diag ( ' ( z i )) W
W t k
hk i k 1 hi 1 i k 1
Vanishing/Exploding gradient problem
• Gradients can be very large or very small –
ht t
hi t
f t (W f ht 1 , xt b f )
it (Wi ht 1 , xt bi )
~
Ct (WC ht 1 , xt bC )
~
Ct f t Ct 1 it Ct
ot (Wo ht 1 , xt bo )
ht ot tanh(Ct )
LSTM’s
• “Forget Gate”
f t (W f ht 1 , xt b f )
LSTM’s
• “Input gate layer”
it (Wi ht 1 , xt bi )
~
Ct (WC ht 1 , xt bC )
LSTM’s
• Updating memory cell
~
Ct f t Ct 1 it Ct
• No longer Exp.:
Ct f t f t 1 f t 2 Ct 3
Acknowledgments:
4. Andrej Karpathy - http://karpathy.github.io/2015/05/21/rnn-effectiveness
/
5. Richard Socher - http://cs224d.stanford.edu/
6. Christopher Olah -
http://colah.github.io/posts/2015-08-Understanding-LSTMs/