Professional Documents
Culture Documents
?
Temporal/sequence problems
‣ How to cast as a supervised learning problem?
USD/EURO
0
2 3 ?
1
6 0 7
6 7
a 6 .. 7
4 . 5
0
(t) (t)
<latexit sha1_base64="CpARfXOGarb4BA4vHdXLXAYOCtY=">AAACQXicfZDLSgMxFIYz3q23qks3g1VQkTKjgi4LunAjVrBW6BQ5k562oZnMkJwRy9CHcKuP41P4CO7ErRvTi+ANDwQ+/vMnOecPEykMed6zMzY+MTk1PTObm5tfWFzKL69cmTjVHCs8lrG+DsGgFAorJEjidaIRolBiNewc9/vVW9RGxOqSugnWI2gp0RQcyErVIGmLLdq+yRe8ojco9zf4IyiwUZVvlp2NoBHzNEJFXIIxNd9LqJ6BJsEl9nJBajAB3oEW1iwqiNDUs8G8PXfTKg23GWt7FLkD9euNDCJjulFonRFQ2/zs9cW/erWUmkf1TKgkJVR8+FEzlS7Fbn95tyE0cpJdC8C1sLO6vA0aONmIcsEJ2l00ntl3zxPUQLHeyQLQrQjuena3VrDbp/+MQn0aLeVsrv7PFH/D1V7R3y96FweFUmmU8AxbY+tsi/nskJXYKSuzCuOsw+7ZA3t0npwX59V5G1rHnNGdVfatnPcPJiOv5Q==</latexit>
y
Temporal/sequence problems
‣ Language modeling: what comes next?
(t) (t)
<latexit sha1_base64="CpARfXOGarb4BA4vHdXLXAYOCtY=">AAACQXicfZDLSgMxFIYz3q23qks3g1VQkTKjgi4LunAjVrBW6BQ5k562oZnMkJwRy9CHcKuP41P4CO7ErRvTi+ANDwQ+/vMnOecPEykMed6zMzY+MTk1PTObm5tfWFzKL69cmTjVHCs8lrG+DsGgFAorJEjidaIRolBiNewc9/vVW9RGxOqSugnWI2gp0RQcyErVIGmLLdq+yRe8ojco9zf4IyiwUZVvlp2NoBHzNEJFXIIxNd9LqJ6BJsEl9nJBajAB3oEW1iwqiNDUs8G8PXfTKg23GWt7FLkD9euNDCJjulFonRFQ2/zs9cW/erWUmkf1TKgkJVR8+FEzlS7Fbn95tyE0cpJdC8C1sLO6vA0aONmIcsEJ2l00ntl3zxPUQLHeyQLQrQjuena3VrDbp/+MQn0aLeVsrv7PFH/D1V7R3y96FweFUmmU8AxbY+tsi/nskJXYKSuzCuOsw+7ZA3t0npwX59V5G1rHnNGdVfatnPcPJiOv5Q==</latexit>
y
What are we missing?
‣ Sequence prediction problems can be recast in a form
amenable to feed-forward neural networks
‣ But we have to engineer how “history” is mapped to a
vector (representation). This vector is then fed into, e.g.,
a neural network
- how many steps back should we look at?
- how to retain important items mentioned far back?
‣ Sentiment classification
I have seen better lectures -1
‣ Machine translation
Olen nähnyt parempia
I have seen better lectures
luentoja
encoding decoding
Key concepts
‣ Encoding (this lecture)
- e.g., mapping a sequence to a vector
‣ Decoding (next lecture)
- e.g., mapping a vector to, e.g., a sequence
Encoding everything
2 32 32 3
.1 .7 .2
words 4 .3 5 4 .1 5 4 .8 5 …
.4 .0 .3
“Efforts and courage are not enough
without purpose and direction” — JFK sentences
2 3
2 3 .2
.3 4 .3 5
4 .3 5 2 3 .6
images .5 .2
4 .4 5
.6
events
new information
<null>
new information
s,s s,x
st = tanh(W st 1 +W xt )
<null>
new information
s,s s,x
st = tanh(W st 1 +W xt )
<null>
new information
s,s s,x
st = tanh(W st 1 +W xt )
lego piece
<null>
new information
s,s s,x
st = tanh(W st 1 +W xt )
lego piece
summary of
<null>
“Efforts”
new information
s,s s,x
st = tanh(W st 1 +W xt )
lego piece
<null>
summary of
“Efforts and”
new information
s,s s,x
st = tanh(W st 1 +W xt )
lego piece
<null> sentence
as a vector
new information
s,s s,x lego piece
st = tanh(W st 1+W xt )
(encoder)
lego piece
<null> sentence
as a vector
lego piece
(encoder)
lego piece
sentence
<null> ✓ ✓ ✓ ✓ ✓ as a vector
context
✓ new context basic
or state or state
RNN
new information
s,s s,x
st = tanh(W st 1 +W xt )
What’s in the box?
‣ We can make the RNN more sophisticated…
context
✓ new context simple
or state or state
gated RNN
new information
g,s g,x
gt = sigmoid(W st 1 +W xt )
st = (1 gt ) st 1 + gt tanh(W s,s st 1 + W s,x
xt )
What’s in the box?
‣ We can make the RNN more sophisticated…
context
✓ new context LSTM
or state or state
(one of many)
new information
f,h f,x
ft = sigmoid(W ht 1 +W xt ) forget gate
i,h i,x
it = sigmoid(W ht 1 +W xt ) input gate
o,h o,x
ot = sigmoid(W ht 1 +W xt ) output gate
c,h c,x memory
ct = ft ct 1 + it tanh(W ht 1 +W xt )
cell
ht = o t tanh(ct ) visible state
Key things
‣ Neural networks for sequences: encoding
‣ RNNs, unfolded
- state evolution, gates
- relation to feed-forward neural networks
- back-propagation (conceptually)
‣ Issues: vanishing/exploding gradient
‣ LSTM (operationally)
Attribution List - Machine learning - 6.86x
1.
Unit 1 Lecture 8: Introduction to Machine Learning
Photo portrait of John F. Kennedy
Slides: #11
Object Source / URL: https://commons.wikimedia.org/wiki/File:John_F._Kennedy,_White_House_photo_portrait,_looking_up.jpg
Citation/Attribution: This file is a work of an employee of the Executive Office of the President of the United States, taken or made as part of that person's official duties.
As a work of the U.S. federal government, it is in the public domain.
2.
Unit 1 Lecture 8: Introduction to Machine Learning
John F Kennedy speech on May 25th, 1961
Slides: #11
Object Source / URL: https://www.defense.gov/Explore/Spotlight/Apollo-11/
Citation/Attribution: Image from defense.gov.This work is in the public domain