Modeling Sequences with RNNs (part 1

Modeling with Machine
Learning: RNN (part 1)

Outline (part 1)
‣ Modeling sequences
‣ The problem of encoding sequences
‣ Recurrent Neural Networks (RNNs)
Temporal/sequence problems
‣ How to cast as a supervised learning problem?
USD/EURO
?
‣ How to cast as a supervised learning problem?
USD/EURO
‣ Historical data can be broken down into feature vectors

and target values (sliding window)
2 3
0.82
6 0.80 7
6 7 0.89
4 0.73 5
0.72
(t)
(t) y
<latexit sha1_base64="CpARfXOGarb4BA4vHdXLXAYOCtY=">AAACQXicfZDLSgMxFIYz3q23qks3g1VQkTKjgi4LunAjVrBW6BQ5k562oZnMkJwRy9CHcKuP41P4CO7ErRvTi+ANDwQ+/vMnOecPEykMed6zMzY+MTk1PTObm5tfWFzKL69cmTjVHCs8lrG+DsGgFAorJEjidaIRolBiNewc9/vVW9RGxOqSugnWI2gp0RQcyErVIGmLLdq+yRe8ojco9zf4IyiwUZVvlp2NoBHzNEJFXIIxNd9LqJ6BJsEl9nJBajAB3oEW1iwqiNDUs8G8PXfTKg23GWt7FLkD9euNDCJjulFonRFQ2/zs9cW/erWUmkf1TKgkJVR8+FEzlS7Fbn95tyE0cpJdC8C1sLO6vA0aONmIcsEJ2l00ntl3zxPUQLHeyQLQrQjuena3VrDbp/+MQn0aLeVsrv7PFH/D1V7R3y96FweFUmmU8AxbY+tsi/nskJXYKSuzCuOsw+7ZA3t0npwX59V5G1rHnNGdVfatnPcPJiOv5Q==</latexit>
‣ Language modeling: what comes next?
This course has been a tremendous …


2 3
0
6 .. 7
6 . 7
6
tremendous 4 1 57
0
2 3 ?
1
6 0 7
6 7
a 6 .. 7
4 . 5
0
(t) (t)
y

2 3
1
6 0 7
6 7
a 6 .. 7
4 . 5
0
2 3 tremendous
0
6 1 7
been 6 7
6 .. 7
4 . 5
0
(t) (t)
y
What are we missing?
‣ Sequence prediction problems can be recast in a form
amenable to feed-forward neural networks
‣ But we have to engineer how “history” is mapped to a
vector (representation). This vector is then fed into, e.g.,
a neural network
- how many steps back should we look at?
- how to retain important items mentioned far back?
‣ Instead, we would like to learn how to encode the

“history” into a vector
Learning to encode/decode
‣ Language modeling
This course has been a success (?)
‣ Sentiment classification
I have seen better lectures -1
‣ Machine translation
Olen nähnyt parempia
I have seen better lectures
luentoja
encoding decoding
Key concepts
‣ Encoding (this lecture)
- e.g., mapping a sequence to a vector
‣ Decoding (next lecture)
- e.g., mapping a vector to, e.g., a sequence
Encoding everything
2 32 32 3
.1 .7 .2
words 4 .3 5 4 .1 5 4 .8 5 …
.4 .0 .3
“Efforts and courage are not enough
without purpose and direction” — JFK sentences
2 3
2 3 .2
.3 4 .3 5
4 .3 5 2 3 .6
images .5 .2
4 .4 5
.6
events
Example: encoding sentences
‣ Easy to introduce adjustable “lego pieces” and optimize
them for end-to-end performance
context new context

or state
✓ or state RNN
new information
<null>
Efforts and courage are not …

context new context

or state
✓ or state RNN
new information
s,s s,x
st = tanh(W st 1 +W xt )
<null>

context new context

or state
✓ or state RNN
new information
s,s s,x
<null>

context new context

or state
✓ or state RNN
new information
s,s s,x
lego piece
<null>

context new context

or state
✓ or state RNN
new information
s,s s,x
lego piece
summary of
<null>
“Efforts”

context new context

or state
✓ or state RNN
new information
s,s s,x
lego piece
<null>
summary of
“Efforts and”

context new context

or state
✓ or state RNN
new information
s,s s,x
lego piece
<null> sentence
as a vector

context new context

or state
✓ or state RNN
new information
s,s s,x lego piece
st = tanh(W st 1+W xt )
(encoder)
lego piece
<null> sentence
as a vector

‣ There are three differences between the encoder
(unfolded RNN) and a standard feed-forward architecture
- input is received at each layer (per word), not just at the
beginning as in a typical feed-forward network
- the number of layers varies, and depends on the length of the
sentence
- parameters of each layer (representing an application of an
RNN) are shared (same RNN at each step)
lego piece
(encoder)
lego piece
sentence
<null> ✓ ✓ ✓ ✓ ✓ as a vector

What’s in the box?
‣ We can make the RNN more sophisticated…
context
✓ new context basic
or state or state
RNN
new information
s,s s,x
context
✓ new context simple
or state or state
gated RNN
new information
g,s g,x
gt = sigmoid(W st 1 +W xt )
st = (1 gt ) st 1 + gt tanh(W s,s st 1 + W s,x
xt )
context
✓ new context LSTM
or state or state
(one of many)
new information
f,h f,x
ft = sigmoid(W ht 1 +W xt ) forget gate
i,h i,x
it = sigmoid(W ht 1 +W xt ) input gate
o,h o,x
ot = sigmoid(W ht 1 +W xt ) output gate
c,h c,x memory
ct = ft ct 1 + it tanh(W ht 1 +W xt )
cell
ht = o t tanh(ct ) visible state
Key things
‣ Neural networks for sequences: encoding
‣ RNNs, unfolded
- state evolution, gates
- relation to feed-forward neural networks
- back-propagation (conceptually)
‣ Issues: vanishing/exploding gradient
‣ LSTM (operationally)

Modeling Sequences with RNNs (part 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modeling Sequences with RNNs (part 1

Uploaded by

Copyright:

Available Formats

Modeling with Machine

Learning: RNN (part 1)

‣ Historical data can be broken down into feature vectors

This course has been a tremendous …

This course has been a tremendous …

This course has been a tremendous …

‣ Instead, we would like to learn how to encode the

context new context

Efforts and courage are not …

context new context

Efforts and courage are not …

context new context

Efforts and courage are not …

context new context

Efforts and courage are not …

context new context

Efforts and courage are not …

context new context

Efforts and courage are not …

context new context

Efforts and courage are not …

context new context

Efforts and courage are not …

Efforts and courage are not …

You might also like