You are on page 1of 63

Very Deep Learning

Lecture 10

Dr. Muhammad Zeshan Afzal, Prof. Didier Stricker


MindGarage, University of Kaiserslautern
afzal.tukl@gmail.com

M. Zeshan Afzal, Very Deep Learning Ch. 10


Recap

M. Zeshan Afzal, Very Deep Learning Ch. 10 2


Mapping Types

M. Zeshan Afzal, Very Deep Learning Ch. 10 3


Mapping types

◼ One to One

M. Zeshan Afzal, Very Deep Learning Ch. 10 4


Mapping types

◼ One to Many

M. Zeshan Afzal, Very Deep Learning Ch. 10 5


Mapping types

◼ Many to One

Image Source: Chen-Wen et.al. Outpatient Text Classification Using Attention-Based Bidirectional LSTM for Robot-Assisted Servicing in Hospital

M. Zeshan Afzal, Very Deep Learning Ch. 10 6


Mapping types

◼ Many to Many

Image Source: https://arxiv.org/abs/1607.05781

M. Zeshan Afzal, Very Deep Learning Ch. 10 7


Mapping types

◼ Many to Many

Image Source: https://medium.com/@gautam.karmakar/attention-for-neural-connectionist-machine-translation-b833d1e085a3

M. Zeshan Afzal, Very Deep Learning Ch. 10 8


Feedforward vs Recurrent Neural Networks

◼ Recurrent Neural Networks (RNNs)


^ Core idea: update hidden state h based on input and previous hidden state using same
update rule (same/shared parameters) at each time step
^ Allows for processing sequences of variable length, not only fixed-sized vectors
^ Infinite memory: h is function of all previous inputs (long-term dependencies)

M. Zeshan Afzal, Very Deep Learning Ch. 10 9


Truncated Backpropagation through Time

◼ Truncated Backpropagation through Time


^ Thus, one typically uses truncated backpropagation through time in practice
^ Carry hidden states forward in time forever, but stop backpropagation earlier
^ Total loss is sum of individual loss functions (= negative log likelihood)

M. Zeshan Afzal, Very Deep Learning Ch. 10 10


Truncated Backpropagation through Time

◼ Truncated Backpropagation through Time


^ Thus, one typically uses truncated backpropagation through time in practice
^ Carry hidden states forward in time forever, but stop backpropagation earlier
^ Total loss is sum of individual loss functions (= negative log likelihood)

M. Zeshan Afzal, Very Deep Learning Ch. 10 11


Truncated Backpropagation through Time

◼ Truncated Backpropagation through Time


^ Thus, one typically uses truncated backpropagation through time in practice
^ Carry hidden states forward in time forever, but stop backpropagation earlier
^ Total loss is sum of individual loss functions (= negative log likelihood)

M. Zeshan Afzal, Very Deep Learning Ch. 10 12


Multilayer RNNs

◼ Multilayer RNNs
^ Deeper multi-layer RNNs can be constructed by stacking RNN layers
^ An alternative is to make each individual computation (=RNN cell) deeper

M. Zeshan Afzal, Very Deep Learning Ch. 10 13


Bidirectional RNNs

M. Zeshan Afzal, Very Deep Learning Ch. 10 14


Gated Recurrent Unit

◼ UGRNN: Update Gate Recurrent Neural Network


◼ GRU: Gated Recurrent Unit
◼ LSTM: Long Short-Term Memory
◼ LSTM was the first and most transformative (revolutionized NLP in 2015, e.g.
at Google), but also most complex model. UGRNN and GRU work similarly
well.
◼ Common to all architectures: gates for filtering information

M. Zeshan Afzal, Very Deep Learning Ch. 10 15


Update Gate RNN

◼ ut is called update gate as it determines if the hidden state h is updated or not


◼ st is the next target state that is added to ht−1 with element-wise weights ut
◼ Remark: Gates use sigmoid (∈ [0, 1]), state computation uses tanh (∈ [−1, 1])
◼ Where ʘ denotes the Hadamard product (elementwise product).

M. Zeshan Afzal, Very Deep Learning Ch. 10 16


Gated Recurrent Unit

◼ Reset gate controls which parts of the state are used to compute next target
state
◼ Update gate controls how much information to pass from previous time step

M. Zeshan Afzal, Very Deep Learning Ch. 10 17


Long Short-Term Memory

◼ Passes along an additional cell state c in addition to the hidden state h. Has 3
gates:
◼ Forget gate determines information to erase from cell state
◼ Input gate determines which values of cell state to update
◼ Output gate determines which elements of cell state to reveal at time t
◼ Remark: Cell update tanh(·) creates new target values st for cell state

M. Zeshan Afzal, Very Deep Learning Ch. 10 18


UGRNN vs. GRU vs. LSTM

UGRNN GRU LSTM


One gate Two gate Three gate
Expose entire state Expose entire state Control exposure
Single update gate Single update gate Input/forget gates
Few parameters Medium parameters Many parameters

A systematic study [Collins et al., 2017] states:


“Our results point to the GRU as being the most learnable of gated RNNs for shallow architectures,
followed by the UGRNN.”

M. Zeshan Afzal, Very Deep Learning Ch. 10 19


RNN Applications

M. Zeshan Afzal, Very Deep Learning Ch. 10 20


Multiple Object Recognition

◼ At each time step, perceive a glimpse (= image region) and predict


a saccade
Ba, Mnih and Kavukcuoglu: Multiple Object Recognition with Visual Attention. ICLR, 2015.

M. Zeshan Afzal, Very Deep Learning Ch. 10 21


Recurrent Instance Segmentation

◼ At each time step, segment the next (not yet segmented) part of
an object
Romera-Paredes, Torr: Recurrent Instance Segmentation. ECCV, 2016.

M. Zeshan Afzal, Very Deep Learning Ch. 10 22


Object Tracking

◼ Tracking of multiple objects by updating each object’s hidden


state using an RNN
He, Li, Liu, He and Barber: Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers. CVPR, 2019.

M. Zeshan Afzal, Very Deep Learning Ch. 10 23


Image Generation

◼ Model for sequential image generation (red rectangle = attended region)


◼ Demonstrates that RNNs can also process/generate non-sequential data
Gregor, Danihelka, Graves, Rezende and Wierstra: DRAW: A Recurrent Neural Network For Image Generation. ICML, 2015.

M. Zeshan Afzal, Very Deep Learning Ch. 10 24


Image Generation

◼ Generated images based on partially occluded inputs


◼ Demonstrates that RNNs can also process/generate non-sequential data
Oord, Kalchbrenner and Kavukcuoglu: Pixel Recurrent Neural Networks. ICML, 2016.

M. Zeshan Afzal, Very Deep Learning Ch. 10 25


Modeling Road Layouts

◼ Iteratively generate or infer road layouts as spatial graphs

Chu et al.: Neural Turtle Graphics for Modeling City Road Layouts. ICCV, 2019.

M. Zeshan Afzal, Very Deep Learning Ch. 10 26


Image Captioning

◼ Generate image description by sequentially looking at an image

Xu et al.: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML, 2015.

M. Zeshan Afzal, Very Deep Learning Ch. 10 27


Image Captioning

◼ Attention over time


◼ Top: soft attention. Bottom: hard attention
Xu et al.: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML, 2015.

M. Zeshan Afzal, Very Deep Learning Ch. 10 28


Image Captioning

◼ Successful caption generations

Xu et al.: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML, 2015.

M. Zeshan Afzal, Very Deep Learning Ch. 10 29


Image Captioning

◼ Wrongly generated captions. Attention can reveal insights into what went
wrong.

Xu et al.: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML, 2015.

M. Zeshan Afzal, Very Deep Learning Ch. 10 30


What is VQA?

◼ an algorithm must answer text-based questions about images

(Agrawal et al)

M. Zeshan Afzal, Very Deep Learning Ch. 10 31


Simple Architecture (Agrawal et al)

M. Zeshan Afzal, Very Deep Learning Ch. 10 33


Datasets

◼ What kind of datasets

M. Zeshan Afzal, Very Deep Learning Ch. 10 39


Question Types (Kafle et al.)

M. Zeshan Afzal, Very Deep Learning Ch. 10 40


Distribution (Kafle et al.)

M. Zeshan Afzal, Very Deep Learning Ch. 10 41


Distribution (Kafle et al.)

M. Zeshan Afzal, Very Deep Learning Ch. 10 42


Neural Machine Translation

Wu et al.: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Arxiv, 2016.

M. Zeshan Afzal, Very Deep Learning Ch. 10 43


Example:
Character-level
Language Model

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

M. Zeshan Afzal, Very Deep Learning Ch. 10


Example:
Character-level
Language Model

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

M. Zeshan Afzal, Very Deep Learning Ch. 10


Example:
Character-level
Language Model

Vocabulary:
[h,e,l,o]

Example training
sequence:
“hello”

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

M. Zeshan Afzal, Very Deep Learning Ch. 10


Character-level Language Models
◼ Generating natural language text
character by character with an RNN
◼ In the example on the right: Alphabet
with 4 characters (’h’, ’e’, ’l’, ’o’)
◼ Each character is represented by a 1
hot vector, e.g.: (1, 0, 0, 0)T
◼ Model predicts distribution over next
character via softmax function
◼ Character drawn from distribution is
fed as input to RNN at next time step

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

M. Zeshan Afzal, Very Deep Learning Ch. 10 47


Character-level Language Models
◼ 3-layer RNN
◼ 512 hidden nodes
◼ Trained on all works of William
Shakespeare
◼ 4.4 Mio characters

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)


Source: http://cs231n.stanford.edu/

M. Zeshan Afzal, Very Deep Learning Ch. 10 48


Character-level Language Models

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)


Source: http://cs231n.stanford.edu/

M. Zeshan Afzal, Very Deep Learning Ch. 10 49


Character-level Language Models

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)


Source: http://cs231n.stanford.edu/

M. Zeshan Afzal, Very Deep Learning Ch. 10 50


Character-level Language Models

◼ 3-layer RNN trained for several days on Linux source code (474 MB)
◼ Sampled code snippets do not compile but look reasonable overall
◼ Learned that code starts with license, uses correct syntax, adds comments
The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

M. Zeshan Afzal, Very Deep Learning Ch. 10 51


Character-level Language Models

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)


Source: http://cs231n.stanford.edu/

M. Zeshan Afzal, Very Deep Learning Ch. 10 52


Character-level Language Models

◼ The behaviour of some hidden neurons (∼ 5%) is logical and human


interpretable

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

M. Zeshan Afzal, Very Deep Learning Ch. 10 53


Character-level Language Models

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

M. Zeshan Afzal, Very Deep Learning Ch. 10 54


OCR

M. Zeshan Afzal, Very Deep Learning Ch. 10 55


OCR

M. Zeshan Afzal, Very Deep Learning Ch. 10 56


Multidimenssional Recurrent Neural
Networks

M. Zeshan Afzal, Very Deep Learning Ch. 10 57


BLSTM

https://www.cs.toronto.edu/~graves/phd.pdf

Alex Graves, PhD thesis


M. Zeshan Afzal, Very Deep Learning Ch. 10 58
Multidimensional RNN

https://www.cs.toronto.edu/~graves/phd.pdf

Alex Graves, PhD thesis


M. Zeshan Afzal, Very Deep Learning Ch. 10 59
Multidimensional Multidirectional RNN

https://www.cs.toronto.edu/~graves/phd.pdf

Alex Graves, PhD thesis


M. Zeshan Afzal, Very Deep Learning Ch. 10 60
2D Bidirectional LSTM

Afzal, Muhammad Zeshan, et al. "Document image binarization using lstm: A sequence learning
approach." Proceedings of the 3rd international workshop on historical document imaging and processing. 2015.

M. Zeshan Afzal, Very Deep Learning Ch. 10 61


M. Zeshan Afzal, Very Deep Learning Ch. 10 62
Afzal, Muhammad Zeshan, et al. "Document image binarization using lstm: A sequence learning
approach." Proceedings of the 3rd international workshop on historical document imaging and processing. 2015.

M. Zeshan Afzal, Very Deep Learning Ch. 10 63


Results

Afzal, Muhammad Zeshan, et al. "Document image binarization using lstm: A sequence learning
approach." Proceedings of the 3rd international workshop on historical document imaging and processing. 2015.

M. Zeshan Afzal, Very Deep Learning Ch. 10 64


Results

Afzal, Muhammad Zeshan, et al. "Document image binarization using lstm: A sequence learning
approach." Proceedings of the 3rd international workshop on historical document imaging and processing. 2015.

M. Zeshan Afzal, Very Deep Learning Ch. 10 65


OCR

M. Zeshan Afzal, Very Deep Learning Ch. 10 66


OCR

M. Zeshan Afzal, Very Deep Learning Ch. 10 67


Scene text analysis

M. Zeshan Afzal, Very Deep Learning Ch. 10 68


Thanks a lot for your Attention

M. Zeshan Afzal, Very Deep Learning Ch. 10 69

You might also like