Very Deep Learning

Very Deep Learning
Lecture 10
Dr. Muhammad Zeshan Afzal, Prof. Didier Stricker

MindGarage, University of Kaiserslautern
afzal.tukl@gmail.com
M. Zeshan Afzal, Very Deep Learning Ch. 10

Recap
M. Zeshan Afzal, Very Deep Learning Ch. 10 2

Mapping Types

Mapping types
◼ One to One

Mapping types
◼ One to Many

Mapping types
◼ Many to One
Image Source: Chen-Wen et.al. Outpatient Text Classification Using Attention-Based Bidirectional LSTM for Robot-Assisted Servicing in Hospital

Mapping types
◼ Many to Many
Image Source: https://arxiv.org/abs/1607.05781

Mapping types
◼ Many to Many
Image Source: https://medium.com/@gautam.karmakar/attention-for-neural-connectionist-machine-translation-b833d1e085a3

Feedforward vs Recurrent Neural Networks
◼ Recurrent Neural Networks (RNNs)

^ Core idea: update hidden state h based on input and previous hidden state using same
update rule (same/shared parameters) at each time step
^ Allows for processing sequences of variable length, not only fixed-sized vectors
^ Infinite memory: h is function of all previous inputs (long-term dependencies)

Truncated Backpropagation through Time
◼ Truncated Backpropagation through Time

^ Thus, one typically uses truncated backpropagation through time in practice
^ Carry hidden states forward in time forever, but stop backpropagation earlier
^ Total loss is sum of individual loss functions (= negative log likelihood)





Multilayer RNNs
◼ Multilayer RNNs
^ Deeper multi-layer RNNs can be constructed by stacking RNN layers
^ An alternative is to make each individual computation (=RNN cell) deeper

Bidirectional RNNs

Gated Recurrent Unit
◼ UGRNN: Update Gate Recurrent Neural Network

◼ GRU: Gated Recurrent Unit
◼ LSTM: Long Short-Term Memory
◼ LSTM was the first and most transformative (revolutionized NLP in 2015, e.g.
at Google), but also most complex model. UGRNN and GRU work similarly
well.
◼ Common to all architectures: gates for filtering information

Update Gate RNN
◼ ut is called update gate as it determines if the hidden state h is updated or not

◼ st is the next target state that is added to ht−1 with element-wise weights ut
◼ Remark: Gates use sigmoid (∈ [0, 1]), state computation uses tanh (∈ [−1, 1])
◼ Where ʘ denotes the Hadamard product (elementwise product).

Gated Recurrent Unit
◼ Reset gate controls which parts of the state are used to compute next target
state
◼ Update gate controls how much information to pass from previous time step

Long Short-Term Memory
◼ Passes along an additional cell state c in addition to the hidden state h. Has 3
gates:
◼ Forget gate determines information to erase from cell state
◼ Input gate determines which values of cell state to update
◼ Output gate determines which elements of cell state to reveal at time t
◼ Remark: Cell update tanh(·) creates new target values st for cell state

UGRNN vs. GRU vs. LSTM
UGRNN GRU LSTM

One gate Two gate Three gate
Expose entire state Expose entire state Control exposure
Single update gate Single update gate Input/forget gates
Few parameters Medium parameters Many parameters
A systematic study [Collins et al., 2017] states:

“Our results point to the GRU as being the most learnable of gated RNNs for shallow architectures,
followed by the UGRNN.”

RNN Applications

Multiple Object Recognition
◼ At each time step, perceive a glimpse (= image region) and predict

a saccade
Ba, Mnih and Kavukcuoglu: Multiple Object Recognition with Visual Attention. ICLR, 2015.

Recurrent Instance Segmentation
◼ At each time step, segment the next (not yet segmented) part of
an object
Romera-Paredes, Torr: Recurrent Instance Segmentation. ECCV, 2016.

Object Tracking
◼ Tracking of multiple objects by updating each object’s hidden

state using an RNN
He, Li, Liu, He and Barber: Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers. CVPR, 2019.

Image Generation
◼ Model for sequential image generation (red rectangle = attended region)

◼ Demonstrates that RNNs can also process/generate non-sequential data
Gregor, Danihelka, Graves, Rezende and Wierstra: DRAW: A Recurrent Neural Network For Image Generation. ICML, 2015.

Image Generation
◼ Generated images based on partially occluded inputs

◼ Demonstrates that RNNs can also process/generate non-sequential data
Oord, Kalchbrenner and Kavukcuoglu: Pixel Recurrent Neural Networks. ICML, 2016.

Modeling Road Layouts
◼ Iteratively generate or infer road layouts as spatial graphs
Chu et al.: Neural Turtle Graphics for Modeling City Road Layouts. ICCV, 2019.

Image Captioning
◼ Generate image description by sequentially looking at an image
Xu et al.: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML, 2015.

Image Captioning
◼ Attention over time

◼ Top: soft attention. Bottom: hard attention

Image Captioning
◼ Successful caption generations

Image Captioning
◼ Wrongly generated captions. Attention can reveal insights into what went
wrong.

What is VQA?
◼ an algorithm must answer text-based questions about images
(Agrawal et al)

Simple Architecture (Agrawal et al)

Datasets
◼ What kind of datasets

Question Types (Kafle et al.)

Distribution (Kafle et al.)

Distribution (Kafle et al.)

Neural Machine Translation
Wu et al.: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Arxiv, 2016.

Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”
The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”

Example:
Character-level
Language Model
Vocabulary:
[h,e,l,o]
Example training
sequence:
“hello”

Character-level Language Models
◼ Generating natural language text
character by character with an RNN
◼ In the example on the right: Alphabet
with 4 characters (’h’, ’e’, ’l’, ’o’)
◼ Each character is represented by a 1
hot vector, e.g.: (1, 0, 0, 0)T
◼ Model predicts distribution over next
character via softmax function
◼ Character drawn from distribution is
fed as input to RNN at next time step

◼ 3-layer RNN
◼ 512 hidden nodes
◼ Trained on all works of William
Shakespeare
◼ 4.4 Mio characters

Source: http://cs231n.stanford.edu/





◼ 3-layer RNN trained for several days on Linux source code (474 MB)
◼ Sampled code snippets do not compile but look reasonable overall
◼ Learned that code starts with license, uses correct syntax, adds comments



◼ The behaviour of some hidden neurons (∼ 5%) is logical and human

interpretable


OCR

OCR

Multidimenssional Recurrent Neural
Networks

BLSTM
https://www.cs.toronto.edu/~graves/phd.pdf
Alex Graves, PhD thesis

Multidimensional RNN

Multidimensional Multidirectional RNN

2D Bidirectional LSTM
Afzal, Muhammad Zeshan, et al. "Document image binarization using lstm: A sequence learning
approach." Proceedings of the 3rd international workshop on historical document imaging and processing. 2015.


Results

Results

OCR

OCR

Scene text analysis

Thanks a lot for your Attention

Very Deep Learning - 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Very Deep Learning - 2

Uploaded by

Copyright:

Available Formats

Dr. Muhammad Zeshan Afzal, Prof. Didier Stricker

M. Zeshan Afzal, Very Deep Learning Ch. 10

M. Zeshan Afzal, Very Deep Learning Ch. 10 2

M. Zeshan Afzal, Very Deep Learning Ch. 10 3

M. Zeshan Afzal, Very Deep Learning Ch. 10 4

M. Zeshan Afzal, Very Deep Learning Ch. 10 5

M. Zeshan Afzal, Very Deep Learning Ch. 10 6

Image Source: https://arxiv.org/abs/1607.05781

M. Zeshan Afzal, Very Deep Learning Ch. 10 7

Image Source: https://medium.com/@gautam.karmakar/attention-for-neural-connectionist-machine-translation-b833d1e085a3

M. Zeshan Afzal, Very Deep Learning Ch. 10 8

◼ Recurrent Neural Networks (RNNs)

M. Zeshan Afzal, Very Deep Learning Ch. 10 9

◼ Truncated Backpropagation through Time

M. Zeshan Afzal, Very Deep Learning Ch. 10 10

◼ Truncated Backpropagation through Time

M. Zeshan Afzal, Very Deep Learning Ch. 10 11

◼ Truncated Backpropagation through Time

M. Zeshan Afzal, Very Deep Learning Ch. 10 12

M. Zeshan Afzal, Very Deep Learning Ch. 10 13

M. Zeshan Afzal, Very Deep Learning Ch. 10 14

◼ UGRNN: Update Gate Recurrent Neural Network

M. Zeshan Afzal, Very Deep Learning Ch. 10 15

◼ ut is called update gate as it determines if the hidden state h is updated or not

M. Zeshan Afzal, Very Deep Learning Ch. 10 16

M. Zeshan Afzal, Very Deep Learning Ch. 10 17

M. Zeshan Afzal, Very Deep Learning Ch. 10 18

UGRNN GRU LSTM

A systematic study [Collins et al., 2017] states:

M. Zeshan Afzal, Very Deep Learning Ch. 10 19

M. Zeshan Afzal, Very Deep Learning Ch. 10 20

◼ At each time step, perceive a glimpse (= image region) and predict

M. Zeshan Afzal, Very Deep Learning Ch. 10 21

M. Zeshan Afzal, Very Deep Learning Ch. 10 22

◼ Tracking of multiple objects by updating each object’s hidden

M. Zeshan Afzal, Very Deep Learning Ch. 10 23

◼ Model for sequential image generation (red rectangle = attended region)

M. Zeshan Afzal, Very Deep Learning Ch. 10 24

◼ Generated images based on partially occluded inputs

M. Zeshan Afzal, Very Deep Learning Ch. 10 25

◼ Iteratively generate or infer road layouts as spatial graphs

M. Zeshan Afzal, Very Deep Learning Ch. 10 26

◼ Generate image description by sequentially looking at an image

M. Zeshan Afzal, Very Deep Learning Ch. 10 27

◼ Attention over time

M. Zeshan Afzal, Very Deep Learning Ch. 10 28

◼ Successful caption generations

M. Zeshan Afzal, Very Deep Learning Ch. 10 29

M. Zeshan Afzal, Very Deep Learning Ch. 10 30

◼ an algorithm must answer text-based questions about images

M. Zeshan Afzal, Very Deep Learning Ch. 10 31

M. Zeshan Afzal, Very Deep Learning Ch. 10 33

◼ What kind of datasets

M. Zeshan Afzal, Very Deep Learning Ch. 10 39

M. Zeshan Afzal, Very Deep Learning Ch. 10 40

M. Zeshan Afzal, Very Deep Learning Ch. 10 41

M. Zeshan Afzal, Very Deep Learning Ch. 10 42

M. Zeshan Afzal, Very Deep Learning Ch. 10 43

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

M. Zeshan Afzal, Very Deep Learning Ch. 10

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

M. Zeshan Afzal, Very Deep Learning Ch. 10