Very Deep Learning

Very Deep Learning
Lecture 11
Dr. Muhammad Zeshan Afzal, Prof. Didier Stricker

MindGarage, University of Kaiserslautern
afzal.tukl@gmail.com
M. Zeshan Afzal, Very Deep Learning Ch. 11

Recap
M. Zeshan Afzal, Very Deep Learning Ch. 11 2

Gated RNNs
◼ UGRNN: Update Gate Recurrent Neural Network

◼ GRU: Gated Recurrent Unit
◼ LSTM: Long Short-Term Memory
◼ LSTM was the first and most transformative (revolutionized NLP in 2015, e.g.
at Google), but also most complex model. UGRNN and GRU work similarly
well.
◼ Common to all architectures: gates for filtering information

Simple Architecture (Agrawal et al)

OCR

Multidimensional RNN
https://www.cs.toronto.edu/~graves/phd.pdf
Alex Graves, PhD thesis

Multidimensional Multidirectional RNN
https://www.cs.toronto.edu/~graves/phd.pdf
Alex Graves, PhD thesis

2D Bidirectional LSTM
Afzal, Muhammad Zeshan, et al. "Document image binarization using lstm: A sequence learning
approach." Proceedings of the 3rd international workshop on historical document imaging and processing. 2015.

OCR

Scene text analysis

Character-level Language Models
◼ Generating natural language text
character by character with an RNN
◼ In the example on the right: Alphabet
with 4 characters (’h’, ’e’, ’l’, ’o’)
◼ Each character is represented by a 1
hot vector, e.g.: (1, 0, 0, 0)T
◼ Model predicts distribution over next
character via softmax function
◼ Character drawn from distribution is
fed as input to RNN at next time step
The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

Natural Language Processing

◼ Linguistic: is the scientific study of language. It involves analysis of language

form, language meaning, and language in context, as well as an analysis of the
social, cultural, historical, and political factors that influence language.
(Wikipedia)
◼ Computational Linguistic: is an interdisciplinary field concerned with the
computational modelling of natural language, as well as the study of
appropriate computational approaches to linguistic questions. (Wikipedia)
◼ Natural language processing (NLP): refers to the branch of computer

science—and more specifically, the branch of artificial intelligence or AI—
concerned with giving computers the ability to understand text and spoken
words in much the same way human beings can. (Wikipedia)

◼ Natural language processing (NLP) refers to the branch of computer

science—and more specifically, the branch of artificial intelligence or AI—
concerned with giving computers the ability to understand text and spoken
words in much the same way human beings can.
◼ Combination of
^ computational linguistics
^ rule-based modelling of human language
^ with
• Statistical models
• Machine learning models
– Deep learning models

Language Models

Language Models

Language Models
◼ Word Language model Example

p(Cat was sitting on the table <EOS>) = p(Cat) p(was | Cat)
p(sitting | Cat was)
p(on | Cat was sitting)
p(the | Cat was sitting on)
p(table | Cat was sitting on the)
p(<EOS> | Cat was sitting on the table )
◼ Word language models are auto-regressive models that predict the

next word given all previous words in the sentence
◼ A good model has a high probability of predicting likely next words

Applications Language Models




Training

Training Language Models

Evaluation of Language Models

Evaluating Character Language Models






Evaluating Word Language Models






Evaluating Language Models
◼ Shannon estimated that English text has 0.6 − 1.3 bits per character
◼ For character language models, current performance is roughly 1 bit per
character
◼ For word language models, perplexities of about 60 were typical until 2017
◼ According to Quora, there are 4.79 letters per word (excluding spaces)
◼ Assuming 1 bit per character, we have a perplexity of 25.79 = 55.3
◼ State-of-the-art models (GPT-2, Megatron-LM) yield perplexities of 10 − 20
◼ Be careful: Metrics not comparable across vocabularies or datasets
◼ Evaluation Metrics for Language Modeling (thegradient.pub)
◼ The relationship between Perplexity and Entropy in NLP | by Ravi Charan |
Towards Data Science

Traditional Language Models

Language Models

Language Models

n-gram Models

n-gram Models

Training of n-gram Models

Sampling from n-gram Models

Sampling from n-gram Models

Thanks a lot for your Attention

Very Deep Learning - 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Very Deep Learning - 3

Uploaded by

Copyright:

Available Formats

Dr. Muhammad Zeshan Afzal, Prof. Didier Stricker

M. Zeshan Afzal, Very Deep Learning Ch. 11

M. Zeshan Afzal, Very Deep Learning Ch. 11 2

◼ UGRNN: Update Gate Recurrent Neural Network

M. Zeshan Afzal, Very Deep Learning Ch. 11 3

M. Zeshan Afzal, Very Deep Learning Ch. 11 4

M. Zeshan Afzal, Very Deep Learning Ch. 11 5

Alex Graves, PhD thesis

Alex Graves, PhD thesis

M. Zeshan Afzal, Very Deep Learning Ch. 11 8

M. Zeshan Afzal, Very Deep Learning Ch. 11 9

M. Zeshan Afzal, Very Deep Learning Ch. 11 10

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

M. Zeshan Afzal, Very Deep Learning Ch. 11 11

M. Zeshan Afzal, Very Deep Learning Ch. 11 12

◼ Linguistic: is the scientific study of language. It involves analysis of language

◼ Natural language processing (NLP): refers to the branch of computer

M. Zeshan Afzal, Very Deep Learning Ch. 11 13

◼ Natural language processing (NLP) refers to the branch of computer

M. Zeshan Afzal, Very Deep Learning Ch. 11 14

M. Zeshan Afzal, Very Deep Learning Ch. 11 15

M. Zeshan Afzal, Very Deep Learning Ch. 11 16

◼ Word Language model Example

◼ Word language models are auto-regressive models that predict the

M. Zeshan Afzal, Very Deep Learning Ch. 11 17

M. Zeshan Afzal, Very Deep Learning Ch. 11 18

M. Zeshan Afzal, Very Deep Learning Ch. 11 19

M. Zeshan Afzal, Very Deep Learning Ch. 11 20

M. Zeshan Afzal, Very Deep Learning Ch. 11 21

M. Zeshan Afzal, Very Deep Learning Ch. 11 22

M. Zeshan Afzal, Very Deep Learning Ch. 11 23

M. Zeshan Afzal, Very Deep Learning Ch. 11 24

M. Zeshan Afzal, Very Deep Learning Ch. 11 25

M. Zeshan Afzal, Very Deep Learning Ch. 11 26

M. Zeshan Afzal, Very Deep Learning Ch. 11 27

M. Zeshan Afzal, Very Deep Learning Ch. 11 28

M. Zeshan Afzal, Very Deep Learning Ch. 11 29

M. Zeshan Afzal, Very Deep Learning Ch. 11 30

M. Zeshan Afzal, Very Deep Learning Ch. 11 31

M. Zeshan Afzal, Very Deep Learning Ch. 11 32

M. Zeshan Afzal, Very Deep Learning Ch. 11 33

M. Zeshan Afzal, Very Deep Learning Ch. 11 34

M. Zeshan Afzal, Very Deep Learning Ch. 11 35

M. Zeshan Afzal, Very Deep Learning Ch. 11 36

M. Zeshan Afzal, Very Deep Learning Ch. 11 37

M. Zeshan Afzal, Very Deep Learning Ch. 11 38

M. Zeshan Afzal, Very Deep Learning Ch. 11 39

M. Zeshan Afzal, Very Deep Learning Ch. 11 40

M. Zeshan Afzal, Very Deep Learning Ch. 11 41

M. Zeshan Afzal, Very Deep Learning Ch. 11 42

M. Zeshan Afzal, Very Deep Learning Ch. 11 43

M. Zeshan Afzal, Very Deep Learning Ch. 11 44

M. Zeshan Afzal, Very Deep Learning Ch. 11 45

M. Zeshan Afzal, Very Deep Learning Ch. 11 46

You might also like