You are on page 1of 46

Very Deep Learning

Lecture 11

Dr. Muhammad Zeshan Afzal, Prof. Didier Stricker


MindGarage, University of Kaiserslautern
afzal.tukl@gmail.com

M. Zeshan Afzal, Very Deep Learning Ch. 11


Recap

M. Zeshan Afzal, Very Deep Learning Ch. 11 2


Gated RNNs

◼ UGRNN: Update Gate Recurrent Neural Network


◼ GRU: Gated Recurrent Unit
◼ LSTM: Long Short-Term Memory
◼ LSTM was the first and most transformative (revolutionized NLP in 2015, e.g.
at Google), but also most complex model. UGRNN and GRU work similarly
well.
◼ Common to all architectures: gates for filtering information

M. Zeshan Afzal, Very Deep Learning Ch. 11 3


Simple Architecture (Agrawal et al)

M. Zeshan Afzal, Very Deep Learning Ch. 11 4


OCR

M. Zeshan Afzal, Very Deep Learning Ch. 11 5


Multidimensional RNN

https://www.cs.toronto.edu/~graves/phd.pdf

Alex Graves, PhD thesis


M. Zeshan Afzal, Very Deep Learning Ch. 11 6
Multidimensional Multidirectional RNN

https://www.cs.toronto.edu/~graves/phd.pdf

Alex Graves, PhD thesis


M. Zeshan Afzal, Very Deep Learning Ch. 11 7
2D Bidirectional LSTM

Afzal, Muhammad Zeshan, et al. "Document image binarization using lstm: A sequence learning
approach." Proceedings of the 3rd international workshop on historical document imaging and processing. 2015.

M. Zeshan Afzal, Very Deep Learning Ch. 11 8


OCR

M. Zeshan Afzal, Very Deep Learning Ch. 11 9


Scene text analysis

M. Zeshan Afzal, Very Deep Learning Ch. 11 10


Character-level Language Models
◼ Generating natural language text
character by character with an RNN
◼ In the example on the right: Alphabet
with 4 characters (’h’, ’e’, ’l’, ’o’)
◼ Each character is represented by a 1
hot vector, e.g.: (1, 0, 0, 0)T
◼ Model predicts distribution over next
character via softmax function
◼ Character drawn from distribution is
fed as input to RNN at next time step

The Unreasonable Effectiveness of Recurrent Neural Networks (karpathy.github.io)

M. Zeshan Afzal, Very Deep Learning Ch. 11 11


Natural Language Processing

M. Zeshan Afzal, Very Deep Learning Ch. 11 12


Natural Language Processing

◼ Linguistic: is the scientific study of language. It involves analysis of language


form, language meaning, and language in context, as well as an analysis of the
social, cultural, historical, and political factors that influence language.
(Wikipedia)
◼ Computational Linguistic: is an interdisciplinary field concerned with the
computational modelling of natural language, as well as the study of
appropriate computational approaches to linguistic questions. (Wikipedia)

◼ Natural language processing (NLP): refers to the branch of computer


science—and more specifically, the branch of artificial intelligence or AI—
concerned with giving computers the ability to understand text and spoken
words in much the same way human beings can. (Wikipedia)

M. Zeshan Afzal, Very Deep Learning Ch. 11 13


Natural Language Processing

◼ Natural language processing (NLP) refers to the branch of computer


science—and more specifically, the branch of artificial intelligence or AI—
concerned with giving computers the ability to understand text and spoken
words in much the same way human beings can.
◼ Combination of
^ computational linguistics
^ rule-based modelling of human language
^ with
• Statistical models
• Machine learning models
– Deep learning models

M. Zeshan Afzal, Very Deep Learning Ch. 11 14


Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 15


Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 16


Language Models

◼ Word Language model Example


p(Cat was sitting on the table <EOS>) = p(Cat) p(was | Cat)
p(sitting | Cat was)
p(on | Cat was sitting)
p(the | Cat was sitting on)
p(table | Cat was sitting on the)
p(<EOS> | Cat was sitting on the table )

◼ Word language models are auto-regressive models that predict the


next word given all previous words in the sentence
◼ A good model has a high probability of predicting likely next words

M. Zeshan Afzal, Very Deep Learning Ch. 11 17


Applications Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 18


Applications Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 19


Applications Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 20


Applications Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 21


Training

M. Zeshan Afzal, Very Deep Learning Ch. 11 22


Training Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 23


Evaluation of Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 24


Evaluating Character Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 25


Evaluating Character Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 26


Evaluating Character Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 27


Evaluating Character Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 28


Evaluating Character Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 29


Evaluating Character Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 30


Evaluating Word Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 31


Evaluating Word Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 32


Evaluating Word Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 33


Evaluating Word Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 34


Evaluating Word Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 35


Evaluating Word Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 36


Evaluating Language Models

◼ Shannon estimated that English text has 0.6 − 1.3 bits per character
◼ For character language models, current performance is roughly 1 bit per
character
◼ For word language models, perplexities of about 60 were typical until 2017
◼ According to Quora, there are 4.79 letters per word (excluding spaces)
◼ Assuming 1 bit per character, we have a perplexity of 25.79 = 55.3
◼ State-of-the-art models (GPT-2, Megatron-LM) yield perplexities of 10 − 20
◼ Be careful: Metrics not comparable across vocabularies or datasets
◼ Evaluation Metrics for Language Modeling (thegradient.pub)
◼ The relationship between Perplexity and Entropy in NLP | by Ravi Charan |
Towards Data Science

M. Zeshan Afzal, Very Deep Learning Ch. 11 37


Traditional Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 38


Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 39


Language Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 40


n-gram Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 41


n-gram Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 42


Training of n-gram Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 43


Sampling from n-gram Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 44


Sampling from n-gram Models

M. Zeshan Afzal, Very Deep Learning Ch. 11 45


Thanks a lot for your Attention

M. Zeshan Afzal, Very Deep Learning Ch. 11 46

You might also like