Professional Documents
Culture Documents
Very Deep Learning - 3
Very Deep Learning - 3
Lecture 11
https://www.cs.toronto.edu/~graves/phd.pdf
https://www.cs.toronto.edu/~graves/phd.pdf
Afzal, Muhammad Zeshan, et al. "Document image binarization using lstm: A sequence learning
approach." Proceedings of the 3rd international workshop on historical document imaging and processing. 2015.
◼ Shannon estimated that English text has 0.6 − 1.3 bits per character
◼ For character language models, current performance is roughly 1 bit per
character
◼ For word language models, perplexities of about 60 were typical until 2017
◼ According to Quora, there are 4.79 letters per word (excluding spaces)
◼ Assuming 1 bit per character, we have a perplexity of 25.79 = 55.3
◼ State-of-the-art models (GPT-2, Megatron-LM) yield perplexities of 10 − 20
◼ Be careful: Metrics not comparable across vocabularies or datasets
◼ Evaluation Metrics for Language Modeling (thegradient.pub)
◼ The relationship between Perplexity and Entropy in NLP | by Ravi Charan |
Towards Data Science