Professional Documents
Culture Documents
Exercise 1 Consider the following toy example (similar to the one from Jurafsky & Martin (2015)):
Training data:
<s> I am Sam </s>
<s> Sam I am </s>
<s> Sam I like </s>
<s> Sam I do like </s>
<s> do I like Sam </s>
Assume that we use a bigram language model based on the above training data.
1. What is the most probable next word predicted by the model for the following word sequences?
2. Which of the following sentences is better, i.e., gets a higher probability with this model?
Solution:
Bigram probabilities:
P (Sam|<s>) = 35 P (I|<s>) = 15
P (I|Sam) = 53 P (</s>|Sam) = 25
P (Sam|am) = 12 P (</s>|am) = 21
P (am|I) = 25 P (like|I) = 52 P (do|I) = 1
5
P (Sam|like) = 31 P (</s>|like) = 23
P (like|do) = 12 P (I|do) = 12
Solution:
1 1 1 1 1
The probability of this sequence is 5 · 5 · 2 · 3 = 150 .
√
The perplexity is then 4 150 = 3.5
Exercise 3 Take again the same training data. This time, we use a bigram LM with Laplace smoothing.
Solution:
1. If we include </s> (this can also appear as second element of a bigram), we get |V | = 6 for our
vocabulary.
2 1 4 1
P (do|<s>) = 11 P (do|Sam) = 11 P (Sam|<s>) = 11 P (Sam|do) = 8
4
P (I|Sam) = 11 P (I|do) = 82 3
P (like|I) = 11
2 1 4 3
2. (8): 11 · 8 · 11 · 11
4 1 2 3
(9): 11 · 11 · 8 · 11
The two sequences are equally probable.
References
Jurafsky, Daniel & James H. Martin. 2015. Speech and language processing. an introduction to natural language processing,
computational linguistics, and speech recognition. Draft of the 3rd edition.