You are on page 1of 4

Bahir Dar University

Bahir Dar Institute of Technology


Faculty of Computing
Department of Computer Science
Natural Language Processing(CoSc5262)

“ Assignment Four : NGRAM Language Modeling”

Name: Molalegn Tamiru ID: BDU1300608

Submitted To: Dr. Milion M. (PHD).

June 01, 2021

Addis Ababa, Ethiopia


Consider the following toy example:

Training data:

I am Sam

Sam I am

Sam I like

Sam I do like

do I like Sam

Assume that we use a bi-gram language model based on the above training data. What is the
most probable next word predicted by the model for the following word sequences? Show.

(1) Sam . . .

(2) Sam I do . . .

(3) Sam I am Sam . . .

(4) do I like . . .

Solution :-

Next word prediction is an input technology that simplifies the process of


typing by suggesting the next word to a user to select, as typing in a
conversation consumes time[1].n-grams are Markov models that estimate
words from a fixed window of previous words. n-gram probabilities can be
estimated by counting in a corpus and normalizing (the maximum likelihood
estimate).

Estimating bigram probabilies by checking the probability of next word prediction for word(W)
can be calculated as [2]

P(W|Wi-1) = count(Wi-1,W)/count(Wi-1)
Assume that we use a bi-gram language model based on the above training data. What is
the most probable next word predicted by the model for the following word sequences?
Show.

 (1) Sam . . .

 Check the probability of word next to sam :-

 P(I|Sam) = count(Sam , I)/count(Sam) = 3/5 = 0.6

 P(am|sam) = count(Sam,am)/count(Sam) = 0/5 = 0

 P(do|Sam) = count(Sam , do)/count(Sam) = 0/5 = 0

 P(like|Sam) = count(Sam,like)/count(Sam) = 0/5 = 0

 P(Sam|Sam) = count(Sam,Sam)/count(Sam) = 0/5 = 0

 therefore word next to “Sam” is “I”

 (2) Sam I do . . .

 Check the probability of word next to do :-

 P(I|do) = count(do , I)/count(do) = ½ = 0.5

 P(am|do) = count(do,am)/count(do) = 0/2 = 0

 P(like|do) = count(do,like)/count(do) = ½ = 0.5

 P(Sam|do) = count(do,am)/count(do) = 0/2 = 0

 P(do|do) = count(do , do)/count(do) = 0/2 = 0

 therefore word next to do are “I” and “Like” are equally probable

 (3) Sam I am Sam . . .

 Check the probability of word next to sam :-

 P(I|Sam) = count(Sam , I)/count(Sam) = 3/5 = 0.6

 P(am|sam) = count(Sam,am)/count(Sam) = 0/5 = 0


 P(do|Sam) = count(Sam , do)/count(Sam) = 0/5 = 0

 P(like|Sam) = count(Sam,like)/count(Sam) = 0/5 = 0

 P(Sam|Sam) = count(Sam,Sam)/count(Sam) = 0/5 = 0

 therefore word next to “Sam” is “I”

 (4) do I like . . .

 Check the probability of word next to Like :-

 P(I|like) = count( like , I)/count(like) = 0/3 = 0

 P(do|like) = count( like , do)/count(like) = 0/3 = 0

 P(Sam|like) = count(like,Sam)/count(like) = 1/3 = 0.33333333

 P(am|like) = count(like,am)/count(like) = 0/3 = 0

 P(like|like) = count(like,like)/count(like) = 0/3 = 0

 Therefore word next to like is Sam

[1] R. Nagata, H. Takamura, and G. Neubig, “Adaptive Spelling Error Correction Models for Learner
English,” Procedia Comput. Sci., vol. 112, pp. 474–483, 2017, doi: 10.1016/j.procs.2017.08.065.

[2] J. Lin, “N-Gram Language Models N-Gram Language Models,” 2009.

You might also like