Welcome to Scribd!

Bahir Dar University Bahir Dar Institute of Technology Faculty of Computing Department of Computer Science

Uploaded by

0% found this document useful (0 votes)

48 views4 pages

The document discusses n-gram language modeling and provides the most probable next words for 4 word sequences based on a bi-gram language model trained on a toy data set. For the sequence "Sam . . .", the most probable next word is "I". For "Sam I do . . .", the next words "I" and "Like" are equally probable. For "Sam I am Sam . . .", the next word is again "I". And for "do I like . . .", the most probable next word is "Sam".

Original Description:

Original Title

Assignment-4 : Bigram

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

48 views4 pages

Bahir Dar University Bahir Dar Institute of Technology Faculty of Computing Department of Computer Science

Uploaded by

Molalegn Tamiru

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 4

Search inside document

Bahir Dar University

Bahir Dar Institute of Technology

Faculty of Computing
Department of Computer Science
Natural Language Processing(CoSc5262)

“ Assignment Four : NGRAM Language Modeling”

Name: Molalegn Tamiru ID: BDU1300608

Submitted To: Dr. Milion M. (PHD).

June 01, 2021

Addis Ababa, Ethiopia

Consider the following toy example:

Training data:

I am Sam

Sam I am

Sam I like

Sam I do like

do I like Sam

Assume that we use a bi-gram language model based on the above training data. What is the
most probable next word predicted by the model for the following word sequences? Show.

(1) Sam . . .

(2) Sam I do . . .

(3) Sam I am Sam . . .

(4) do I like . . .

Solution :-

Next word prediction is an input technology that simplifies the process of

typing by suggesting the next word to a user to select, as typing in a
conversation consumes time[1].n-grams are Markov models that estimate
words from a fixed window of previous words. n-gram probabilities can be
estimated by counting in a corpus and normalizing (the maximum likelihood
estimate).

Estimating bigram probabilies by checking the probability of next word prediction for word(W)
can be calculated as [2]

P(W|Wi-1) = count(Wi-1,W)/count(Wi-1)
Assume that we use a bi-gram language model based on the above training data. What is
the most probable next word predicted by the model for the following word sequences?
Show.

 (1) Sam . . .

 Check the probability of word next to sam :-

 P(I|Sam) = count(Sam , I)/count(Sam) = 3/5 = 0.6

 P(am|sam) = count(Sam,am)/count(Sam) = 0/5 = 0

 P(do|Sam) = count(Sam , do)/count(Sam) = 0/5 = 0

 P(like|Sam) = count(Sam,like)/count(Sam) = 0/5 = 0

 P(Sam|Sam) = count(Sam,Sam)/count(Sam) = 0/5 = 0

 therefore word next to “Sam” is “I”

 (2) Sam I do . . .

 Check the probability of word next to do :-

 P(I|do) = count(do , I)/count(do) = ½ = 0.5

 P(am|do) = count(do,am)/count(do) = 0/2 = 0

 P(like|do) = count(do,like)/count(do) = ½ = 0.5

 P(Sam|do) = count(do,am)/count(do) = 0/2 = 0

 P(do|do) = count(do , do)/count(do) = 0/2 = 0

 therefore word next to do are “I” and “Like” are equally probable

 (3) Sam I am Sam . . .

 Check the probability of word next to sam :-

 P(I|Sam) = count(Sam , I)/count(Sam) = 3/5 = 0.6

 P(am|sam) = count(Sam,am)/count(Sam) = 0/5 = 0

 P(do|Sam) = count(Sam , do)/count(Sam) = 0/5 = 0

 P(like|Sam) = count(Sam,like)/count(Sam) = 0/5 = 0

 P(Sam|Sam) = count(Sam,Sam)/count(Sam) = 0/5 = 0

 therefore word next to “Sam” is “I”

 (4) do I like . . .

 Check the probability of word next to Like :-

 P(I|like) = count( like , I)/count(like) = 0/3 = 0

 P(do|like) = count( like , do)/count(like) = 0/3 = 0

 P(Sam|like) = count(like,Sam)/count(like) = 1/3 = 0.33333333

 P(am|like) = count(like,am)/count(like) = 0/3 = 0

 P(like|like) = count(like,like)/count(like) = 0/3 = 0

 Therefore word next to like is Sam

[1] R. Nagata, H. Takamura, and G. Neubig, “Adaptive Spelling Error Correction Models for Learner
English,” Procedia Comput. Sci., vol. 112, pp. 474–483, 2017, doi: 10.1016/j.procs.2017.08.065.

[2] J. Lin, “N-Gram Language Models N-Gram Language Models,” 2009.

Famine, Affluence, and Morality by Peter Singer - A Summary - Clueless Political Scientist PDF
Document4 pages
Famine, Affluence, and Morality by Peter Singer - A Summary - Clueless Political Scientist PDF
Neil Shroff
No ratings yet
Why Don't I Feel Good Enough Using Attachment Theory To Find A Solution (Helen Dent)
Document265 pages
Why Don't I Feel Good Enough Using Attachment Theory To Find A Solution (Helen Dent)
Rucsandra Murzea
No ratings yet
Formal Languages And Automata Theory
From Everand
Formal Languages And Automata Theory
Ajit Singh
No ratings yet
Tolentino Vs Commission On Elections Case Digest
Document3 pages
Tolentino Vs Commission On Elections Case Digest
Daf Mariano
No ratings yet
Cohen, Relative Differences, The Myth of 1%
Document1 page
Cohen, Relative Differences, The Myth of 1%
Lea Vukusic
No ratings yet
3b. Winch Brochure 20170424 R1
Document16 pages
3b. Winch Brochure 20170424 R1
Ardian20
No ratings yet
Corporate Competitive Analysis
Document39 pages
Corporate Competitive Analysis
Muhammad Ali Khan Niazi
No ratings yet
The Irresistible Offer
Document17 pages
The Irresistible Offer
Andrew Yakovlev
No ratings yet
Introduction To Language Modeling Final
Document69 pages
Introduction To Language Modeling Final
jeysam
No ratings yet
Lecture5 Language Models
Document68 pages
Lecture5 Language Models
Bemenet Biniyam
100% (2)
Machine Learning: Hands-On for Developers and Technical Professionals
From Everand
Machine Learning: Hands-On for Developers and Technical Professionals
Jason Bell
No ratings yet
Chapter 3
Document51 pages
Chapter 3
Dip Das
100% (1)
Maventic Placement Paper
Document5 pages
Maventic Placement Paper
Sujay Menasinakai
33% (3)
Binani Cement Research Report
Document11 pages
Binani Cement Research Report
Rinkesh25
No ratings yet
Contoh Soal N Gram (Bagus)
Document2 pages
Contoh Soal N Gram (Bagus)
yeninur
No ratings yet
NLP Exp03
Document5 pages
NLP Exp03
Umm
No ratings yet
19104A0017 NLP Exp03
Document6 pages
19104A0017 NLP Exp03
Umm
No ratings yet
NLP LM Report
Document14 pages
NLP LM Report
Nama Desalew
No ratings yet
NLTK - N-Gram LM
Document13 pages
NLTK - N-Gram LM
Pavan Kumar
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
Document36 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
Rohan Sharma
No ratings yet
Program For Fun: Prabhas Chongstitvatana
Document58 pages
Program For Fun: Prabhas Chongstitvatana
AntonioBarbosa
No ratings yet
Big Data Assignment Group 7 Monalisa Kakati (2757) Sejal Gandhi (2403) Indrani Das (3890) Nitesh Deshmukh (0505) Farhan Ali (3232)
Document3 pages
Big Data Assignment Group 7 Monalisa Kakati (2757) Sejal Gandhi (2403) Indrani Das (3890) Nitesh Deshmukh (0505) Farhan Ali (3232)
Aviral Lamsal
No ratings yet
AI Assignment 1
Document31 pages
AI Assignment 1
Manisha Hatzade
No ratings yet
Lecture 4
Document87 pages
Lecture 4
Abdullah Khan Qadri
No ratings yet
Natural Language Processing
Document38 pages
Natural Language Processing
Soham Datta
No ratings yet
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
Document32 pages
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
erjaimin89
No ratings yet
19102B0052 - NLP - Exp - 4
Document5 pages
19102B0052 - NLP - Exp - 4
Umm
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
Document25 pages
Lecture 5: Language Modeling (N-Gram, BOW)
Manaal Azfar
No ratings yet
Ai Unit-5 Notes
Document30 pages
Ai Unit-5 Notes
4119 RAHUL S
100% (1)
Word Break Problem Using Backtracking
Document11 pages
Word Break Problem Using Backtracking
Chaitanya Kumar P.A.S
No ratings yet
UNIT-V String Matching
Document24 pages
UNIT-V String Matching
Jaya krishna
No ratings yet
Muhammad Saifal FA19M2MB075
Document3 pages
Muhammad Saifal FA19M2MB075
Muhammad Saifal
No ratings yet
Hangman Report1
Document10 pages
Hangman Report1
Renjith
No ratings yet
Lecture 3 Sentiment Analysis
Document41 pages
Lecture 3 Sentiment Analysis
Andrew Chung
No ratings yet
Prelab GMM 1
Document5 pages
Prelab GMM 1
lamtayazaissat
No ratings yet
2-Data Collection and Preperation
Document43 pages
2-Data Collection and Preperation
Menna Saed
No ratings yet
13 Ai Cse551 NLP 1 PDF
Document50 pages
13 Ai Cse551 NLP 1 PDF
Nishit
No ratings yet
The State of The Art in Language Modeling
Document140 pages
The State of The Art in Language Modeling
Piyush Modi
No ratings yet
Lecture13 Ngrams With SRILM
Document6 pages
Lecture13 Ngrams With SRILM
Marluce Cancio
No ratings yet
Lecture 2. N-Gram LMs
Document77 pages
Lecture 2. N-Gram LMs
20021477 Phạm Thành Vinh
No ratings yet
Using Collocations and K-Means Clustering To Improve The N-Pos Model For Japanese IME
Document12 pages
Using Collocations and K-Means Clustering To Improve The N-Pos Model For Japanese IME
patel_musicmsncom
No ratings yet
Predicting Words and Sentences Using Statistical Models: Nicola Carmignani
Document42 pages
Predicting Words and Sentences Using Statistical Models: Nicola Carmignani
Elvira R Zelada
No ratings yet
Corpus (Pl. Corpora) A Computer-Readable Collection Of: Introduction To NLP
Document3 pages
Corpus (Pl. Corpora) A Computer-Readable Collection Of: Introduction To NLP
Howell Erivera Yangco
No ratings yet
Language Modelling
Document3 pages
Language Modelling
Prakash Sawant
No ratings yet
Adv. Natural Language Processing: Instructor: Dr. Muhammad Asfand-E-Yar
Document54 pages
Adv. Natural Language Processing: Instructor: Dr. Muhammad Asfand-E-Yar
Zeeshan Siddique
No ratings yet
10 Math Imp ch5 4
Document29 pages
10 Math Imp ch5 4
Biplab
No ratings yet
Random Text Choices: My Demo 2
Document2 pages
Random Text Choices: My Demo 2
Sahil Madan
No ratings yet
Damerau-Levenshtein Algorithm and Bayes Theorem For Spell Checker Optimization
Document6 pages
Damerau-Levenshtein Algorithm and Bayes Theorem For Spell Checker Optimization
Iskandar Setiadi
No ratings yet
W11 Natural Language Processing Lecture
Document9 pages
W11 Natural Language Processing Lecture
abbiha.mustafamalik
No ratings yet
Ai & Es Experiments
Document72 pages
Ai & Es Experiments
Vishesh Shrivastava
No ratings yet
DS V Unit Notes
Document33 pages
DS V Unit Notes
Alagandula Kalyani
No ratings yet
Dzexams 2am Anglais 761868
Document6 pages
Dzexams 2am Anglais 761868
nawelbenzaied2019
No ratings yet
Mental Cryptography and Good Passwords
Document7 pages
Mental Cryptography and Good Passwords
Anonymous vcdqCTtS9
No ratings yet
Flames Program Question
Document4 pages
Flames Program Question
Kaushik Reddy
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
Document28 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
Getnete degemu
No ratings yet
Notes of NLP - Unit-2
Document23 pages
Notes of NLP - Unit-2
inamdaramena4
No ratings yet
Secrets of Printf
Document6 pages
Secrets of Printf
guzska
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 9
Document17 pages
UBC Summer School in NLP - VSP 2019 Lecture 9
万颖佳
No ratings yet
Unit 2
Document69 pages
Unit 2
Sonupatel Sonupatel
No ratings yet
Text Classification Create
Document35 pages
Text Classification Create
Z Chen
No ratings yet
Markov and Pos Report
Document30 pages
Markov and Pos Report
Chvn I Chu
No ratings yet
02-Stemming - Jupyter Notebook
Document4 pages
02-Stemming - Jupyter Notebook
Premjit Sengupta
No ratings yet
Autmata 1
Document64 pages
Autmata 1
iutechsoft
No ratings yet
Artificial Intelligence Lab
Document16 pages
Artificial Intelligence Lab
vik
No ratings yet
Typeahead Feature: Data Structures and Algorithms (Cse2003)
Document7 pages
Typeahead Feature: Data Structures and Algorithms (Cse2003)
Pranjal Gupta
No ratings yet
Plagiarism English-3le17-2trim3
Document2 pages
Plagiarism English-3le17-2trim3
anis18karim
No ratings yet
Simple Programming Problems
Document7 pages
Simple Programming Problems
Anonymous VV8nDbA4K
No ratings yet
String Algorithms in C: Efficient Text Representation and Search
From Everand
String Algorithms in C: Efficient Text Representation and Search
Thomas Mailund
No ratings yet
2.veneers and Laminates: 1. Lumineers
Document4 pages
2.veneers and Laminates: 1. Lumineers
sfhussain8
No ratings yet
ECA2+ - Tests - Vocabulary Check 1B - 2018
Document1 page
ECA2+ - Tests - Vocabulary Check 1B - 2018
Julia Sobieraj
No ratings yet
Cheb Yshev Polynomials
Document13 pages
Cheb Yshev Polynomials
Bolitten Vianey
No ratings yet
Bhanu Bhai
Document4 pages
Bhanu Bhai
R R Pratap Singh
No ratings yet
PROJECT WORK For Class 12-B Subject-COMMERCE
Document2 pages
PROJECT WORK For Class 12-B Subject-COMMERCE
Mariam
No ratings yet
Consolidated District School Forms Checking Repport
Document2 pages
Consolidated District School Forms Checking Repport
Mary Chriszle Domisiw
No ratings yet
Aloha Table Service Pos Reportguide 6.4
Document180 pages
Aloha Table Service Pos Reportguide 6.4
Alessandro Oliveira
No ratings yet
Yangon Thanlyin Bridge 2
Document147 pages
Yangon Thanlyin Bridge 2
ကိုနေဝင်း
No ratings yet
James O'Brien: Summary of Qualifications
Document3 pages
James O'Brien: Summary of Qualifications
api-303211994
No ratings yet
MSCExtract
Document17 pages
MSCExtract
merinekc
No ratings yet
Frederick Taylor and Scientific Management
Document8 pages
Frederick Taylor and Scientific Management
Pavel Banerjee
No ratings yet
Oracle Database 19c - New Features For Administrators
Document8 pages
Oracle Database 19c - New Features For Administrators
vineet
No ratings yet
3.5 Martinez V Martinez
Document2 pages
3.5 Martinez V Martinez
Angie Japitan
No ratings yet
DISP TRUST 18 SEPTIEMBRE 2023 Clientes
Document100 pages
DISP TRUST 18 SEPTIEMBRE 2023 Clientes
Alex David Dueñas Quintero
No ratings yet
Career Path
Document2 pages
Career Path
Love Gonzales
No ratings yet
KJR 2010 02
Document144 pages
KJR 2010 02
Sree Mahalakshmi
No ratings yet
Carood Cadp
Document84 pages
Carood Cadp
Ma. Solita Virtudazo
No ratings yet
Jit Presentation
Document45 pages
Jit Presentation
Jasspreet Singh Sra
No ratings yet
Sap 2015 Supervisory Board Report
Document8 pages
Sap 2015 Supervisory Board Report
AB
No ratings yet
H14YS-5B-2309-Reminder - 011424
Document4 pages
H14YS-5B-2309-Reminder - 011424
BẢO Nhi Lê
No ratings yet
Legal Eyt
Document46 pages
Legal Eyt
Aiko Nakamura
No ratings yet
History of The Church in The Philippines 1521 1898 Detailed Notes
Document120 pages
History of The Church in The Philippines 1521 1898 Detailed Notes
Andrei Refugido
No ratings yet