Professional Documents
Culture Documents
Graham Neubig
Nara Institute of Science and Technology (NAIST)
Review:
Calculating Sentence Probabilities
NOTE:
sentence start <s> and end </s> symbol
NOTE:
P(w0 = <s>) = 1
Incremental Computation
P( wiw 0 wi1 )P (w i)
=
Puni(w=system recognition speech ) =
P(w=speech) * P(w=recognition) * P(w=system) * P(w=</s>)
Puni(w=i am) =
P(w=i) * P(w=am) * P(w=</s>)
Puni(w=we are) =
P(w=we) * P(w=are) * P(w=</s>)
Puni(w=we am) =
Puni(w=i are) =
P(w=we) * P(w=am) * P(w=</s>)
P(w=i) * P(w=are) * P(w=</s>)
P( wiw 0 wi1 )P (w i)
c (w in+ 1 wi )
P( wiw in+ 1 wi1 )=
c ( wi n+ 1 wi1)
i live in osaka . </s>
i am a graduate student . </s>
my school is in nara . </s>
P(osaka | in) = c(in osaka)/c(in) = 1 / 2 = 0.5
n=2
P(nara | in) = c(in nara)/c(in) = 1 / 2 = 0.5
7
P(osaka | in)
P(nara | in)
Bigram:
Unigram:
2=0.95, 1 =0.95
2=0.95, 1 =0.90
2=0.95, 1 =0.85
2=0.95, 1 =0.05
2=0.90, 1 =0.95
2=0.90, 1 =0.90
2=0.05, 1 =0.10
2=0.05, 1 =0.05
Problems:
Too many options
Choosing takes time!
Using same for all n-grams
There is a smarter way!
c(Tokyo city) = 40
c(Tokyo is) = 35
c(Tokyo was) = 24
c(Tokyo tower) = 15
c(Tokyo port) = 10
c(Tottori is) = 2
c(Tottori city) = 1
c(Tottori was) = 0
i1
10
Witten-Bell Smoothing
i1
u( wi1 )
=1
u( wi1 )+ c ( wi 1)
For example:
2
Tottori=1
=0.6
2+ 3
30
Tokyo =1
=0.9
30+ 270
11
Programming Techniques
12
13
14
Exercise
15
Exercise
16
17
Thank You!
19