Professional Documents
Culture Documents
Probabilistic Models of
Pronunciation and Spelling
2007 년 05 월 04 일
부산대학교 인공지능연구실
김민호
3 / 40
5.1 Dealing with Spelling Errors (1/2)
The detection and correction of spelling error
integral part of modern word-processors
Applications in which even the individual letter aren’t
guaranteed to be accurately identified
Optical character recognition (OCR)
On-line handwriting recognition
Detection and correction of spelling errors, mainly in
typed text
OCR systems often
misread “D” as “O” or “ri” as “n”
producing ‘mis-spelled’ words like dension for derision
4 / 40
5.1 Dealing with Spelling Errors (2/2)
Kukich (1992) breaks the field down into three
increasingly broader problems:
non-word error detection (graffe for giraffe)
isolated-word error correction (correcting graffe to giraffe)
context-dependent error detection and correction
- there for three, dessert for desert, piece for peace
5 / 40
5.2 Spelling Error Patterns (1/2)
Single-error misspellings - Damerau (1964)
insertion: mistyping the as ther
deletion: mistyping the as th
substitution: mistyping the as thw
transposition: mistyping the as the
Kukich (1992) breaks down human typing error
Typographic errors (spell as speel)
Cognitive errors (separate as seperate)
6 / 40
5.2 Spelling Error Patterns (2/2)
OCR errors are usually grouped into five classes
substitutions (e →c)
multi-substitutions (m →rn, he →b)
space deletions or insertions
failures (u →~)
framing errors
7 / 40
5.3 Detecting Non-word Errors
Detecting non-word errors in text is done by the use
of dictionary
dictionaries would need to be kept small
large dictionaries contain very rare words that resemble
misspellings of other words
8 / 40
5.4 Probabilistic Models (1/3)
9 / 40
5.4 Probabilistic Models (2/3)
string of phones (say [ni])
word corresponds to this string of phones
consider all possible words
P (word | observation) is highest
(5.1)
: our estimate of the correct w
O : the observation sequence [ni]
function argmaxx f(x) : the x such that f(x) is maximized
10 / 40
5.4 Probabilistic Models (3/3)
(5.2)
(5.3)
substituting (5.2) into (5.1) to get (5.3)
we can ignore P(O). Why?
(5.4)
P(w) is called the Prior probability
P(O|w) is called the likelihood
11 / 40
5.5 Applying the Bayesian Method to Spelling (1/5)
12 / 40
5.5 Applying the Bayesian Method to Spelling (2/5)
13 / 40
5.5 Applying the Bayesian Method to Spelling (3/5)
14 / 40
5.5 Applying the Bayesian Method to Spelling (4/5)
15 / 40
5.5 Applying the Bayesian Method to Spelling (5/5)
16 / 40
5.6 Minimum Edit Distance (1/6)
string distance - some metric of how alike two strings
are to each other
minimum edit distance - the minimum number of
editing operations needed to transform one string into
another
operation - insertion, deletion, substitution
For example
the gap between intention and execution is five operation
trace, alignment, operation list (Figure 5.4.)
17 / 40
5.6 Minimum Edit Distance (2/6)
18 / 40
5.6 Minimum Edit Distance (3/6)
Levenshtein distance
assign a particular cost or weight to each of operations
simplest weighting factor
three operation has a cost of 1
Levenshtein distance between intention and execution is 5
alternate version - substitutions has a cost of 2 (why?)
The minimum edit distance is computed by dynamic
programming
19 / 40
5.6 Minimum Edit Distance (4/6)
Dynamic programming
large problem can be solved by properly combining the
solution to various subproblems
minimum edit distance for spelling error correction
Viterbi and the forward for speech recognition
CYK and Earley for parsing
20 / 40
5.6 Minimum Edit Distance (5/6)
21/ 40
5.6 Minimum Edit Distance (6/6)
22 / 40
5.8 The Bayesian Method for Pronunciation (1/6)
23 / 40
5.8 The Bayesian Method for Pronunciation (2/6)
24 / 40
5.8 The Bayesian Method for Pronunciation (3/6)
y represents the sequence of phones
w represents the candidate word
it turns out that confusion matrices don't do as well
for pronunciation
the changes in pronunciation between a lexical and surface
form are much greater
probabilistic models of pronunciation variation include a
lot more factors than a simple confusion matrix can include
One simple way to generate pronunciation likelihoods
is via probabilistic rules
25 / 40
5.8 The Bayesian Method for Pronunciation (4/6)
26 / 40
5.8 The Bayesian Method for Pronunciation (5/6)
27/ 40
5.8 The Bayesian Method for Pronunciation (6/6)
28 / 40
5.9 Weighted Automata (1/12)
Weighted Automata
simple augmentation of the finite automaton
each arc is associated with a probability
the probability on all the arcs leaving a node must sum to 1
29/ 40
5.9 Weighted Automata (2/12)
30 / 40
5.9 Weighted Automata (3/12)
31 / 40
5.9 Weighted Automata (4/12)
3 2/ 40
5.9 Weighted Automata (5/12)
3 3/ 40
5.9 Weighted Automata (6/12)
3 4/ 40
5.9 Weighted Automata (7/12)
35 / 40
5.9 Weighted Automata (8/12)
36 / 40
5.9 Weighted Automata (9/12)
37 / 40
5.9 Weighted Automata (10/12)
38 / 40
5.9 Weighted Automata (11/12)
39 / 40
5.9 Weighted Automata (12/12)
40 / 40