Chapter 5. Probabilistic Models of Pronunciation and Spelling

Chapter 5.
Probabilistic Models of
Pronunciation and Spelling
2007 년 05 월 04 일
부산대학교 인공지능연구실
김민호
Text : Speech and Language Processing

Page. 141 ~ 189
Outline
 Introduction
 5.1 Dealing with Spelling Errors
 5.2 Spelling Error Patterns
 5.3 Detecting Non-Word Errors
 5.4 Probabilistic Models
 5.5 Applying the Bayesian Method to Spelling
 5.6 Minimum Edit Distance
 5.7 English Pronunciation Variation
 5.8 The Bayesian Method for Pronunciation
 5.9 Weighted Automata
 5.10 Pronunciation in Humans
Introduction
 Introduce the problems of detecting and correcting spelling
errors
 Summarize typical human spelling error patterns
 The essential probabilistic architecture:
 Bayes Rule
 Noisy channel model
 The essential algorithm
 Dynamic programming
 Viterbi algorithm
 Minimum edit distance algorithm
 Forword algorithm
 Weighted automaton
3 / 40
5.1 Dealing with Spelling Errors (1/2)
 The detection and correction of spelling error
 integral part of modern word-processors
 Applications in which even the individual letter aren’t
guaranteed to be accurately identified
 Optical character recognition (OCR)
 On-line handwriting recognition
 Detection and correction of spelling errors, mainly in
typed text
 OCR systems often
 misread “D” as “O” or “ri” as “n”
 producing ‘mis-spelled’ words like dension for derision
4 / 40
5.1 Dealing with Spelling Errors (2/2)
 Kukich (1992) breaks the field down into three
increasingly broader problems:
 non-word error detection (graffe for giraffe)
 isolated-word error correction (correcting graffe to giraffe)
 context-dependent error detection and correction
- there for three, dessert for desert, piece for peace
5 / 40
5.2 Spelling Error Patterns (1/2)
 Single-error misspellings - Damerau (1964)
 insertion: mistyping the as ther
 deletion: mistyping the as th
 substitution: mistyping the as thw
 transposition: mistyping the as the
 Kukich (1992) breaks down human typing error
 Typographic errors (spell as speel)
 Cognitive errors (separate as seperate)
6 / 40
5.2 Spelling Error Patterns (2/2)
 OCR errors are usually grouped into five classes
 substitutions (e →c)
 multi-substitutions (m →rn, he →b)
 space deletions or insertions
 failures (u →~)

 framing errors
7 / 40
5.3 Detecting Non-word Errors
 Detecting non-word errors in text is done by the use
of dictionary
 dictionaries would need to be kept small
 large dictionaries contain very rare words that resemble
misspellings of other words
8 / 40
5.4 Probabilistic Models (1/3)
 The intuition of the noisy channel model is to treat the surface

form as an instance of the lexical form
 to build a model of the channel so that we can figure out how
it modified this “true” word and recover it
 source of noise
 variation in pronunciation, variation in the realization of phones,
acoustic variation due to the channel
9 / 40
 string of phones (say [ni])
 word corresponds to this string of phones
 consider all possible words
 P (word | observation) is highest
 (5.1)
 : our estimate of the correct w
 O : the observation sequence [ni]
 function argmaxx f(x) : the x such that f(x) is maximized
10 / 40
 (5.2)
 (5.3)
 substituting (5.2) into (5.1) to get (5.3)
 we can ignore P(O). Why?
 (5.4)
 P(w) is called the Prior probability
 P(O|w) is called the likelihood
11 / 40
5.5 Applying the Bayesian Method to Spelling (1/5)
12 / 40
13 / 40
 p(acress|across) → number of times that e was

substituted for 0 in some large corpus of error
 confusion matrix
 a square 26 * 26 table
 number of times one letter was incorrectly used instead of
another
 [o,e] in a substitution confusion matrix
- count of times e was substitution for o
14 / 40
 del[x,y] contains the number of times in the training set that

the characters xy in the correct word were typed as x
 ins[x,y] contains the number of times in the training set that
the character x in the correct word was typed as xy
 sub[x,y] the number of times that x was typed as y
 trans[x,y] the number of times that xy was typed as yx
15 / 40
16 / 40
5.6 Minimum Edit Distance (1/6)
 string distance - some metric of how alike two strings
are to each other
 minimum edit distance - the minimum number of
editing operations needed to transform one string into
another
 operation - insertion, deletion, substitution
 For example
 the gap between intention and execution is five operation
 trace, alignment, operation list (Figure 5.4.)
17 / 40
18 / 40
 Levenshtein distance
 assign a particular cost or weight to each of operations
 simplest weighting factor
 three operation has a cost of 1
 Levenshtein distance between intention and execution is 5
 alternate version - substitutions has a cost of 2 (why?)
 The minimum edit distance is computed by dynamic
programming
19 / 40
 Dynamic programming
 large problem can be solved by properly combining the
solution to various subproblems
 minimum edit distance for spelling error correction
 Viterbi and the forward for speech recognition
 CYK and Earley for parsing
20 / 40
21/ 40
22 / 40
5.8 The Bayesian Method for Pronunciation (1/6)
 Bayesian algorithm can be used to solve what is often

called the pronunciation subproblem in speech
recognition
 when [ni] occurs after the word I at the beginning of a
sentence
 investigation of the Switchboard corpus produces a total of
7 words
 the, neat, need, new, knee, to, you (Chapter 4 참고 )
 two components
 candidate generation
 candidate scoring
23 / 40
 Speech recognizers often use an alternative

architecture, trading off speech for storage
 each pronunciation is expanded in advance with all
possible variants, which are then pre-stored with their
scores
 Thus there is no need for candidate generation
 the word [ni] is simply stored with the list of words
that can generate it
24 / 40

 y represents the sequence of phones
 w represents the candidate word
 it turns out that confusion matrices don't do as well
for pronunciation
 the changes in pronunciation between a lexical and surface
form are much greater
 probabilistic models of pronunciation variation include a
lot more factors than a simple confusion matrix can include
 One simple way to generate pronunciation likelihoods
is via probabilistic rules
25 / 40
 a word-initial [δ] becomes [n] if the preceding word

ended in [n] or sometimes [m]
 ncout : number of times lexical [δ] is realized word initially

by surface [n] when the previous word ends in a nasal
 envcount : total number of times lexical [δ] occurs when
the previous word ends in a nasal
26 / 40
27/ 40
 Decision Tree Models of Pronunciation Variation
28 / 40
5.9 Weighted Automata (1/12)
 Weighted Automata
 simple augmentation of the finite automaton
 each arc is associated with a probability
 the probability on all the arcs leaving a node must sum to 1
29/ 40
30 / 40
31 / 40
3 2/ 40
3 3/ 40
3 4/ 40
35 / 40
36 / 40
37 / 40
38 / 40
39 / 40
40 / 40

Chapter 5. Probabilistic Models of Pronunciation and Spelling

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 5. Probabilistic Models of Pronunciation and Spelling

Uploaded by

Copyright:

Available Formats

Chapter 5.

Text : Speech and Language Processing

 The intuition of the noisy channel model is to treat the surface

 p(acress|across) → number of times that e was

 del[x,y] contains the number of times in the training set that

 Bayesian algorithm can be used to solve what is often

 Speech recognizers often use an alternative

 a word-initial [δ] becomes [n] if the preceding word

 ncout : number of times lexical [δ] is realized word initially

 Decision Tree Models of Pronunciation Variation

You might also like