You are on page 1of 40

Chapter 5.

Probabilistic Models of
Pronunciation and Spelling

2007 년 05 월 04 일

부산대학교 인공지능연구실
김민호

Text : Speech and Language Processing


Page. 141 ~ 189
Outline
 Introduction
 5.1 Dealing with Spelling Errors
 5.2 Spelling Error Patterns
 5.3 Detecting Non-Word Errors
 5.4 Probabilistic Models
 5.5 Applying the Bayesian Method to Spelling
 5.6 Minimum Edit Distance
 5.7 English Pronunciation Variation
 5.8 The Bayesian Method for Pronunciation
 5.9 Weighted Automata
 5.10 Pronunciation in Humans
Introduction
 Introduce the problems of detecting and correcting spelling
errors
 Summarize typical human spelling error patterns
 The essential probabilistic architecture:
 Bayes Rule
 Noisy channel model
 The essential algorithm
 Dynamic programming
 Viterbi algorithm
 Minimum edit distance algorithm
 Forword algorithm
 Weighted automaton

3 / 40
5.1 Dealing with Spelling Errors (1/2)
 The detection and correction of spelling error
 integral part of modern word-processors
 Applications in which even the individual letter aren’t
guaranteed to be accurately identified
 Optical character recognition (OCR)
 On-line handwriting recognition
 Detection and correction of spelling errors, mainly in
typed text
 OCR systems often
 misread “D” as “O” or “ri” as “n”
 producing ‘mis-spelled’ words like dension for derision

4 / 40
5.1 Dealing with Spelling Errors (2/2)
 Kukich (1992) breaks the field down into three
increasingly broader problems:
 non-word error detection (graffe for giraffe)
 isolated-word error correction (correcting graffe to giraffe)
 context-dependent error detection and correction
- there for three, dessert for desert, piece for peace

5 / 40
5.2 Spelling Error Patterns (1/2)
 Single-error misspellings - Damerau (1964)
 insertion: mistyping the as ther
 deletion: mistyping the as th
 substitution: mistyping the as thw
 transposition: mistyping the as the
 Kukich (1992) breaks down human typing error
 Typographic errors (spell as speel)
 Cognitive errors (separate as seperate)

6 / 40
5.2 Spelling Error Patterns (2/2)
 OCR errors are usually grouped into five classes
 substitutions (e →c)
 multi-substitutions (m →rn, he →b)
 space deletions or insertions
 failures (u →~)

 framing errors

7 / 40
5.3 Detecting Non-word Errors
 Detecting non-word errors in text is done by the use
of dictionary
 dictionaries would need to be kept small
 large dictionaries contain very rare words that resemble
misspellings of other words

8 / 40
5.4 Probabilistic Models (1/3)

 The intuition of the noisy channel model is to treat the surface


form as an instance of the lexical form
 to build a model of the channel so that we can figure out how
it modified this “true” word and recover it
 source of noise
 variation in pronunciation, variation in the realization of phones,
acoustic variation due to the channel

9 / 40
5.4 Probabilistic Models (2/3)
 string of phones (say [ni])
 word corresponds to this string of phones
 consider all possible words
 P (word | observation) is highest
 (5.1)
 : our estimate of the correct w
 O : the observation sequence [ni]
 function argmaxx f(x) : the x such that f(x) is maximized

10 / 40
5.4 Probabilistic Models (3/3)
 (5.2)
 (5.3)
 substituting (5.2) into (5.1) to get (5.3)
 we can ignore P(O). Why?
 (5.4)
 P(w) is called the Prior probability
 P(O|w) is called the likelihood

11 / 40
5.5 Applying the Bayesian Method to Spelling (1/5)

12 / 40
5.5 Applying the Bayesian Method to Spelling (2/5)

13 / 40
5.5 Applying the Bayesian Method to Spelling (3/5)

 p(acress|across) → number of times that e was


substituted for 0 in some large corpus of error
 confusion matrix
 a square 26 * 26 table
 number of times one letter was incorrectly used instead of
another
 [o,e] in a substitution confusion matrix
- count of times e was substitution for o

14 / 40
5.5 Applying the Bayesian Method to Spelling (4/5)

 del[x,y] contains the number of times in the training set that


the characters xy in the correct word were typed as x
 ins[x,y] contains the number of times in the training set that
the character x in the correct word was typed as xy
 sub[x,y] the number of times that x was typed as y
 trans[x,y] the number of times that xy was typed as yx

15 / 40
5.5 Applying the Bayesian Method to Spelling (5/5)

16 / 40
5.6 Minimum Edit Distance (1/6)
 string distance - some metric of how alike two strings
are to each other
 minimum edit distance - the minimum number of
editing operations needed to transform one string into
another
 operation - insertion, deletion, substitution
 For example
 the gap between intention and execution is five operation
 trace, alignment, operation list (Figure 5.4.)

17 / 40
5.6 Minimum Edit Distance (2/6)

18 / 40
5.6 Minimum Edit Distance (3/6)
 Levenshtein distance
 assign a particular cost or weight to each of operations
 simplest weighting factor
 three operation has a cost of 1
 Levenshtein distance between intention and execution is 5
 alternate version - substitutions has a cost of 2 (why?)
 The minimum edit distance is computed by dynamic
programming

19 / 40
5.6 Minimum Edit Distance (4/6)
 Dynamic programming
 large problem can be solved by properly combining the
solution to various subproblems
 minimum edit distance for spelling error correction
 Viterbi and the forward for speech recognition
 CYK and Earley for parsing

20 / 40
5.6 Minimum Edit Distance (5/6)

21/ 40
5.6 Minimum Edit Distance (6/6)

22 / 40
5.8 The Bayesian Method for Pronunciation (1/6)

 Bayesian algorithm can be used to solve what is often


called the pronunciation subproblem in speech
recognition
 when [ni] occurs after the word I at the beginning of a
sentence
 investigation of the Switchboard corpus produces a total of
7 words
 the, neat, need, new, knee, to, you (Chapter 4 참고 )
 two components
 candidate generation
 candidate scoring

23 / 40
5.8 The Bayesian Method for Pronunciation (2/6)

 Speech recognizers often use an alternative


architecture, trading off speech for storage
 each pronunciation is expanded in advance with all
possible variants, which are then pre-stored with their
scores
 Thus there is no need for candidate generation
 the word [ni] is simply stored with the list of words
that can generate it

24 / 40
5.8 The Bayesian Method for Pronunciation (3/6)


 y represents the sequence of phones
 w represents the candidate word
 it turns out that confusion matrices don't do as well
for pronunciation
 the changes in pronunciation between a lexical and surface
form are much greater
 probabilistic models of pronunciation variation include a
lot more factors than a simple confusion matrix can include
 One simple way to generate pronunciation likelihoods
is via probabilistic rules

25 / 40
5.8 The Bayesian Method for Pronunciation (4/6)

 a word-initial [δ] becomes [n] if the preceding word


ended in [n] or sometimes [m]

 ncout : number of times lexical [δ] is realized word initially


by surface [n] when the previous word ends in a nasal
 envcount : total number of times lexical [δ] occurs when
the previous word ends in a nasal

26 / 40
5.8 The Bayesian Method for Pronunciation (5/6)

27/ 40
5.8 The Bayesian Method for Pronunciation (6/6)

 Decision Tree Models of Pronunciation Variation

28 / 40
5.9 Weighted Automata (1/12)
 Weighted Automata
 simple augmentation of the finite automaton
 each arc is associated with a probability
 the probability on all the arcs leaving a node must sum to 1

29/ 40
5.9 Weighted Automata (2/12)

30 / 40
5.9 Weighted Automata (3/12)

31 / 40
5.9 Weighted Automata (4/12)

3 2/ 40
5.9 Weighted Automata (5/12)

3 3/ 40
5.9 Weighted Automata (6/12)

3 4/ 40
5.9 Weighted Automata (7/12)

35 / 40
5.9 Weighted Automata (8/12)

36 / 40
5.9 Weighted Automata (9/12)

37 / 40
5.9 Weighted Automata (10/12)

38 / 40
5.9 Weighted Automata (11/12)

39 / 40
5.9 Weighted Automata (12/12)

40 / 40

You might also like