Professional Documents
Culture Documents
Spring 2011
• Note
▫ This outline is produced by TAs
▫ There might be missing contents
▫ Ideas and meanings are more important than
formula
• Lecture 1
▫ General concepts on speech and recognition
Lecture 2
• Speech periodical between 5-100 ms
▫ non-periodical over 200ms
• Spectrogram shows speech intensity in different frequency bands
over time
• Diphthongs
▫ Can’t keep steady
▫ Start from one vowel move towards another
▫ /ay/ (buy) /aw/ (down) /ey/ (bait) /oy/ (boy)
/o/ (boat) /ju/ (you)
• Semivowels
▫ Acoustic characteristics strongly influenced by context
▫ /w/ /l/ /r/ /y/
Lecture 2
• Nasal consonants
▫ Location of constriction determines which nasal consonant
▫ /m/ /n/ /ng/
• Unvoiced Fricatives
▫ Random noise
▫ Can’t see the fundamental frequency
▫ /f/ /θ/ /s/ /sh/ /h/
• Voiced Fricatives
▫ Can see the fundamental frequency
▫ /v/ /th/ /z/ /zh/
• Voiced and Unvoiced Stops
▫ Clear cut
▫ Voiced: /b/ /d/ /g/
▫ Unvoiced: /p/ /t/ /k/
Lecture 3
• Window size
▫ Narrowband spectral analysis
Window size L is large
Good resolution of fundamental frequency and its harmonics
For pitch detection
Only formants
Leecture 3
Mel scale:
• Linear before 1kHz
• Log after 1kHz
• Based on human
hearing perception
• Related to frequency
G Vocal tract
Figure 3.27 Linear prediction model of speech
Pitch Period
p
1 a i z i All pole
i1 1/ A(z)
c z transform
• Approximate vocal tract model
• H(z) is the vocal tract response
p
• p is the order of LPC, # of samples
S ( z ) ai z i S ( z ) GU (
z) • ak is the LPC coefficients
i 1
Why is LPC a good model for speech recognition?
• Frequency domain
• P=4
• P=8
• P=12
• P=16
• P=20
Applications of LPC Analysis
• II: pitch detection
▫ Pitch is a subjective measurement associated with the
fundamental frequency of the vocal cord source
▫ e(n) is the source Gu(n) if the LPC model is good
p
Prediction e( n) s (n) ak s (n k ) Gu ( n)
error m 1
Frequenc
2. Log: y
multiplication → addition Spectrum
3. FFT
Quefrency
Cepstrum
4. Low Pass Filter – Liftering liftering
Frequenc
5. DTFT y
Spectrum
▫ Mel frequency cepstral coefficients lead to another
warped frequency distortion measure.
▫ Mel frequency cepstral coefficients:
K
c~n (log S k ) cos[ n(k 1 / 2) / K ]
k 1
~
where S k is the power coefficients of S(ω) and K is
the truncated number of cepstral coefficients
▫ Mel-frequency cepstral distance for S(ω) and S’(ω):
L
dc˜2 (L) (c˜ n c˜ 'n ) 2
n 1
Exercise 1
• Answer:
▫ It will not work when we used a very long frame.
▫ As speech signal is stationary only over a
sufficiently short period of time (5-100 ms)
▫ The signal characteristic will be changed when we
used a very long window
Exercise 2
A speech signal is sampled at a rate of 20,000 samples/second(Fs =
20kHz). A 20-msec window is used for short-time spectral analysis,
and the window is moved by 10 msec in consecutive analysis frames.
Assume that we use a 512 points FFT to compute DFTs.
5 3 2 3 1 1 1
Assume local path
constraints and slope
4 3 2 2 1 2 2 weights of form
1
1
3 2 1 2 2 1 3 1
2 1 1 1 1 2 3
1 1 1 3 3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2 1
1
1
5 3 2 3 1 1 1
a + b* c
4 3 2 2 1 2 2
a: minimum partial
accumulated distortion
along a path connecting
3 2 1 2 2 1 3
starting point and
previous point of this point
2 1 1 1 1 2 3
b: slop weight
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(2,1) is : D(2,1) =min{1+1*1}
iy
6 3 2 3 2 2 2 1
1
1
5 3 2 3 1 1 1
a + b* c
4 3 2 2 1 2 2
a: minimum partial
accumulated distortion
along a path connecting
3 2 1 2 2 1 3
starting point and
previous point of this point
2 1 1 1 1 2 3
b: slop weight
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(2,1) is : D(2,1) =min{1+1*1}
iy
6 3 2 3 2 2 2 1
1
1
5 3 2 3 1 1 1
a + b* c
4 3 2 2 1 2 2
a: minimum partial
accumulated distortion
along a path connecting
3 2 1 2 2 1 3
starting point and
previous point of this point
2 1 1 1 1 2 3
b: slop weight
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(2,1) is : D(2,1) =min{1+1*1}
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 1 2 2 1 3
1
1
1
2 1 1 1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 1 2 2 1 3
1
1
1
2 1 1 1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(2,2) is : D(2,2) = min{1+1*1}
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 1 2 2 1 3
1
1
1
2 1 2|
(1,1)
1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 1 2 2 1 3
1
1
1
2 1 2|
(1,1)
1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(2,3) is : D(2,3) = min{1+1*1} = 2
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2|(1,1) 2 2 1 3
1
1
1
2 1 2|
(1,1)
1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(2,3) is : D(2,3) = min{1+1*1} = 2
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 2 2 1 3
1
(1,1) 1
1
2 1 2|(1,1) 1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (2,1) and
(3,1) is : D(3,1) = min{2+1*3} = 5
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 2 2 1 3
1
(1,1) 1
1
2 1 2|(1,1) 1 1 2 3
1 1 2|
(1,1)
5|(2,1) 3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 2 2 1 3
1
(1,1) 1
1
2 1 2|
(1,1
1 1 2 3
1 1 2|
(1,1)
5|(2,1) 3 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(3,2) is : D(3,2) = min{2+1*1, 2+1*1}= 3
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 2 2 1 3
1
(1,1) 1
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 2 2 1 3
1
(1,1) 1
1
2 1 2|1 3|1
3|2
1 2 3
1 1 2 |1 5|1 3 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(3,3) is : D(3,3) = min{2+1*2, 2+1*2, 2+1*2}= 4
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(1,1)
1 2 3
3|
1 1 2|
(1,1)
(2,1)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(3,4) is : D(3,4) = min{2+1*2, 2+1*2}= 4
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(3,5) is : D(3,5) = min{2+1*3}= 5
iy
6 3 2 3 2 2 2
5 3 2 5|
(2,3)
1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(3,5) is : D(3,5) = min{2+1*3}= 5
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(4,1) is : D(4,1) = min{5+1*3}= 8
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2|(1,1) 4|(2,1) 2 1 3
1
4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2|(1,1) 4|(2,1) 2 1 3
1
4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(4,2) is : D(4,2) = min{5+1*1,3+1*1}= 4
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2|(1,1) 4|(2,1) 2 1 3
1
4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(4,3) is : D(4,3) = min{5+1*2, 3+1*2, 4+1*2}= 5
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(4,4) is : D(4,4) = min{3+1*1, 4+1*1, 4+1*1}= 4
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(4,5) is : D(4,5) = min{4+1*1, 4+1*1,5+1*1}= 5
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
1 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
1 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(4,6) is : D(4,6) = min{4+1*2, 5+1*2}= 6
iy
6 3 2 3 6|
(3,4)
2 2
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
1 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6(4,4) 7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(2,2) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 11|(4,1) 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6|(4,4) 7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4|(2,2)
4 3 2 4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(1,1) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 10|(4,1) 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6|
(4,4)
7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(1,1) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 10|(4,1) 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6|
(4,4)
7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(1,1) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 10|(4,1) 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6|
(4,4)
7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(1,1) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 10|(4,1) 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6|
(4,4)
7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(1,1) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 10|(4,1) 3
1 2 3 4 5 6 ix
• Total accumulated path distance = 7
• Average path distance = 7/6