Review On ELEC333: Spring 2011 Nico & Wilber

Review on ELEC333
Spring 2011
Nico & Wilber

Outline of this course
• Note
▫ This outline is produced by TAs
▫ There might be missing contents
▫ Ideas and meanings are more important than
formula
• Lecture 1
▫ General concepts on speech and recognition
Lecture 2
• Speech periodical between 5-100 ms
▫ non-periodical over 200ms
• Spectrogram shows speech intensity in different frequency bands
over time
• The vocal cord’s frequency of vibration

▫ fundamental frequency of a sound
• The harmonics of vocal cord vibrations
▫ produce the regular spacing between the vertical spikes in a spectrum
• The resonances of the vocal tract
▫ frequency formants- peaks in the spectrum
• Vocal tract configurations
▫ formant positions in the spectrum and leads to different vowel
sounds
Envelope = formants
Lecture 2
• Vowel
▫ Different formant positions give rise to different vowels
▫ /a/ /i/ /u/ /e/ /o/
• Diphthongs
▫ Can’t keep steady
▫ Start from one vowel move towards another
▫ /ay/ (buy) /aw/ (down) /ey/ (bait) /oy/ (boy)
/o/ (boat) /ju/ (you)
• Semivowels
▫ Acoustic characteristics strongly influenced by context
▫ /w/ /l/ /r/ /y/
Lecture 2
• Nasal consonants
▫ Location of constriction determines which nasal consonant
▫ /m/ /n/ /ng/
• Unvoiced Fricatives
▫ Random noise
▫ Can’t see the fundamental frequency
▫ /f/ /θ/ /s/ /sh/ /h/
• Voiced Fricatives
▫ Can see the fundamental frequency
▫ /v/ /th/ /z/ /zh/
• Voiced and Unvoiced Stops
▫ Clear cut
▫ Voiced: /b/ /d/ /g/
▫ Unvoiced: /p/ /t/ /k/
Lecture 3
• Window size
▫ Narrowband spectral analysis
 Window size L is large
 Good resolution of fundamental frequency and its harmonics
 For pitch detection
▫ Wideband spectral analysis

 L is small
 the resolution is poor
 No harmonics can be seen
 For formant detection
Narrowband spectral analysis
Large window size
Harmonic + fundamental frequency
Wideband spectral analysis

Small window size
Only formants
Leecture 3
Mel scale:
• Linear before 1kHz
• Log after 1kHz
• Based on human
hearing perception
• Related to frequency
• Emphasize different frequency component according to speech perception

Linear Predictive Coding(LPC)
source u(n) s(n) Speech
A ( z) sound
G Vocal tract
Figure 3.27 Linear prediction model of speech
Pitch Period
Voiced / Unvoiced Vocal Tract Parameters

Impulse Switch
Train
Generator u(n) s(n)
Time-varying
Digital Filter
Random
Noise H(z
G Gain of the source
Generator )
Figure 3.28 Speech synthesis model based on LPC model
Linear Predictive Coding
We approximate a speech signal at

time n, s(n) (prediction), as a linear  H(z)  S(z) /GU(z)
combination of the past p samples 1
(data):  p
p
1  a i z i All pole
s(n)   a i s(n  i)  Gu(n)

i1
i1  1/ A(z)
c z  transform
 • Approximate vocal tract model
• H(z) is the vocal tract response
p
• p is the order of LPC, # of samples
S ( z )   ai z i S ( z )  GU (
z) • ak is the LPC coefficients
i 1

Why is LPC a good model for speech recognition?
• LPC model is all pole (i.e. peaks)

▫ This is similar to the vocal tract transfer function
during voiced speech production
▫ Good enough for unvoiced speech
• Good source-vocal tract separation
• Simpler computation
Applications of LPC Analysis
• I: Formant Position Detection p
▫ LPC Model, H (e j )  G /(1   ai e  jk )
i 1
▫ As p increases, more peaks are shown in the LPC

spectrum
▫ Larger p means longer window
 Capturing vocal cord pitch information
▫ In general, 8≤ p ≤10
• Time domain
• Frequency domain
• P=4
• P=8
• P=12
• P=16
• P=20
Applications of LPC Analysis
• II: pitch detection
▫ Pitch is a subjective measurement associated with the
fundamental frequency of the vocal cord source
▫ e(n) is the source Gu(n) if the LPC model is good
p
Prediction e( n)  s (n)   ak s (n  k )  Gu ( n)
error m 1
▫ e(n) is large at the onset of each pitch period for voiced

speech
▫ We can find the pitch period from periodic onsets of
large e(n)
Pitch period
Lecture 4
• A better representation model -> Cepstrum

• LPC parameters are poor representations of speech
information :
▫ The speech spectrum |S(ω)| consists of two parts
▫ |E(ω)|, the quickly varying part (due to vocal cord vibrations)
▫ |Θ(ω)|, the slowing varying part (due to vocal tract responses)
▫ we want to separate these two parts

▫ But |E(ω)| and |Θ (ω)| are combined into S(ω) in a nonlinear
fashion and cannot be separated easily
1. FFT → S(ω)
Frequenc
2. Log: y
multiplication → addition Spectrum
3. FFT
Quefrency
Cepstrum
4. Low Pass Filter – Liftering liftering
Frequenc
5. DTFT y
Spectrum
▫ Mel frequency cepstral coefficients lead to another
warped frequency distortion measure.
▫ Mel frequency cepstral coefficients:
K
c~n   (log S k ) cos[ n(k  1 / 2) / K ]
k 1
~
where S k is the power coefficients of S(ω) and K is
the truncated number of cepstral coefficients
▫ Mel-frequency cepstral distance for S(ω) and S’(ω):
L
dc˜2 (L)   (c˜ n  c˜ 'n ) 2
n 1
Exercise 1
• Q: When we compute the MFCC, the frame

length is usually very short (20 ms). If we use a
very long frame (200 ms), will it work? Please
explain.
Exercise 1 (Cont.)
• Answer:
▫ It will not work when we used a very long frame.
▫ As speech signal is stationary only over a
sufficiently short period of time (5-100 ms)
▫ The signal characteristic will be changed when we
used a very long window
Exercise 2
A speech signal is sampled at a rate of 20,000 samples/second(Fs =
20kHz). A 20-msec window is used for short-time spectral analysis,
and the window is moved by 10 msec in consecutive analysis frames.
Assume that we use a 512 points FFT to compute DFTs.
1. How many speech samples are used in each segment?

2. What is the frame rate of the short-time spectral analysis?
3. What is the resulting frequency resolution(spacing in Hz) between
adjacent spectral samples?
Answer
1. How many speech samples are used in each segment?
20ms of speech at rate of 20,000 samples/sec gives:

20×10-3 sec × 20,000 samples/sec = 400 samples
Each section of speech is 400 samples in duration.
2. What is the frame rate of the short-time
spectral analysis?
Since the shift between consecutive speech frames is 10msec,

the frame rate is:
frame rate = 1/ frame shift = 1/(10×10-3 sec ) = 100/sec
That is, 100 spectral analysis are performed per second of speech
3. What is the resulting frequency resolution(spacing in
Hz) between adjacent spectral samples?
The frequency resolution = sampling rate/ DFT size

= 20,000Hz/512
≈ 39 Hz
Exercise 3
• Consider the problem of finding the best path through
the 6*6 grid of local distances shown below, Assume that
the path must begin at (1,1) and end at (6,6).
Consider the problem of finding the best path through the 6*6 grid of local
iy distances shown below, Assume that the path must begin at (1,1) and end
at (6,6).
6 3 2 3 2 2 2
5 3 2 3 1 1 1
Assume local path
constraints and slope
4 3 2 2 1 2 2 weights of form
1
1
3 2 1 2 2 1 3 1
2 1 1 1 1 2 3
1 1 1 3 3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2 1
1
1
5 3 2 3 1 1 1
a + b* c
4 3 2 2 1 2 2
a: minimum partial
accumulated distortion
along a path connecting
3 2 1 2 2 1 3
starting point and
previous point of this point
2 1 1 1 1 2 3
b: slop weight
c: local distance of this point

1 1 1 3 3 3 3
1 2 3 4 5 6 ix
The minimum partial accumulated distortion along a path connecting (1,1) and
(2,1) is : D(2,1) =min{1+1*1}
iy
6 3 2 3 2 2 2 1
1
1
5 3 2 3 1 1 1
a + b* c
4 3 2 2 1 2 2
a: minimum partial
3 2 1 2 2 1 3
starting point and
2 1 1 1 1 2 3
b: slop weight

1 1 1 3 3 3 3
1 2 3 4 5 6 ix
(2,1) is : D(2,1) =min{1+1*1}
iy
6 3 2 3 2 2 2 1
1
1
5 3 2 3 1 1 1
a + b* c
4 3 2 2 1 2 2
a: minimum partial
3 2 1 2 2 1 3
starting point and
2 1 1 1 1 2 3
b: slop weight

1 1 1 3 3 3 3
1 2 3 4 5 6 ix
(2,1) is : D(2,1) =min{1+1*1}
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 1 2 2 1 3
1
1
1
2 1 1 1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 1 2 2 1 3
1
1
1
2 1 1 1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
(2,2) is : D(2,2) = min{1+1*1}
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 1 2 2 1 3
1
1
1
2 1 2|
(1,1)
1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 1 2 2 1 3
1
1
1
2 1 2|
(1,1)
1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
(2,3) is : D(2,3) = min{1+1*1} = 2
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2|(1,1) 2 2 1 3
1
1
1
2 1 2|
(1,1)
1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
(2,3) is : D(2,3) = min{1+1*1} = 2
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 2 2 1 3
1
(1,1) 1
1
2 1 2|(1,1) 1 1 2 3
1 1 2|
(1,1)
3 3 3 3
1 2 3 4 5 6 ix
(3,1) is : D(3,1) = min{2+1*3} = 5
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 2 2 1 3
1
(1,1) 1
1
2 1 2|(1,1) 1 1 2 3
1 1 2|
(1,1)
5|(2,1) 3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 2 2 1 3
1
(1,1) 1
1
2 1 2|
(1,1
1 1 2 3
1 1 2|
(1,1)
5|(2,1) 3 3 3
1 2 3 4 5 6 ix
(3,2) is : D(3,2) = min{2+1*1, 2+1*1}= 3
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 2 2 1 3
1
(1,1) 1
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 2 2 1 3
1
(1,1) 1
1
2 1 2|1 3|1
3|2
1 2 3
1 1 2 |1 5|1 3 3 3
1 2 3 4 5 6 ix
(3,3) is : D(3,3) = min{2+1*2, 2+1*2, 2+1*2}= 4
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 2 1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(1,1)
1 2 3
3|
1 1 2|
(1,1)
(2,1)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
(3,4) is : D(3,4) = min{2+1*2, 2+1*2}= 4
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 3 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
(3,5) is : D(3,5) = min{2+1*3}= 5
iy
6 3 2 3 2 2 2
5 3 2 5|
(2,3)
1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
(3,5) is : D(3,5) = min{2+1*3}= 5
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(1,1) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
3 3 3
1 2 3 4 5 6 ix
(4,1) is : D(4,1) = min{5+1*3}= 8
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2|(1,1) 4|(2,1) 2 1 3
1
4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2|(1,1) 4|(2,1) 2 1 3
1
4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
1 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
(4,2) is : D(4,2) = min{5+1*1,3+1*1}= 4
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2|(1,1) 4|(2,1) 2 1 3
1
4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 2 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
(4,3) is : D(4,3) = min{5+1*2, 3+1*2, 4+1*2}= 5
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
1 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
(4,4) is : D(4,4) = min{3+1*1, 4+1*1, 4+1*1}= 4
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 1 1 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
(4,5) is : D(4,5) = min{4+1*1, 4+1*1,5+1*1}= 5
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
1 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 2 2 2
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
1 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
(4,6) is : D(4,6) = min{4+1*2, 5+1*2}= 6
iy
6 3 2 3 6|
(3,4)
2 2
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
1 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 2 2
3 2 2| 4|(2,1) 5|(3,2) 1 3
1
(2,2) 4|(2,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 2 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 3 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6(4,4) 7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(2,2) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 11|(4,1) 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6|(4,4) 7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4|(2,2)
4 3 2 4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(1,1) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 10|(4,1) 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6|
(4,4)
7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(1,1) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 10|(4,1) 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6|
(4,4)
7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(1,1) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 10|(4,1) 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6|
(4,4)
7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(1,1) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 10|(4,1) 3
1 2 3 4 5 6 ix
iy
6 3 2 3 6|
(3,4)
6|
(4,4)
7|(5,5)
5 3 2 5|(2,3) 5|(3,3)
5|(3,4)
5|(4,4) 1
4 3 2 4|(2,2)
4|(2,3)
4|(3,2) 6|(4,2)
6|(4,4)
2
3 2 2| 4|(2,1) 5|(3,2) 5| 3
1
(1,1) 4|(2,2) (4,2) 1
4|(2,3)
1
2 1 2|(1,1) 3|
(2,1)
4|(3,2) 6|(4,2) 3
3|
1 1 2|
(1,1)
(2,2)
5|
(2,1)
8|(3,1) 10|(4,1) 3
1 2 3 4 5 6 ix
• Total accumulated path distance = 7
• Average path distance = 7/6

Review On ELEC333: Spring 2011 Nico & Wilber

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Review On ELEC333: Spring 2011 Nico & Wilber

Uploaded by

Copyright:

Available Formats

Review on ELEC333

Nico & Wilber

• The vocal cord’s frequency of vibration

▫ Wideband spectral analysis

Harmonic + fundamental frequency

Wideband spectral analysis

• Emphasize different frequency component according to speech perception

Voiced / Unvoiced Vocal Tract Parameters

We approximate a speech signal at

s(n)   a i s(n  i)  Gu(n)

• LPC model is all pole (i.e. peaks)

▫ As p increases, more peaks are shown in the LPC

▫ e(n) is large at the onset of each pitch period for voiced

• A better representation model -> Cepstrum

▫ we want to separate these two parts

• Q: When we compute the MFCC, the frame

1. How many speech samples are used in each segment?

20ms of speech at rate of 20,000 samples/sec gives:

Since the shift between consecutive speech frames is 10msec,

The frequency resolution = sampling rate/ DFT size

c: local distance of this point

c: local distance of this point

c: local distance of this point

You might also like