Professional Documents
Culture Documents
¡ In frequency domain
1
X (ω ,τ ) = ∑ H (ωk )G(ωk )W (ω − ωk ,τ )
p
23/08/2019 Dr.Shikha Tripathi@PESU Blr 5
1
X (ω ,τ ) = ∑ H (ωk )G(ωk )W (ω − ωk ,τ )
p
The peaks in the spectral envelope correspond to vocal-tract formant frequencies, F1, F2, …FN.
Illustration of periodic glottal flow: (a) typical glottal flow; (b) same as (a) with
lower pitch; (c) same as (a) with softer glottal flow.
23/08/2019 Dr.Shikha Tripathi@PESU Blr 13
1
X (ω ,τ ) = ∑ H (ωk )G(ωk )W (ω − ωk ,τ )
p
The peaks in the spectral envelope correspond to vocal-tract formant frequencies, F1, F2, …FN.
¡ In spoken speech we can change the resonant modes of vocal cavity &
we can stretch the vocal cords to modify pitch period for different
vowels. Thus the model should be time varying
Variable pitch period
l
Uniform lossless tube with ideal terminators: Tube of uniform cross section
being excited by an ideal source of volume velocity flow
ia(t) v( l,t ) = 0
_
x = 0 x = l
Concatenated tube model. The k-th tube has cross-sectional area Ak and length lk.
Nasal
sound
))))))
Pharyngeal
cavity
Vocal
folds Tongue
hump Oral
sound
output
Trachea
Lungs
Muscle
force
FIGURE 2.2. A block diagram of human speech production.
Wideband Spectrogram
Narrowband Spectrogram
23/08/2019 Dr.Shikha Tripathi@PESU Blr 36
¡ Speech sound t in ‘tea’ and ‘to’ is blurry in
narrowband spectrogram while sharp in
wideband spectrogram
¡ Consider utterance that transitions from
normal voicing to a diplophonic voicing as the
pitch becomes very low (“Jazz Hour”)
A basic tool for spectral analysis is the wideband spectrogram, which is discussed
further in Chapter 6. A spectrogram converts a two-dimensional speech waveform (ampli-
tude/time) into a three-dimensional pattern (amplitude/frequency/time) (Figure 3.10). With
time and frequency on the horizontal and vertical axes, respectively, amplitude is noted by the
¡ Consider spectrogram of short sections of
darkness of the display. Peaks in the spectrum (e.g., formant resonances) appear as dark
horizontal bands [73]. Voiced sounds cause vertical marks in the spectrogram due to an
vowels from a male speaker
increase in speech amplitude each time the vocal folds close. The noise in unvoiced sounds
causes rectangular dark patterns, randomly punctuated with light spots due to instantaneous
¡ Formants for each vowel is are noted by dots
Figure 3.10 Spectrogram of short sections of English vowels from a male speaker.
Spectrogram of short sections of English Vowels spoken by male speaker
Formants for each vowel are noted by dots.
QJ
" \
\
'1:'
= \
\
-" ,
Q. /\
40 \
E .... \
ta
QJ
\
\
" """
>
·zta \
\ /\
'i> /
a:: \ \
20 \ / \
,,
.---." " ....
/
U
- - -- /
"""
0
0 1 2 3
Frequency (kHz)
variations in energy. Spectrograms portray only spectral amplitude, ignoring phase informa-
¡ Wideband spectrograms employ 300 Hz
bandpass filters with response times of a few
ms, which yield good time resolution (for
accurate durational measurements) but
smoothed spectra.
¡ Smoothing speech energy over 300 Hz
produces good formant displays of dark
bands, where the center frequency of each
resonance is assumed to be in the middle of
the band (provided that the skirts of a
resonance are approximately symmetric).
23/08/2019 Dr.Shikha Tripathi@PESU Blr 43
¡ Vowels are voiced (except when whispered),
are the phonemes with the greatest intensity,
and range in duration from 50 to 400 ms in
normal speech.
¡ Like all sounds excited solely by a periodic
glottal source, vowel energy is primarily
concentrated below 1kHz and falls off at
about -6 dB/oct with frequency.
¡ Many relevant acoustic aspects of vowels can
be seen in Figure, which shows brief portions
of waveforms for five English vowels.
23/08/2019 Dr.Shikha Tripathi@PESU Blr 44
58 Chapter 3 • Speech Production and Acoustic Phonetics
Iii
Time
leI
Time
10.1
(1)
'0
.E Time
Q..
e
-<
101
(1)
'0
Time
Q..
e
-e
luI
Time
Figure 3.12 Typical acoustic waveforms for five English vowels. Each plot shows 40 ms of
¡ The signals are quasi-periodic due to repeated
excitations of the vocal tract by vocal fold
closures. Thus, vowels have line spectra with
frequency spacing of F0 Hz (i.e, energy
concentrated at multiples of F0).
¡ The largest harmonic amplitudes are near the
low- formant frequencies.
3500
3000
........
N
:I: 2500
u
c
= 2000
..
c::r
Q,)
c
..E
ns 1500
-cc
0
u
Q,)
V)
1000
500 ...
1400
First-formant frequency (Hz)
Figure 3.13 Plot of FI vs F2 for vowels spoken by 60 speakers. (After Peterson and
Plot of FI vs F2 for vowels spoken by 60 speakers.
Barney [31].)
23/08/2019 Dr.Shikha Tripathi@PESU Blr 48
¡ There is much overlap across speakers, such
that vowels with the same FI-F2 are heard as
different phonemes when uttered by
different speakers.
¡ Other aspects of the vowels (e.g., F0, upper
formants, bandwidths) enable listeners to
make correct interpretations.
¡ Each speaker keeps his vowels well apart in
three-dimensional FI-F3 space.
600 J
•
N
400
200
800 1200 1600 2000 Figure 3.14 The vowel triangle for the vowels
F2 (Hz) of Table 3.2.