Professional Documents
Culture Documents
Spectrogram Reading
Why bother??
Whats the point of spectrogram reading? Do people read
spectrograms as part of their job? Do computers read spectrograms
in order to recognize speech?
There are some jobs that require spectrogram reading (e.g. phonetic
time alignment), but not many. Automatic speech recognition
systems do not process speech in this way.
Primary reason for spectrogram reading:
If youre going to work on a problem, its advisable to
understand the nature of that problem. Spectrogram reading
provides a direct method for hands-on learning of the
characteristics of speech. Studying phonetics, signal processing,
or techniques in speech recognition/speech synthesis does not
fully convey the complexity and structure of spoken language.
(source unknown)
Phonetics: Introduction
Phonology:
A description of the systems and patterns of sounds
that occur in a language (abstract), often involving
comparisons between languages and/or evolution of
a language over time.
Phonetics:
A branch of phonology that deals with individual speech
sounds, their production, and their written representation.
Phoneme:
A unit of speech that can be used to differentiate words
(e.g. cat /k ae t/ vs. bat /b ae t/).
Phonemes identify minimal pairs in a language.
The set of phonemes in a language subject to interpretation;
most languages have 20 to 40 phonemes.
Phonetics: Introduction
Allophone:
A speech sound constituting one of the systematic phonetic
variants of a given phoneme. Different allophones are
predictable from environment (e.g. toe, caught,
fitness, writer; sill, still, spill)
Phone:
An acoustic realization of a phoneme. (Many different
phones may represent the same phoneme.)
The phoneme /s/ consists of more than 100 allophones
Pickett, The Acoustics of Speech Communication, p. 7.
Phonetics: Introduction
Syllable:
Unit of speech containing one or more phonemes.
A vowel in a syllable is called the syllable nucleus.
Most syllables contain one vowel (or diphthong);
some contain only a lateral (bott/le) or nasal
(butt/on) as the most intense sound.
Syllable boundaries sometimes ambiguous
(tas/ty vs. tast/y vs. ta/sty)
Coarticulation:
The blending of two or more adjacent phones, causing
a non-distinct boundary between them. Coarticulation
is caused by smooth changes in the articulators (lips,
tongue, jaw) over time.
Phonetics: Introduction
Coarticulation Example:
uw
aa
you are: /y uw aa r/
Phonetics: Introduction
Another Example of Coarticulation:
nasal tract
velic port
velum (soft palate)
tongue
pharynx
glottis
(hard) palate
oral tract
alveolar ridge
lips
teeth
tongue tip
vocal folds
= vocal cords
larynx
(voice box)
Anterior
Coronal
Continuant
Strident
Voiced
Description
_
contact between corona of tongue and roof of mouth,
with lowering of sides of tongue (only /l/ in English)
Nasal
High
Low
Back
Round
Adapted from Language by C.E.Cairns and F. Williams in Normal Aspects of Speech, Hearing,
and Language, edited by Minifie, Hixon, and Williams, 1973, p. 424, as printed in Daniloff p. 51.
*
Description
_
resonant quality of a sound; vowels are +sonorant,
stops and fricatives are sonorant. nasals also sonorant.
non-sonorants, e.g. stops, fricatives, affricates, which
are formed by obstructing the airflow.
is the phoneme the main sound in a syllable?
vowels are syllabic, stops are usually syllabic,
but there are syllabic nasals and liquids.
tense vowels are longer, more fully articulated, and
more distinct, e.g. /iy ey uw ow aa/; lax vowels
are less so, e.g. /ih eh uh ah/.
produced without a constriction in the vocal tract,
but also without voicing (/h/).
produced with aperiodic or extremely low-frequency
vibrations of the vocal cords.
Rounded
Unrounded
Rounded
High
i (iy)
i (ix)
u (uw)
Mid
(eh)
(ah)
o (ow)
Low
(ae)
a (aa)
(ao)
Front, Round
Back, +Round
Tense
Lax
Tense
Lax
High
iy
ih
uw
uh
Mid
ey
eh
ow
ae
ao
Low
*
Back
Back, Round
Tense
Lax
ix
ah, ax
aa
from Schane, pp. 12-13. /ax/ is slightly more centralized than /ah/, and shorter in duration
CNT
12945
15
30
5
714
2
6413
170
243
962
379
167
171
226
5137
PCNT EXAMPLES
0.10002
0.00012 chui, des, kiwani, lui, moishe, pih, to
0.00023 bienvenue, des, eh, moshe, yahweh, zeh
0.00004 dhaka, lashua, losoya, pah, yeah
0.00552
0.00002 lheureux, milieu
0.04955
0.00131
0.00188
0.00743
0.00293
0.00129
0.00132
0.00175
0.03969
0.21280
21% of words end in vowel/diphthong
Central
Back
ju
uw
High
ih
uh
ey
ix
Mid
ay
eh
oy
ax
ow
aw
ao
ah
Low
ae
aa
obstruent
stops
fricatives
approximant
affricates
Voicing
bilabial
labiodental
dental
alveolar
palato-
palatal
velar
alveolar
+voice
-voice
+voice
dh
zh
-voice
th
sh
+voice
jh
-voice
ch
nasals
+voice
glides
+voice
retroflex
+voice
lateral
+voice
glottal
ng
y
r
(w)
-sibilant
Labial
Coronal
Dorsal
+nasal
ng
-nasal
p b
t d
k g
stop
ch jh
+sibilant
-sibilant
s z
-lateral
f v
th dh
+lateral
strong
fricative
fricative
y
approximant
l
+anterior
from Ladefoged, p. 44
sh zh
-anterior
Approximants: Terminology
Approximants are NOT the same as Semi-Vowels
(although Rabiner states they are the same). American
English /r/ is debatable, but well exclude it from the
Semi-Vowels for consistency. (Ladefoged p. 229)
Approximants can be divided into two groups: Liquids and Glides
Liquid = {/l/, /r/}, Glide = {/w/, /y/}
(Again, Rabiner confuses things by mixing up these sets)
Lateral = {/l/}
Retroflex = {/r/, /er/, /axr/}.
(In some cases, /er/ is considered a retroflex but /r/ isnt;
well keep things simple by calling /r/ a retroflex).
Central Approximants = {/r/, /w/, /y/},
Lateral Approximant = {/l/}
Approximants: Terminology
Approximant
Semi-Vowel / Glide
/y/
/w/
Liquid
Retroflex
Lateral
/l/
lateral approximant