You are on page 1of 24

CS 551/652:

Structure of Spoken Language


Lecture 2: Spectrogram Reading and
Introductory Phonetics
John-Paul Hosom
Fall 2010

Spectrogram Reading
Why bother??
Whats the point of spectrogram reading? Do people read
spectrograms as part of their job? Do computers read spectrograms
in order to recognize speech?
There are some jobs that require spectrogram reading (e.g. phonetic
time alignment), but not many. Automatic speech recognition
systems do not process speech in this way.
Primary reason for spectrogram reading:
If youre going to work on a problem, its advisable to
understand the nature of that problem. Spectrogram reading
provides a direct method for hands-on learning of the
characteristics of speech. Studying phonetics, signal processing,
or techniques in speech recognition/speech synthesis does not
fully convey the complexity and structure of spoken language.

More Formant Data

(source unknown)

Phonetics: Introduction
Phonology:
A description of the systems and patterns of sounds
that occur in a language (abstract), often involving
comparisons between languages and/or evolution of
a language over time.
Phonetics:
A branch of phonology that deals with individual speech
sounds, their production, and their written representation.
Phoneme:
A unit of speech that can be used to differentiate words
(e.g. cat /k ae t/ vs. bat /b ae t/).
Phonemes identify minimal pairs in a language.
The set of phonemes in a language subject to interpretation;
most languages have 20 to 40 phonemes.

Phonetics: Introduction
Allophone:
A speech sound constituting one of the systematic phonetic
variants of a given phoneme. Different allophones are
predictable from environment (e.g. toe, caught,
fitness, writer; sill, still, spill)
Phone:
An acoustic realization of a phoneme. (Many different
phones may represent the same phoneme.)
The phoneme /s/ consists of more than 100 allophones
Pickett, The Acoustics of Speech Communication, p. 7.

Phonemes indicated by / /; phones (allophones) indicated by [ ].

Phonetics: Introduction
Syllable:
Unit of speech containing one or more phonemes.
A vowel in a syllable is called the syllable nucleus.
Most syllables contain one vowel (or diphthong);
some contain only a lateral (bott/le) or nasal
(butt/on) as the most intense sound.
Syllable boundaries sometimes ambiguous
(tas/ty vs. tast/y vs. ta/sty)
Coarticulation:
The blending of two or more adjacent phones, causing
a non-distinct boundary between them. Coarticulation
is caused by smooth changes in the articulators (lips,
tongue, jaw) over time.

Phonetics: Introduction
Coarticulation Example:

uw

aa

you are: /y uw aa r/

Phonetics: Introduction
Another Example of Coarticulation:

Phonetics: Introduction (adapted from Schane, p. 4-6)


Speech signal is continuous; we perceive discrete entities.
(How many sound units are in the word cat?)
One assumption of phonology: utterances can be represented as
sequence of discrete units.
Are such units purely an invention of linguistics?
Spoonerisms (belly jeans vs. jelly beans) and rhymes
indicate small units of language (Reverend William Archibald Spooner (1844-1930))
Utterances of the same word(s) have many differences were
usually only interested in those differences that are linguistically
significant or that are perceived as different.
Implies a somewhat subjective nature to phonology, whereas
we want an objective measure of perceived or produced units.

Phonetics: Distinctive Phonetic Features


Phonemes do not differ randomly from one another; there
are relationships among phonemes (e.g. /p/ vs. /t/ vs. /ah/)
A (distinctive) feature is a phonetic property that can be
used to classify sounds [Ladefoged, p. 42]
Typically, features are associated with aspects of articulation
Features may be binary or multi-valued
Capital letters indicate feature name: Manner
square brackets [] indicate feature value: [+fricative]

Phonetics: Distinctive Phonetic Features


Exact set of features and feature values depends on goals
(no right or wrong set of features or values)
Distinctive features provide a vocabulary for describing speech
Are distinctive features purely an invention of linguistics?
memory tasks show that when people forget a phoneme, they
usually remember a phoneme with similar distinctive features

Phonetics: Distinctive Phonetic Features

nasal tract
velic port
velum (soft palate)
tongue
pharynx
glottis

(hard) palate
oral tract

alveolar ridge
lips
teeth
tongue tip

(vocal folds and


space between vocal cords)

vocal folds
= vocal cords

larynx
(voice box)

The Speech Production Apparatus (from Olive, p. 23)

Phonetics: Distinctive Phonetic Features*


Feature
Description
_
Consonantal produced with a constriction along center line of
oral cavity. Only vowels, /w/, /h/, and /y/ are not.
Vocalic

largely unobstructed vocal tract. Vowels and


liquids (/l/, /r/) are vocalic; glides (/w/, /y/) are not.

Anterior

point of articulation near alveolar ridge, including


all labial and dental sounds.

Coronal

articulation involves front of tongue

Continuant

no complete obstruction in oral cavity; only nasals,


stops, and affricates are non-continuant

Strident

articulation with long, narrow constriction;


such as /s/, /z/, /f/, /v/, /sh/, /zh/, /ch/, /jh/

Voiced

vibration of the vocal folds occurs during articulation

Phonetics: Distinctive Phonetic Features*


Feature
Lateral

Description
_
contact between corona of tongue and roof of mouth,
with lowering of sides of tongue (only /l/ in English)

Nasal

lowering of the velic port and opening of nasal cavity.

High

vowel with high tongue position (narrow constriction);


in English, /iy/, /ih/, /uh/, /uw/

Low

vowel with low tongue position (no constriction);


/ae/, /ao/, /aa/ are (some) low vowels in English.

Back

vowels produce with tongue toward back of mouth;


/uw/, /uh/, /ah/, /ao/, /aa/, /ow/ are back vowels

Round

articulation involving rounding of the lips; only


/uw/, /ow/, /ao/, and /uh/ are rounded in English.
However, /uh/ may take an unrounded form.

Adapted from Language by C.E.Cairns and F. Williams in Normal Aspects of Speech, Hearing,
and Language, edited by Minifie, Hixon, and Williams, 1973, p. 424, as printed in Daniloff p. 51.
*

Phonetics: More Distinctive Phonetic Features*


Feature
Sonorant
Obstruent
Syllabic
Tense
Aspirated
Glottalized
*

Description
_
resonant quality of a sound; vowels are +sonorant,
stops and fricatives are sonorant. nasals also sonorant.
non-sonorants, e.g. stops, fricatives, affricates, which
are formed by obstructing the airflow.
is the phoneme the main sound in a syllable?
vowels are syllabic, stops are usually syllabic,
but there are syllabic nasals and liquids.
tense vowels are longer, more fully articulated, and
more distinct, e.g. /iy ey uw ow aa/; lax vowels
are less so, e.g. /ih eh uh ah/.
produced without a constriction in the vocal tract,
but also without voicing (/h/).
produced with aperiodic or extremely low-frequency
vibrations of the vocal cords.

from Schane, pp. 26-32

Phonetics: Distinctive Phonetic Features


Physiological Features:
Manner
stop /p/, fricative /s/, affricate /ch/, liquid /l/ /r/,
glide /j/ /w/, nasal /m/, vowel /ah/, aspiration /h/
Place
bilabial /p/, labiodental /f/, dental /th/, alveolar /t/,
palato-alveolar /r/, palatal /sh/, velar /k/, glottal /h/,
front /iy/, mid /ah/, back /aa/ (can combine mid + back)
Height
high /iy/, mid-high /ih/, mid /ax/, mid-low /eh/, low /aa/
or high /iy/, mid /eh/, low /aa/ (3 values, plus tense/lax)
Tenseness, Nasality, Rounding
same as previous descriptions

Phonetics: Distinctive Feature Relationships: Vowels


Front
Unrounded

Rounded

Unrounded

Rounded

High

i (iy)

i (ix)

u (uw)

Mid

(eh)

(ah)

o (ow)

Low

(ae)

a (aa)

(ao)

Front, Round

Back, +Round

Tense

Lax

Tense

Lax

High

iy

ih

uw

uh

Mid

ey

eh

ow

ae

ao

Low
*

Back

Back, Round
Tense

Lax
ix
ah, ax

aa

from Schane, pp. 12-13. /ax/ is slightly more centralized than /ah/, and shorter in duration

Phonetics: Distinctive Phonetic Features: The Case of /ae/


/ae/ is classified in the preceding table as lax, but we have been
considering it as tense.
One Rule for Differentiating Tense/Lax:
A lax vowel can never be a word-final stressed vowel
e.g. /iy/ can be word final: be /b iy/, tea /t iy/
/ih/ can not be word final in one-syllable word: /b ih/, /t ih/
/ah/ can be word final, but only if unstressed.
According to this rule, both /eh/ and /ae/ are lax, because they can
not be word-final stressed vowels. In this case, the tense vowel in
contrast to /eh/ is /ey/.
However, /ae/ is long in duration (e.g. Forgie and Forgie (1959) and
Peterson and Lehiste (1960)), making it acoustically more similar to
a tense vowel.
For spectrogram reading, were more concerned with acoustics, so
well call /ae/ a tense vowel, although others may call it lax.

Phonetics: Distinctive Phonetic Features: The Case of /ae/


Looking at 130,000 words in the CMU dictionary:
PHN
/iy/
/ih/
/eh/
/ae/
/uw/
/uh/
/ah/
/aa/
/ao/
/ey/
/ay/
/oy/
/yu/
/aw/
/ow/

CNT
12945
15
30
5
714
2
6413
170
243
962
379
167
171
226
5137

PCNT EXAMPLES
0.10002
0.00012 chui, des, kiwani, lui, moishe, pih, to
0.00023 bienvenue, des, eh, moshe, yahweh, zeh
0.00004 dhaka, lashua, losoya, pah, yeah
0.00552
0.00002 lheureux, milieu
0.04955
0.00131
0.00188
0.00743
0.00293
0.00129
0.00132
0.00175
0.03969
0.21280
21% of words end in vowel/diphthong

Phonetics: Distinctive Feature Relationships: Vowels


Front
iy

Central

Back

ju

uw

High
ih

uh

ey

ix

Mid
ay

eh

oy
ax

ow
aw

ao

ah
Low

ae

from Ladefoged, pp. 38, 81, 218 with correction to /aw/

aa

Phonetics: Distinctive Feature Relationships: Consonants


Manner

obstruent

stops

fricatives

approximant

affricates

Voicing

bilabial

labiodental

dental

alveolar

palato-

palatal

velar

alveolar

+voice

-voice

+voice

dh

zh

-voice

th

sh

+voice

jh

-voice

ch

nasals

+voice

glides

+voice

retroflex

+voice

lateral

+voice

from Olive, p. 28 and Daniloff, p. 56

glottal

ng
y
r

(w)

Phonetics: Distinctive Feature Relationships: Consonants

-sibilant

Labial

Coronal

Dorsal

+nasal

ng

-nasal

p b

t d

k g

stop

ch jh
+sibilant

-sibilant

s z
-lateral

f v

th dh

+lateral
strong
fricative

fricative

y
approximant

l
+anterior

from Ladefoged, p. 44

sh zh

-anterior

Approximants: Terminology
Approximants are NOT the same as Semi-Vowels
(although Rabiner states they are the same). American
English /r/ is debatable, but well exclude it from the
Semi-Vowels for consistency. (Ladefoged p. 229)
Approximants can be divided into two groups: Liquids and Glides
Liquid = {/l/, /r/}, Glide = {/w/, /y/}
(Again, Rabiner confuses things by mixing up these sets)
Lateral = {/l/}
Retroflex = {/r/, /er/, /axr/}.
(In some cases, /er/ is considered a retroflex but /r/ isnt;
well keep things simple by calling /r/ a retroflex).
Central Approximants = {/r/, /w/, /y/},
Lateral Approximant = {/l/}

Approximants: Terminology
Approximant

Semi-Vowel / Glide

/y/

/w/

Liquid

Retroflex

/r, er, axr/


central approximants

Lateral

/l/
lateral approximant

You might also like