You are on page 1of 11

An Introduction to Speech Production

OVERVIEW OF SPEECH GENERATION


Speech is achieved by compression of the lung volume causing air flow which
may be made audible if set into vibration by the activity of the larynx. This sound can then be
made into speech by various modifications of the supralaryngeal vocal tract.
1. Lungs provide the energy source - Respiration
2. Vocal folds convert the energy into audible sound - Phonation
3. Articulators transform the sound into intelligible speech - Articulation

Fig 2.1 - An overview of the vocal tract showing structures that are important in speech sound
production and speech articulation


LUNG STRUCTURE AND FUNCTION
Expanding the thoracic cavity by expanding the rib cage (raising the ribs) and by
lowering the diaphragm increases lung volume, decreases air pressure in the lungs and so air
is drawn in from the from the outside to equalise pressure. Contracting the thoracic cavity by
contracting the rib cage (lowering the ribs) and by raising the diaphragm decreases lung
volume, increases air pressure in the lungs and so air is expelled from the lungs to equalise
pressure with the outside air.






Breathing In and Breathing Out
Fig 2.2 - Flow Chart of Lung Function


LARYNX STRUCTURE AND FUNCTION
The larynx is a continuation of the trachea but the cartilage structures of the
larynx are highly specialised. The main cartilages are the thyroid, cricoid and arytenoid
cartilages. These cartilages variously rotate and tilt to affect changes in the vocal folds. The
vocal folds (also known as the vocal cords) stretch across the larynx and when closed they
separate the pharynx from the trachea. When the vocal folds are open breathing is permitted.
Expand Rib Cage
(raise ribs)
and
Lower Diaphragm

Contract Rib Cage
(lower ribs)
and
Raise Diaphragm
The opening between the vocal folds is known as the glottis. When air pressure below closed
vocal folds (sub-glottal pressure) is high enough the vocal folds are forced open, the vocal
folds then spring back closed under both elastic and aerodynamic forces, pressure builds up
again, the vocal folds open again, ... and so on for as along as the vocal folds remain closed
and a sufficient sub-glottal pressure can be maintained. This continuous periodic process is
known as phonation and produces a "voiced" sound source.
Different laryngeal adjustments affect the way that the vocal folds vibrate and can
result in different voice qualities, some of which are important linguistically in some
languages.
ARTICULATION
When sound is produced at the larynx, that sound can be modified by altering the
shape of the vocal tract above the larynx (supralaryngeal or supraglottal). The shape can be
changed by opening or closing the velum (which opens or closes the nasal cavity connection
into the oropharynx), by moving the tongue or by moving the lips or the jaw.

Fig 4.1 - The major vocal tract articulators

Distinction Between Consonants and Vowels
The distinction between vowels and consonants is based on three main criteria:-
1. physiological: airflow / constriction
2. acoustic: prominence
3. phonological: syllabicity
Sometimes, it is necessary to rely on two or three of these criteria to decide
whether a sound is a vowel or a consonant.
Physiological Distinction
In general, consonants can be said to have a greater degree of constriction than
vowels. This is obviously the case for oral and nasal stops, fricatives and affricates. The case
for approximants is not so clear-cut as the semi-vowels /j/ and /w/ are very often
indistinguishable from vowels in terms of their constriction.
Acoustic Distinction
In general, consonants can be said to be less prominent than vowels. This is
usually manifested by vowels being more intense than the consonants that surround them.
Sometimes, certain consonants can have a greater total intensity than adjacent vowels but
vowels are almost always more intense at low frequencies than adjacent consonants.
Phonological Distinction
Syllables usually consist of a vowel surrounded optionally by a number of
consonants. A single vowel forms the prominent nucleus of each syllable. There is only one
peak of prominence per syllable and this is nearly always a vowel. The consonants form the
less prominent valleys between the vowel peaks. This tidy picture is disturbed by the
existence of syllabic consonants. Syllabic consonants form the nucleus of a syllable that does
not contain a vowel. In English, syllabic consonants occur when an approximant or a nasal
stop follows a homorganic (same place of articulation) oral stop (or occasionally a fricative)
in words such as "bottle" /btl / or "button" /btn /.
The semi-vowels in English play the same phonological role as the other
consonants even though they are vowel-like in many ways. The semi-vowels are found in
syllable positions where stops, fricatives, etc. are found (eg. "pay", "may", and "say" versus
"way").
Classifying Consonants
Most English consonants can be classified using three articulatory parameters:
Voicing: vibration or lack of vibration of the vocal folds.
Place of Articulation: the point at which the air stream is most restricted.
Manner of Articulation: What happens to the moving column of air.

Stricture is the extent to which the oral tract is constricted. The following diagram
is ordered with the greatest "degree" (or "rank order") of stricture at the top and the least
degree of stricture at the bottom. The greatest degree of stricture is the "stop" with complete
closure or constriction of the oral cavity at some point.
Note that stricture specifically relates to oral cavity constriction and ignores the
state of the nasal cavity (ie. whether the velum is open or closed).

Stricture types: Degree of closure as a function of time.
Stop and Tap Stricture
Stop stricture is complete closure followed by release. At the release of an oral
stop there is a brief burst of noise which may be followed by a period of aspiration.
Note that tap stricture is really a special case of stop stricture. A tap is an
extremely brief stop.
Trill Stricture
A trill consists of a series of taps interspersed by narrow openings of a similar
cross-sectional area to a fricative.
Fricative Stricture
Fricative stricture consists of a very narrow opening that has a small enough
cross-sectional area to cause the air to flow turbulently. Turbulent air flow generates random
or aperiodic sound that characterises fricatives.
Approximant Stricture
Approximant stricture consists of a opening with a greater cross-sectional area
than a fricative but the opening is narrower than that of a vowel. The opening is greater than
that which would produce turbulent air flow and aperiodic noise.
Resonant Stricture
This is stricture typical of vowels. Semi-vowels are also often produced with
resonant stricture.
Manner of Articulation
Robert Mannell

CONSONANT MANNER OF ARTICULATION
There is considerable variation in the names applied to manners of articulation in
the literature. In some cases different names are applied to the same manner of articulation,
whilst in other cases labels divided up consonants in different ways.
In the present course we will mostly use the following labels for place or
articulation:-
1) Oral Stops
Oral stops have stop stricture and have a closed velum (ie. no nasal airflow). Oral
stops are sometimes referred to as "plosives" or simply as "stops". Be warned that in the
literature the term "stop" can refer specifically to oral stops, to oral stops and nasal stops
collectively, or to stop stricture.
2) Nasal Stops
Nasal stops have stop stricture and have an open velum (ie. nasal airflow and
nasal resonance). Nasal stops are very often referred to simply as "nasals".
4) Fricatives
Fricatives are consonants with fricative stricture. Many systems include central
and lateral fricatives in the same manner category (but the IPA Pulmonic Consonant chart and
the chart below separates them). In most of the course notes for this subject the central and
lateral fricatives are included in a single manner category. Fricatives are sometimes referred
to as "spirants" but this term is now considered obsolete.
The strong fricatives [s z ] are often termed "sibilant" fricatives.
5) Affricates
Affricates are commonly described as a complex combination of stop plus
fricative. Affricates can also be considered to represent one extreme end of a continuum of
stop aspiration. See the topic "Complex Articulations: Affrication" for more information. In
this course we will treat affricates as a manner oI articulation because this is the customary
way oI classiIying /t

/ and /d

/ in English.
6) Approximants
Approximants are consonants with approximant stricture, although some
approximants also commonly display resonant stricture. It is very easy to become confused
about the terminology used in the literature when referring to this class of consonants. Very
often approximants are divided into the following two sub-classes:-
1. liquids (e.g. English, [] and [l])
2. semi-vowels (e.g. English, [w] and [j]) - also known as "glides"
When this system is used, liquids are effectively those approximants that are not
classified as semi-vowels. Semi-vowels are those consonants that are most like vowels in their
acoustic and articulatory characteristics and the semi-vowels often exhibit resonant stricture.
Very often semi-vowels are only distinguishable from vowels using phonological criteria (see
the topic "Distinction Between Consonants and Vowels" for details on the phonological
distinction between vowels and consonants).
The division of approximants into liquids and semi-vowels is of particular
relevance in this course to the topic "Distinctive Features", where the feature set for is
different for liquids and semi-vowels.
7) Rhotics
Sometimes this further class of consonants is defined, but it is not strictly a
manner of articulation. The rhotic sounds are the so-called r-like sounds and include the
alveolar and retroflex approximants and the alveolar and uvular trills. In this course the term
"rhotic" is used when dealing with the consonants of Australian Aboriginal languages (see the
topic "The Phonetics and Phonology of Australian Aboriginal Languages"). In many
Australian languages there are two consonants in the rhotic class, the alveolar trill [r] and the
alveolar or post-alveolar approximant[]. Also, the term "rhotic" is also used when referring to
the "rhotic" (eg. American) and "non-rhotic" (eg. Australian) dialects of English (see the topic
"The vowel systems of four English dialects : Centring Diphthongs and Non-rhotic Dialects
of English" for more information).
8) Obstruents versus Sonorants
Sometimes you will see consonants classified as "obstruents" or "sonorants".
Obstruents include the oral stops, the affricates and the fricatives. Sonorants include the nasal
stops, approximants and the vowels. For more information on these classes of consonants see
the topic "Distinctive Features".
DEFINING MANNER OF ARTICULATION IN TERMS OF LATERALITY,
NASALITY AND STRICTURE
Manner of articulation can be described in terms
of Laterality, Nasality and Stricture. The following diagram shows how the various
manners of articulation can be defined in terms of their laterality, nasality and stricture
features.

Relationship between Manner of Articulation and laterality, nasality and stricture.
Note that there can be no lateral (oral or nasal) stops; lateral requires the air to be
directed around the sides of the tongue, stop requires the air to be totally obstructed in the
mouth. The features are therefore incompatible.

Place of articulation is defined in terms of the the articulators involved in the
speech gesture. It is common to refer to a speech gesture in terms of an active articulator and a
passive articulator.
ACTIVE ARTICULATORS
An active articulator is the articulator that does all or most of the moving during a
speech gesture. The active articulator is usually the lower lip or some part of the tongue.
These active articulators are attached to the jaw which is relatively free to move when
compared to parts of the vocal tract connected directly to the greater mass of the skull.
PASSIVE ARTICULATORS
A passive articulator is the articulator that makes little or no movement during a
speech gesture. The active articulator moves towards the relatively immobile passive
articulator. Passive articulators are often directly connected to the skull. Passive articulators
include the upper lip, the upper teeth, the various parts of the upper surface of the oral cavity,
and the back wall of the pharynx.
NAMING PLACE OF ARTICULATION
The place of articulation of a consonant is generally named for
the passive articulator. Sometimes the active articulator is also explicitly included in the name
of a place of articulation by use of the prefixes "apico-" and "lamino-".
ILLUSTRATIONS OF PLACE OF ARTICULATION IN ENGLISH
The following links lead to diagrams that illustrate place of articulation in
English. These diagrams are applicable to most dialects of English. The possible exception is
the diagram for /r/ which may be articulated differently in some dialects of English.
1. Oral Stop Articulation
2. Nasal Stop Articulation
3. Fricative Articulation
4. Approximant Articulation
TABLE OF POSSIBLE AND IMPOSSIBLE ARTICULATIONS
The following table makes a distinction between articulations that are actually
used contrastively in the world's languages, articulations that are not used but are possible,
and articulations that are impossible. In some cases, articulations marked with "***" are
actually physically impossible and in some cases "***" marks articulations that are too
difficult to be considered serious possibilities for linguistic use.



Passive
Articulator
Active Articulator
L
Lower
Lip
T
Tongue
Tip
T
Tongue
Blade
F
ront of
Tongue
B
ack of
Tongue
R
oot of
Tongue
V
ocal
Folds
Upper
Lip
b
ilabial
-
--
-
--
*
**
*
**
*
**
*
**
Upper Front
Teeth
l
abio-
dental
(
apico-)
dental
(
lamino-)
dental
-
--
*
**
*
**
*
**
Alveolar
Ridge
-
--
(
apico-)
alveolar
(
lamino-)
alveolar
-
--
*
**
*
**
*
**
Hard
Palate
*
**
r
etroflex
p
alato-
alveolar
p
alatal
*
**
*
**
*
**
Soft
Palate
*
**
*
**
*
**
-
--
v
elar
*
**
*
**
Uvula

*
**
*
**
*
**
*
**
u
vular
*
**
*
**
Pharynx
Wall
*
**
*
**
*
**
*
**
*
**
p
haryngeal
*
**
Vocal
Folds
*
**
*
**
*
**
*
**
*
**
*
**
g
lottal
In the above table:
*** means not a possible articulation
--- means not found in any language (so far)
From the above table, it can be seen that places of articulation are completely
specified by both the active and the passive articulator. Some common articulatory
distinctions are not completely captured by specification of the passive articulator alone.
For example:-
Labiodental articulations cannot be fully specified by just the passive articulator (front
upper teeth) as this would fail to distinguish such articulations from dentals.
Dentals can be either apico-dentals or lamino-dentals (and in some languages these
can contrast). It is essential that the active articulator is specified to separate them.
Note that, with the exception of the lower lip and the vocal folds, the majority of
active articulators are different parts of the tongue. Refer to this figure from lecture 1 for the
location of these different parts of the tongue.