Chapter 2 - Describing Speech Sounds

1
Chapter 2: Describing Speech Sounds
Preliminaries
It is a fact that different languages have different speech sounds. Anyone who has had any experience
with a second language can probably tell you that one of the most immediate clues to this difference is
the general impression of the way the language sounds. Even if we use English as a comparison point, it
is fairly obvious that:
o Some languages have sounds that English lacks

o Some languages lack sounds that English has
o Sometimes a sound in English may be similar in a different language, but pronounced in
a slightly different way
Different languages have different sounds, different types of syllables, and different speech
rhythms. Thus, when describing or analyzing the speech sounds of a different language, we cannot rely
on the phonology of English to guide us; we have to become trained as phoneticians, listening beyond
the constraints that English places on us.
Likewise, we cannot rely on the Roman alphabet to capture all of the possible speech sounds
found in languages. The English spelling system is inadequate, oftentimes reflecting earlier historical
stages in the language, and several scholars have called for a spelling reform. The biggest problem for
doing phonology is that there is not always a one-to-one correspondence between sounds and letters,
and there are numerous inconsistencies in how sounds are represented in spelling. For instance, the
single letter <c> can be used to represent many different sounds (where we represent letters or
‘graphemes’ in angled brackets). Notice that the <c> in <cat> is made with the back of the tongue
stopping the airflow in the mouth, while the <c> in <cell> is made with the front of the tongue, and with
a rapid-moving airstream. Here we have a one-to-many correspondence between a symbol and sounds.
We can take the case of <c> again, and this time compare it with <k>. These are two distinct letters in
the Roman alphabet, and both are used to represent speech sounds in English. However, both letters
can be used to represent the same sound. Notice that <cat> and <keep> both start with the same
sound, though not with the same letter. This illustrates a many-to- one correspondence between
symbols and sounds. These inconsistencies in spelling/pronunciation have led some humorists to create
what are known as ghoti words (often attributed to George Bernard Shaw). The word ghoti was
designed to be pronounced as ‘fish’, but with the intention of using the most unintuitive uses of spelling
possible. So <gh> can be pronounced as [f] in the word ‘laugh’; <o> can be pronounced as [ɪ] as in
‘women’, and <ti> can be pronounced as [ʃ] in the word ‘invention’. Shaw and others used constructed
examples like these to point out the fact that the English spelling system is not a transparent one. This is
also the complaint of many second-language learners of English who attempt to learn all of the rules
(and numerous exceptions) to the spelling system.
So, it’s obvious that English spelling isn’t an ideal system to use. But why not just use the Roman
alphabet, and be consistent with spellings? It’s not so easy, and it would only be effective if we were
working with a language with 21 consonants and 5 vowels (or less). Any more and we would be out of
symbols (as there are 26 symbols in the alphabet total). This is exactly the case in English, where we
have 24 consonants and roughly 12 vowels (depending on the dialect – for instance General American
has 12, but Californian English and other dialects have only 11), not counting diphthongs. For many
cases we rely on digraphs, or combinations of symbols to express the sound in question, such as <ch>
for the sound at the beginning of cheese. While a handy orthographic device, digraphs don’t get around
the score of problems mentioned above, and they run into problems of their own when sequences of
sounds like <c> (or whatever sound <c> stands for) and <h> are allowed in a language – then the
problem is disambiguating whether the sequence <c>-<h> is a digraph meant to represent a single
sound, or instead a sequence of two separate sounds.
For all of these reasons, we are left with the desire to use an alphabet that will be maximally
descriptive and will cover all of the speech sounds possible in languages (i.e. it will be universal). The
alphabet we use is the International Phonetic Alphabet (the IPA; cf. the International Phonetic
2
Association 1999, and the website of the IPA, http://www.langsci.ucl.ac.uk/ipa/). It is the alphabet that
will be used throughout this course, and it is the one that is standardly used in the field of linguistics. For
the purposes of this course, we will use the IPA as consistently as possible throughout (though if that is
for some reason not possible, we will point that out). It is a recognized standard, provides for a
transparent relationship between sound and symbol, and allows us to compare the sounds of two
different languages. If, for instance, we come across an IPA symbol like [j] in a grammar of one
language, and it shows up in the grammar of another language as well, we know that this specific sound
in these two languages is roughly equivalent (and something like the initial sound in English yes).
Without a community standard, we are left to decipher what the phonetic values are from writing
system to writing system.
Other systems certainly exist. Within the scientific community (primarily anthropologists and
linguists of the last century), there has been common use of another phonetic alphabet: the Americanist
alphabet (American Anthropological Association 1916). This alphabet also strives to maximize efficiency
by introducing a one-to-one correspondence between symbols and sounds, but uses different symbols.
For instance, familiar symbols such as š, č belong to the Americanist system. It is useful to know about
the existence of this alphabet, as it is still commonly used (or certain symbols are commonly used)
throughout the world; however, the number of symbols in this alphabet is not as complete as the IPA,
and so we will not be using it here.
It is also worth pointing out that local orthographies are oftentimes used. Are these bad?
Absolutely not! Orthographies can be helpful for communities (e.g. in terms of literacy, typing on a
computer, bilingualism, language revitalization, etc.). If one were to write teaching materials or a
pedagogical grammar of a language, then it would probably be advisable to use the local orthography;
however, whenever a local orthography is used, a phonetic description (with IPA equivalents) should be
provided so that other readers will know what the phonetic value of a particular sound is.
Since we now have the tools to describe speech sounds, we will next explore how speech is
produced. After discussing the mechanisms behind the production of speech, we will then explore
further the IPA and how it is organized. It is sometimes difficult to try and understand how a whole
system of descriptions gets put into actual sound. Since it is often easier to understand what one is
dealing with if one can actually hear the sound produced, the late Peter Ladefoged of UCLA has put
materials from his books “A course in phonetics” and “Consonants and vowels” online, along with sound
files (these materials are found at the UCLA phonetics lab’s website: http://www.phonetics.ucla.edu/). It
is highly recommended that the reader explore these sounds while learning about them here.
Anatomy and physiology of speech

In order to understand the descriptive notation for speech sounds, we must first understand where and
how speech sounds are produced. For this, we need to become familiar with the primary anatomical
apparatus for speech production: the vocal tract. To define the vocal tract, we have to think about what
is necessary for producing a speech sound. The most intuitive response to this is airflow; without a
stream of air moving through our vocal tract, it would be very difficult to produce audible sounds for
speech. So where is the most appropriate place to begin describing speech? Respiration is a good start.
Respiration is the technical term for breathing, and the primary biological function of respiration is to
keep us alive. Respiration also has a secondary function, though: it provides the airflow that is necessary
for speech. Take a moment to test this necessity: exhale all of your breath, then attempt to utter a
sentence at normal speaking rate. This is difficult, if not impossible to accomplish. Speech requires a
constant supply of airflow.
The respiratory tract is actually divided into two separate tracts: the upper respiratory tract,
and the lower respiratory tract. The lower respiratory tract consists of the lungs, the bronchi, the
trachea, and the larynx. The upper respiratory tract consists of the pharyngeal cavity, the oral cavity,
and the nasal cavity.
Typically everything from the larynx to the lips is considered the vocal tract (though of course
humans do more than just talk with this tract; we eat, breathe, etc.). The anatomical/physiological
configuration of the vocal tract allows for speech sounds to be created and manipulated in different
ways, including resonance (or filtering), articulation, and turbulence. We’ll explore these concepts as we
progress through the chapter.
3
The Larynx
Moving up from the lungs, the first anatomical structure relevant for speech is the larynx. The larynx is
a framework that is made up of 9 cartilages and 1 bone; it is bounded below by the trachea, and above
by the hyoid bone. The larynx is the structure that houses the vocal folds (often commonly referred to
as the vocal ‘cords’). The primary biological function of the larynx is to prevent debris from entering the
trachea and/or lungs (i.e. it prevents choking). In a sense, the larynx is a safety valve that blocks foreign
particles from making their way into the lungs, and its job is to expel any of these foreign substances
which may enter the airway (the result being coughing), but it also has the added job of preventing air
from escaping the lungs. This is known as “bearing down”, and a good illustration comes in holding one’s
breath, or trying to lift something very heavy. When we perform these actions, the automatic response
is to close off the airway.
The secondary function of the larynx is it provides the source for speech sounds. What is meant
by ‘source’ will be expanded on below, but for now this means it gives the voice its quality. An
important structure in this regard is the glottis. More accurately, the glottis is not a structure at all, but
the space between the vocal folds. Since the vocal folds are movable structures, this space can vary.
The vocal folds themselves are structures of muscle and ligament (more precisely, the vocalis muscle,
the thyromuscularis muscle, and the vocal ligament). When the vocal folds are pulled apart (abducted),
this allows air to flow unimpeded through the glottis, resulting in a voiceless sound. When the vocal
folds are pulled closely together (adducted) and air is flowing through the glottis, this results in voiced
sounds. The combination of the vocal folds being adducted and thus blocking airflow, then being blown
apart by the increased pressure below them, then being adducted again through the Bernoulli effect
(with the cycle repeating itself) is what is known as the myoelastic aerodynamic theory of phonation.
This theory stands in contrast to the outdated idea that voicing results from fine-grained control of the
vocal folds, with nerve firings alone (without the help of the airstream) setting the vocal folds adducting
and abducting, adducting and abducting, etc.
The difference in voicelessness and voicing with respect to the configuration of the vocal folds is
schematized here (as seen from above the glottis looking down into the larynx):
Voicelessness Voicing
The larynx is also important to other aspects of speech sounds (specifically to voice quality, voice
pitch, and airflow mechanisms), which we will discuss in detail in later chapters.
Alaryngeal speech
What happens if someone has no larynx with which to create the source for speech sounds? This
may sound like a bizarre question, but it is the state of affairs that results when a laryngectomy is
performed, which involves the surgical removal of the larynx. This procedure essentially removes
the source of speech. There are two common methods for creating alaryngeal speech. One is to
employ esophageal speech, which uses air from the mouth which is pushed into the esophagus,
then released (with oscillation of esophageal structures instead of the larynx). Another is to use an
artificial device called an electrolarynx. This device creates vibrations, which when held against the
throat serve as “voicing” replacements. With an electrolarynx used as a voice source, the result is
that all speech using this device is continuously voiced (i.e. there are no voiceless sounds).
Source-Filter model of speech

Our approach to phonology assumes a source-filter model of speech. This breaks the speech
production process into two parts: a source, and a filter. The source is the airflow that is generated at
the lungs, which moves through the vocal tract. Since the larynx is the entryway for air flowing from
the lungs into the vocal tract, air passing through will create voiceless or voiced sounds (or even
4
others, as we’ll see later in the course). The filter is the vocal tract itself. Think of the various
configurations or shapes that the vocal tract can make – all of the different tongue positions, lip
positions, etc. It is in this way that the vocal tract filters the source airflow – it shapes this airflow to
give it a distinctive acoustic quality. A good analogy is a trombone: the source of the trombone’s
sound comes from the vibrating lips of the trombone player, and the changing shape of the
instrument (by manipulating the slide, essentially making the instrument longer or shorter) changes
the quality of the sound that is produced.
Place of Articulation
Since we have discussed some aspects of the source of speech, it is now relevant to talk about the
filter: the vocal tract above the glottis. We can start with where in the vocal tract a speech sound is
produced (or articulated). This constitutes one descriptive parameter in phonetics/phonology: Place
of articulation. Articulators are parts of the vocal tract that form constrictions that result in the
sounds we use for speech. There are two different types of articulator: active articulators and
passive articulators. An active articulator is something that actually moves and touches (or
approximates) some other structure. For instance, the tongue is an active articulator, as it moves
around in the oral cavity to form constrictions. The upper teeth, or the palate, on the other hand, are
passive articulators: they are static structures that can be used to form constrictions, but they do not
themselves move.
Let’s start with the most anterior articulators in the vocal tract: the lips. Consonant sounds
made with the lips are generally called labial. The two ways of articulating labial sounds are with
both lips (bilabial), and by pressing the lower lip against the upper teeth (labiodental). Both of these
types of sound occur in English: bilabials include the first sound in pat [p] and mat [m], while
labiodentals occur as the first sounds in fat [f] and vat [v] (phonetic sounds will be placed in square
brackets; the symbols used are those of the IPA).
This about exhausts the possible sounds made with the lips (or a single lip in conjunction
with some other articulator).
Summary of Labial places of articulation

▪ Bilabial – made with both lips
▪ Labiodental – made with the upper lip and tongue
The next most posterior articulator is the tongue. The tongue is a collection of muscles called a muscular
hydrostat. This means that there is no bone structure that supports the tongue, but rather muscles that
work together to support movements. This also means that the tongue preserves its volume. An oft-
cited analogy is a water balloon: if you squeeze a water balloon, it does not change volume, the water in
the balloon merely moves into different areas. The tongue works in the same way. However, the tongue
does not act as one monolithic articulator; instead it seems to act as two distinct components (and
possibly three, as we will see below). Sounds produced with the “front” part of the tongue are referred
to as coronals. The class of coronal sounds includes those made with the blade of the tongue. The blade
of the tongue constitutes the portion that you can comfortably grab with your fingers, which is
approximately the first 15-20mm (Keating 1991). The absolute tip of the tongue is called the apex.
Sounds made with the apex are called apical, and those made with the blade only are called laminal.
Coronal sounds are by far the most common in languages, and all languages have at least one coronal
sound in their inventory (Maddieson 1984).
Now, if you move your tongue along the roof of your mouth and all the way down to your teeth,
you can tell that there are many different places that can be used in producing sound. The entire class of
coronals is typically broken down into the following places of articulation, going from anterior to
posterior in the oral tract: dental, alveolar, palato-alveolar, retroflex, and palatal. We can explore each
of these briefly in turn.
Dental is for the most part the most anterior place of articulation within the coronals. Dental
sounds can range from being interdental (with the tongue between both sets of teeth) to being truly
dental (with the tongue tip placed behind the upper dentition). True dental sounds in English include
5
voiceless [θ] and voiced [ð], the first sounds in ‘thin’ and ‘this’, respectively. Depending on the
dialect/idiolect of English, these sounds can be interdental, dental, or vary between both.
If we move slightly further back, we come to a noticeable ridge on the roof of the mouth. This is
the alveolar ridge. It stretches back from the upper dentition, and then sharply slopes upward. You can
feel this structure for yourself if you place your finger on the roof of your mouth (or simply move your
tongue along the roof). The upward sloping area is sometimes called the ‘corner’. Placing the tongue tip
against the alveolar ridge results in an alveolar sound, something which is extremely common across
languages (Maddieson 1984). In English this includes voiceless [t] and [s], as well as voiced [d] and [z]
and nasal [n].
Moving further back along the roof of the mouth we come to a long, extended surface called the
palate. Placing the tongue at the most anterior part of the palate, or just at the corner, produces
postalveolar sounds. Postalveolars are found at the beginning of English words such as ‘shirt’ [ʃ] and
‘church’ [ʧ] and ‘judge’[ʤ], and also at the end of the word ‘beige’ [ʒ].
Retroflexes are produced with the tongue tip (the apex) or the tongue blade curled back so that
the underside of the tip/blade forms a constriction with the roof of the mouth, typically at the alveolar
ridge, or possibly along the corner. Retroflex consonants are famously found in many of the languages of
India, though some English speakers’ production of [ɹ] involves a retroflex gesture.
This takes us to the most posterior coronal articulation, produced by the tongue blade touching
or approximating the palate. These sounds are termed palatal. English has one nasal palatal sound (in
onion) [ɲ], and a palatal consonant produced by approximating the tongue to this fixed position, the first
sound in ‘yellow’ [j].
There are also other coronal sounds available for speech, though these are either rarer than the five we
have talked about, or they are articulatorily similar to a subset of them. One extremely rare sound is the
linguolabial – made with the tip of the tongue touching the top lip. Lingulabials are found in Tangoa
(Oceanic; spoken in Vanuatu), which has a linguolabial voiced fricative, nasal, and
voiceless plosive [n̼, t̼ , ð̼], respectively. Other coronal sounds include palatoalveolars (as opposed to
alveopalatals, and are articulated slightly further back, and potentially with different parts of the tongue
blade).
Summary of the major coronal places of articulation:

• Dental – made with the tongue blade at the upper dentition
• Alveolar – made with the tongue blade at the alveolar ridge
• Postalveolar – made with the tongue blade posterior to the alveolar ridge
• Retroflex – made with the tongue apex or underside of apex at alveolar ridge
• Palatal – made with the tongue blade at the hard palate
Dorsal sounds are made with the body of the tongue (the dorsum). There are two primary
places where dorsals are made: at the velum, and at the uvula. The entire palate extends back all the
way to the uvula; however, only part of this structure is bone. This is usually called the ‘hard palate’, or
just simply the palate. The portion where the bone ends and only tissue remains is termed the ‘soft
palate’, or more precisely, the velum. Sounds created by articulating the tongue body near or at the
velum are termed ‘velar’. By placing the tongue body all the way back to the uvula, this produces a
uvular sound. Velars are very common in languages; uvulars are less common.
Summary of the dorsal places of articulation:

• Velar – made with the tongue dorsum at the velum
• Uvular – made with the tongue dorsum at the uvula
One place of articulation that is typologically rare, but is nonetheless worth discussing is
pharyngeal, sometimes called radical. The primary articulator here is the tongue root (termed the
radix), or possibly the constriction of the pharyngeal walls (recall that the ‘pharynx’ is actually just a
cavity, and not an articulator on its own). Because of the limited structure and space for articulation in
the pharyngeal cavity (note its smaller size compared to the oral cavity), there is a much smaller range of
consonants made at this place. Because of the limitations in the range of movement of the tongue root,
6
certain pharyngeal sounds are impossible; this includes closing off the pharynx completely, creating a
nasal pharyngeal, etc. The only possible pharyngeal sounds are fricatives (discussed below), and possibly
approximants (though not indicated on the IPA chart, as these are not contrastive with pharyngeal
fricatives – no language uses these sounds to signal a difference in meaning). Despite these limitations,
the actions of the tongue root, however, can contribute toward the articulation of vowels.
Finally, while it is not recognized as a “place” phonologically, it is possible to make consonant
sounds at the glottis. This includes the first sound in ‘hat’ [h], and also the glottal stop, an extremely
common sound cross-linguistically, which is present as the sound between the vowels in ‘uh-oh’ [ʔ]. The
former requires the glottis to be pulled open while airflow rushes through, while the latter requires the
glottis to be closed. Like the pharyngeals, glottals are limited by the constraints that anatomy and
physiology impose on them.
Manner of Articulation
Where in the vocal tract a sound is made is not the same as how it is made. Thus we draw a distinction
between place of articulation and manner of articulation. Place refers to the where, manner refers to
the how. Manner of articulation refers to the type of airflow that is created by articulators. For instance,
[t] and [s] are made in the exact same place in the vocal tract with the exact same articulators: the
tongue blade (active) and the alveolar ridge (passive). Their difference lies in how these sounds are
created by using those articulators. [t] is made by touching the tongue tip to the alveolar ridge and
creating a blockage of airflow (called a stop, or plosive). [s], on the other hand, is created by moving the
tongue tip very close to the alveolar ridge, but not close enough to stop air from flowing. In fact, the
very small space between the tongue tip and the alveolar ridge is enough to shoot airflow through the
narrow channel in a very turbulent manner, creating the high-frequency noise characteristic of this
sound (a fricative).
We can class the different manners into two general batches of sounds: obstruents and
sonorants. These terms themselves don’t refer to a manner of articulation; rather, they are phonological
categories that are based loosely on the degree of stricture used in producing the sound. Thus, while
they are not used phonetically as descriptive labels, they are useful categorizing devices (both practically
speaking, and in terms of how sounds pattern phonologically). Obstruents include stops, fricatives, and
affricates; sonorants include nasals, liquids, approximants, and vowels. We’ll deal with each of these in
turn.
Stops, or plosives, involve a complete blockage of airflow in the vocal tract. English has bilabial
stops [p, b], alveolar stops [t, d], and velar stops [k, g]. Many other languages have palatal stops [c, ɟ],
uvular stops [q, ɢ], etc. Because they create a complete blockage, stops are typically shorter in duration
than many other consonants. Voiced stops are also less common than voiceless stops, for reasons of
aerodynamics. Since a stop closes off the oral tract and doesn’t allow air to flow through it, then
pressure above the glottis and pressure below the glottis will quickly become the same. And since
voicing requires a constant flow of air across the glottis (see the discussion of the myoelastic
aerodynamic theory of phonation above), then voicing will cease shortly after the oral closure is made
(cf. Ohala & Riordan 1979).
A fricative is created in almost the same manner as a stop, but without full closure at the point
of articulation. Instead, there is a space small enough between the articulators to create a turbulent
(aperiodic) stream of airflow. This turbulence is perceived as high frequency noise. Thus, English
fricatives include [f], [θ], [ð], [s], [z], [ʃ] and [ʒ]. English is uncommon in having such a large number of
fricatives in its consonant inventory – most languages have far fewer fricatives.
Affricates are a special type of stop – one with a “noisy” release. Many accounts of affricates
consider them to be phonetically stop+fricative sequences, as the release period of the sound is
characterized by a high-frequency burst similar to fricatives. English has postalveolar affricates in church
and judge, which begin with and end with [ʧ] and [ʤ], respectively. English is relatively deficient in its
inventory of affricates, compared to what is possible in other languages. For instance, Japanese has the
alveolar affricate [ʦ], which is very common cross-linguistically, and Tahltan (Athabaskan, British
Columbia, Canada) has interdental [tθ], alveolar [ʦ], postalveolar [ʧ], a laterally-released affricate [tɬ],
plus aspirated and ejective versions of all of these. This makes 12 affricates total in this language! While
affricates are often conceptualized as stop+fricative sequences, there are differences between these
7
sequences and actual affricates. For example, in English there is a contrast between the stop + fricative
sequence (over the word boundary) ‘white shoes’ vs. the affricate in ‘why choose’.
This concludes the obstruents. Next we explore the sonorants. In contrast to obstruents,
sonorants are sounds that are produced with a relatively loose constriction in the vocal tract. Sonorants
include nasals, approximants (glides, laterals, and rhotics), and sounds with the most open degree of
stricture, vowels.
The velum is a unique structure in the vocal tract in that it is allowed movement – it raises and
lowers, acting like a hinge. The space between the uvula and the pharyngeal wall is called the
velopharyngeal port. When raised, the uvula comes in contact with the posterior pharyngeal wall,
sealing the velopharyngeal port and preventing any air from escaping through the nasal cavity. When
the velum is lowered, this opens the velopharyngeal port, allowing air to freely pass through the nasal
cavity. This movement of air through the nasal cavity creates nasal sounds. If there is a complete
blockage of airflow in the oral tract, then a stop or plosive is created. However, if air is allowed to flow
through the nasal cavity, then a nasal stop is created. Nasal stops are found in English (and nearly all
languages of the world), and include bilabials in the first sound of more [m] and alveolars in the first
sound of nose [n]. There is also a velar nasal in English, though we don’t indicate this in our writing
system: the medial sound in the word hanger [ŋ]. Nasals can be made at other place of articulation, such
as a labiodental [ɱ] in the fast-speech pronunciation of emphatic, or the uvular nasal [ɴ] found in the
Inuit languages. Vowels can also be nasalized, as we will discuss below.
Approximants have a degree of stricture that is very wide. This class includes the palatal and labio-
velar glides [j] and [w] in English, as well as the voiced labial-palatal glide [ɥ] in French. Glides have the
widest degree of stricture for a consonant without crossing over into vowel territory. For instance, [j]
and [i] are articulated in roughly the same way, with [j] having a slightly smaller stricture. If you were to
open the stricture up any further, you would end up with [i]. The same goes for [w] and [u].
The term ‘approximant’ also applies to the phonological class of liquids, which includes the laterals
and the rhotics (generally the ‘l’ and ‘r’ sounds of languages, which come in a variety of types). Laterals
are produced with a constriction that blocks central airflow, and forces the air around the side of the
tongue. This can involve one side, or it can involve both. You can test this in English: articulate an [l] like
in lip and hold it; then suck in a breath of air. Where the tongue feels cold is where the air flows
laterally. For some speakers this will be only one side of the tongue, for others it will be the other, and
yet for others it will be both sides. Native languages of the northwest coast of North America are known
for being very rich in lateral sounds, including lateral approximants like [l], lateral fricatives such as [ɬ],
and lateral affricates such as [t ɬ]. These sounds are articulated as alveolars, but other sounds can be
laterals, too, such as the palatal lateral [ʎ] found in Castilian Spanish, the velar lateral [ʟ], or the retroflex
lateral [ɭ]. In contrast to laterals, rhotics are produced with a centrally-directed airstream. There is a
wide range of rhotics, and they prove extremely difficult to classify in phonetic terms. However,
phonologically the rhotics form a class in that they pattern together across languages. The alveolar
approximant [ɹ] is the primary rhotic in English, though the alveolar flap [ɾ] found word-medially in fast-
speech butter and matted is also a type of rhotic sound that is common cross-linguistically.
Complex Consonants
In addition to the garden-variety sounds discussed above, other types of “complex” consonant are
possible. These sounds are produced with more than one articulation. This includes consonants like the
labial-velar glide in English [w], the first sound in war and witness. This sound is articulated with both
the tongue dorsum, and also with the lips; if you tried to articulate it without either of these
components, you would get a very non-English like sound. There are also doubly-articulated consonants,
such as the labial-velar stops found in some languages of West Africa, such as Yoruba [͡kp, ͡gb], where the
[ ͡ ] over the symbols indicates that they constitute one single sound, and not simply a sequence of [k-
p], [g-b]. In these languages, sounds like [k p] behave as single stops, and not sequences of two stops.
Describing Vowels
Describing vowels is a slightly different endeavor than describing consonants, primarily because there
are no hard-and-fast boundaries that separate one vowel from another. You can test this by articulating
a high front unrounded vowel [i] – the vowel sound in heat, and attempt to move to the low front
8
unrounded vowel [æ] in hat. It’s possible to produce these two vowel sounds, but also all of the vowels
in between.
Generally, vowels are located on a space called the vowel quadrilateral. This space is roughly
equivalent to how vowels are perceived. The vocal tract can be conceptualized as a set of tubes –
one vertical tube (for the pharyngeal cavity), and another horizontal tube (for the oral cavity). These
tubes are manipulated in such a way as to change their shape – and thus, to change the acoustic quality
of the resulting sound. Recall our discussion of filtering above. Different vowel sounds require a slightly
different shape in these tubes. The primary way of changing tube shape is by moving the tongue, and
the acoustic effect of filtering in this way relates to the different resonances created by these different
tubes. These tubes/cavities have natural frequencies at which they resonate (much like a tuning fork),
and they will resonate at some frequencies, but not others. The tubes/cavities thus act as an
amplification device, enhancing certain frequencies which the vocal folds vibrate at, which are termed
formants. Vowels are characterized by 4 formants (labelled F1-F4). F1 and F2 are perhaps most
important for the production/discrimination of a vowel’s quality.
The actual mechanism of changing tube shape (and thus changing the formants) is by moving
the tongue about in the oral tract. We can see in the illustration below how this is achieved: given a
fixed point on the tongue, that point will be highest and most anterior for the vowel [i]; it will be at its
lowest (yet still anterior) point at [a]. If the tongue is retracted, this yields the vowel quality [ɑ], and if it
is raised to its highest point while still retracted, this gives [u]. Notice that [u] is not as high as [i], for the
reason that the tongue is a hydrostat and cannot expand in volume, and also because the tongue is fixed
to the floor of the mouth by the frenulum linguae (the small flap of tissue on the underside of the
tongue, restricting its movements).
i
u
e o
ɔ
ɛ
a ɑ
We can see exactly how the tongue position varies by observing these still photographs from an x- ray
film of the famous phonetician Daniel Jones (Jones 1972). The circles indicate a fixed point on the
tongue, while the x’s indicate a fixed point on the maxilla, which since it is static, provides a non- moving
reference point with which to compare the moving tongue point.
9
It is these articulatory configurations, and ultimately their acoustic/perceptual implementation, which

serve as the basis for the vowel quadrilateral. The vowel quadrilateral schematically places all vowel
sounds as they stand in relation to one another in these terms. The chart below illustrates how the F1
and F2 values of a vowel determine its placement on the chart.
F2
F1
This vowel quadrilateral serves as the basis for the IPA vowel chart. The dimensions that are used for
describing vowels, then, are (i) height, (ii) backness, and (iii) lip rounding. Height on the chart is inversely
correlated with F1 (the higher the F1 value, the lower the vowel), backness is correlated with F2 (the
higher the F2 value, the more front the vowel is), and lip rounding, while a gesture independent of the
tongue, still has an effect of lengthening the horizontal oral tract tube (when you round your lips, you
also protrude them, thus creating a longer “tube”), and thus has a noticeable effect on F3. Here we are
using the term “height”, but on the IPA chart this dimension ranges from “close” to “open”.
Phoneticians will occasionally use “open” and “close” (as on the IPA chart), while phonologists will
primarily use “high” and “low” (as on the chart below); it’s important to keep this difference in mind.
Notice also that the term “backness” is being used here instead of “frontness”. This is a phonological
bias. Of course, we label vowels as “front” or “back” or “central”, depending on their relative F2 values;
however, we will see in later chapters that the primary concept along this dimension is “back”. Observe
how these values work for the vowels of English:
Front Central Back

i u
ɪ ʊ
o
e
ə
ɛ ʌ ɔ
ɜ
æ
ɑ
a
The late Peter Ladefoged has composed the following list of words, where the only sound
that is different from word to word is the vowel. This list illustrates the different monophthong
vowels existent in English (we have arranged them roughly as they appear in the vowel
quadrilateral, and added the acronym for the Housing and Urban Development, HUD as it is known
in the US, to illustrate the central vowels).
[i] heed [u] who’d

10
[ɪ] hid [ʊ] hood
[ɛ] head [ʌ, ə] HUD [ɔ] hawed
[æ] had [ɑ] hod
In addition to these vowel qualities, there also exist what are termed diphthongs, which are
defined as movement from one vowel quality to another within a single vowel sound. Some
languages lack diphthongs entirely, and some have large numbers of them. English has several; the
vowels in (West Coast) American English boil [oi], paid [ei], side [ai] and close [oʊ] are all diphthongs.
Diacritics
In addition to these basic symbols for consonants and vowels, we also have a set of diacritics
available in the IPA. These diacritics play the role of adding a phonetic value to the symbol which
they are modifying. For instance, the IPA does not have a category strictly for dental sounds, so the
method of indicating dentality is to use the symbol for alveolar and to compliment this with a
diacritic meaning ‘dental’. Thus, [t] would be alveolar, while [t̪] would be dental. The same goes for
ejectives, where an alveolar ejective stop would use the symbol for the voiceless alveolar stop [t]
and add the ejective diacritic [’] to yield [t’]. The diacritics have their own sub-chart in the IPA.
Speech Sciences and the IPA

Finally, for those interested or working in the speech sciences (including audiology and speech-
language pathology), the IPA also includes a set of symbols for disordered speech (found at their
website).
Presenting Consonants and Vowels

When presenting the full inventory of sounds in a given language, these should be displayed in
roughly the order that they appear in the IPA chart. Obviously this is not possible for many of the
sounds (such as dentals or ejectives, or any other sound that requires a diacritic to distinguish it
from another existing sound), and so we should use the IPA chart as a rough guideline and arrange
sounds the best we can, but always being sure to label place of articulation, manner of articulation,
voicing, etc., and make clear to the reader what the phonetic/phonological values of these sounds
are. Here are a few examples.
Since we have been dealing with English sounds in order to help give us a reference point, it
may be informative to give the inventory of English. Below is the consonant inventory. Note that
[w] is listed as a bilabial approximant, and also in parentheses as a velar approximant to highlight
the fact that it is phonetically a labio-velar sound.
English consonant inventory

Bilabial Labiodental Alveolar Palatal Postalveolar Velar Glottal
Plosives pb t d kɡ
Nasal m n ŋ
Fricatives f v s z ʃʒ h
Affricates ʧʤ
Approximant w ɹ j (w)
Lateral l
Approximant
To give a slightly different example, here is the phonological inventory of Gitksan, a Tsimshianic
language spoken in northern British Columbia, Canada.
Gitksan consonant inventory

Bilabial Alveolar Palatal Velar Uvular Glottal
Plosive p t k kʷ q ʔ
11
Affricate ʦ
Ejective p’ t’ ʦ’ ɬ’ k’ kʷ’ q’
Fricative s ɬ x xʷ h
Nasal m n
Laryngealized
Nasal
Approximant w l j (w)
Laryngealized w l j (w)
Approximant
Notice here how we have simplified the chart a bit. Each of the laterals in the inventory could have
formed its own category (as they do in the IPA chart); however, having numerous separate categories
for each lateral may not be helpful, and may not represent very well what the phonological
relationships are between sounds. For instance, the lateral fricative [ɬ] participates with [s] in a well-
defined phonological class in the language, and therefore it makes sense to put the two sounds
together under the heading of fricative. Sometimes executive decisions (with justification) like these
must be made by the researcher.
Discussion
This concludes our discussion of the phonetics and description of consonants and vowels for the
moment, but we will return in later chapters to other sounds that are available to English and other
languages. There are lots of texts that can be suggested for further reading, but two important ones
are Pullum & Ladusaw (1996), which is a guide to the usage of the symbols of the IPA, and the actual
handbook of the IPA (International Phonetic Association 1999), which also includes brief phonetic
descriptions of various languages. An electronic version of the IPA chart can be found on the IPA
website at: http://www.langsci.ucl.ac.uk/ipa/IPA_chart_(C)2005.pdf. Instructions for how to use
Unicode IPA fonts can be found at: http://www.phon.ucl.ac.uk/home/wells/ipa-unicode.htm, and an
IPA font can be downloaded from the Summer Institute of Linguistics at:
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=FontsInCyberspace (this should be taken
as a suggestion for locating a good phonetic font, and not necessarily an endorsement of the
organization).
Bibliography
American Anthropological Association. 1916. Phonetic transcription of Indian languages. Report of
Committee of American Anthropological Association. Washington: Smithsonian miscellaneous
collections, vol. 66 no. 6.
International Phonetic Association. 1999. Handbook of the International Phonetic Association: A
guide to the use of the International Phonetic Alphabet. Cambridge: Cambridge University
Press.
Jones, Daniel. 1972. An outline of English phonetics (9th ed.). Cambridge: W. Heffer & Sons Ltd.
Keating, Patricia A. 1991. Coronal places of articulation. In C. Paradis & J-F. Prunet (eds.), The special
status of coronals: Internal and external evidence. Phonetics and phonology, vol. 2. New York:
Academic Press.
Ladefoged, Peter. 2005. Vowels and consonants, 2nd ed. Blackwell.
Ladefoged, Peter. 2006. A course in phonetics, 5th ed. Thomson/Wadsworth Publishers.
Maddieson, Ian. 1984. Patterns of sounds. Cambridge: Cambridge University Press.
Maddieson, Ian. 1989. Linguo-labials. In VICAL 1: Oceanic Languages, Part II: Papers from the Fifth
International Conference on Austronesian Linguistics, Auckland, New Zealand, January 1988, ed.
by R. Harlow & R. Hooper, 349–375. Auckland: Linguistic Society of New Zealand.
Pullum, Geoffrey K. & William A. Ladusaw. 1996. Phonetic symbol guide. Chicago: University of
Chicago Press.

Chapter 2 - Describing Speech Sounds

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2 - Describing Speech Sounds

Uploaded by

Copyright:

Available Formats

1

Chapter 2: Describing Speech Sounds

o Some languages have sounds that English lacks

Anatomy and physiology of speech

Source-Filter model of speech

Summary of Labial places of articulation

Summary of the major coronal places of articulation:

Summary of the dorsal places of articulation:

It is these articulatory configurations, and ultimately their acoustic/perceptual implementation, which

Front Central Back

[i] heed [u] who’d

Speech Sciences and the IPA

Presenting Consonants and Vowels

English consonant inventory

Gitksan consonant inventory

You might also like