Professional Documents
Culture Documents
First, study the following four figures carefully. Don't mind about the physics or technicalities, just look
at them as pieces of art. They show the acoustic/artistic characteristics of the words ice (upper row) and
eyes (lower row) as produced by an American speaker (left column) and a Brazilian speaker (right
column; but it could typically have been by almost any other "foreigner"). How are they different?
What seems to be the main feature for the Native Speaker (NS) to distinguish the two words? The NS
will have no problems at all with NS speech, even in a noisy environment. But on hearing the speech of
a Non-Native Speaker (NNS) the NS will have to put in much more effort to be sure to perceive
correctly ─ even in a noise-free environment. Why?
American Brazilian
ice ice
eyes eyes
1 Adapted from fig. 5 in: de Castro Gomes, M. L. (2013). Understanding the Brazilian way of speaking English. In
Levis, J. & LeVelle, K. (Eds.). Proceedings of the 4th Pronunciation in Second Language Learning and Teaching
Conference, Aug. 2012. (pp. 279-289). Ames, IA: Iowa State University. Available from
http://jlevis.public.iastate.edu/pslltconference/4th%20Proceedings/Castro%20Gomes%20PSLLT%202012.pdf
2 Use the fantastic Praat software, freely downloadable from http://www.fon.hum.uva.nl/praat/
Kjellin (2017): Phonetic Details of a Foreign Accent. Most recent typo corrections 2020-01-21 kl 16:01 2/10
ice (again)
Notice how the amplitude of the oscillogram
rapidly decreases as the word is about to go from
vowel to consonant (from shaded to unshaded area),
and how this transition actually begins well before
the [s] itself. And even continues a little bit into it.
The whole trough in the oscillogram is the region of
articulatory overlap between the [i] and the [s];
coarticulation.
The formants are acoustical phenomena that occur thanks to resonances in the speech tube. Take a half-
empty bottle, blow air across its neck, and you will get a tone. That's its resonance tone. Empty the bottle,
blow again, and you will get another tone. Higher or lower? It is lower, because there is more air to agitate.
But it is still the same bottle. The resonance frequency depends on the amount of air, or, more exactly, the
mass of that air, not on the bottle itself. A smaller mass vibrates faster than a bigger mass. Fast vibration =
high frequency, which we will hear as a high tone. And vice versa. If you set up an array of bottles with
carefully chosen masses of air in them, you can play Beethoven or Mozart, or whatever, when you agitate
those resonance frequencies by hitting the bottles with a stick or blowing air across their necks. Because
you have very cunnily planted some mutually suitable formants into them and organised them in an easily
playable array! You are a genius.
The human speech tube too is an ingenious apparatus
with a number of such "bottles" coupled in series, and
they have soft walls so you can continuously vary their
volumes (read: vary the masses of air contained in them)
on the fly, in order to get the mutually suitable formants
of your choice at any instant. You agitate those air
masses by literally blowing aross their common neck,
which happens to be the larynx in your own neck. The
result is several resonance sounds at the same time.
Several variable resonance sounds at the same time.
They are the vowel formants, and thanks to smart
technology they show up as black bands in various
combinations in the spectrogram. Each particular
combination is unique for each particular vowel. Each
vowel spectrum is the unique key to the unique keyhole
in your ear. If it fits, you will perceive the vowel.
| a | ɪ |
Roughly speaking, you have one such "bottle" in your pharynx (the cavity behind your tongue), another bottle in
the oral cavity above your tongue, another bottle between your lips if you round them forwardly as for an oo, yet
another bottle under your tongue blade if you were to raise it, such as for an American vocalic r sound as in her,
and yet another bottle in your nasal cavity. Again roughly speaking, your pharyngeal bottle is the source for the
first formant, F1, which is also the lowest formant; and the oral cavity is the source for the second formant, F2,
the next lowest formant. The other cavities make the higher formants. The nasal formant is special, because this
cavity is coupled in parallel instead of in series. F1 and F2 are the main components for vowel keyholes.
F1 indicates mouth opening and is highest in open vowels like [a], and lowest in closed vowels like [i, u].
F2 mainly reacts on tongue position and is highest in front vowels like [i], and lowest in back vowels like [u].
This is not the whole truth about speech acoustics, but it is a good-enough approximation.
Kjellin (2017): Phonetic Details of a Foreign Accent. Most recent typo corrections 2020-01-21 kl 16:01 6/10
Opera singers have to practise very hard to be able to precisely adjust their loudest formants separately
depending on each note they sing, so that no formant note will conflict with the song note (f0) and ruin
the aria. In so-called throat-singing, such as for example khöömei of the Mongolian4 and Tuvan5
tradition, they actually sing with the formants instead, while keeping the f0 steady most of the time, as a
drone. Listen and enjoy!
We looked at the F1, the lowest black band in the
spectrogram. The F2, is the next lowest formant. It's
a bit trickier to follow in this case, but notice the
band just above F1 in the early [a] part, suddenly
weakening and stepping upwards to end far
separated from the F1 towards the end of the [ɪ]
part. Those are the main features of their "keys":
F1-F2 are close together in [a], as close as possible
and a little bit up from the bottom. And they are
widely separated in [ɪ], as widely as possible, F1
low and F2 high. These are the acoustic results of
quite wide open jaws in [a] moving towards a little
smile in [ɪ], and the tongue moving from back to
front.
| a | ɪ |
In the case of moo, The F1 and F2 are again very close to each other, but as far down towards the
bottom as they can. When you perceive some "low" tone in your moo and some "high" tone in mee, it
is in fact the second formant, F2, that you hear. If you whisper the vowels, the F2 will dominate the
sound even more. Listen! Whispering is a very handy and helpful trick for all language teachers in the
world! Stay alert for more about this later.
4 https://www.youtube.com/watch?v=hV8EJOvvPvY
5 https://www.youtube.com/watch?v=UCsXbNvlWXA
Kjellin (2017): Phonetic Details of a Foreign Accent. Most recent typo corrections 2020-01-21 kl 16:01 7/10
Whistled language of the island of La Gomera (Canary Islands), the Silbo Gomero
https://www.youtube.com/watch?v=PgEmSb0cKBg
Kjellin (2017): Phonetic Details of a Foreign Accent. Most recent typo corrections 2020-01-21 kl 16:01 9/10