You are on page 1of 26

© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 1/26

May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

Quality Practise Pronunciation With Audacity – The Best Method!


A tutorial by Olle Kjellin,1,2 MD, PhD

A re you learning a new language? Do you, like me, have the ambition to learn it well, even to sound as
"native" as possible, or at least to have a listener-friendly pronunciation that will not embarrass yourself or
annoy the native language speakers? This paper will show you and explain why and how it is possible to achieve
that, even if you are not a child (children are allegedly fast learners, but many of us regard that as an urban
legend). In these 26 pages with its 35 illustrations you will learn how to:
• Produce perfect pronunciation exercises with your favourite sentences for free.
• Practice in the way that will give you the best result, for example perfect pronunciation, if you wish.
This the ultimate tutorial for the first few days to weeks of learning a new language with little or no “accent”.

1 Introduction
There is as yet, to my knowledge, no freely or commercially available pronunciation practice material that is "best" for
my purpose. So I produce my own material, and so could you. It is easy with Audacity, which is a very powerful free
software for recording and editing sounds. It takes time, to be sure, but this is time well spent that not only yields really
good results for my pronunciation exercises, but it also makes learning faster. This tutorial will show both how to utilize
Audacity and how best to perform the exercises, and why it works, according to my knowledge and experience.
Hopefully, it will suit you too.
There are many commercially available language courses, and I have several of them. For instance, Pimsleur and
Rosetta Stone produce really good courses. But still I want to add my own modifications to make them even better, or
supplement them with other material that I make myself according to the guidelines in this tutorial. As you will soon
see, I practice without any text in the beginning, in accordance with all recommendations based on research as well as
on my own experience: Avoid written materials as long as possible in the beginning! Because most writing systems
do not represent the pronunciation well enough, but rather confuse the learner and leads to faulty, "broken"
pronunciation. If you still do want written support, you should learn the IPA (International Phonetic Alphabet
http://en.wikipedia.org/wiki/International_Phonetic_Alphabet) and try the Glossika method
(http://www.glossika.com/).3
For serious learners of English, Richard Cauldwell's Speech in Action http://www.speechinaction.com/ is an
unsurpassed source, a must.
If you don't want to pay for CDs or online courses, there are some quite good free materials available, too. If you have
native-speaking friends to record, do so. Else, I can recommend book2, also called 50Languages, from Goethe-Verlag
http://www.goethe-verlag.com/book2/. I often download their sound files and modify them as below for my own
language studies. Here is a tutorial (in Swedish and English) for the best ways of utilising the 50Languages audio
courses; https://bit.ly/50Lang-metodik
As a competitor or complement to my Audacity method, please do try this fantastic, incredible app (for Android):
Ear2Memory. http://ear2memory.com It has some learning curve, but that's easily overcome and the user will be
greatly rewarded!
Two other recently developed apps that seem very good and very much compatible with my method:
Chorusing https://www.chorusing.com/ (Free)
WorkAudioBook http://www.workaudiobook.com/ (Limited free)
However, beware of all the amateurish materials that are abundant on the Internet. Most of them are too bad! Some even
incorrect, presented by self-appointed “teachers” seemingly with no clue of what they are doing.

1 This author is a language nerd having tried to learn many foreign languages and taught Swedish to foreigners and to Swedish teachers
intermittently since 1970. Furthermore, I am a linguist and phonetician with a medical Ph.D. in Speech Physiology. In addition, I am also a
medical radiologist subspecialized in the anatomy and functions of the speech and swallowing organs. And, because of my interest in the
neurology of learning, forgetting and communication, I also worked for 6 years in a memory and dementia clinic. See further section 21, p. 25
2 New in this version (2,7): Minor changes.
3 If you are reading the PDF version electronically, the links and cross-references should be clickable.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 2/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

This paper was born from my Swedish cookery-book tutorial on Audacity, originally written as a handout to participants
in my own pronunciation classes. That handout had only a brief description of the theory behind the practice method at
the end, but many readers wanted more of the theory, and they wanted it from the start, so here it is! Also, as a
testimonial of the successfulness of the Quality Repetition method, I might mention that many of my pron-class
participants thought my courses were too short, regardless of their education levels (even many MDs or other
academics who were unsatisfied with their Swedish pronunciation), and despite the fact that the courses were one whole
week long, about 35 hours mainly consisting of intensive chorus and individual practice on just 12 representative
sentences, such as how to say the participants' own street addresses! And all of them were very angry that they were not
given this chance to pronounce correctly from their first week of studying Swedish. So this is a method that should be
adopted and adhered to during the very first couple of weeks of learning a new language. Maybe 2-8 weeks, depending
on your previous knowledge and the difficulty of the language in question. After that period you will be mastering the
language like a native 3-6-year-old toddler and can as quickly move on to attain a very high level of the language(s) you
are learning. Why? Because native infants too begin by acquiring the pronunciation without having to be confused by
reading or consulting grammar books. Obviously that's the best “method”, as they all typically will end up speaking as -
ahem - native speakers. :-)

2 How to Practice Pronunciation the Best – in Theory


My method of practising pronunciation is very effective while, at the same time, being embarrassingly simple and not
novel at all. It's all about simply repeating many, many, many times. Deliberate, tenacious practice. Purposeful,
persistent practice. Actually this is the classical method for learning anything that you want to hone your skills in, such
as in sports, arts, hunting, playing instruments or computer games, dancing, typing, operating on brains, reading x-rays,
writing calligraphy, doing flower arrangement, cooking fugu, or whatever skill you want to acquire. It's neither a new
nor a unique method at all, but rather self-evident for elite performers in all those areas, so it is doubtful if I could call it
"my" method at all. But sadly, deliberate practice has been out of fashion in language pedagogy4 for decades! It has
been scorned at as "skinnerism" or whatever. This is a very unfortunate situation, and I want to turn it back to normal
again. Deliberate, persistent practice in a special way to be described below and termed Quality Repetition is "my
method"5, or, in fact, everybody's method known since prehistoric times. It is effective because it is based on
neurophysiology. Toddlers do it all the time in their own, innate, smart way when they acquire their first language. (Or
languages; because there may be more than one. There actually are no physiological, only practical limits to the number
of languages6 that can be acquired in parallel. Only time is the limit.) Adults are well advised to peek at toddler's
"methods" and adapt them to their own capabilities and constraints. This paper will show you how to do that.
Teachers usually teach the alphabet and the grammar well and carefully, but seldom pronunciation to a sufficient
degree. Many of them even think it is wasted time to practice pronunciation with adult learners, on the (false) belief that
they will never succeed anyway. Particularly not with the prosody (rhythm and intonation of speech), which often is
alleged to be "the most difficult" thing to learn in a new language, although it is arguably the most important thing to
learn if you want to get a listener-friendly pronunciation with a good communicative function.
Is it really true, then, that it is so incredibly difficult to learn L2 (second-language) pronunciation, and that the prosody
is particularly difficult? – No, on the contrary! Not only is it very possible to learn very excellent L2 pronunciation
equal to, or not very different from, native pronunciation, but the prosody even is the easiest part of it! This claim of
mine is based on my long-time experience as a notorious language learner and teacher coupled with my medical
training focused on the physiology of the voice and speech organs and of the brain and neuromuscular systems in
learning and forgetting. There is plenty of scientific evidence to back this up (though mainly in the medical literature);
see the selected bibliography in section 23 on page 25.
It has become more and more recognized among language teachers in the recent decades that the speech prosody is the
overwhelmingly and undeniably most important factor for reaching a native, near-native, or at least listener-friendly,
pronunciation. The prosody is for speech what the carrier wave is for a radio transmission. The "program" is
superimposed on the carrier wave, and the wave as such should not normally be perceived consciously. Therein lies
another great potential and important function of prosody: Namely, by suddenly varying the pitch, loudness or length of
sounds and words in unexpected ways, i.e., by adding emphasis, the speaker can choose to bring prosody up to a
conscious and conspicuous level and attract the listener's attention to the paralinguistic contents of the message. This
corresponds to italics, boldface, underscore, exclamation marks etc. of writing.
The prosody of any language typically consists of less than ten or so rules based on only three fundamental elements
that every (=indeed every!) language uses in its own specific prosody, viz. (1) voice pitch, (2) voice loudness, and (3)
length of sounds. These three mechanisms are well developed from the moment of birth (listen to a baby!), they work in
4 And mathematics, too, I have been told.
5 This nice term, though, was coined by Judy B. Gilbert. See further below.
6 … or musical instruments, or branches of athletics, or jobs, or indeed whatever.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 3/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

the same way for all human beings, and actually prosody is the very first aspect of the language that children
master; usually before the age of two. Prosody is even regarded as the “handle” for further language acquisition, and
children born with impairments of prosodic perception (called dysprosodia) may not develop any normal language at
all.
There are only partial differences in the details of how these three fundamental elements are controlled and utilized in
different languages, in varying proportions of importance per each specific language. For example, what may be called
"stress" or prominence is often signalled by certain aspects of pitch and/or loudness in the stressed syllable (as in
Spanish, Hungarian or Finnish), or on the pre-stress syllable (as quite often in Polish, Russian and maybe French), often
accompanied by a slight lengthening of the stressed syllable (as in Russian and Spanish), or a significant lengthening
(as in French and English), or signalled almost only by the length (as in Swedish), whereas length happens to have
nothing at all to do with stress in some other languages (such as Finnish, Hungarian, Czech, Yakut). And some
languages don't even use "stress" at all, but have other means of prosodic signalling (such as Japanese, Somali, and
maybe French). Pitch is used to signal the morphological structure of words in some languages (such as Swedish,
Japanese and Tibetan), or to signal lexical identity of words in other languages (such as all Chinese languages, Thai,
Vietnamese, and many African and Native American languages; so-called tone languages).
Common to all these uses still is that, regardless of language, they always involve the very same three fundamental
elements ─ pitch, loudness and length ─ to signal all those lexical, grammatical, emotional and other characteristics
involved in the spoken conversation. Also, in every culture there are songs, and songs too consist of notes in sequences
of varying pitches, loudnesses and lengths. So indeed, each one of us above toddler age already masters all the prosodic
requirements to be used in any other language; we just have to learn how to tweak our existing skills for the specific
needs of the new, particular language we are learning. And please do carefully note: All prosodic uses of pitch, loudness
and length appear in each and every utterance regardless of its contents, so it is a very, very good and time-efficient idea
to concentrate mainly on the prosody from the very outset of learning a new language. Don't care too much about the
particulars of vowels and consonants until you feel confident with the prosody. This happens to be exactly how normal
children are learning their first (“native”) language(s), and they succeed perfectly; by definition. And mind you, on their
way to perfection, they will prioritize the prosody and typically “cheat” in the beginning with the most difficult
consonants and vowels as long as the correct prosody is preserved. So, why not do the same as adults? Do cheat with
the particulars of single sounds until you feel mature enough to master them, but never ever cheat with the prosody!
In contrast to the surprisingly small number of prosodic details to learn, there are typically some 30-40 vowel and
consonant sounds (some languages have less, some have more, some have considerably many more), but all of them
don't appear every time, not in every utterance. So they are indeed of less importance than the prosody, at least in the
beginning. You can see proof of that in children's first-language acquisition, as already hinted above. By the time the
toddlers can say 25-30 words in their emerging language, their prosody is already identical with that of their adults.
Nevertheless, it will usually take some 5-6 years or even more, before they can master all the vowels and consonants.
Despite this, they are never perceived of as having any “foreign accent” at all – thanks to their correct prosody!
Therefore, I always practice prosody first and foremost, even if my tongue will stumble on some individual vowels and
consonants that may pop up every now and then.
But, how to do it then, if you haven't got a teacher to help you? The answer is in this tutorial. Produce your own
materials for pronunciation exercises and heed my advice here! Read more about the methodology and its
neurophysiological foundations in the next few sections. Finally read the cookery-book instructions for the use of
Audacity in the second half of this article, from section 9 and on.
And if you are a language teacher, study this tutorial carefully and apply the methods in your classroom. Your students
will be very satisfied. And do not worry, they will not think the many repetitions make it boring, because, for them,
every new repetition is a better version than all the previous ones, and that feeling of success will have a tremendous
and even addictive effect on the brain's reward centres.
You may want to read more about this in my Facebook group https://www.facebook.com/groups/best.pronunciation
where you can find several of my essays on various aspects of the method, and also testimonials by “victims” who
actually tried it out.

3 Neurophysiology of Speech Perception, Production and Learning

The short version: In the way to be described below I will train my ears with the correct speech rhythm and
melody according to the model and saturate my brain's primary hearing centres as well as its hearing
perception centres without speaking anything myself. I should not torture my ears to hear myself speak with
a faulty accent (as I would do in the beginning, if I didn't saturate my ears first). Subconsciously and
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 4/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

gradually, by shadowing, mirroring and imitation, I will train and automatise my mirror neurons (imitation
neurons), which are then used to guide my speech muscles to my own pronunciation when, eventually, I can
start saying the practice phrases (calibration sentences) myself without help. In this process my brain will
actually be physically changed due to its plasticity. This is learning on the neuroanatomical scale. My brain
will connect and match the sounds that I hear with the sounds that I can make and the sounds that I should
make. This is a kind of pattern recognition process. Therefore, I should not trouble my speech muscles to
learn first to speak with a funny pronunciation (as, again, I would do in the beginning, if I didn't saturate my
ears first). Instead, I will first make the (correct) model utterances resound as an audio template like a din in
my head, and that will direct my speech muscles accordingly. It will then even be difficult for me to
pronounce much differently from the model. (Incidentally, this is also how our native, first-language speech
is mirrored, acquired, controlled and monitored, the speech muscles then being guided by internally
“hearing” and predicting how the result would and should sound for a given articulation. More than 50,000
years of human language evolution cannot be wrong.)
In conclusion: I will practice pronunciation with my ears and let automated nerve reflexes do the rest. I will
then have created an “audio-motor procedural memory” for the target language, with a result as native-like
as I have the time and motivation to aspire for.

The longish version:


Neurophysiology is the science of how the nervous system functions in normal life. The nervous system is deeply
involved in everything we do and experience and thus is of utmost importance. To really be able to appreciate the best,
neurophysiology-based method of learning pronunciation, as presented here, it is a good idea to spend enough time on
this chapter, in order to understand some of the neurophysiological processes involved.
The nervous system works in two principal systems/directions: (1) the sensory (afferent) system sends signals into the
brain, monitoring our body, its inside, outside and surroundings, with the senses of hearing, vision, taste, smell, and
feelings of touch, pain, temperature, vibration, balance, position, hunger, thirst, and more. (Yes, we do have many more
than five senses!) And (2) the motor (efferent) system plans and composes chains of muscular activities and sends
signals from the brain out, to execute those compositions, leading to finely tuned, monitored and controlled sequences
of movements such as walking, typing, speaking, maintaining equilibrium, playing tennis, etc. Literally everything we
do. The sensory and the motor systems constantly work together via innumerable feedback pathways. In our exercises
with the pronunciation of a new language, we want to optimise those feedback pathways and make the two systems
cooperate and be synchronised on our own conditions. To this end, four important neurophysiological components
will naturally, conveniently, and rather effortlessly, come to our assistance:

A) Hearing
The primary hearing centres are neuronal arrays situated in the temporal lobes, bilaterally (both sides), also called the
auditory cortex. They belong mainly to the sensory system. The auditory nerves from both ears are connected to the
brain stem, and then relayed in a series of neurons (nerve cells) to the primary hearing areas and also to other places,
bilaterally. About 60% of the nerve fibres from one ear cross over to the other side (i.e., from the left ear to the right
temporal lobe, and vice versa), while some 40% remain on the same side. Many of the crossed pathways cross back
again after relaying its signals to various other locations and reflex circuits, for example for directional hearing and
head-turning reflexes towards a sound. See schematic picture in http://bit.ly/auditory-path-2. A useful reflex is the
Lombard reflex that causes me subconsciously and irresistably to speak louder in a loud environment. Replaying my
material loudly thus is a good idea: It will activate my speech organs stronger than playing softly. The auditory system
is replete with reflexes of various kinds.
Speech may seem to be a sequence of distinct words, each made up of distinct sounds. In reality, however, speech is a
continuous stream of interwoven sounds. Sounds are caused by pressure variations in a medium (for example air, water
or human tissues). The words from a speaker as well as all other natural sounds and noises from around us are an
extremely complex mixture of regular and irregular waves (vibrations) and noises in air travelling to our eardrums. The
eardrums are co-moved, transferring the pressure variations to a chain of three small bones in the middle ear, the
ossicles. Pure physics. An intricate mechanism amplifies the air waves in the middle ear while transforming them from
air to water waves in the inner ear via the oval window, and then actually zooms in on speech-relevant vibrations
(particularly those pertaining to the speaker's speech rhythm), synchronizes with them, performs a basic sorting of
relevant sounds, filters out non-speech sounds, and converts all into electrochemical signals in the neurons leading to
the brain, where they are further sorted into higher-order categories of many kinds (phonetical, phonological,
morphological, syntactical, lexical, semantical, non-speech, etc.), by which they can be identified and hopefully
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 5/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

correctly comprehended in their particular context.7 You may be surprised to learn that the auditory nerve actually
contains many more efferent (motor) fibres than afferent (sensory) ones! However, the sorting and filtering
mechanisms of the inner ear is dependent on this arrangement.
The primary hearing centres register the physical characteristics of the incoming signals, such as pitch, loudness and
length, and map them tonotopically along the cerebral cortex. Tonotopical mapping means that neurons are ordered with
those for low pitch at the anterior end and high pitch at the posterior, much like the keys on a piano, in a simple,
straightforward array. This is nicely illustrated in http://bit.ly/tonotopic. Corresponding "periodotopic" mapping
probably exists for the temporal (timing) aspects of sounds too.
From the primary auditory cortex, signals are then relayed on to higher-order hearing-perception and comprehension
centres (see B, below), and to mirror neurons (see D, below). The pathways to mirror neurons are the shorter and faster,
which has important implications for our practice method. Presumably, the efferent fibres to the inner ear mentioned
above are connected with the mirror neurons.
We hear and perceive our own speech in three different ways. One is by air-ossicle conduction: the sounds from our
mouth go around the cheeks into the ears and are converted to nerve signals via the eardrums and ossicles etc. as above.
The second way is by bone conduction: These waves travel directly through the soft tissues and bone into the inner ear.
This is louder and much faster than air conduction, not only because the route is so much shorter and even bypasses the
eardrum and middle ear, but also because the waves travel more than four times faster in water and solids than in air. So
we really can't know how our own air-conducted speech sounds until we listen to a recording. Some people don't like to
hear themselves off a recording, but that is how we "really" sound to other people, like it or not. The bone conduction
pathway enables a very fast route for auditory feedback, which is very important for the pre- and subconscious
monitoring of what we are saying while we talk. (As for articulatory feedback, there are also feedback loops through
proprioception, i.e., senses of muscular and joint positions and movements. However, although not unimportant, the
proprioceptive routes in general are too slow for real-time feedback. The auditory nerve is short, thick and fast. The
proprioceptive nerves are long, thin and slow.)
The third way of perceiving our own words is psychological: We "know" what we said or ought to have said, because
we wanted to say it. It is usually correct, but occasionally it happens that the mouth said it wrong, or even another word.
In most cases we can correct ourselves immediately, but at times it happens that the mistake goes undetected, and we
can swear on being correct even when we are not. Only a recording can reveal the truth then. Incidentally, this can also
happen when perceiving another person's speech. We may hear only what we expected to hear, or what we could
comprehend, and we can honestly swear on being correct even when we are not. Again, only a recording can solve the
issue. There is never ever any point in arguing about what somebody did or did not say.

B) Perception
The brain's hearing perception "centres" also belong to the sensory system and are responsible for how we understand
speech and language. They are vast, complex, intertwined systems of nerve circuits and networks mainly distributed in
the parietal lobes and around the angles between the temporal and parietal lobes (Wernicke's area). These centres,
circuits and networks continuously exchange information with one another as well as with the primary auditory centres
in the temporal lobes as well as with the mirror neurons mainly in the frontal lobes (see below) across both the right and
the left brain, and to innumerable other networks that, all taken together, represent functions for speech, language,
memory, emotions, etc.
Please note: No brain is “half”, even if they are called hemispheres. The brains are a paired organ, just like the eyes,
ears, kidneys, lungs, hands, feet, etc., none of which ever are “halves”. And like all the other paired organs, both brains
can perform the same actions simultaneously. However, in some special cases conflicts might arise if both brains
competed about what to do. So with thick bundles of extra fibres between them (the corpus callosum), the right and left
brain will communicate, negotiate, and decide which side is to do what and how much. As a result one side may become
well trained (dominant) and the other side dominated, "ring-rusty" or even inhibited, but nevertheless always prepared
to jump in and substitute if the dominant side should falter. Making one side dominant is called lateralization.
Don't ever believe in the urban myths about right-left brain separation of tasks. Some of them may contain a little truth
to a certain extent, but not in the way they are presented by non-experts in the media. The differences of lateralization
and dominance often are in only a few percent of the total, bilateral activity. Speech and language are such highly
specialized and finely trained functions, that the non-dominant brain is less prepared for these functions ─ but not
unable to jump in and substitute, for example after an injury. In the majority of people the left brain dominates for many
aspects of language, while the right brain usually is dominant for the prosodic factors. However, both the right and the
left brain do indeed cooperate closely and extensively all the time, even in language and speech.
7 People with hearing impairment using hearing aids or cochlear implants can't sort out sounds like that in their inner ear. A noisy environment or
several speakers talking simultaneously will impose great difficulties on them.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 6/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

Eating and drinking utilizes the same anatomical structures in the face, mouth and pharynx as speech, but the
controlling neural networks are different from speech, and left-right dominance is random at about 50:50 percent. So, in
cases of a unilateral stroke, about half of the patients get swallowing problems, depending on whether the stroke is on
the swallowing-dominant side or not. However, in most of these cases the swallowing functions return more or less
completely in about 3-4 months. This is not thanks to any healing of dead neurons but due to the brain's plasticity (see
next section), by which the intact, contralateral, previously non-dominant brain, slowly re-learns how to control
swallowing. Similarly for all other lost functions when they resolve after some time. To some extent this is a
spontantaneous process, but usually very intensive and extensive rehabilitation activities are needed to alert, coach and
exercise the substituting brain.

C) Plasticity (Neuroplasticity)
Learning and getting results of training is only possible thanks to the plasticity of the brains. This means their ability
to adapt, reorganize connections, change, and even grow anatomically, in response to incoming stimuli and identified
needs, in effect relocating functions between the right/left brain pair as well as within each brain separately. This is one
of the most fascinating functions of the brains. It happens very fast, and it occurs in both the sensory and the motor
system. And it is not necessary to have had a stroke to induce plasticity; it is a normal function of all brains at all ages!
A connection between two neurons is called a synapse. Plasticity primarily affects the number of synapses. On an
average, each neuron has input synapses from about 10,000 other neurons and constantly receives various signals from
all of them, some excitatory, some inhibitory.
When a neuron has accumulated enough signals of a certain kind that it is specialized for, it will "fire", in its turn
sending a signal on to its output synapses with, again, some 10,000 other nerve connections. One adult pair of brains
has about 100 billion (100,000,000,000) neurons. Multiply these three factors and find that this is indeed a huge
network of some ten billion billion (=10,000,000,000,000,000,000) connections (synapses). In comparison, the World
Wide Web is a very tiny network. According to the available Internet statistics from today, March 11, 2023, there are
only about 5.28 billion users in the world right now (equating, for the comparison, one internet-connected user with one
brain synapse). (https://de.statista.com/themen/42/internet/#editorsPicks)
At birth, we are even bestowed with some 200 billion neurons, but with only rather few synapses. However, in response
to all and any incoming stimuli and physical activities of the child, zillions of new synapses are formed each minute and
connect all the involved neurons. To accommodate all the new synapses, the neurons form extensive systems of
branches and twigs, in a process called arborization. It is therefore very important to present as many different stimuli as
possible to a child from birth to adulthood, to promote arborization and synapse formation. The more modalities of
different kinds that are involved and coupled (eyes, ears, hands, body movements, right side, left side, etc.) in motor
pattern formation, the better and the more robust will the skills and long-term memories be. "Neurons that fire together,
wire together" (Hebb's principle). This too has pedagogical implications, because the same applies to all ages. For
example, also adult learners should practice prosody complete with appropriate body language.
Unused neurons are weeded out or made dormant. For instance, surprisingly, a newborn baby has neural pathways from
the primary auditory centres in the temporal lobes to the visual centres in the occipital lobes, but since such pathways
are generally not needed, they will shrink and almost disappear. Unless the child is blind, of course, in which case these
neurons are kept active, and retrained, to serve the visual centres, which would otherwise have been unemployed, but
now will be used for auditory tasks instead. That is an example of neuroplasticity. However, even blindness acquired in
adulthood will induce similar activation of the visual centres by auditory input. That is so impressive! However, every
instance of normal learning of anything at all at any age at all is accomplished through these same neuroplasticity
mechanisms, and they work perfectly throughout our entire lifetime! This is very encouraging news. There is thus no a
priori reason to give up learning anything, not even a second language, after puberty or any other mythological age
limit, although usually the earlier the better, when possible. The neuronal molecules don't really know the owner's age.
In response to a new stimulus it takes only seconds for small "knobs" (dendritic spines) to form on the branches of
neurons. This time-lapse video of knob formation https://www.youtube.com/watch?v=s9-fNLs-arc illustrates learning
on the scale of branches of a single neuron! If the stimulus is not repeated, the new knobs will disappear. If the stimulus
is repeated sufficiently many times, the knobs will develop further and form permanent synapses and wire together all
neurons that happened to be involved in that task, for instance the pronunciation of a new speech sound, or a whole
sentence with correct rhythm and intonation pattern with concomitant body movements and gestures ─ and grammatical
structure! The results are long-term memories. Such wired-together networks may be re-used in total or in parts in the
formation of yet other networks, and hence assist in recall, cueing, and mental associations of all kinds. All this is the
neurophysiological rationale for multi-modal multiple repetitions in any learning process. The bad news is that there
is no shortcut to learning and long-term memory, only repetitive work. Deliberate, persistent, repetitive practice.
See this amazing video by Harvard Professor Jeff Lichtman on the abundance of criss-crossing neurons and synaptic
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 7/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

connections here (starting at 21:40): https://youtu.be/2QVy0n_rdBI?t=1299


Ever since we start speaking as toddlers and throughout all our lives, every time we say anything at all, every utterance
will serve as an instance of practice that will form new synapses and thus further consolidate and reinforce our speech
habits as represented in our mirror neurons and elsewhere. And so we will all become super experts in all the procedures
involved in hearing and speaking our first language(s). The robustness of this procedural memory and other long-term
memory in general is a linear effect of the number of repetitions. It is statistical learning.
Thus, procedural memories for skilled actions form like paths in a lawn: They emerge wherever you tread frequently
enough, nowhere else, and never independently of older paths, i.e., prior knowledge. But fortunately, there is no best-
before date for neuroplasticity. As we grow older, we will, in many cases (but not all, depending on the type of task),
need more repetitions per item to learn it and automatize it than at younger ages. That is the only age effect. And there is
no neurophysiological difference between language learning and any other types of motor learning. So forget the
disheartening myths about age and language learning, at least as concerns pronunciation. (It may be true for grammar,
which usually is more complicated.) Just repeat a larger number of times if you are "older." And be sure to make it right
from the beginning, to avoid arborization and synapse formation for unwanted pronunciation. Because wrong
pronunciation too will induce all these plasticity processes in the same way and end up stored as unwanted, faulty
motor "skills" in your long-term, procedural memory. You don't really aspire for that. Fossilization in second language
users (i.e. a petrified foreign accent in spite of many years' use of the new language) is more due to faulty instruction
and insufficient training during the beginning few weeks of learning a new language than to any biological constraints,
and thus is preventable (if you do want to prevent it). Due to the time handicap of adult learners there is little chance for
us ever to catch up with a native speaker in every respect, but it is indeed perfectly feasible to sound like a native
speaker in the limited number of sentences we are able to say. This is particularly true of the prosody of the language,
the easiest part to master, because you already master the three fundamental elements of it: the pitch, the loudness, and
the length of sounds, as outlined above. Surprised? No need to be. This is natural, unavoidable neurophysiology.
In experimental conditions it has been found that automating a new (simple) motor skill takes up to about 15 minutes of
repetitions. Can you practice the same sentence for 15 minutes? It seems like a good idea to do so. However, depending
on the difficulty of the task and your previous experience with similar skills, of course, it may take longer or shorter
time than that to learn a new motor pattern. For example, the 15 click consonants in Zulu are quite a challenge for most
non-South African speakers, but presumably easy-peasy for Xhosa speakers (who have 21 click consonants). When,
however, you can say 20-30 sentences in a native or near-native way in your new language, after hours of deliberate,
persistent practice on only them, you will also, automatically, be able to say 20-30 million other sentences with the
same, excellent pronunciation. Because they all follow exactly the same rules of prosody and pronunciation. So part of
the trick for the adult language learner is to have a very limited curriculum of such “calibration sentences” for the
initial pronunciation training period to really make it possible really to learn them completely and perfectly.

D) Mirror neurons
Our pair of brains contains numerous mirror neurons, also called imitation neurons. Discovered only in the late 20th
century, their functions are highly relevant for language learning and acquisition, and this may be the most fascinating
area of recent research in neuroscience. The human mirror-neuron system is involved in understanding others’ actions
and their intentions behind them, and it underlies mechanisms of observational learning. Research on the mechanism
involved in learning by imitation has shown that there is a strong activation of the mirror-neuron system during new
motor pattern formation. It has been suggested that the mirror-neuron system is the basic mechanism from which
language developed. Some functional deficits typical of autism spectrum disorder, such as deficits in imitation,
emotional empathy, and attributing intentions to others, seem to have a clear counterpart in malfunctions of the mirror-
neuron system.
Surprisingly, the mirror neurons belong to the motor system! They are motor neurons primarily involved in finely tuned
muscular actions, movements and procedures that we can perform. But secondarily, they are also recruited when we
observe other people perform similar actions and procedures with which we ourselves already have prior experience
and interest. In essence, mirror neurons are a kind of action and pattern recognition mechanism essential for the
perception and appreciation of what other people are doing, saying, or intending. Therefore, the mirror neurons are also
crucially involved when we want to shadow, mirror and imitate what others do or say, such as the teacher in a language
class. Our ability of, and agility in, such action recognition, mirroring and imitation depends heavily on these mirror
neurons' prior experience of the same sort, and to some extent to our motivation and desire in perceiving the signals.
Learning motor skills is the result of inducing the formation of new mirror-neuron networks by plasticity processes. As
simple as that. The amount of mirror activation correlates with the degree of our motor skill for that action. Experiments
have shown an increase in mirror activation over time in people who underwent a period of motor training in which
they became skilful. It works after brain injuries too; data on plasticity induced by motor observation provide a
conceptual basis for application of action-observation protocols in stroke rehabilitation.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 8/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

Since we all as adults already have ample experience and skills in speaking at all (our first language), our mirror
neurons are ready to recognize, mirror and imitate the new language almost directly (after due listening practice, as
above; otherwise not). This is in stark contrast to pre-linguistic toddlers, who has to train both their mirror neurons and
their speech organs from scratch, which will take many times longer than for adults. (Thus, small children do not
necessarily learn languages more quickly than adults, except for the fact that they usually spend far more practice time
per day on it and get much more immediate feedback than adults typically do.)
A little handicap we have as adult learners is that our mirror neurons are heavily biassed in favour of our first
language(s), so they will tend only to "recognize" and do what they already know or think they should expect (the
action recognition function). In this process they may happen to reinterprete what they hear into something more
familiar. That is, they may miss many details and get a more or less distorted picture that better conforms with their
experience: Deaf by preconceptions (amblyacusia). This happens particularly if we start reading too soon into the
language course. Learning a new language should always be done without reference to the writing, initially. Because the
letters (particularly if based on a similar script system as own, or transcribed to our own script) will in all likelihood
signal their usual meanings to us, namely the sounds of our own native language instead of the new language. This will
lead to suboptimal perception, suboptimal recognition, and suboptimal imitation of the new details, the situation we call
"foreign accent". To avoid this, we would need a teacher pointing out the minute details and giving immediate feedback
for the learners to perceive and modify their pronunciation habits in accordance with the patterns of the new language
correctly. However, since we already are super elite players of our speech instruments as such, this actually is no big
deal, but we do need to get the detailed information and pay much attention to it until our new pronunciation becomes
automatic and starts working subconsciously. We are better than parrots. We use both quality and quantity for learning.
So, in addition to a good teacher, we need extensive and deliberate listening practice, as recommended in this tutorial. If
you have no teacher, studying phonetics is a good option. This also if you do have a teacher. And an unavoidable must if
you are a teacher.
The actions of mirror neurons are subconscious most of the time, but sometimes they surface in comical ways:
Examples that everybody surely has experienced are when we are watching a soccer/football game on TV and feel
twitches in our own legs as if to try to kick the ball; or when we are listening to a person with a hoarse voice and feel
urged to clear our own throats. Such urges are due to the fact that (1) we recognise the situation, and (2) there are direct
neuronal pathways from the primary auditory cortex (in the temporal lobes) to those mirror neurons (in the frontal
lobes) that monitor and control the speech and voice muscles (or leg muscles). These direct pathways do not involve
understanding of the contents of what is being said! This makes it very fast to shadow or mirror what somebody is
saying, even before you know what s/he is saying. This also makes it very efficient to practice pronunciation in
chorus with your class, or in unison with your recordings, because your mirror-neuron system will compel your
speech and voice muscles to act according to the loud and overwhelming auditory input. This will push you into
getting a native-like rhythm and intonation, virtually without even a chance of getting it wrong. You will certainly
appreciate and enjoy that!
Indeed, experiments have confirmed that the coupling of observation and execution significantly increases plasticity
in the motor cortex. After a training period in which participants simultaneously performed and observed congruent
movements, there was a potentiation of the learning effect. "Observation" here might mean only the auditory input, but
best of all would be a live teacher, whose lip shapes, facial expressions, gestures and all body language could be
observed and mimicked at the same time. Multimodal learning is the best.
All of this, all that is known about mirror neurons in speech-related activities, lends very strong, neurophysiological
support for the method as advocated in this tutorial, in which we practice multimodally multitudinous times in chorus
along with the teacher and class or a recording. We call it Quality Repetition. (This term was coined by Judy B.
Gilbert, well-known author of many books on English pronunciation for foreign or immigrant learners, when we gave
workshops together long ago. Judy also introduced the use of a big rubber band to indicate the long sounds of English.
This is more than a toy gadget, it is the powerful addition of another modality, vision, to the exercises. It will
significantly increase the neuronal traffic between the left and right brain and assist in making that detail ─ length ─
more salient and robust in the learners' procedural memory. I use the rubber band extensively in my Swedish classes
too, where segmental length contrasts are even more significant than in English.)
Most mirror neurons seem to be distributed in the frontal lobes, which are the "head-quarters" of motor activities.
Neuronal networks involved in speech and facial expressions are concentrated in Broca's area (and its homologue on
the non-dominant side) where there is an abundance of mirror neurons. Actually, these mirror neurons for speech (and
hand movements!) also monitor the results of their own speech by continuous, real-time mirroring and monitoring our
own spoken output. That is, they compare what they hear us say ourselves, with the memory of what they think we
should say and should sound like. This is important, because it enables us to modify our speech on the fly, should the
need arise due to some temporary constraints, such as if we are chewing gum at the same time, or are having a
congested nose, or are whispering or shouting, or whatever circumstances that force the speech muscles to act
differently from the usual ways. This is called compensatory articulation, in which we can instantly modify, adapt and
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 9/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

correct our articulation by result-guided processes based on the audio-motor procedural memory stored with our
mirror neurons. "Audio-motor" = the coupling of sounds and speech gestures. 8 All motor movements (including vocal
ones) are organised towards goals, targets, results.
Actually, there is always a natural variation in our pronunciation of any sound or sound sequence, not only depending
on such factors as the degree of stress, the surrounding sounds, head and neck position, how much air we have left in
the lungs, etc., but also random variations because we are only humans. Not the least, there are immense anatomical
differences between individuals. Some of us are small, some are big, some are adults, some are children, some are
males, some are females, some are non-binary, some are skinny, some are obese. Some have high-domed palates, some
have flat palates, some have narrow palates, some have wide palates, some have big chins, some have small chins, some
have their teeth pointing this way, some have their teeth pointing that way, some have lost some teeth. Etc. etc. etc. All
these factors lead to ever so different acoustical properties, but all are still able to produce virtually the same language-
related acoustic output as everybody else speaking the same language. Speech is virtually independent of any
anatomical differences between vocal apparatuses. In effect, each individual has their own full set of unique goal-
oriented basic compensatory articulation to accommodate for that individual's particular anatomy and all other factors
to achieve the same acoustic results as the other people around. One could say that we all are experts on applied,
acoustical phonetics. This competence pivots on the auditory-goal-guided processes of the audio-motor procedural
memory. And this is why we have to train our ears first in learning a new language (just as we did for our first
language). Each person has to tune their own, individual speech apparatus to harmonize with the ambient community.
The previous two paragraphs contain very important messages. Please read them again.
Another important thing to know is that "a pronunciation" is not a kind of event in which we hit a canonical bull's eye in
the middle of a target; never ever so. It is rather a whole cloud of permitted variants around that bull's eye on a
multifaceted, polygonal target slate, or region, bordering on and bumping into its surrounding sounds, much like the
electron cloud around an atom. For the atom, it is totally unimportant where the electron is, as long as it stays connected
with that atom. For the listener of speech, it is totally unimportant where in the target region the speaker hits, as long as
it is in the right region, for example "th" instead of "f" or "t" or "s". Try saying these with varying positions of the
tongue and lips. In a natural context, the native listener will not discern anything of the physical variation (if not too
excessive), but will perceive the "target region" as an abstract category (called a phoneme in linguistics). This is called
categorical perception, in which the native speaker is like deaf for the internal variations but a super-expert on
detecting the minutest transgression across the boundaries; for instance, if you say sink instead of think by a small
alteration in the position of the tongue tip. Or sh*t instead of sheet by making the vowel too short. A huge difference by
a minimal difference, as it were. The categorical perception has its counterpart in articulation and other aspects of
speech; we may call it categorical production. This is where the compensatory articulation comes into play: We never
need to pronounce "bull's-eye" canonical sounds. It is usually even much better to hit somewhere else in the target
region (the "category"), conveniently on the part of it that is nearest our previous target region which we just
pronounced, and with any temporary constraints that may have applied. For this reason too, nothing is better than a live
teacher who makes us practice repeatedly in chorus with all the natural and other variations, a teacher who acts as a
Quality Controller giving us immediate feedback on whether our products happen to fall inside or outside the stipulated
limits, and generously lets us hit anywhere between the limit lines a great number of times accompanied by cheers and
encouragement. Then, by sheer statistical learning, we will acquire a feel for the limits of native or native-like
compensatory articulation. Categorical rather than precise articulation is the best goal.
Interestingly, the acoustical boundaries of a given sound, or phoneme, are not static, but move and change depending on
surrounding sounds, degrees of stress, and other factors. On moving from one sound to the next, the speaker should
lazily only try just to barely cross the nearest boundary but not to go any further, thereby quickly achieving sufficient
categorical (phonological) contrast with minimal effort. This phenomenon is one of several factors behind what
phoneticians call coarticulation, the intertwined articulation of adjacent sounds almost "together". For example, note the
difference in the English /n/ sound if you say “tin” versus “tin can” or “tin bottle.” Exaggerated articulation, on the
other hand, with less coarticulation, is a sure sign of foreignness, be it ever so perfect as such. It may even have reduced
intelligibility for native listeners due to its relative lack of coarticulation. Quality Repetition helps you achieve the
natural articulation and coarticulation.
IPA transcription is a research instrument intended to show one particular pronunciation at one particular event and thus
does not reflect the natural variation. Therefore, although not worthless, IPA transcriptions are not optimal for the
purpose of teaching or practising pronunciation. When I learn languages or teach Swedish, I very seldom use IPA
transcriptions. The ears are much more powerful. But depending on the learner's experience with IPA and awareness of
natural variation, it might still be a useful substitute when no teacher is available.

8 Of course, there is also input from sensor organs of touch in lips, tongue and pharynx, and proprioceptive information of muscular and joint
positions and movements, but "audio-sensory-propriocipio-motor" would be too cumbersome a word. Let "audio-motor" cover it all.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 10/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

4 How to practise pronunciation the best – in real life!


Here are my suggestions for the very first steps into a new language. If possible, I begin by listening rather casually to
as much spoken language as possible, such as we can find on Youtube or radio stations. This will give me the general
“feel” for the language. How do people speak? How do they interact? How do they interrupt one another? How do they
emphasize? How do they express astonishment? How do they laugh? How is their general voice setting? High pitched?
Low pitched? Tongue fronted? Tongue far back? Melodious? What kind of rhythm? Etc.
Next, I download the 100 bilingual9 free mp3 audio lessons on that language from 50Languages (book2.de) and listen to
them in random order, over and over again until I begin recognising and understanding many of the words and
sentences. So far this is pure ear training, and I don't try to speak yet. By the way, do not do this on-line, because then
you will be tempted to read the written texts, which will hinder you from hearing the fine and crucial details. Also, in
the case of Latinised transcriptions from other script systems, you can't always trust them. (For example, the Latinised
Arabic at 50Languages is almost totally incomprehensible.)
For the next stage, when I am getting more serious about one particular language, I will pick one of the lessons that
might be more interesting to know and that doesn't sound too difficult. It could be lesson 1 or 30 or any other lesson.
Ideally, if I have enough time, I will edit the lesson with Audacity, so that I can get many repetitions of each sentence
consecutively. Otherwise I will have to do with “ordinary” repetitions of the lesson in toto. (Each lesson of
50Languages contains 20 sentences and is about 4 minutes long. All their language lessons contain exactly the same
sentences.)
NB. This is important: When I play my practice sentences, I now set my player to Repeat 1, so I can listen lots and
lots of times without having to press the play button every time. Hundreds of times. Maybe thousands of times, over and
over again. This is very efficient, and necessary for the statistical training of my ears first. 10 Particularly if it is a
completely new language of which I have no prior knowledge. Furthermore, since I will make perhaps 6 copies of every
item in each track (see below), I will get 6 exemplars (repetitions) of each even when I have not set it to Repeat 1, as
when I review my material at a later stage. This will efficiently remind me of all that I had forgotten. I know of no
commercially available material that is as good as this.
In the beginning I set the volume of the player to quite loud to "push" the sounds into my head. Little by little my ears
will be "saturated", and I will be able to discern words and feel an urge to mirror and gradually to speak in unison with
the recording. Thanks to neural reflex circuits between the ears and the speech organs (the Lombard reflex, and others),
I too will be speaking in quite loud a voice, reflexively. This is good for training all my 146 speech muscles. And thanks
to some other nerve circuits (including the mirror neurons, that compel me to mirror and imitate correctly), it will
actually be quite difficult for me to pronounce with any other rhythm and melody than in my model sentences. That is, I
will automatically and irresistibly get the correct prosody from the beginning! If, at this stage, I don't get all the vowels
and consonants correct, this does not matter much, really, as explained above. I cheat or remain silent across the
difficult ones. As I stated above: The prosody is the easiest part of the new language. Irrestistibly.
Little by little I will start softening the sound level, more and more. Finally, I will hardly hear the sounds at all while I
still keep repeating. At that stage I will speak it almost by myself, like a native! Without the help of a teacher. But direct,
immediate feedback with comments by a live, well-educated and dedicated teacher with the same amount of patience
would of course have been even better, much better.
With this method I can fairly quickly learn the pronunciation, at least the prosody, of any language. I only need a few
short recordings. I edit them, and then listen to them hundreds of times. I can even have them droning off the car media
player while I am driving, because being repetitive they don't distract my attention from driving, while I can still listen
attentively enough to train my ears and my mirror neurons.
Initially I don't necessarily have to understand anything at all, but of course it would be more fun and efficient if I
could. With time I will be able to discern more and more. I will be like a little child conquering his-her first language,
but I will do it faster than a child. With my recordings, I have no teacher who gets fatigued, no difficult letters, no
boring text, no complicated grammar, no confusing explanations. Only pronunciation, pronunciation, pronunciation,
pronunciation, ... Particularly the rhythm and intonation; the prosody. When my new pronunciation is ready (!) after

9 “Bilingual” means that you will have one “start” language saying a sentence and one target language immediately saying the translation, often
with both a male and female voice. The start language does not have to be my first language. Rather, it should be another language that I have
already studied previously and master fairly well, at least on this simple level that 50Languages offers. In this way I will keep bettering the
previous language too, instead of forgetting it, while I'm learning the new language. This is called the ladder method and is recommended by the
real polyglots. Surprisingly, this will lead to NOT mixing them up. As if they each is assigned their own language-specific box in my brain,
instead of them all being drowned in the same general “foreign-language box” resulting in inevitable muddling up and confusion, if I study my
foreign languages sequentially instead of in parallel.

10 Being 75 years old, I do need many repetitions to remember. But age does not prohibit me from statistical learning and acquiring native or near-
native pronunciation. For how could it? On the neuro-molecular level, the learning processes are still the same as when I was new-born.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 11/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

some time (days to weeks) with hundreds of exercises with the same small amount of practice sentences, then it is time
for me to move on with a good textbook and/or teacher. I will be on the approximate level of a native 2-4 year old
toddler. That is, I will have a native or near-native prosody, as explained above. But in addition, I will also have quite
good command of most if not all the vowels and consonants, because my speech apparatus is mature (in contrast to a
toddler, that is). And I will have a basic, working vocabulary and a set of useful, idiomatic sentences. My basic practice
sentences will now be my calibration sentences for all of my future learning. The front door to my new language will
be wide open. I can begin functioning in simple conversations. Fortunately, my interlocutors can't know what I do NOT
know. Thanks to my pronunciation they will think I know very much more than I actually do, even when I hesitate and
don't find the right words. They will find it natural that I'm still having some empty slots in my command of their
vocabulary, but they will not know that nearly all slots still are totally empty... As a result it will be easy for me to make
contact with native speakers; they will not shun me because of my pronunciation. On the contrary, they will respect me
because I am respecting their language.
This situation, in my opinion, is far better than hurrying through a language course and superficially learn many lessons,
but with unbrushed prosody and faltering pronunciation, hoping that I will deal with that “little detail” later on. Because
the sad truth, as you may have inferred by now, would most likely rather be that I would have learnt and automatized
such unbrushed, “broken” pronunciation that neither I myself nor my teacher nor any other native interlocutor will like
or much less respect. And that will be very difficult to remedy at a later stage. This is called fossilisation.
An advantageous spin-off effect of the Quality Repetition method is the fact that, in all languages, there are close
connections between the pronunciation and the grammar, particularly between their prosody and syntax. Hence,
focusing so hard on the pronunciation of whole sentences initially, will also help me approach and master the grammar
better later on.
I will also claim that the method I advocate here is very time-efficient. Because it will not take a long time to master 20-
30 sentences to the level I aspire for. Of course the required time is very individual, depending on many factors such as
previous experience of learning languages, time available for practice, and the difficulty of the particular language. But
I would dare say that on an average it should take not much more than, say, 100 hours or so of active exercises. The
other alternative, that of initially learning a “broken” pronunciation, will take most people more than a lifetime to
repair! Fossilisation is unnecessary and perfectly avoidable.
More than occasionally I encounter adult learners of Swedish who speak with an ever so slight foreign accent or even
no detectable foreignness at all. Asked about how they attained this, they will recount something similar to the method I
advocate here. Some people have that innate instinct and motivation. However, and unfortunately, the majority who
don't have that innate drive are usually let down by the educational system. Deliberate practice of pronunciation is not
pervasive in the classrooms, because most teachers feel undereducated and insecure of how to do it. This paper suggests
the best remedy so far. It is targeting both teachers and learners. Please do as I tell you! :-)

5 Research
The scientific and empirical underpinnings for this method are sketched in my 1998 article "Accent Addition : Prosody
and Perception Facilitate Second Language Learning" (see link in the bibliography), and detailed in my 2002 book
"[Pronunciation, Language and the Brain. Theory and Methods for Language Education]" with more than 200
annotated references (sorry, only in Swedish so far). But when they were written, we didn't know as much about mirror
neurons as we do now. So the present paper is an important update.
Classroom research on pronunciation and its teaching methods is very difficult to perform rigorously, and there are no
“hard data” on how this Quality Repetition method fares in reality. Teacher educators on higher academic levels never
fail to point that out. It is their duty by profession and training to take a critical stance on all new methods and fads. I
too am a PhD in speech science and also am trained in higher medical research methodology, so I know. However, there
are no “hard data” on how the hitherto used methods or non-methods fare either, except that in this case we can find a
clear answer from the results of a large-scale de-facto experiment, namely traditional language teaching: Go out and
speak with any “foreigner” you may meet and listen to their usually quite broken, fossilized pronunciation. That is the
result of the presently used “methods” of pronunciation teaching in the second-language classrooms. They are the
victims of a gigantic, unscientific test with no control group. To be sure, there are scientific articles reporting on the
foreign-accentedness of L2 learners after so-and-so long time in the new country, but in none of those that I have seen
are the modes and amounts of classroom instructions and practice detailed in the text! They seem to consider time to be
the main or even the only factor influencing the ultimate, inevitable attainment of L2 pronunciation. Here I want to
repeat my strong contention that faulty pronunciation of a second or foreign language is much more the result of
suboptimal instruction and practice than of age, or whatever.
So, in conclusion, dear fellow teachers and learners, at least try out the Quality Repetition method seriously and
diligently! Nothing can get worse than without it.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 12/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

6 An interesting article
Here is a fantastic review by Patricia Kuhl (2010) of brain mechanisms in language acquisition. I particularly liked this
figure showing focal activites in the brains at birth and at 6 and 12 months of age, when subjected to audio input of
speech.

The brain silhuettes in the upper row shows activity in the so-called Wernicke's area, where speech input is received
(from the primary auditory centres in the temporal lobes), analysed, parsed, processed, interpreted and understood. So it
is no surprise that this area is activated by auditory input, most vigorously in the newborns and subsequently reflecting
increasing automaticity in the older infants.
The lower row shows activity in the motor output area, called Broca's area, where muscular action sequences for facial
and speech muscles are composed, before sending them to the actual motor neurons a bit further back in the brain for
subsequent neural commands directly to the muscles. Note that this motor activity recorded still is from audio input of
speech. Because this is one important area where mirror neurons develop and function to help recognize familiar
sounds after training by comparing the input sounds with the sounds that the individual can produce himself/herself by
activity in this very area. So in Broca's area there is no activity in the newborn brains, but increasing activity with age,
i.e., with training! Similar brain activity development can be predicted in adults exposed to a completely unknown
language on the first day (equivalent to newborn) and after 6 and 12 months (or weeks? or days?) of intensive and
extensive listening and practising. (I hope that this kind of study on adults learners will be done soon. Or perhaps done
already?)
This excellent review paper thus supports the notion of mirror neurons' involvement in audio-motor learning and
formation of audio-motor memory templates for goal-guided speech production, as a function of statistical learning as
explained above.
The paper also reviewed studies that show that multi-national pre-school staff can safely use their respective L1:s (i.e.,
their first, or best, languages) when interacting with the children, even if they don't speak the same languages. The
children's brains will develop normally in response to each language and attain a higher cognitive level than if exposed
to only one language. Kuhl states, "... post-exposure MMN phonetic discrimination data revealed that infants showing
greater phonetic learning had higher cognitive control scores post-exposure. ... Taken as a whole, the data are consistent
with the notion that cognitive skills are strongly linked to phonetic learning at the initial stage of phonetic
development."11
Other research reviewed shows that in listeners' brains "cognitive effort is increased when processing nonnative
speech," and that training studies "show that adults can improve nonnative phonetic perception when training occurs
under more social learning conditions, and MEG measures before and after training indicate that neural efficiency
increases after training."12

11 MMN = MisMatch Negativity, a kind of electro-encephalography revealing if a particular perception happens or not.

12 MEG = Magnetic EncephaloGraphy, a research method with magnets. No needles involved.


© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 13/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

7 Beware of Minimal Pairs


Here comes a serious warning: Don't ever practice much with minimal pairs! Minimal pairs are good for phonological
research and for making learners aware of crucial, phonological distinctions, such as of the vowels in ship and sheep, or
the initial consonants in tin, thin and sin. So, of course some listening practice and some pronunciation practice with
minimal pairs will obviously have to take place, but only initially, for creating that awareness. Not more. They should
never be automated pairwise, because of Hebb's principle, "neurons that fire together, wire together." That is, if the
words are automated together, they will always pop up in my mind together. Even if (or, rather, particularly if) I master
the distinction to exquisite perfection after pairwise practice, then every time I am about to say one of them in context,
both of them will appear in my mind as if in a multiple-choice test, forcing me to hesitate for a fraction of a second, and
distressingly often pick the wrong one. Usually, I will notice the mistake and immediately correct myself. But my
fluency will be ruined, a totally unnecessary break that will embarrass me every time. "Oh horror! I chose the wrong
word again even though I know perfectly well that a stalactite – or was it a stalagmite? – hangs from the ceiling … or
was it vice versa? ..." – So please be sure to avoid that trap. Don't ever practice contrastive, confusable pairs on the
same day! Practice just one of them. Then the other one will automatically become ... eh ... the other one.
A conspicuous example of the destructiveness of minimal-pair exercises is the /r/ versus /l/ issue for Japanese learners
of English. They will struggle with that pair daily ever since they begin learning English in school. Even those who are
highly proficient in the English language as well as in the phonetic realization of [r] and [l] will fumble with them
almost every time and make many unnecessary and sometimes embarrassing mistakes. On the other hand, those
Japanese persons whom I met who spoke Swedish, Russian, Tibetan, Chinese or any other foreign language generally
fared much better, making no or much fewer such mistakes. Presumably they did not have to practice light-right etc. as
minimal pairs in their other languages, after English.
This happens not only in pronunciation but in grammar and vocabulary too, such as gender le-la in French, or en-ett and
de-dem in Swedish. I'm sure every reader of this paper can recognize the situation. For instance, native speakers of
English have a notorious tendency to pick the wrong alternative of their and there and even they're in writing their own
language. This is not due to low education or low IQ, but much more likely to natural Hebbian muddle-up: Their
teachers obviously were very meticulous about teaching the distinction a zillion times at school, but ... but ... So don't
ever practice much with two similar things. Put them each in their own natural (and different!) context, and Quality
Practice only one of them on the first day, and the other one on another day much later (if still necessary then). For
instance, Monday: There was a fluffy sheep in the barn. Thursday: I saw a big ship in the harbour.
A pervasive, non-linguistic example of the deleterious effect of pairwise practice is many people's notorious difficulty
of even keeping right-left apart. I call it the "stalactite-stalagmite effect". Completely unnecessary, if you ask me.

8 Beware of Reading and Writing


People who can read and write do want to read and write. This is a serious dilemma for a beginning language learner.
On the one hand, reading adds an extra modality to the learning situation, which is good for memory. And it is true that
you should read as much as possible in all languages you are learning as well as in your own language. You don't even
have to understand everything, because thanks to the statistical characteristics of texts and of the brains' statistical
learning and plasticity mechanisms, you will still gain both grammatical and lexical competence from the sheer amount
of reading. Neurons that fire together will wire together – if they are fired frequently enough. On the other hand, as has
been pointed out above, reading and writing should be avoided in the beginning stage of any language course, until the
pronunciation is mastered reasonably well (as also is the naturally smart condition for first-language learners).
When, eventually, the writing system is introduced, the learner should be warned that there is no connection whatsoever
between the letters and the sounds. The script signs are only arbitrary, abstract symbols of the sounds, very abstract and
very arbitrary. Who said you should use Roman letters instead of runes or Greek or Hindi letters for English? Often
enough there may be more than one symbol or constellation of symbols for one given sound (such as English e, ee, ea,
ie, ei for /i/13; Swedish even has a sound /ʃ/ with more than 60 different spellings); or more than one sound to a given
symbol (such as English c for /k/ or /s/ as in cancer; or y for sometimes a vowel, sometimes a consonant); or two
different sounds together in one symbol (as x for /ks/ in English, but not at all that for the superficially “same” symbol x
in Albanian or Vietnamese); or some symbols not sounded at all, depending on the context or other circumstances (such
as English "silent e", or th in sixths). At best, the orthography is a museum and mausoleum of historic pronunciations. In
these respects, some languages (English, French, Swedish, Tibetan) are worse than others (Finnish, Hungarian, Spanish,
Serbo-Croat, Yakut, Japanese). In languages such as English, Swedish and Russian, reductions are very important. That

13 In accordance with international conventions, the slashes /.../ are used to indicate an intended sound (phoneme) regardless of its spelling or
phonetic details.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 14/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

is, the pronunciation may be far, far away from what you might expect from the letters, usually depending on the degree
of stress. If you learn the prosody well, reductions will come naturally. And vice versa: if you make the reductions well,
the prosody will come out more naturally. But if you learn from the writing, you may miss the reductions completely,
and thereby the prosody too. Or, similarly, if you are learning Japanese ありがとう, 'thanks', from the romanized
transcription "arigato", you may be misled to believe that the Latin letter r represents an /r/ sound, and that the letter g
represents a /g/ sound, which, however, happens to be utterly wrong. Orthographies, transcriptions and transliterations
are nothing else than a bunch of compromises between convenience and phonetic details. (Who, beside phoneticians,
wants to write or read “aɾiŋatoː”?)
So, if possible, shun away from any script until you can speak! Billions of toddlers can't be wrong on this “method”. :-)

The next main section will tell you, step by step, how to use the free Audacity sound editor to customise your own audio
lessons. Other softwares also exist, particularly the Ear2Memory mentioned in the introduction, but when I wrote the
bulk of this article, I was only familiar with Audacity. It would take too much time to rewrite the cookery-book, or even
to update it for its recent evolution. So here you are, dear reader! I'm confident that you can adapt it to your own needs
and wishes.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 15/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

9 Download, install and start Audacity.


Available for PC, Mac and Linux.

Get Audacity at https://www.audacityteam.org/(Fig. 1).


Audacity means boldness, courage. It is pronounced [ɔːˈdæsəti]
with the stress on -da; it's not a "city". :-)
I don't know why they chose this name, but the software is
fabulous. Powerful. You can do lots of things with it and have lots
of fun.
At the time of writing this section, I was using Audacity version
2.0.5. Today, version 3.2 is available with many new functions.
Please bear with any differences in the screenshots. Fig. 1
Do have a look at one or more of the many tutorials that are
available in English and several other languages.
They are under the Help tab (Fig. 2).
Now, start your Audacity and start enjoying!

Fig. 2

Check the settings. For example, there are 56 languages


to choose between in the program. The default language
is English, and the translations are of varying quality, and
in some cases only some functions are translated at all.
You will find the language settings and other preferences
under Edit → Preferences → Interface, shortcut Ctrl+P.
(Fig. 3)

Fig. 3

In Fig. 4 you can see the Swedish interface.

For the time being, accept all standard and default


settings. For most purposes they are the best.
Fig. 4
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 16/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

Sometimes it may be difficult to set your computer and Audacity for recording from a microphone or the speaker sound,
e.g., from YouTube or some pod radio. If so, ask someone who understands your computer to help you.
You will probably have to import a separate component to handle mp3 files. If so, follow the link and tips that may pop
up and install that component too. Or else, skip mp3 and use only wav.
Hint: When using a microphone, be sure to place it at your cheek a little bit behind the angle of your
mouth, so as not to blow air into the mic and cause a noisy recording.
NB: In most laptops the built-in mic makes rather low-quality sound, so a separate mic is recommended!

More hints: If you want to make phonetical analyses, use the wav format, not mp3. The program of first
choice for phonetics is Praat (Dutch for "speech"). Praat too is free, extremely versatile and powerful and
used by most of the phoneticians in the world. Unfortunately it is not so intuitive, but there are lots of
detailed help files, tutorials and active user groups. Download it from http://www.fon.hum.uva.nl/praat/
One very good tutorial for both Praat and phonetics is available at http://swphonetics.com/praat/ by
renowned Swedish-British phonetician Sidney Wood.

10 Look at all the buttons


At the top left of the program window you will find buttons of the
same kind as on an old cassette player. ( Fig. 5) So you will likely feel
comfortable to start using Audacity at once:
Fig. 5
Pause ─ Play ─ Stop ─ Skip to Start ─ Skip to End ─ Record.
If you hover the mouse over a button, a help text kindly appears. The same for all other buttons too. In many cases you
will also find a help text in the left bottom margin.
Hint: The easiest way to record is by pressing R on the keyboard. Then stop and play with the space bar.
Replaying will start from the beginning, or where you have placed the marker.
To continue recording in the same track, press Shift+R. Because if you only press R now, the recording
will start in a new track below your previous track, instead of at the end of the previous recording.
Now record something (R), or open any pre-
existing recording you may have (File →
Open...). Experiment with the buttons! Nothing
can go wrong; your original recording will not be
affected, and everything you do can be undone.
Here in Fig. 6 I opened a part of my Pimsleur
Hungarian course.
What we see, are the sound tracks for the right
and left channel (stereo). Curves like these are
called oscillograms. They show the pressure
variations of the sound waves. A straight line is
silence. The hight of the peaks is the amplitude
(≈loudness).
If you like, you can select just one part of the
recording and listen to that selection only. Do as
in a text document: press, hold down the left Fig. 6
mouse button and drag to the length you like.
(Shaded area in Fig. 6)
Or Select All with Ctrl+A.
Hint: Hold down the Shift key when you start playing, and the
selection will be replayed repeatedly in a loop until you stop it. I
recommend this procedure for pronunciation practice.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 17/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

Examine what happens with the green Play button when you press Shift!
The selection can be treated like any selected text in Word or OpenOffice: Copy, Cut, Paste, Delete, Move, Change,
and very much more. We will return to that later. Please do experiment with various menu choices and keyboard
commands that you think look interesting!
Hint: Everything that you do can be undone in the ordinary way with Ctrl+Z or Edit → Undo. You
can undo exactly EVERYTHING that you may have done. And redo, and re-undo what you redid, etc.
For Redo, press Ctrl+Y. Try it! Do several things and undo and redo them back and forth as much as you
like. You can even undo opening the file, and undo undoing that you undid opening the file. :-)
Undo and Redo also exist as buttons with curved
arrows as in many other ordinary programs today.
(Fig. 7).

Fig. 7
Hint: When you quit the program, it will ask if
you want to save changes. Always reply No!
(Fig. 8) I will explain later below.
Fig. 8

11 Zooming
Look at the View menu.
There are several alternatives for zooming in and out. (Fig. 9)
Try them out, and learn the keyboard commands! That will speed
up and simplify your work significantly.

What I use most of the time is Ctrl+E (Zoom to Selection) and


Ctrl+F (Fit in Window), and often zoom in and out with Ctrl+1
and Ctrl+3.
In version 2.2.2 you can zoom in and out with the scroll wheel.
Ctrl+F means "show the full recording in the window" and is
handy to get an overview of exactly everything without having to Fig. 9
scroll back and forth or zoom out in small steps.

Ctrl+E will zoom in and enlarge my


selection, making it fill the whole
window. Then I often use Ctrl+3
(=zoom out a little) directly afterwards
to get a better perspective on each side of
the selection.
Fig. 10 shows what my selection from
Fig. 6 may look like after Ctrl+E+3:

Fig. 10
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 18/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

← If you place the marker on the lower edge of the stereo sound
channels, you can resize both channels up and down symmetrically.
(See Fig. 11 a and b on the left.)

The marker changes its shape to ↨

Fig. 11
If you place the marker on the line
between the channels, you can resize them
reciprocally (i.e., make one wider, the
other one narrower).
(See Fig. 12 a and b on the right)→
Usually there is no need to do this.

Fig. 12
12 Stereo or mono?
Usually mono is enough (occupies ½ the
space on my hard disk), so I will remove one
channel.
Click the little triangle ▼ (Fig. 13), and get a
drop-down menu Fig. 14).
Choose Split Stereo to Mono, and the
channels will split into two identical mono
channels.
Pick either one and close it with the little Fig. 13
cross × in its upper left corner (Fig. 15).

The other option here (Split Stereo Track) will keep the right and
left channels different as in the original (if you really used a stereo
microphone). You might want to experiment with each channel
separately, and then join them again. You may get funny or artistic
effects! Fig. 14

However, for the purpose of pronunciation exercises, mono is enough, occupies the least space
on your drive, and is the best choice.
Fig. 15
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 19/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

Hint: Remember that you may Undo (Ctrl+Z) at any time, and Redo (Ctrl+Y) (and "un-undo" and "un-
redo") as many times as you like, if needed or wanted. If you ever should feel total panic, wondering
what on earth you have done, then just close the program, and as always answer No to the question if you
want to save changes! Next time you open the file, everything is as it was from the beginning. The
original recording will never be affected by our manipulations.

13 Recording during my (Olle Kjellin's) pronunciation classes


It's a good idea to record as much as possible during our pronunciation exercises, both the teacher's voice and your own
voice. We will practice a lot in chorus. In that procedure, everybody's speech will be "pushed", as it were, to more or
less inevitably get the same rhythm and melody (intonation) as the teacher's voice. (Thanks to the brain's mirror
neurons.) If you use a headphone and record yourself with its mic correctly placed near your mouth, your own voice
will dominate in the recording. It's your own voice with this excellent rhythm and melody. Choose the best of these
recordings for your future pronunciation exercises. Then use Audacity to practice in unison with yourself, imitate your
own voice! That is what suits best for your own ears, your own brain, and your own speech apparatus. This is very
effective, and often much better than most commercial (and usually expensive) language courses on DVD, CD or
cassette.

Hint: When you temporarily stopped the recording during class, and then start recording again, a new
track will be created below the previous one. This does not matter much, but makes the editing
cumbersome afterwards. It is better to continue recording in the same track as before. To achieve this,
press Shift+Record (Shift+R).
Alternatively, use the Pause button instead of Stop. Then just un-pause to continue recording.

14 Manipulate and adapt your recorded material


At times, the recording will be too soft (weak), and you will just get a very thin sound track (Fig. 16):

Fig. 16
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 20/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

Then do like this: Select a part as


explained above (or select all with
Ctrl+A). Go to menu Effect and click
Amplify... (Fig. 17)
A dialogue opens (Fig. 18) in which
Audacity suggests the largest possible
amplification in dB without clipping
the peaks. In this particular instance
21.4 dB.

Fig. 18
Fig. 17

Generally it is best to accept the suggested degree of amplification. But if you think it
got too loud, just Undo (Ctrl+Z) and then do the Amplify anew with a lower dB value.
Again and again, until you are satisfied.
Fig. 19 is the result of the 21.4 dB amplification in this particular example.
If we ever should want to make the sound softer, instead, we will use the same menu
Amplify but put a minus (-) in front of the dB value.
Fig. 19
Hint: Sometimes there are spikes of artefact noises in the midst of the utterance that I want to amplify.
Then I zoom into the noise until I can delimit and select only the spike, exactly, and de-amplify it
significantly. Finally I will zoom out again and amplify the whole utterance in the usual way as described
above, the noise being gone.

Hint: After selecting something, but before doing anything with the selection, I press Z on the keyboard.
This will move the edges of the selection to the nearest zero value in the amplitude curve. This essentially
removes the risk of getting irritating clicks in the manipulated result. (I press Z so often that it has become
a like a subconscious reflex, even if it often is unnecessary. But it takes less than a second, and nothing can
be destroyed.)
Fig. 20 shows a very zoomed-in picture of the left edge of a selection before I pressed "Z", and Fig. 21
shows the result after "Z". Notice how the edge of the selection and the amplitude curve now cross the zero
line at the same place.
Edit: In the more recent version of Audacity, there are good click removal algorithms available.

Fig. 20
Fig. 21
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 21/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

15 Help! They speak too fast!

Do like this: Select an utterance in


the usual way, then click Effect →
Change tempo... (Fig. 22)

I get a dialogue (Fig. 23), where I


enter minus 20 percent.
Audacity will then work for a little
while, filling in extra sound that
makes the speech slower. It's a kind
of cheating, but the result is usually
good enough for the purpose.
If it sounds too bad, try a smaller
tempo change, for instance -15
percent. More than -20 will seldom Fig. 23
be good.
Try out various tempo changes!

Fig. 22

If your recording has too slow tempo, you can speed it up with a positive percentage. I do it most of the time with the
Book2 recordings, especially on the renderings in my own language. (Most speech samples in Book2 are somewhat
slow as they are intended for learners, and then combined to be used bilingually in a great number of possible
permutations.)

Remember (again) that you can always Undo (Ctrl+Z) and try other values until you are satisfied. Or just for fun!

16 Prepare sound tracks for practising with your smartphone, CD, mp3 or
computer
Let's assume that I have an audio recording from a language class, or a chat over a cup of coffee with friends, or a radio
program, or a TV drama, or an old language course on a cassette tape, or something from YouTube, or whatever, with
useful phrases that I want to practice my pronunciation with. In the following example I have chosen a little phrase
embedded in a dialogue. The phrase happens to be about 2.31 seconds long (displayed in the bottom margin; Fig. 24).
This duration is very suitable for pronunciation exercises. Remember that! About 2 seconds is the best duration for
practice sentences! Maybe even shorter when you are a beginner, and probably quite a bit longer when you are getting
more advanced. I listen a couple of times with Shift+Spacebar (=Shift+Play), and take a note of its time position that
is displayed along the upper border; in this case just before 15 seconds measured from start (Fig. 24, upper margin).
This is useful to know if the total recording is very long and I might get lost when I zoom out...
I then press Z and modify the amplitude and tempo as above, if needed.
I also want some "air" around my practice phrase, so I will create silence before and after it. I zoom in a bit and put the
marker at the left edge of my selection, press Z and click the menu Generate → Silence (Fig. 25) and get a dialogue to
choose the duration of the silence, for example 2 seconds (Fig. 26). I do the same at the end of the selection.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 22/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

Fig. 24 Fig. 25 Fig. 26

My track now looks like in Fig. 27; no sound is lost, just pushed aside by 2 seconds in each direction:

Fig. 27

Hint: Be sure now to extend the selection a little bit into the silences, particularly some 600 ms
(milliseconds) at the end. Because ca 600-800 ms (0.6-0.8 seconds) of silence between the repetitions,
neither longer nor shorter, will typically make it easy to practice in unison with the program with a
comfortable rhythm. The interval also depends on the rhythm and tempo of the sample. Test it by Shift-
playing your selection a couple of times, stop and adjust the included silences and Shift-play again, until
you obtain the rhythm that feels the most comfortable to you.

The next thing is to make the selection repeat itself a couple of times. Go to Effect →
Repeat... (Fig. 28) and specify the number of repetitions (Fig. 29). I often enter 5,
which will give me 6 exemplars total (Fig. 30).

Fig. 29

Fig. 28

Fig. 30
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 23/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

Hint: This 600-800 ms silent interval between the repetitions will give precise time for breathing and
contemplating how to modify one's pronunciation for the next round. Because this is all about chorus
practice together with the recording; not any "listen and say after me" as in olden times. (The "listen and
say after me" procedure is ineffective in the beginning of second language learning; perhaps better a little
later on, when the pronunciation is solidly mastered.)
While my six exemplars of the practice sentence are still selected, it is time to save them. However, in Audacity we
typically don't "save" the file, but export selection. Go to menu File → Export selection... (Fig. 31):

Fig. 31
...and first choose a suitable location to save it, and then a suitable file name (for instance part of the sentence itself). I
can also choose the file format, such as MP3, WAV, AIFF, Ogg or other (Fig. 32):

Fig. 32

Hint: Write track number before the file name (with leading zero for 01-09). This will simplify the
sorting later.

Hint: I put my practice sentences in Dropbox directly. This will give me immediate back-ups in case of a
hard-disk crash after all this work, and best of all, I can access the most recent version of my files at once
from any other computer and my smartphone. No need for a memory stick.
If you haven't got Dropbox yet, please use this "invitation" link from me http://db.tt/tsfzycJ4, and we
will both get a little extra bonus space.

Remember when you close the program to reply No to save. We have already exported what we wanted to keep.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 24/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

Extra: If you reply Yes to Save, you will save a Project, a special Audacity file that is quite big but
allows many exciting possibilities. For instance, you can annotate your recording. Or you can create
music, or sing in chorus with yourself in several different tracks while you are playing various
instruments in several other tracks. You can manipulate and mix them in innumerable ways. Professional
musicians do so. There are a lots of fun things to do with Audacity. When you are ready, you concatenate
them all into a final version with two stereo channels, export them to a WAV file, burn ten CDs, and try to
sell them on the Flea Market on Saturday! Or at least one CD to your mother.

17 Now YOU try! Experiment with Audacity and yourself. Nothing can go wrong!

18 More hints
For quickly and easily getting the pitch contour (also known as F0 extraction) of your practice sentence(s), please use
the free program WaveSurfer (not for stereo tracks). Read about it here: http://en.wikipedia.org/wiki/WaveSurfer
and download it from here: https://sourceforge.net/projects/wavesurfer/

In the Glossika group on Facebook, some very good suggestions came up:
Alexander Giddings wrote:
It just occurred to me that the quickest and most effective way to edit the A files [of the Glossika language courses]
may be simply to use the repeat function over each group of two target sentences (following the primer) and then the
truncate silence feature over the whole file once you are finished, which will give you a pause of exactly the same
length (i.e. 600-800 milliseconds) between each repetition and between each group of repetitions. ... There is one
downside, however, which is that any sentence-internal pauses (as in the mini-dialogues) longer than the specified
truncate length will be condensed in the same way.
Rand added:
Here is how I quickly edit glossika files down for choral repetition in Audacity: use the "sound finder" feature. It
will automatically find each phrase and break it up for you. It won't break up short pauses within the phrase because
you set the duration of silence that it considers a break. You can also tell it how much before or after the phrase
(silence) to include in the output file. Set this really short and put your iPod on repeat and you have mobil choral
repetition. Then you export and it will auto sequentially name the files for you, I always make it 1-50 for each c file
(ex sentence 605 is En_Zhs_13_05, meaning 13th C file, track 5). Takes me about 3 minutes from start to finish
breaking the C file into individual files then putting them together as a playlist on iTunes and putting it on my
phone.

19 Other software
New info, March 10, 2019: There is a very neat program and app called WorkAudioBook, with which you can easily
replay a zillion times any sentence or phrase of an mp3 audio file (e.g. audio books and 50Languages lessons) for
language practice, including automatically find and jump to the next phrase. (Thanks to Piotr Szafraniec for this info.)
With its associated software you can extract sound tracks from films and Youtube videos and save them to mp3,
including subtitles.
http://workaudiobook.com/WorkAudioBook/Download(Windows).aspx
I haven't yet found a way to change the tempo as in Audacity, but maybe there is one.
I think that Audacity used as I explained in this tutorial is superior for the beginning stages of language learning. But
when you get so advanced that you can enjoy audio books, you may probably prefer WorkAudioBooks. As for myself,
I'm not very fond of films and audio books, but I will perhaps use it for news podcasts in various languages.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 25/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

20 Even more later


I may want to add or delete stuff here, whatever comes to my mind and whenever that happens. Keep alert for new
versions. You can identify the version and its last edited date and time in the header on each page. This version contains
some corrections and additions compared to earlier versions.

21 Why should you believe my assertions?


This author is a language nerd who has experienced many kinds of language pedagogy, some with success, some
without. I was born a monolingual Swede. I have worked quite a lot as a teacher of Swedish to learners with other first
languages. In this role, I have always had a special interest in pronunciation and grammar, and the associations between
the pronunciation and the grammar, and how grammar functions with pronunciation in the brain. I also educate other
teachers and produce textbooks and learning materials. Furthermore, as I mentioned in Footnote 1, I am a linguist and
phonetician. I got my Ph.D. in speech science (or, formally, Dr. of Medical Science in the physiology of speech) at the
Research Institute of Logopedics and Phoniatrics, Faculty of Medicine, Tokyo University, in 1976, defending my thesis
in Japanese. The thesis dealt with the associations between speech physiology, phonetics, phonology and prosody in
Tibetan14. Not a very outstanding thesis, but I learnt a real lot about speech science. In addition, I am a MD, now a
retired diagnostic radiologist, specialized in examining the speech and swallowing organs in the assessment of
dysphagia (swallowing problems) of various kinds. I believe that few other phoneticians have seen the speech organs as
"live shows" as many times as I have, and it has been very informative to see the vast interindividual variability of the
same anatomy, all still achieving the very same results. Moreover, I worked for several years in a geropsychiatric clinic
assessing and treating memory disorders and dementias. I learnt a lot on how the brain and its neurons work in learning
and forgetting.
The common denominator between all these fields is communication and human social interaction. No other person in
the world whom I know of (yet) has the combined competences from these disciplines of both the humanities and the
medical sciences, synthesing them all into a well-designed methodology for language learning, as presented here, a
methodology that is based both on personal experience of language learning as well as teaching, and on what
neuroscience really knows about the human functions of learning, hearing and speech.

22 Was this useful to you?


I spent a substantial amount of time on learning all this, and on writing this tutorial. Any comments, praise, critique,
corrections, suggestions and requests will be appreciated. Please go to FACEBOOK and join this group,
https://www.facebook.com/groups/best.pronunciation/ and give me some feedback.

23 Selected bibliography
Cattaneo, L., & Rizzolatti, G. (2009). The Mirror Neuron System. Archives of Neurology, 66(5), 557–560. Available at
http://archneur.jamanetwork.com/article.aspx?articleid=796996
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The Role of Deliberate Practice in the Acquisition of
Expert Performance. Psychological Review, 100(3), 363–406. Available at:
http://graphics8.nytimes.com/images/blogs/freakonomics/pdf/DeliberatePractice(PsychologicalReview).pdf
Ericsson, K. A. (2000). How experts attain and maintain superior performance: Implications for the enhancement of
skilled performance in older individuals. Journal of Ageing and Physical Activity, 8, 346-352. (Updated excerpt
available at: http://www.psy.fsu.edu/faculty/ericsson/ericsson.exp.perf.html or
http://www.freezepage.com/1404355998UGCCCQIQAR)
Hurford, J.R. (2002). Language beyond our grasp: what mirror neurons can, and cannot, do for language evolution. In
D. Kimbrough Oller, U. Griebel, & K. Plunkett, eds. The Evolution of Communication Systems: A Comparative
Approach. Cambridge MA: MIT Press. Available at: http://www.lel.ed.ac.uk/~jim/mirrormit.pdf.
Kjellin, O. (1999). Accent Addition : Prosody and Perception Facilitate Second Language Learning. In O. Fujimura, B.
D. Joseph, & B. Palek, eds. Linguistics and Phonetics Conference 1998 (LP’98). Columbus, Ohio: The Karolinum
Press, pp. 1–25. Available at: http://olle-kjellin.com/SpeechDoctor/ProcLP98.html. (Recommended reading!)

14 Kjellin, O. (1977). Observations on consonant types and “tone” in Tibetan. Journal of Phonetics, 5, 317–338.
© Olle Kjellin 2023: Kjellin-Practise-Pronunciation-w-Audacity 26/26
May be updated at any time; this is version 2.9 last edited on March 13, 2023 at 12:20

Kjellin, O. (2002). Uttalet, språket och hjärnan. Teori och metodik för språkundervisningen [Pronunciation, Language
and the Brain. Theory and Methods for Language Education]. [book in Swedish] Uppsala: Hallgren och Fallgren
Studieförlag AB.
(The book is out of print as of Feb 2021, and I'm now in the process of uploading a slightly revised version of it as a
free google doc for anyone to enjoy: https://bit.ly/Uttalet)

Kuhl, P. K. (2010). Brain mechanisms in early language acquisition. Neuron, 67(5), 713–27.
https://doi.org/10.1016/j.neuron.2010.08.038
Rizzolatti, G. (2005). The mirror neuron system and its function in humans. Anatomy and Embryology, 210(5-6), 419–
21. Available at: http://link.springer.com/article/10.1007/s00429-005-0039-z?LI=true
Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. WIREs Cogn Sci. Retrieved
May 14, 2012, from http://wires.wiley.com/WileyCDA/WiresArticle/wisId-WCS78.html
Skoyles, J.R. (1998). Speech phones are a replication code. Medical Hypotheses, (50), pp.167–173. Available at:
http://human-existence.com/publications/Medical Hypotheses 98 Skoyles Phones.pdf.
Tettamanti, M. et al. (2005). Listening to action-related sentences activates fronto-parietal motor circuits. Journal of
cognitive neuroscience, 17(2), pp. 273–81. Available at: http://www.ncbi.nlm.nih.gov/pubmed/15811239.
***

You might also like