Professional Documents
Culture Documents
Series Editors
Luke Plonsky
Martha Young-Scholten
Volume 64
Second Language
Pronunciation
Edited by
Ubiratã Kickhöfel Alves and
Jeniffer Imaregna Alcantara de Albuquerque
ISBN 978-3-11-073951-0
e-ISBN (PDF) 978-3-11-073612-0
e-ISBN (EPUB) 978-3-11-073614-4
ISSN 1861-4248
www.degruyter.com
About the Authors
Akiyo Joto is a professor emeritus at the Prefectural University of Hiroshima in Japan. She holds
an MA in English linguistics awarded by Okayama University, Japan, and an MA in TEFL
conferred by Ball State University, USA. Her main research concerns the analysis of English
pronunciations of native Japanese speakers and its application to teaching English sounds to
Japanese learners from the perspective of contrastive phonetics between English and Japanese.
She is currently working on the development of a teacher’s manual of English sounds with video
instructions for elementary school English education in Japan. Email: joto@pu-hiroshima.ac.jp
Anabela Rato is an Assistant Professor and the Associate Chair of the Undergraduate Program
in Portuguese Studies at the Department of Spanish and Portuguese (University of Toronto,
Canada). She is also the Chair of the Canadian Association of Teachers of Portuguese (CATPor).
She received her Ph.D. in Language Sciences, with a specialization in English Linguistics, and
her Master’s degree in English Language, Literature, and Culture from the University of Minho,
Portugal. Her research interests include Second Language (L2) Speech Learning, Heritage
Language (HL) Phonological Acquisition, Speech Perception and Production, Phonetic Training,
and Applied Phonetics. Email: anabela.rato@utoronto.ca
Cosme Daniel Paz is a PhD student and graduate assistant in Agricultural Sciences at
Universidad Nacional de Mar del Plata (UNMdP), Argentina. He is a member of the Research
Group Cuestiones del Lenguaje at UNMdP - ANPCYT - INTA. He is an Agricultural Engineer
graduated from Universidad Nacional de Salta (UNSA), Argentina, 2011. From 2012-2016, he
was awarded a CONICET doctoral scholarship. Main research areas: Statistical analysis
related to L2 speech development. Email: cosmepaz@gmail.com
Denis Liakin is Full Professor of French and Linguistics in the Department of French Studies at
Concordia University in Montreal. Prof. Liakin completed a PhD in Linguistics at the University
of Western Ontario (2003) and joined Concordia University in 2004. His research interests
include effects of computer technology on L2 learning, corrective phonetics and second
language acquisition of syntax. His current SSHRC project investigates the pedagogical use of
mobile devices for improving L2 pronunciation. Email: denis.liakin@concordia.ca
Denise Cristina Kluge is a professor at Federal University of Rio de Janeiro (UFRJ) at the
Department of Anglo-Germanic Languages and also part of the Graduate Studies Program in
Language at Federal University of Paraná (UFPR). She is graduated in Portuguese and English
https://doi.org/10.1515/9783110736120-202
VI About the Authors
teaching from Universidade do Vale do Rio dos Sinos - Unisinos (2000), and did her MA
(2004) and PhD (2009) in Linguistics at Federal University of Santa Catarina (UFSC) in Brazil.
Her research interests include speech perception and production, perceptual training, effect
of visual cues, acquisition/learning an additional language and teaching pronunciation.
Email: deniseckluge@gmail.com
Diana Oliveira received her Ph.D. in Language Sciences, with a specialization in Applied
Linguistics, in 2020 and her Master’s degree in Portuguese as Foreign or Second Language
in 2016 from the University of Minho, in Portugal. She is a junior researcher at CEHUM,
UMinho, and is interested in individual differences in second language speech learning.
Email: oliveira.diana27@gmail.com
Elena Kkese (PhD in Linguistics) has taught at secondary and tertiary education since 2004.
Elena’s research focuses on phonetics and its relation to phonology, bilingualism, sociophonetics,
sociolinguistics, teaching and education. Her research interests include the speech and visual
perception and production of phonetic and contextual information in L1 and L2 and the
implications to L2 pronunciation and literacy. She is the author of Identifying Plosives in L2
English: the case of L1 Cypriot Greek speakers, L2 Writing Assessment: The Neglected Skill of
Spelling, as well as Speech Perception and Production in L2. Email: elenakkese@hotmail.com
Ellen Simon is an Associate Professor in English Linguistics at Ghent University, Belgium. Her
research field is that of second language phonetics and phonology and she has published in
a.o. Second Language Research, Journal of Child Language, Journal of Phonetics and International
Journal of Bilingualism. She has published two book volumes with Academia Press: Voicing in
Contrast (2010) on the acquisition of the English voicing contrast by native speakers of Dutch
and Media-induced Second Language Acquisition (Simon & Van Herreweghe, 2018) on the
acquisition of English by primary school children in Flanders. She is currently working on issues
of accent variation, intelligibility and the effect of training and exposure on L2 speech learning.
Email: ellen.simon@ugent.be
Idée Edalatishams received her PhD in Applied Linguistics and Technology from Iowa State
University, where she worked as a communication consultant at the Writing Center and the
Center for Communication Excellence and taught first-year composition and a range of
graduate and undergraduate ESL courses. Her primary research is in spoken corpus
linguistics, pronunciation, and multilingual speakers’ oral communication. She has the
developed the Corpus of Teaching Assistant Classroom Speech and is the Faculty ESL
Specialist at George Mason University Writing Center, where she develops programming,
About the Authors VII
Ilvi Blessenaar, MA is a speech and language therapist, clinical linguist, lecturer at the
Utrecht University of Applied Sciences Utrecht (HU), Department for Speech and Language
Therapy and a junior researcher at Research Group for Speech and Language Therapy. Her
teaching and research focuses is on the role of Speech Language Therapists in second
language pronunciation in the Netherlands and Belgium. She is also active in the field of
children with Developmental Language Disorders (DLD) and Speech Sounds Disorders (SSD).
Email: ilvi.blessenaar@hu.nl
Lily Compton is the Graduate Communication Programs Coordinator at the Iowa State
University’s Center for Communication Excellence. Her primary research is in curriculum and
instructional technology, online education, teacher education, and oral communication. She
taught and designed curriculum for courses for the oral communication skills of
International Teaching Assistants (ITAs), methods for teaching English as a Second
Language, and instructional technology for online language learning. She oversees the
institutional language tests for ITAs and trains the test raters. She also mentors and
supervises the English Speaking Consultants and instructors of the ITA oral communication
courses. Email: lcompton@iastate.edu
Lizet van Ewijk (PhD) is a speech and language therapist and senior Lecturer at the Utrecht
University of Applied Sciences Utrecht (HU), Department for Speech and Language Therapy
and a senior researcher at Research Group for Speech and Language Therapy. Her teaching
and research focus is on improving communication opportunities for adults with
communicative vulnerability. Email: lizet.vanewijk@hu.nl
María Claudia Troglia holds a degree in English Language Teaching (Universidad Nacional de Mar
del Plata – UNMDP -, 2010). Currently, she is a teaching assistant at Discurso Oral II at the English
Teacher Training Program at UNMDP. She is also a teaching practice instructor (English Teacher
Training Program, Instituto Superior Idra, Mar del Plata, Argentina. 2019–2021). She is a member
of the research group Cuestiones del Lenguage at UNMDP. Email: claudiatroglia@gmail.com
Natallia Liakina’s professional experience includes teaching French as a second language at the
university level in Ontario and in Quebec. Since 2006, she has taught at the French Language
Centre at McGill University. Her current research is focused on corrective phonetics and the
impact of new technologies such as speech technologies and augmented reality games on L2
teaching and learning both in the classroom setting and online. As part of her work at McGill, she
has taught FSL classes and developed educational materials. Email: natallia.liakina@mcgill.ca
VIII About the Authors
Pauline Degrave is Assistant Professor in Dutch Didactics at UCLouvain, Belgium. Her key
research interests are foreign language acquisition - especially Dutch by French-speaking
learners - and the relationship between music and language. She explored the effect of musical
training and abilities as well as the use of music in foreign language classrooms. Specialized in
pedagogy (secondary school and higher education), she has been teaching Dutch to French
speakers for more than 10 years. She has published several Dutch handbooks and research
articles in International Review of Applied Linguistics in Language Teaching and Journal of
Language Teaching and Research. Email: pauline.degrave@uclouvain.be
Pedro Luis Luchini holds a Post-doctoral degree in Linguistics from Universidad Federal Rio
Grande Do Sul, Porto Alegre, Brazil (2019), a PhD in Letters, from Universidad Nacional de
Mar del Plata (UNMdP), (2015), an MA in ELT and Applied Linguistics (AL) from King’s College,
University of London, UK (2003). Currently, he is a full professor and research group director at
Cuestiones del Lenguaje, UNMdP, Argentina. Main research areas: AL with a focus on English
pronunciation. Email: luchinipedroluis@gmail.com
Quentin Decourcelle is a teacher of Dutch as a Foreign Language for native speakers of French.
He graduated from Ghent University with a Master in Linguistics and Literature, with English as
the main subject. In 2018, he successfully completed a Master of Advanced Studies in
Linguistics, in which he specialized in multilingual and foreign language learning and teaching.
His research interests include incidental language learning, grammar acquisition and language
training. He was affiliated to the English Section of the Linguistics Department at Ghent
University from 2018 to 2021. Email: decourcelle.quentin@hotmail.be
Ronaldo Lima Jr is a professor at the Federal University of Ceará, Brazil, where he teaches
English and general phonetics and phonology at both undergraduate and graduate levels. He
is the founder and current director of the Laboratory of Phonetics and Multilingualism
(LabPhoM) at the Federal University of Ceará. He has a doctorate in Linguistics, a master’s in
Applied Linguistics, and his main research interest is in the phonological development of
nonnative languages. Email: ronaldo.limajr@gmail.com
Tim Kochem is a Lecturer in the English Department at Iowa State University. His primary
research is in L2 pronunciation pedagogy, language teacher education, educational technology,
and distance education. He worked as an English Writing, English Speaking, and Interpersonal
Communications Consultant at the Center for Communication Excellence for four years. He has
also taught a global online course for the Online Professional English Network (OPEN), Using
Educational Technology in the English Language Classroom, as well as introductory courses in
public speaking and linguistics at Iowa State University. Email: tkochem@iastate.edu
Tracey Derwing, Professor Emeritus, has extensively researched L2 pronunciation and fluency,
especially the relationships among intelligibility, comprehensibility, and accent. She has also
investigated native speakers’ speech modifications for L2 speakers and has conducted
workplace studies involving pragmatics and pronunciation. For several years she directed a
research center on immigration and integration. Currently, she serves on a committee that
advises the Canadian government on language training for newcomers. Much of Tracey’s work
has been conducted with Murray Munro – together they wrote Pronunciation Fundamentals:
Evidence-based perspectives for L2 teaching and research, in addition to dozens of research
articles. E-mail: tderwing@ualberta.ca
Wellington Mendes is an English Teacher in the Federal Center for Technological Education of
Minas Gerais (CEFET-MG), where he develops research related to second language speech. His
Master’s degree is in Theoretical and Descriptive Linguistics from the Federal University of
Minas Gerais (UFMG), where he is also concluding his PhD on the acquisition of English as a
Second Language and its relationship with sound variation and change. He is certified in both
Teaching English as a Foreign Language by the University of Toronto and in English Language
Teaching by the Federal University of Minas Gerais. Email: wellington.matt@gmail.com
Yuri Nishio received her Ph.D. from Nagoya University, Japan, in 2007. Since 2016, she has
worked as a professor at Meijo University’s Faculty of Foreign Studies. She teaches English
phonetics and seminars related to second language acquisition. She is the head of the
Intercultural Cooperative Research Center for analyzing the effectiveness of study abroad
programs. She is interested in the mechanisms of perception and production of English
sounds by Japanese speakers, developing ICT materials to help Japanese learners improve
their pronunciation, and in creating comprehensive teaching guidelines for English
phonetics. Email: ynishio@meijo-u.ac.jp
Contents
About the Authors V
Ronaldo Lima Jr
A dynamic account of the development of English (L2) vowels by
Brazilian learners through communicative teaching and through explicit
instruction 147
Conclusion
Tracey M. Derwing
An overview of pronunciation teaching and training 399
Index 413
Ubiratã Kickhöfel Alves, Jeniffer Imaregna Alcantara de
Albuquerque
Introduction
Pronunciation teaching and phonetic training in second
language development: What do they have to offer?
The learning process of a new sound system may be challenging not only to stu-
dents, but also to their teachers. When facing this challenge, L2 learners need to
develop new strategies to perceive as well as to produce those new sounds. In
turn, when trying to help their students in this task, teachers may find it difficult
to set the goals to be reached in their pronunciation classes, as well as to decide
on which aspects have to be taught and how these aspects should be addressed
in their classrooms.
In order to help both learners and teachers overcome these challenges, re-
search on L2 pronunciation (be it in the classroom or in the language labora-
tory) plays a fundamental role. Considering this scenario, as we go through the
pages of the most consolidated journals on L2 learning and teaching, we may
easily notice that there has been a significant increase in the number of studies
focusing on L2 pronunciation instruction and perceptual/production training in
the last two decades. This growth accompanies the rising number of studies on
L2 acquisition in general, being the result of new developments in both the
fields of L2 speech and L2 teaching.
As for the developments in the field of L2 speech, the last twenty years have
witnessed a significant growth in the propositions of new L2 perceptual models,
such as the Native Language Magnet Model (NML – Kuhl 2000), the Perceptual
Assimilation Model-L2 (PAM-L2 – Best and Tyler 2007), The Second Language Lin-
guistic Perception model (L2LP – Escudero 2005) and the recent Revised Speech
Learning Model (SLM-r – Flege and Bohn 2021),1 among others. Even though all
these models are related in what regards their empirical object of investigation,
each one of them reflects different views of language and phonetic primitives,
ranging from a psychoacoustic account, such as the SLM-r, to a direct-realist,
articulatory basis, as claimed in the PAM-L2. These different accounts and the
discussions proposed in each of them have contributed to different fields of
This is a revised (and updated) version of Flege’s (1995) Speech Learning Model.
https://doi.org/10.1515/9783110736120-001
2 Ubiratã Kickhöfel Alves, Jeniffer Imaregna Alcantara de Albuquerque
life and help identify influencing factors. The application of the model is illus-
trated with a case study of a Syrian refugee living in the Netherlands. This chapter
reflects the interdisciplinary status of the field of L2 teaching, as new pedagogical
approaches may be adapted or developed from previous research carried out in a
variety of related fields of knowledge.
The third part of the book, which addresses pronunciation training and its
implications for the classroom, is opened with a chapter by Susan Jackson and
Walcir Cardoso. In this chapter, the authors carry out an artificial language
learning experiment in order to investigate whether the inconsistent grapheme-
to-phoneme correspondence for /h/-initial words in English has an impact on
Francophone learners’ ability to encode the fricative /j/ as part of a newly-
learned word. In this learning experiment, the students were taught English
pseudo-words by associating auditorily presented stimuli with non-objects and
were placed into one of three learning conditions: auditory + congruent spell-
ing, auditory + congruent/incongruent (inconsistent) spelling, and auditory
only. The accuracy rates in a subsequent word-picture matching task suggest
that the acquisition of a novel phoneme is more difficult when the grapheme-
phoneme correspondence of the target language is inconsistent. This study
shows how training studies (especially artificial language experiments) may
contribute to showing the main developmental processes and sources of diffi-
culties faced by learners. These results have important implications for pronun-
ciation teaching, especially for the design of pronunciation materials/classes.
The next two chapters in this third block deal with different training activities
that can be implemented in language classrooms. In Chapter 9, Yuri Nishio and
Akiyo Joto tested how an Information and Communications Technology (ICT) self-
learning system is effective in teaching English vowel and consonant sounds to
Japanese learners, by focusing on the pronunciation of the names of the letters of
the English alphabet. The Japanese participants in the experiment were divided in
two groups, each one of them being trained in different platforms. In one of the
platforms, besides the native speakers’ video of the articulation of the sound, the
learners were shown a self-learning video, in which they could visualize their own
production. The results showed that both groups benefitted from training, and the
members of the group who were able to visualize their own face showed some ad-
vantages concerning the learning of consonants. Chapter 10, in turn, focuses on
Automatic Speech Recognition-based applications. In this chapter, Natallia Liakina
and Denis Liakin address the different types of implicit and explicit corrective
feedback provided by these apps and discuss their impact on the acquisition of L2
pronunciation. The authors also report on the results of an action research on the
use of three different ASR-based tools, with a special focus on the learners’ percep-
tions of the usefulness of the different types of feedback provided by each one of
Introduction 7
these tools. Together, these two chapters make it clear that the use of technologies
may be an aid in the L2 classroom, and training approaches using such technolo-
gies may be implemented in classroom activities aiming at the teaching of pronun-
ciation. In other words, teaching and training approaches may be merged with the
aim of helping learners achieve higher levels of speech intelligibility.
Finally, the last three chapters deal with High Variability Phonetic Training
(HVPT) and their empirical implications for L2 speech and teaching. In chapter
11, Ellen Simon, Bastien de Clercq, Pauline Degrave and Quentin Decourcelle
investigate the robustness of HVPT on the perception of non-native Dutch con-
trasts by French-speaking learners. By ‘robustness’, the authors refer to (i) the
generalizability of the training to novel tokens and talkers; (ii) the long-term
effects of HVPT; and (iii) the effect of HVPT in non-optimal listening conditions.
Their results, which show variability in the efficacy of HVPT in most robustness
variables, are discussed in view of the moderating variables examined. This is
an innovative study as it focuses on a target language other than English, show-
ing that investigations on different L2 systems have become a common (and de-
sired) research practice in the last few years.
Also verifying the effects of HVPT in retention and generalization, Polli-
anna Milan and Denise Kluge carry out an experiment on the effects of HVPT in
the perception and production of heterotonics by Brazilian learners of Spanish.
This study is innovative not only concerning the L1 and L2 systems involved,
but also in its focus on heterotonic words. The study also innovates in adopting
a Complex, Dynamic Systems perspective in an HVPT study, focusing on both
individual and group analyses. In the same fashion as in Simon et al.’s study,
Milan and Kluge’s results also show variability among participants, which is ex-
plained according to the tenets of a Complex, Dynamic account. Finally, closing
the last module of this volume, Anabela Rato and Diana Oliveira present a sys-
tematic review of 27 perceptual training studies, carried out over the last 40
years, which include the testing of generalization and retention of learning. As
it provides a detailed picture of the HVPT research scenario, this chapter also
presents suggestions for future research, paving the way for new studies on per-
ceptual training both in the laboratory and in the classroom.
The concluding chapter of the book is authored by Tracey Derwing. This con-
clusion not only presents a summary of the current research questions addressed
in pronunciation teaching and training studies, but also predicts future scenarios
for both researchers and practitioners in the field. Given her vast experience in
both L2 teaching and L2 acquisition studies, Professor Derwing’s chapter pro-
vides suggestions to bridge the gap between pronunciation researchers and prac-
titioners, which is one of the most important goals set for this book.
8 Ubiratã Kickhöfel Alves, Jeniffer Imaregna Alcantara de Albuquerque
All in all, the chapters in this volume are grounded on different views of
language acquisition (ranging from traditional accounts, such as fossilization,
to more innovative approaches, which view language as a Complex, Dynamic
system) and different teaching perspectives and frameworks (such as Celce-
Murcia et al’s, the TPACK framework for pronunciation teaching and the ICF
model, among others). Therefore, it is not by chance that the label ‘different ap-
proaches’ is part of the title of this volume. We see these different approaches
as exciting and positive, as they reflect the interdisciplinary nature as well as
the growth this field has had throughout the years. We hope the chapters in
this volume contribute to new theoretical and methodological developments in
the L2 pronunciation teaching and training studies, consolidating the contribu-
tion of this research theme to the field of L2 acquisition.
References
Albuquerque, Jeniffer Imaregna Alcantara de. 2019. Caminhos dinâmicos em inteligibilidade
e compreensibilidade de línguas adicionais: Um estudo longitudinal com dados de fala
de haitianos aprendizes de Português Brasileiro [Dynamic paths of intelligibility and
comprehensibility inadditional languages: a longitudinal study on speech data from
Haitian learners of Brazilian Portuguese]. Porto Alegre, Brazil: Universidade Federal do
Rio Grande do Sul dissertation.
Beckner, Clay, Nick C. Ellis, Richard Blythe, John Holland, Joan Bybee, Jynyun Ke, Morten
H. Christiansen, Diane Larsen-Freeman, William Croft & Tom Schoenemann. 2009.
Language is a Complex Adaptive System: Position paper. Language Learning 59(s.1). 1–26.
Best, Catherine & Michael D. Tyler. 2007. Nonnative and second-language speech perception:
Commonalities and complementarities. In Ocke-Schwen Bohn & Murray J. Munro (eds.),
Language Experience in Second Language Speech Learning: In honor of James Emil Flege,
13–34. Amsterdam: John Benjamins.
Bybee, Joan. 2001. Phonology and Language Use. Cambridge: Cambridge University Press.
Bybee, Joan. 2008. Usage-based grammar and second language acquisition. In Peter
Robinson & Nick Ellis (eds.), Handbook of Cognitive Linguistics and Second Language
Acquisition, 216–235. New York: Routledge.
Celce-Murcia, Marianne, Donna M. Brinton, Janet M. Goodwin & Barry Griner. 2010. Teaching
Pronunciation: A Course Book and Reference Guide. Cambridge: Cambridge University
Press.
De Bot, Kees. 2017. Complexity Theory and Dynamic Systems Theory: Same or different?
In Lourdes Ortega & ZhaoHong Han (eds.), Complexity Theory and Language
Development: In Celebration of Diane Larsen-Freeman, 51–58. Amsterdam: John
Benjamins.
De Bot, Kees, Wander Lowie & Marjolijn H. Verspoor. 2007. A Dynamic Systems Theory
approach to second language acquisition. Bilingualism: Language & Cognition 10(1). 7–21.
Introduction 9
Nagle, Charles, Pavel Trofimovich & Annie Bergeron. 2019. Toward a dynamic view of second
language comprehensibility. Studies in Second Language Acquisition 41(4). 647–672.
Saito, Kazuya. 2012. Effects of instruction on L2 pronuciation development: a synthesis of 15
quasi-experimental intervention studies. TESOL Quarterly 46(4). 807–819.
Saito, Kazuya & Luke Plonsky. 2019. Effects of second language pronunciation teaching revisited:
a proposed measurement framework and meta-analysis. Language Learning 69(3).
652–708.
Thomson, Ron I. 2018. Measurement of accentedness, intelligibility, and comprehensibility.
In Okim Kang & April Ginther (eds.), Assessment in Second Language Pronunciation,
11–29. London & New York: Routledge.
Thomson, Ron I. & Tracey M. Derwing. 2015. The effectiveness of L2 pronunciation instruction:
A narrative review. Applied Linguistics 36(3). 326–344.
Thomson, Ron I. & Tracey M. Derwing. 2016. Is phonemic training using nonsense or real
words more effective? In John Levis, Huong Le, Ivana Lucic, Evan Simpson & Sonca Vo
(eds.), Proceedings of the 7th annual Pronunciation in Second Language Learning and
Teaching Conference, Dallas, Texas, 2015, 88–97. Ames, IA: Iowa State University.
Trofimovich, Pavel, Charles L. Nagle, Mary Grantham O’Brien, Sara Kennedy, Kym Taylor Reid,
Lauren Strachan. 2020. Second language comprehensibility as a dynamic construct.
Journal of Second Language Pronunciation 6(3). 430–457.
Verspoor, Marjolijn H. 2017. Complex Dynamic Systems Theory and L2 pedagogy: lessons to
be learned. In Lourdes Ortega & ZhaoHong Han (eds.), Complexity Theory and language
Development: In celebration of Diane Larsen-Freeman, 143–162. Amsterdam: John
Benjamins.
World Health Organization. 2001. International Classification of Functioning, Health and
Disability. Genova: World Health Organization.
World Health Organization. 2003. How to use the ICF: A practical manual for using the
International Classification of Functioning, Disability and Health (ICF). Genova: World
Health Organization.
Part I: Pronunciation development and
intelligibility: Implications for teaching
and training studies
Thaïs Cristófaro Silva, Wellington Mendes
Plural formation in English: A Brazilian
Portuguese case study
Abstract: This study examines the role of orthography in the production of plural
formation in English by Brazilian Portuguese (BP) speakers. Two orthographic
patterns were examined for English nouns whose plural is pronounced as a (stop +
sibilant) sequence: [ps, ts, ks, bz, dz, gz]. One of the patterns presents two letters
word-finally – cups, cats, marks – whereas the other one presents a silent <e>
between two consonants: grapes, plates, cakes. The question we posed is
whether these different orthographic patterns would trigger different pronuncia-
tions for Brazilian L2 learners of English. An ongoing sound change involving
[Cs] ~ [Cis] in regular plural forms in BP was also considered. An experiment was
designed to test the production of regular plural forms in English and Brazilian
Portuguese to examine (stop + sibilant) sequences. Results showed that English
learners are more likely to pronounce a vowel when the orthographic pattern is
<Ces> rather than <Cs>. These results are discussed in the light of proposals
which suggest that phonological and orthographic representations are activated
in L2 production (Bassetti 2017; Hamann and Colombo 2017; Rastle et al. 2011).
The role played by an ongoing sound change from the L1 into L2 English is also
addressed. It was shown that [Cs] sequences consist of a robust pattern in Bra-
zilian Portuguese, which is adopted in L2 English. The [Cs] ~ [Cis] alternation
observed in BP and adopted in L2 English offers evidence that subphonemic
properties are part of phonological representations. The emergence of [z] is a
challenge for BP speakers learning English, as this pattern does not occur in
BP. Finally, some suggestions for the pronunciation teaching of regular plural
nouns in English are presented.
Thaïs Cristófaro Silva, National Council for Scientific and Technological Development,
Research Supporting Foundation of Minas Gerais, Federal University of Minas Gerais
Wellington Mendes, Federal Center for Technological Education of Minas Gerais, Federal
University of Minas Gerais
https://doi.org/10.1515/9783110736120-002
14 Thaïs Cristófaro Silva, Wellington Mendes
1 Introduction
This paper aims to investigate the pronunciation of regular plural nouns in
English which are produced by Brazilian Portuguese speakers of L2 English
(BP-EL2). The investigation was twofold. Firstly, it considered whether different
orthographic patterns would trigger different pronunciations of plural forms, by
assessing how orthographic and phonological representations can be related.
Secondly, it considered the relationship between an ongoing sound change in
Brazilian Portuguese (BP) as a first language (L1) into L2 English. An Exemplar
Model approach is proposed to account for the findings, which also incorporates
the revised Speech Learning Model (SLM-r) (Flege and Bohn 2021). The model to
be presented captures the relationship between speech production, perception
and orthography in L1 and L2 and conceives language as a dynamic system. This
first section reviews the literature on the relationship between orthography and
pronunciation. Then, a review on epenthesis in BP-EL2 production of past and
participle as well as 3rd person singular present and regular plural formation is
presented. This motivates the present study and offers insights on new ways of
approaching L2 pronunciation.
Studies on the relationship between orthography and phonology have in-
creased in recent years (Bassetti 2017; Colantoni, Steele, and Escudero 2015;
Hamman and Colombo 2017; Rafat 2015; Zhou 2021). The main research ques-
tions in this topic aim to explain how L2 learners mediate the relationship be-
tween the already known phonological and orthographical knowledge from the
L1 in order to build an L2.
The major contribution from works on the relationship between orthogra-
phy and phonology is to model representations as being multimodal, in which
perception, production and orthography interact. In the past, several works ad-
dressed the relationship between orthography and the pronunciation of BP-EL2
speakers. The main concern was the presence of an epenthetic vowel in BP-EL2
which would reflect a letter corresponding to a vowel. A major characteristic of
BP phonology is to insert an epenthetic vowel to prevent illicit consonantal
clusters which are orthographically represented by two contiguous consonantal
letters (Collischonn 2002). Such strategy applies in the native lexicon as in
dogma ['dɔ.gi.mə] or afta ['a.fi.tə], as well as in loanwords, word-initially or
word-finally, as in Skype [is.'kaj.pi] (Gomes 2019), and word-medially, as in
Plural formation in English: A Brazilian Portuguese case study 15
Alveopalatal affricates [tʃ, dʒ] may occur in BP when followed by a high front vowel, reflect-
ing a palatalization process: tia [tia] ~ [tʃia] (aunt), dia [dia] ~ [dʒia] (day).
16 Thaïs Cristófaro Silva, Wellington Mendes
person singular present and regular plural forms in English. BP-EL2 speakers
tend to optionally insert an epenthetic vowel between the two word-final conso-
nants, for example, cakes [keɪks] ~ ['keɪ.kis] (Cristófaro-Silva 2011). Interest-
ingly, works that considered 3rd person singular present and regular plural
forms in English spoken by BP-EL2 speakers addressed voicing agreement
rather than epenthesis. Let us consider works on voicing agreement and then
we will return to the alternation between [Cs] ~ [Cis].
Zanfra (2013) studied the voicing of sibilants in English by BP-EL2 speakers.
Although her focus was not specifically on plural forms, her results throw some
light on the current discussion. First, she considered cases where a word-final [s]
was expected to be pronounced (e.g. house, bus). Her results showed that [s] was
recurrent in words whose orthography ended in the letter <s>, as in bus, as op-
posed to words that presented a silent <e> word-finally, as in house. In the latter
case, higher rates of [z] were attested, suggesting that the silent letter <e> played a
role in the pronunciation of the word-final sibilant. It is worth mentioning that [z]
occurred followed by a vowel: ['haʊzi] house. The voicing of the sibilant, in this
case, is explained by a regressive assimilation rule involving adjacent segments in
BP word boundaries. Only voiceless sibilants occur word-finally followed by a
pause in BP, as in mês [mes] ‘month’. The regressive assimilation rule predicts that
if the next word begins with a vowel or a voiced consonant, then [z] occurs: mês
anterior [mez ə̃.te.ɾi.ˈoɾ] ‘previous month’ and mês bonito [mez bo.ˈni.tʊ] ‘beautiful
month’. If a voiceless consonant follows the sibilant, then [s] occurs: mês passado
[mes pa.ˈsa.dʊ] ‘last month’. Zanfra (2013) tested whether the BP voicing assimila-
tion rule involving adjacent segments in word boundaries would apply in BP-EL2
learners’ productions. Her results showed that sibilants tended to be voiced when
followed by a voiced consonant (e.g. The house backyard is huge) or by a vowel
(e.g. The mouse I saw is white). Conversely, a sibilant was voiceless when the fol-
lowing context was a pause (e.g. I won’t go if he goes.) or a voiceless consonant
(e.g. These pancakes are great). Zanfra (2013) suggested that BP-EL2 speakers
transfer the BP regressive assimilation rule into their L2 English.
Fragozo (2017) investigated the voicing of sibilants in English regular plural
forms and 3rd person singular presented by BP-EL2 speakers. She assessed the
extent to which a sibilant would be manifested as voiced after a voiced conso-
nant, as in dogs or clubs, which would reflect the acquisition of a progressive
assimilation rule from English. The underlying representation for regular plural
and 3rd person singular present is assumed to be /z/ (Hayes 2011). The progres-
sive assimilation rule predicts that if a vowel or a voiced consonant precedes
/z/, the output is [z], as in keys, dogs. If a voiceless consonant precedes /z/, it
surfaces as [s], as in cats. Finally, if a sequence of sibilants occurs, the outcome
is [ɪz], as in kisses. Fragozo (2017) also examined words in context to verify if
Plural formation in English: A Brazilian Portuguese case study 17
corresponds to our main interest, shows that [Cs] alternates with [Cis] word-
finally.2 The last column presents the gloss.
The alternation between [Cs] ~ [Cis] word-finally in BP follows from the re-
duction and eventual loss of unstressed high front vowels when flanked be-
tween a consonant and a final sibilant (Cristófaro-Silva, Almeida, and Guedri
2008; Leite 2006; Soares 2016). The alternation between the presence and ab-
sence of an unstressed high vowel between a consonant and a sibilant also ap-
plies to BP-EL2 plural forms, as in cakes [keɪks] ~ ['keɪ.kis]. This paper intends
to investigate [Cs] ~ [Cis] in English regular plural forms produced by BP-EL2
speakers attempting to address the question of whether an ongoing sound
change from the L1 plays a role in L2 learning.
The role of orthography in L2 pronunciation will also be addressed in this
paper. BP has only <Ces>3 as the orthographic correlate for [Cs] ~ [Cis], as
shown in Table 1, whereas English, on the other hand, has two orthographic
correlates for [Cs]: <Ces> as in grapes and <Cs> as in cats.4
This paper is organized as follows. The next section presents the EMPL-2
model (Exemplar Model in L2 Phonology). The third section describes the meth-
odology adopted in this study. The fourth section presents and discusses the
results and is followed by a suggestion to the teaching of English plural forms
to BP-EL2 speakers.
In BP, word-final voiceless alveolar fricatives remain voiceless regardless of the alternation
between [Cs] ~ [Cis]. As previously mentioned, only if the next word begins with a vowel or a
voiced consonant will [z] occur (e.g. mês anterior [mez ə̃.te.ɾi.ˈoɾ] ‘previous month’). A question
that arises is whether both the alternation between [Cs] ~ [Cis] and the presence of a following
vowel have an influence on the voicing property of the word-final sibilant in L2 English. This
question will be addressed at the end of our analysis.
BP presents words such as cheques (checks) and mangues (wetlands), which display the
<Cues> orthographic pattern. For the sake of clarity, this pattern will be represented in this
paper as <Ces>.
A restricted number of plural forms in English present the <Cues> orthographic pattern, e.g.,
tongues and techniques. Due to the limited number of examples, they are not considered in
this paper.
Plural formation in English: A Brazilian Portuguese case study 19
Exemplar Models claim that linguistic representations are shaped from expe-
rience (Bybee 2001, 2008, 2010). Any exemplar which is experienced is mapped
and abstractly represented by phonological and semantic identity and similarity.
In terms of the sounds of a given language, any fine phonetic detail as well as
contextual information is mapped onto abstract representations. Within an Exem-
plar Model approach, aspirated as well as unaspirated stops in English are present
in phonological representations, as well as the contextual information which de-
fines that aspiration of stops occurs in stressed position. Abstract representations
also contain grammatical information which emerges from the categorization of
experienced exemplars: “Lexical organization provides generalizations and seg-
mentation at various degrees of abstraction and generality. Units such as mor-
pheme, segment, or syllable are emergent in the sense that they arise from the
relations of identity and similarity that organize representations” (Bybee 2001: 7).
Within this view, the three plural suffixes for English nouns – [z], [s], [ɪz] –
emerge from language experience offering grammatical generalizations. That
means that any given noun has a plural morpheme associated to it when the
plural is regular. Irregular plurals have a special grammatical representation.
Consider Figure 1.
the SLM-r also share the assumption that perception and production interact in a
dynamic fashion to construct abstract representations. The SLM-r suggests a
three-level model: sensory motor level, a phonetic category level and a lexico-
phonological level (Flege and Bohn 2021: 12), which could be accommodated in
three levels of Exemplar Models: neuromotor production schemas, perceptual-
articulatory categories and constructions. The EMPL2 adds an orthographic level
which is present in literate speakers’ representations. Consider Figure 2.
Figure 2 presents a network consisting of a zoom from Figure 1 for regular plu-
ral formation that takes the morpheme [s]. Orthographic representations are
presented inside angle brackets and phonetic representations are presented
without any brackets. All speakers have phonetic representations which are in
fact formed of all the exemplars experienced for the category. Thus, several ex-
perienced instances of cup form the exemplar for this word. For the sake of clar-
ity, this is simplified in Figure 2 to a single exemplar. Only literate speakers
have access to orthographic representations which are connected to their corre-
sponding phonetic category. In the diagram of Figure 2, all the words shown
end in a voiceless consonant, i.e., C-. A generalization that emerges from this
24 Thaïs Cristófaro Silva, Wellington Mendes
network is that any noun that ends in a voiceless consonant (except sibilants)
will receive the morpheme [s] if it is a regular plural. It is also inferred from the
network in Figure 2 that a noun in the plural that ends in a final [s] presents a
voiceless consonant word-finally in its singular form. It is also inferred from the
network in Figure 2 that plural forms for the nouns presented take <s> as the
orthographic representation of plural regardless of whether the final consonant
in the spelling of the nouns are different.
Adult L2 learners have primarily written input. As they do not have the L2
network for orthography, they adopt the potential corresponding orthography
from their L1. Besides pronunciation and orthography, the model presented in
Figures 1 and 2 also include a perceptual level which is connected to the network.
Modelling perception and production within a single model has been pro-
posed by the Bidirectional Phonetics-Phonology model (Boersma 2011; Boersma
and Hamann 2009). Currently, the model was extended by a reading grammar
that encompasses orthography and is referred to as BiPhon Model (Hamann and
Colombo 2017; Zhou 2021). The main difference between the EMPL2 proposal and
the BiPhon Model lies on how abstract representations are related to empirical
data. In the BiPhon model, simple abstract representations are processed in a
complex manner, whereas in the EMPL2, representations are complex and map-
ping is simple (Johnson 1997). We tested the EMPL2 model in plural formation in
English by BP-EL2 speakers focusing on the distribution of [s, z].5
3 Methodology
This study investigated the production of plural nouns in two languages: L1 Bra-
zilian Portuguese and L2 English. In BP’s regular plural forms, only a voiceless
consonant occurs word-finally. Thus, all nouns in their plural forms will be re-
ferred to as Cs-nouns. In English regular plural forms, either a voiceless or a
voiced sibilant may occur word-finally depending on the final consonant of the
noun in the singular. The data from English will be referred to as Cs-nouns and
Cz-nouns depending on how the sibilant is expected to be pronounced in English.
A set of 36 plural nouns ending in a sequence of (stop + sibilant) were con-
sidered in BP, which present a single orthographic pattern: <Ces>, as in cheques
Since our objective is to investigate the production of [Cs] ~ [Cis] sequences, analyzing [ɪz]
would take us beyond this paper as the absence of [i] would trigger two adjacent sibilants.
Cristófaro-Silva, Almeida and Guedri (2008) analyzed adjacent sibilants in BP. Future studies
could consider these cases in L2 English.
Plural formation in English: A Brazilian Portuguese case study 25
[ʃɛks] ~ ['ʃɛ.kis] ‘cheques’. For the L2 English case study, a set of 36 words were
selected, where 15 words display the orthographic pattern <Ces>, as in grapes
[ɡreɪps], and the other 21 words display the orthographic pattern <Cs>, as in
maps [mӕps]. This distribution is shown in Table 2.
Brazilian Cs-nouns
Portuguese
ps ts ks bs ds gs
ps ts ks bz dz gz
Table 2 shows the distribution of the target words used for the BP and L2 En-
glish experiments. The uppermost part of the table lists BP Cs-nouns divided by
cluster type. As mentioned earlier, all 36 BP nouns display the < Ces> ortho-
graphic pattern. The bottom of the table lists L2 English targets, which are com-
prised by both Cs-nouns (e.g. cups [kʌps]) and Cz-nouns (e.g. bags [bæɡz]). L2
English words have also been divided by their orthographic patterns: words
such as grapes, gates and cakes are spelled with <Ces> word-finally, whereas
words such as cups, cats and books end in <Cs>.
In order to disguise the purpose of the experiment, a set of 72 filler items
were added to the words listed in Table 2 during the trials. Filler items consisted
of singular nouns that did not have a consonant cluster in word-final position,
as in ball [bɔːl] for English and banana [ba.'nɐ̃.nə] ‘banana’ for BP. All filler
26 Thaïs Cristófaro Silva, Wellington Mendes
items were discarded for the purpose of analysis.6 Stimuli presentation was ran-
domized with the sort_rand macro of Microsoft PowerPoint 2019.
The experiment comprised two tasks that were performed by all partici-
pants, which took place one after the other. The first one consisted of a picture-
counting task in which participants were asked to count and name the items
shown in the pictures. Short carrier sentences that did not include orthographic
stimuli of the target words were given, as illustrated in Table 3.
BP filler items include the following words: abelha (bee), avenida (avenue), bambu (bam-
boo), banana (banana), batata (potato), bingo (bingo), bolo (cake), brinquedo (toy), cadeira
(chair), caminho (path), caneta (pen), carteira (wallet), cobra (snake), copo (cup), corvo (crow),
estátua (statue), família (family), festa (party), flecha (arrow), foto (photo), gato (cat), gravata
(tie), lago (lake), lenço (handkerchief), logotipo (logotype), menino (boy), mesa (table), metrô
(metro), mochila (backpack), pizza (pizza), sapo (frog), sapato (shoe), sofá (sofa), tornado (tor-
nado), torta (pie) and vulcão (volcano). English filler items include the following words: arrow,
avenue, bamboo, banana, bee, bingo, boy, country, cowboy, crow, day, eye, family, key, logo,
metro, party, pen, photo, pie, pizza, potato, radio, sky, spa, statue, tie, tissue, tomato, tornado,
toy, tree, volcano, way, window and zoo.
Plural formation in English: A Brazilian Portuguese case study 27
This research has been approved by the ethics committee from the Universidade Federal de
Minas Gerais, reference number: CAAE: 15116119.9.0000.5149.
28 Thaïs Cristófaro Silva, Wellington Mendes
Reference for Languages were invited to take part in this research. They were not
given any information about the aim of the experiment.
Prior to the experiment, all participants filled out consent and screening
forms, ensuring that their data would be used for scientific purposes only. Due
to the recent COVID-19 pandemic, all interactions were performed remotely
through a video call on Google Meet. Experiments were recorded with the Open
Broadcaster Software Studio at 48 kHz sampling rate. The obtained recordings
were converted into WAVEform audio format by the software Adobe Premiere
2020, which was able to maintain the same sampling rate as the original
files. The average time to complete the experiment was 45 minutes. A total of
648 tokens were collected for the L2 English study. For the BP study, 432 tokens
were collected. Samples were edited and manually annotated using Praat Text-
Grids (Boersma and Weenink 2020). The R Studio (R Studio Team 2020) was
used for statistical analysis. The chosen test was the Pearson’s Chi-square,
available in the basic R Studio package (function chisq.test), which assesses
the significance effects of each variable. The adopted significance threshold
was 0.05, in agreement with general linguistic investigations (Levshina 2015).
Two main research questions were investigated. The first one is related to
the relationship between phonological and orthographical representations. BP
plural forms whose final orthography is <Ces> were examined as well as regular
plural forms in English whose orthographical representations were either <Ces>
or <Cs>. The hypothesis posited was that the orthographic pattern <Ces> will
present more realizations of a vowel intervening between the final consonants
than the orthographic pattern <Cs>. This would show that a letter <e> favours a
vowel to be manifested. We also assessed whether or not visual input influen-
ces the pronunciation of a vowel between the word-final consonants.
The second question addressed the role of an ongoing sound change in-
volving [Cs] ~ [Cis] in L2. The hypothesis posited is that the most common pat-
tern from the L1 will emerge in the L2. This will offer evidence that it is not just
sounds that are transferred from the L1 to the L2, but rather patterns that reflect
subphonemic alternations.
Finally, this research considered the voice quality of word-final sibilants.
In BP only voiceless sibilants occur word-finally, unless a vowel follows it, to
which a voiced sibilant occurs. In English, voiced and voiceless sibilants occur
word-finally. When a vowel follows the sibilant, the voice quality remains as it
formerly was (rather that changing as in BP). We posited that word-final voice-
less sibilants will be favoured in L2 English, as it is the more robust pattern in L1.
We also posited that a voiced sibilant occurs at higher rates in an intervocalic
position [Cis + vowel].
Plural formation in English: A Brazilian Portuguese case study 29
80
Frequency (%)
60
96
40 83
62
20
0
Brazilian Portuguese <Ces> English <Ces> English <Cs>
Orthographic Petterns
Figure 3 shows the rates of [Cs] in regular plural forms in BP and BP-EL2.8 The
leftmost column shows that regular plural forms in BP, whose orthography is
For the purpose of the present discussion, we refer to [Cs] as a (consonant + sibilant) se-
quence. As it will be discussed later, the sibilant may be either [Cs] or [Cz]. At this stage, voic-
ing is not relevant.
30 Thaïs Cristófaro Silva, Wellington Mendes
We acknowledge that a bigger set of data may be required to shed new light on this matter.
Plural formation in English: A Brazilian Portuguese case study 31
consonants is low: 38% in BP and 10.5% in BP-EL2. That means that in most
cases [Cs] occurs in regular plural forms in BP and in BP-EL2. Within an Exem-
plar Model, the [Cs] pattern is more robust than [Cis]. We suggest that the robust-
ness of [Cs] in the L2 comes from the ongoing sound change in the L1, where [Cs]
occurs at higher rates than [Cis]. We claim that an ongoing sound change in the
L1 – which reflects subphonemic information – plays an important role in shap-
ing L2 linguistic knowledge. In other words, phonetic detail has an impact in L2
phonology. This issue is further explored in the following pages.
The second research question we posited regarded the voice quality of the
word-final sibilant in [Cs] and [Cis]. This was the main issue considered by Zanfra
(2013) and Fragozo (2017) within a rule-based approach. Their analysis claimed
that voicing in BP-EL2 did not achieve the rates expected in English due to con-
straints of BP distribution of sibilants and regressive assimilation. BP only presents
voiceless sibilants word-finally. However, across word-boundaries, BP sibilants are
voiced when followed by a voiced consonant or a vowel: mês [mes] ‘month’, mês
bonito [mez ˈbo.ni.tu] ‘beautiful month’, mês anterior [mez ə̃.te.ɾi.ˈoɾ] ‘previous
month’. According to Zanfra (2013) and Fragozo’s (2017) proposal, the regressive
assimilation rule triggered sibilants to be voiced when the sibilant was followed by
a voiced consonant or a vowel.
In this paper, we offer an alternative view to the preceding rule-based ap-
proaches. Within the scope of the EMPL2, it is suggested that generalizations
from an ongoing sound change in BP phonology are transferred into BP-EL2,
where phonetic detail plays an important role in shaping mental representa-
tions. General results showed that [Cs] occurred in 62% of cases in BP and in
89.5% of cases in BP-EL2 (Figure 3). We suggest that these results show that
[Cs] is a robust pattern which is adopted in English L2. Consider Figure 4.
All bars in Figure 4 show the rates for word-final voiceless sibilants where
the alternation between [Cs] ~ [Cis] is observed in regular plural forms. The
white bars illustrate data from BP where sibilants are followed by a voiceless
consonant (1st and 2nd white bars) or by a vowel (3rd and 4th white bars).10 Re-
sults show that a voiceless sibilant always occurs when it is followed by a voice-
less consonant (1st and 2nd white bars). This was somewhat expected as only
voiceless sibilants occur word-finally in BP. The third white bar shows that in
The white bars aggregate BP data of the picture-counting task and the reading task, with a
total of 432 tokens. 216 tokens consist of target words being followed by a voiceless consonant,
whereas the other 216 tokens consist of target words being followed by a vowel. The gray bars
aggregate L2 English data of Cs-nouns and considers both the picture-counting task and the
reading task, with a total of 432 tokens. The black bars aggregate L2 English data of Cz-nouns
and also considers both production tasks, with a total of 432 tokens.
32 Thaïs Cristófaro Silva, Wellington Mendes
80
Frequency (%)
60
100 100 100 96
85 90 86
40 81
58 59 56
20 40
0
Cs + C- Cis + C- Cs + V Cis + V Cs# Cs + V Cis# Cis + V Cz# Cz + V Ciz# Ciz + V
Figure 4: Rates of word-final voiceless sibilants per phonetic environment in PB and L2 English.
85% of the cases in which [Cs] is followed by a vowel, a voiceless sibilant oc-
curs. However, when [Cis] occurs, voiceless sibilants were produced in 58% of
the cases, as seen in the fourth bar. In BP, it is traditionally assumed that a
voiceless sibilant is produced as voiced when flanked between two vowels. If
this generalization applied to all cases in our data, then the third and fourth
white bars should present 100% of voiced sibilants, which is not the case. In
intervocalic position, 42% of intervocalic sibilants are voiced, and in cases
where [Cs] is followed by a vowel, 15% of voiced sibilants occurred (3rd white
bar). What the results presented in the third and fourth white bars show is that
a voiced sibilant may or may not occur in BP when the following environment
is a vowel. Thus, what takes place is not the application of a rule as posited by
Zanfra (2007) or Fragozo (2013), but rather a variable pattern involving the
[Cs] ~ [Cis] alternation.
Within an Exemplar Model, results from BP reflect that sibilants fol-
lowed by a voiceless consonant have a very robust pattern in BP (1st and 2nd
white bars). If generalizations from the [Cs] ~ [Cis] in BP applies to L2 En-
glish, it is expected that a voiceless sibilant occurs word-finally in Cs-nouns
and Cz-nouns. This follows from the fact that voiceless sibilants categori-
cally occur in word-final position in BP (Cristófaro-Silva 2003). On the other
hand, exemplars for sibilants followed by a vowel may display variability in
L2 as being either voiced or voiceless. This follows from the findings shown
in the third and fourth white bars in Figure 4.
Plural formation in English: A Brazilian Portuguese case study 33
The grey and black bars illustrate results for regular plural forms in English
spoken by BP-EL2 speakers where [Cs] ~ [Cis] is observed. The grey bars illus-
trate results for plural forms which are expected to present a voiceless sibilant
word-finally: Cs-nouns. The black bars illustrate plural forms which are expected
to present a voiced sibilant word-finally: Cz-nouns. An overview of BP-EL2 data
shows that voiceless sibilants occur at high rates in Cs-nouns and Cz-nouns. It
is also observed that Cs-nouns display higher rates of voiceless sibilants than
Cz-nouns. This is expected as BP only presents voiceless sibilants word-
finally. What we have to account for is the cases in which a voiceless sibilant
is expected in Cs-nouns but a voiced one occurs. Similarly, for Cz-nouns, we
have to account for cases in which a voiceless sibilant occurs when a voiced
one is expected. Consider Table 4.
Cs-nouns
. Cs# % %
. Cs + V % %
. Cis# % %
. Cis + V % %
Cz-nouns
Table 4 presents the rates of voiceless and voiced sibilants (cf. Figure 4). The
upper part of the table shows results for Cs-nouns and the lower part of Figure 4
shows results for Cz-nouns. Unexpected realizations of the plural morpheme
are presented in the shaded areas of the table.
Our proposal based on the EMPL2 accounts for the high rates of the expected
morpheme [s] (3rd column) as it reflects the more robust pattern in BP. Cases in
which an unexpected voiced sibilant [z] occurred in Cs-nouns tended to present an
adjacent vowel (4%, 10% and 19%), which, similarly to BP, favour voiced sibilants
34 Thaïs Cristófaro Silva, Wellington Mendes
(cf. 3rd and 4th white bars in Figure 4). The unexpected voiced sibilants in Cs-nouns
can be accounted for as the adoption of a subphonemic pattern observed in BP.
Cz-nouns present more unexpected voiceless sibilants than voiced ones
(except for the Ciz+V environment, which will be mentioned soon). A high
number of unexpected [s] reflects the BP robust pattern, which is adopted in L2
English. An expected [z] occurs at 14% word-finally, which possibly reflects the
emergence of English phonology, where [z] occurs word-finally. In the other
three environments, higher rates of [z] are observed. Notice that in these con-
texts an adjacent vowel occurs. The highest rates of [z] occur in intervocalic po-
sition (40%), which is an environment that favors voiced sibilants in BP. These
results can be understood as reflecting exemplar patterns from the ongoing
sound change involving [Cs] ~ [Cis] in BP being adopted in BP-EL2 phonology.
In general, we can conclude that Cs-nouns present an expected plural
morpheme at higher rates than Cz-nouns. The expected plural morpheme in
Cz-nouns word-finally is low (14%). Although this is an expected pattern in
English, it appears to be challenging to BP learners. An adjacent vowel con-
tributes to the occurrence of a voiced sibilant, especially when the [Cis] pat-
tern is manifested (cf. lines 3 and 4 for Cs and Cz-nouns in Table 4).
Our results throw some light on the line of research carried out by Zanfra (2013)
and Fragozo (2017), who investigated the voicing of sibilants followed by a vowel
within rule-based approaches. We account for the fact that voiceless sibilants have
the highest rates in regular plural forms in BP-EL2, as [s] is the most robust exemplar
in word-final position in BP. We also account for the fact that the pattern [Cis] fa-
vours a voiced sibilant in BP-EL2, as voiced sibilants are favoured in similar contexts
in BP (i.e. when they’re followed by a word-initial vowel). This indicates that L1 ex-
emplar patterns which reflect subphonemic information are adopted in the L2. Fi-
nally, our analysis explains why [z] presents a low rate of production in BP-EL2: it is
an emergent pattern in the L2, since it doesn’t occur word-finally in BP (unless when
they’re followed by a word-initial vowel, as stated above). Since [z] doesn’t pattern
in both languages word-finally, its exemplars are not robust in the L2. It will be
through experience that such exemplars will become robust and more recurrent.
Thus, Fragozo’s (2017) interpretation that partial voicing in English prevents [z] from
occurring does not hold. A final word must be said about intelligibility and compre-
hensibility (Derwing 2018). It is likely that the facts discussed in this paper do not
affect intelligibility in BP-EL2. For example, the plurals of monks and monkeys in BP-
EL2 are likely to present the same alternating forms – [mʌŋks] ~ [ˈmʌŋ.kɪs]11 – which
Assuming that all the other segmental content is pronounced accordingly (Cristófaro-Silva
2011).
Plural formation in English: A Brazilian Portuguese case study 35
will possibly be resolved by the context in which they occurred. Our main con-
cern in this paper was rather to consider the role of orthography in L2 phonologi-
cal representations (which seem to be favored by the orthographic patterns
rather than the visual presentation of the words) and to account for the variable
patterns observed in BP-EL2, which, in our assumption, come from an ongoing
sound change in BP. Further investigations on whether or not intelligibility and
comprehensibility are affected are desirable. The next section considers some sug-
gestions for teaching the pronunciation of English regular plural nouns to BP
learners.
6 Conclusions
This study examined the role of orthography in the production of plural forma-
tion in English by Brazilian Portuguese (BP) speakers. It also considered the role
played by the [Cs] ~ [Cis] ongoing sound change from BP into L2 English. Results
showed that the orthographic pattern <Ces> favours a vowel to occur at higher
rates that the <Cs> pattern. It was also shown that it was not visual access to or-
thographic forms that triggered a vowel to occur. We suggest that orthography
has a permanent effect on literate individuals’ mental representations. A bigger
set of data in future investigations may provide further insights on this proposal.
Concerning the role played by the BP ongoing sound change involving the
[Cs] ~ [Cis] alternation, it was shown that it has an impact on the L2. The analy-
sis based on the EMPL2 showed that robust patterns from the L1 are adopted in
L2, including fine phonetic detail that reflects subphonemic properties. The
proposal put forward in this paper offers a more comprehensive analysis than
previous rule-based models, as it explains the different pathways or trends that
BP-EL2 learners use to produce regular plural forms in English: from Cs-nouns
to Cz-nouns word-finally.
This study opens a number of questions that could be addressed in future
studies. All participants were classified as having an intermediate level of profi-
ciency in English. If our proposal is correct, we expect that students at advanced
levels will present similar distributions to those we found, but at lower rates.
This is because they will have had greater exposure to the L2 and therefore will
have more robust exemplars in the foreign language. A similar study could also
be carried out for the 3rd person singular present in English regular verbs, which
presents a similar distribution to the regular plural formation. Cases in which [z]
occurs word-finally in English singular nouns could be considered in order to as-
sess whether or not morphophonological generalizations contribute to improving
phonological knowledge. Other ongoing sound changes could be considered to
evaluate their impact on BP-EL2 phonology. Additionally, it might be worth as-
sessing whether the production of voiced and voiceless sibilants actually affects
intelligibility in BP-EL2. Finally, our recommendations for the teaching of English
plural forms could also be tested in order to verify whether they indeed improve
pronunciation as suggested.
Plural formation in English: A Brazilian Portuguese case study 37
References
Alves, Ubiratã, Susiele Silva, Luciene Brisolara & Ana Paula Engelbert (eds.). 2020. Fonética e
Fonologia de Línguas Estrangeiras: subsídios para o ensino. [Phonetics and phonology of
foreign languages: a support for teaching practices]. Campinas: Pontes Editores.
Bassetti, Bene. 2017. Orthography affects second language speech: Double letters and
geminate production in English. Journal of Experimental Psychology: Learning, Memory,
and Cognition 43(11). 1835–1842.
Boersma, Paul. 2011. A programme for bidirectional phonology and phonetics and their
acquisition and evolution. In Anton Benz and Jason Mattausch (eds.), Bidirectional
Optimality Theory, 33–53. Amsterdam and Philadelphia: John Benjamins Publishing
Company.
Boersma, Paul & Silke Hamann. 2009. Phonology in Perception. Berlin and New York: Walter
de Gruyter.
Boersma, Paul & David Weenink. 2020. Praat: Doing Phonetics by Computer [Computer
program]. Version 6.1.30, retrieved 3 November 2020 from http://www.praat.org.
Bybee, Joan. 1995. Regular morphology and the lexicon. Language and Cognitive Processes 10
(5). 425–455. https://doi.org/10.1080/01690969508407111 (accessed 28 May 2021).
Bybee, Joan. 2001. Phonology and Language Use. Cambridge: Cambridge University Press.
Bybee, Joan. 2002. Word frequency and context of use in the lexical diffusion of phonetically
conditioned sound change. Language Variation and Change 14(3). 261–290.
Bybee, Joan. 2008. Usage-based grammar and second language acquisition. In Peter
Robinson & Nick Ellis (eds.), Handbook of Cognitive Linguistics and Second Language
Acquisition, 216–235. New York: Routledge.
Bybee, Joan. 2010. Language, Usage and Cognition. Cambridge: Cambridge University Press.
Colantoni, Laura, Jeffrey Steele & Paola Escudero. 2015. Second Language Speech.
Cambridge: Cambridge University Press.
Collischonn, Gisela. 2002. A epêntese vocálica no português do sul do Brasil [Vowel
epenthesis in the South of Brazil]. In Leda Bisol and Cláudia Brescancini (eds.), Fonologia
e Variação: Recortes do Português Brasileiro [Phonology and Language Variation: Issues
in Brazilian Portuguese], 205–230. Porto Alegre: EDIPUCRS.
Cristófaro-Silva, Thaïs. 2003. Fonética e Fonologia do Português: Roteiro de Estudos e Guia de
Exercícios [Phonetics and Phonology of Brazilian Portuguese: study guide and exercises],
7th edn. São Paulo: Contexto.
Cristófaro-Silva, Thaïs. 2011. Pronúncia do Inglês para Falantes do Português Brasileiro.
[English Pronunciation for Brazilian Speakers]. São Paulo: Contexto.
Cristófaro-Silva, Thaïs, Leonardo Almeida & Cristine Guedri. 2008. Phonological traces in the
loss of a plural marker in Brazilian Portuguese. Estudos Linguísticos [Linguistic Studies] 1
(1). 285–299. Lisboa: Edições Colibri/CLUNL. https://clunl.fcsh.unl.pt/wp-content/
uploads/sites/12/2018/02/thais-silva.pdf (accessed 03 July 2021).
Cristófaro-Silva, Thaïs & Daniela Guimarães. 2021. Paper submitted to Seminário de Ciências
da Fala [Speech Sciences Seminar], Federal University of Minas Gerais, 18–19 October.
Delatorre, Fernanda. 2006. Brazilian EFL learners production of vowel epenthesis in words
ending in -ed. Santa Catarina: Federal University of Santa Catarina thesis.
38 Thaïs Cristófaro Silva, Wellington Mendes
Derwing, Tracey. 2018. Putting an accent on the positive: New directions for L2 pronunciation
and instruction. International Symposium on Applied Phonetics, University of Aizu, Japan,
2018, 12–18.
Flege, James Emil. 1995. Second language speech learning: Theory, findings, and problems. In
Winifred Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-
Language Research, 233–277. Timonium, MD: York Press.
Flege, James Emil & Ocke-Schwen Bohn. 2021. The revised speech learning model (SLM-r). In
Ratree Wayland (ed.), Second Language Speech Learning: Theoretical and Empirical
Progress, 3–83. Cambridge: Cambridge University Press.
Fragozo, Carina. 2017. Aquisição de regras fonológicas do Inglês por falantes de Português
Brasileiro [Acquisition of phonological rules in English by Brazilian Portuguese
speakers]. São Paulo: University of São Paulo dissertation.
Gomes, Maria Lúcia de Castro. 2009. A produção de palavras do inglês com o morfema ED por
falantes brasileiros: uma visão dinâmica [A dynamic view on the production of English
-ed morphemes by Brazilian speakers]. Curitiba: Federal University of Paraná dissertation.
Gomes, Matheus Freitas. 2019. A redução segmental em sequências#(i) sC no português
brasileiro [Vowel lenition in #(i)sC clusters in Brazilian Portuguese]. Belo Horizonte:
Federal University of Minas Gerais MA thesis.
Hamann, Silke & Ilaria Colombo. 2017. A formal account of the interaction of orthography and
perception. Natural Language and Linguistic Theory 35(3). 683–714.
Harris, John & Jonathan Kaye. 1990. A tale of two cities: London glottalling and New York City
tapping. Berlin and New York: Walter de Gruyter. https://doi.org/10.1515/tlir.
1990.7.3.251 (accessed 28 May 2021).
Hayes, Bruce. 2011. Introductory phonology. Oxford: John Wiley and Sons.
Horta, Bruno, Thaïs Cristófaro-Silva & Victor Soares. 2021. O Ensino de Pronúncia de Inglês
[Teaching English Pronunciation]. To appear in the journal Colineares. http://
natal.uern.br/periodicos/index.php/RCOL.
Johnson, Keith. 1997. Speech perception without speaker normalization: An exemplar model.
In Keith Johnson & John Mullenix (eds.), Talker Variability in Speech Processing, 145–165.
San Diego: Academic Press.
Leite, Camila Tavares. 2006. Seqüências de (oclusiva alveolar+ sibilante alveolar) como um
padrão inovador no português de Belo Horizonte [(alveolar stop + alveolar sibilant)
clusters as an innovative pattern in the Brazilian Portuguese spoken in the city of Belo
Horizonte]. Belo Horizonte: Federal University of Minas Gerais MA thesis.
Levshina, Natalia. 2015. How to do Linguistics with R: Data Exploration and Statistical
Analysis. Amsterdam and Philadelphia: John Benjamins Publishing Company.
Nascimento, Katiene. 2016. Emergência de padrões silábicos no português brasileiro e seus
reflexos no inglês língua estrangeira [Emerging sound patterns in Brazilian Portuguese
and their impact on English as a Foreign Language]. Fortaleza: Universidade Estadual do
Ceará dissertation.
Rafat, Yasaman. 2015. The interaction of acoustic and orthographic input in the acquisition of
Spanish assibilated/fricative rhotics. Applied Psycholinguistics 36(1). 43–66.
Rastle, Kathleen, Samantha McCormick, Linda Bayliss & Colin Davis. 2011. Orthography
influences the perception and production of speech. Journal of Experimental Psychology:
Learning, Memory, and Cognition 37(6). 1588–1594.
R Studio Team. 2020. RStudio: Integrated Development for R. [Computer program]. Retrieved
2 December 2020 from http://www.rstudio.com.
Plural formation in English: A Brazilian Portuguese case study 39
Silveira, Rosane. 2007. O papel desempenhado pelo tipo de tarefa e pela ortografia na
produção de consoantes em final de palavra [The role of task type and orthography on
the production of word final consonants]. Revista de Estudos da Linguagem [Language
studies journal] 15(1). 147–180.
Soares, Victor Hugo Medina. 2016. Encontros consonantais em final de palavra no português
brasileiro [Word-final consonante clusters in Brazilian Portuguese]. Belo Horizonte:
Federal University of Minas Gerais MA thesis.
Zanfra, Mayara. 2013. Phonological context as a trigger of voicing change: a study on the
production of English /s/ and /z/ in word-final position by Brazilians. Florianópolis:
Federal University of Santa Catarina MA dissertation.
Zhou, Chao. 2021. L2 speech learning of European Portuguese /l/ and /ɾ/ by L1-Mandarin
learners: Experimental evidence and theoretical modelling. Lisbon: University of Lisbon
dissertation.
Elena Kkese, Sviatlana Karpava
Effect of task, word length and frequency
on speech perception in L2 English:
Implications for L2 pronunciation teaching
and training
Abstract: This study presents the findings of three perception tasks examining
the relative difficulty encountered by learners of L2 English in perceiving conso-
nants and vowels in high- and low-frequency words. The tasks focused on the
word level and involved a phoneme identification task, a discrimination task,
and a word dictation task. The participants were 130 students at public and pri-
vate universities in Greek-speaking Cyprus, exposed to L2 English as the lan-
guage of instruction. Overall, the findings indicate a task effect. Word length is
also a significant factor for speech perception based on the findings. Moreover,
the results of the study indicated difficulties with word frequency. According to
the item analysis, low-frequency words are more difficult to perceive, especially
with respect to consonants in the word dictation task. This could be attributed to
the acoustic-orthography interface in L2 phonology. Age, gender, as well as years
of L2 instruction and use, are statistically significant factors for speech percep-
tion. The overall pattern trend is in line with the Native Language Magnet Model
(NLM; Kuhl 2000), suggesting that non-native contrasts may be difficult to dis-
criminate when the prototype of an L1 category closely resembles two L2 phones.
https://doi.org/10.1515/9783110736120-003
42 Elena Kkese, Sviatlana Karpava
(Best 1984) suggesting that listeners extract the invariants of articulatory gestures.
According to this model, children identify and learn to hear high-level articulatory
gestures, which differentiate L1 sound contrasts and facilitate L1 perception. These
L1-specific high-level articulatory gestures are used in new language environ-
ments. Beginner listeners assimilate L2 sounds to L1 sounds, which are perceived
as most similar given that non-native environments lack familiar articulatory ges-
tures. Discrimination is expected to be excellent in the cases when an L2 contrast
is perceptually assimilated to different native categories (two-category assimila-
tion). However, discrimination is expected to be poor when contrasting L2 sounds
are assimilated to the same L1 category (single category assimilation). In the case
of an L2 contrast, in which the one member is assimilated as a good version and
the other as a poor version of a native category (category-goodness assimilation),
the perceptual difficulty depends on the degree of difference in category goodness
between the two L2 phones while discrimination is expected to be moderate to
good. The next type involves cases where one L2 phone is categorised and the
other is not (uncategorised-categorised assimilation) while discrimination is good.
When both L2 phones are uncategorised (uncategorised-uncategorised assimila-
tion), discrimination may be poor or very good depending on the auditory and
phonetic similarities between the L2 phones. Finally, when the two non-native
phones are very different from the articulatory gestures of the L1 phonemes, these
are not perceived as speech sounds (non-assimilable assimilation); discrimination
may be poor to very good depending on the similarity of the sounds.
In turn, the Speech Learning Model (SLM: Flege 1995, 2002) supports that
the problems in acquiring L2 sounds are the result of the learners’ tendency to
relate new sounds to the existing positional allophones. This process is called
“equivalence classification”, and because of it, L2 sounds get filtered out by L1
phonology. The model suggests that because “the mechanisms and processes
used in learning L1 sound system remain intact over the life spam” (Flege 1995:
239), adults could learn the accurate perception of new L2 properties. Learners
could create new categories given that they could perceive the phonetic differen-
ces between L2 sounds. Based on the model, L2 sounds are more difficult to be
perceived if they are similar to L1 sounds; however, L2 sounds that differ com-
pared to L1 sounds are easier to be perceived. One main difference compared to
NLM is that SLM predicts one common phonological space for both the L1 and
the L2 systems. Recently, Flege and Bohn (2021) have revised the SLM; the re-
vised Speech Learning Model (SLM-r) is an individual differences model aiming
to account for how phonetic systems reorganise over the life span based on the
phonetic input received during the L2 learning.
Therefore, the three above-mentioned perceptual models aim to explain L1
and L2 speech perception and production as well as the connection between
44 Elena Kkese, Sviatlana Karpava
Based on word frequency (Kkese 2016; Kkese and Karpava 2019; Pierrehumbert
2003), high-frequency words can be accessed faster while there will be fewer
problems retrieving these words when information is missing or when there is
noise in the acoustic signal. Low-frequency words, however, cannot be identi-
fied on the basis of fewer perceptual cues and, as a result, cannot be that easily
predicted (Kkese 2016).
The present study aims to investigate the perception of the complete set of
English consonants and vowels by CG listeners of L2 English when these are
found in high- and low-frequency words, according to the NLM theory, taking
extra-linguistic and linguistic factors into consideration. Even though L2 En-
glish is widely used in Greek-speaking Cyprus (Kkese 2016), phonetic research
comparing L1 CG and L2 English is limited, focusing on plosive consonant per-
ception on a word (Kkese 2016, 2020a, 2020b; Kkese and Petinou 2017a, 2017b)
or utterance level (Kkese 2016), as well as consonant and vowel perception on a
word level (Karpava and Kkese 2020; Kkese and Karpava 2019).
To gain an insight into the inventories of consonants and vowels in SBE (Stan-
dard British English) and CG, it is important to briefly describe the differences
between the two systems. Even though SBE and CG share a similar alphabet,
there are many differences in terms of phonology that merit attention. To start
with, CG is a southeastern dialect of Greek, spoken in Greek-speaking Cyprus.
The dialect is a closer variety to ancient Greek since it differs in the phonological,
lexical, and syntactical level when compared to Greek (Petinou and Terzi 2002).
CG has a complicated consonant system consisting of approximately 51 sounds
(Table 1), including voiceless plosives and affricates, voiceless and voiced frica-
tives, nasals, and liquids (Arvaniti 1999). Consonants are further distinguished
based on consonant length (Arvaniti 2010; Kkese 2016). Turning to the vowel in-
ventory, this is constituted of the five simple vowels /i e a u o/ while there are no
diphthongs (Table 2). Moreover, vowels in CG do not differ in terms of duration
(Lengeris 2009), tense-lax or long-short distinction (Arvaniti 2007).
The target English variety investigated in the present study is Standard British
English (SBE), given that this is the variety in which students were exposed in En-
glish phonetics and phonology modules at the specific universities.1 SBE has a
Participants at the specific universities had to attend one English phonetics and phonology
module two times a week (for three hours in total).
46 Elena Kkese, Sviatlana Karpava
Plosive p p ͪ: b t t ͪ: d c c ͪ: ɟ k k ͪ :g
Affricate ts ʧ ʧ: ʤ
Fricative f f: v v: θ θ: ð ð: ʃ ʃ: ʒ ʒ: ç ç: j j: x x: ɣɣ:
s s: z z:
Nasal m m: n n: ɲ ŋ
Lateral l l: ʎ
Tap ɾ
Trill r
front i e
central a
back u o
consonant system of only twenty-four sounds (Table 3). Concerning vowels, SBE
has a more complicated vowel system, consisting of at least twenty sounds (Deterd-
ing 2004). Specifically, there are twelve monophthongs /i: ɪ ɛ æ u: ʊ ɔ: ɒ ɑ: ʌ ɜ: ə/
(Table 4), which are stressed phonemes except for the unstressed schwa [ə] (Crut-
tenden 2014). Furthermore, this variety consists of eight diphthongs /aı eı ɔı aʊ əʊ
ıə ɛə ʊə/, the five falling diphthongs /aı eı ɔı aʊ əʊ/ and the three centering diph-
thongs /ıə ɛə ʊə/ (Cruttenden 2014). Triphthongs are also present in SBE; these con-
sist of the five closing diphthongs with /ə/ added at the end, resulting in /aıə eıə ɔıə
aʊə əʊə/. Duration differences between the lax and tense vowels are also impor-
tant; /ɪ ɛ æ ʊ ɒ ʌ/ are lax vowels while /i: u: ɔ: ɑ: ɜ:/ are tense.
One major difference between SBE and CG involves the consonantal invento-
ries of the two language varieties. Even though both varieties have a voiceless/
voiced distinction, the consonantal inventory of CG is considerably larger as indi-
cated in Table 1 due to additional consonants as well as the consonant length
distinction, which are lacking in SBE. Allophonic differences also account for
some of the differences; even though some consonants are shared by SBE and CG
Effect of task, word length and frequency on speech perception in L2 English 47
Plosive P b t d k g
Affricate ʧ ʤ
Fricative f v θ ð s z ʃ ʒ h
Nasal m n ŋ
Approximant ɹ j
Lateral l
Approximant
front i: ɪ e æ
central ʌ ə ɜ:
back u: ʊ ɔ: ɒ a:
such as the plosive consonants /p t k/, which occur across the two phonetic in-
ventories, CG also includes /pʰ tʰ kʰ/. Whereas in SBE these are allophonic differ-
ences (non-contrastive), in CG these are separate phonemes.
A second major difference between SBE and CG relates to the phonological
make-up of their vowel inventories. Namely, CG has seven monophthongal vowel
categories less than SBE, while differences can be observed between the ones that
are present based on vowel transcriptions alone. Whereas the five monophthongs
are orthographically similar in SBE and CG, they differ considerably at the pho-
netic level. This implies that there is not an orthographic-acoustic link between
SBE and CG, as the same grapheme can represent different phonemes in the two
languages. Specifically:
48 Elena Kkese, Sviatlana Karpava
1. the grapheme ‘a’ can be represented with the phoneme /a/ in CG as in [ˈgata]
(cat), [ˈkap:a] (cape), but with /æ ɑ: ə/ in SBE as in [pʰæt] (pat), [pʰɑːt] (part),
[əˈpʰɑːt] (apart);
2. the grapheme ‘i’ can be represented with the phoneme /i/ in CG as in [ˈmiti]
(nose), [miˈsi] (half) but with /i: ɪ/ in SBE as in [ˈli:tə] (litre), [ˈlɪtə] (litter);
3. the grapheme ‘e’ can be represented with the phoneme /e/ in CG as in
[ˈmeres] (days), [ˈslaises] (fetes) but with /ɛ ɜ:/ in SBE as in [end] (end), [ɜːnd]
(earned);
4. the grapheme ‘o’ can be represented with the phoneme /o/ in CG as in [ˈkopos]
(trouble), [ˈponos] (pain) but with /ɔ: ɒ/ in SBE as in [pʰɔːt] (port), [kʰɒt] (cot);
5. the grapheme ‘u’ can be represented with the phoneme /u/ in CG as in
[sɣuˈrus] (curly ones), [ˈkuklus] (dolls) but with /u: ʊ/ in SBE as in [ˈlu:kə]
(lucre), [fʊl] (full).
vowels imposing more vowel variability (Recasens and Espinosa 2006); more
complex L1 vowel inventories, thus, could facilitate listeners’ ability to attend to
cues in a native-like manner when perceiving L2 English vowels (Hacquard, Wal-
ter, and Marantz 2007; Kivistö-de Souza and Carlet 2014). Therefore, the present
study aimed to examine the relative difficulty encountered by CG listeners of L2
English in perceiving consonants and vowels focusing on consonant voicing and
vowel length. Specifically, the following research questions were investigated:
1. Is there a task effect on the consonant and vowel perception in L2 English?
Do learner variables such as age, gender, and years of L2 instruction and
use correlate with the results in the three tasks?
2. What is the effect of word length on the discrimination of L2 English conso-
nantal and vocalic contrasts?
3. What is the effect of word frequency on the perception of L2 English conso-
nant and vowel sounds?
Given that we assume that speech sounds are perceived categorically, vowels
are expected to be more difficult for the L2 listeners of English. One of the au-
thors’ intentions, thus, was to examine L2 speech perception as a function of
the type of task used for discrimination. L2 sounds were further examined
based on word length; words with fewer syllables were expected to be more dif-
ficult for L2 learners due to the lack of suprasegmental information (Kkese
2016). The authors also hypothesised that low-frequency words would be less
efficiently processed compared to high-frequency words since the former may
not be known to most people. This word frequency effect (Monsell, Doyle, and
Haggard 1989) suggests low-frequency words in L2 English may be distin-
guished with more difficulty by the L2 listeners. The findings seem to have sig-
nificant implications for L2 pronunciation teaching and training.
2 Methodology
2.1 Participants
130 normal-hearing and vision adults participated in this study; the selection
phase took place based on the participants’ self-reported language background
information, which was obtained via a language background questionnaire. The
questionnaire consisted of seven questions in the effort to gather some general
information about the participants, their first language, and further information
about their L2 English usage and exposure. Based on the participants’ responses,
50 Elena Kkese, Sviatlana Karpava
84 were female and 46 male L1 CG speakers and their mean age was 20, ranging
from 17 to 28 (SD=2.91). They were all undergraduate students attending two pub-
lic and one private universities in Greek-speaking Cyprus, exposed to L2 English
as the language of instruction; 59 participants were attending a public university
and 71 were students at the private university. With regard to L2 English, the par-
ticipants’ mean age of exposure to the language was 9.6, ranging from 0 to 19
(SD=3.42), and the mean number of years of formal instruction to L2 English was
10, ranging from 0 to 28 (SD=2.63). Concerning visits to English-speaking coun-
tries, 60 participants reported positively while 70 reported that they had never
been to an English-speaking country. Most students reported that they use L2 En-
glish in their everyday life (90 participants) while only 35 participants responded
that they do not generally use the language. Finally, in terms of L2 proficiency,
the mean number of obtained IELTS score was 6.5, ranging from 5 to 9 (SD=1.3),
indicating a low intermediate to advanced L2 English proficiency.
For the present study, non-probability convenience sampling was used given
that the participants were attending General English, Academic Writing and/or
Linguistics courses taught by the two researchers in L2 English. The only partici-
pants who were excluded from the sample were students whose L1 was not CG. It
is worth mentioning that participants had no previous knowledge of English pho-
netics and phonology at the beginning of the study. Participation was on a
completely voluntary basis and students were ensured about their confidentiality
of their personal information. They agreed to take part in the study by signing a
consent form; the participants were divided into five groups (N=26) and they
were tested in consonant and vowel identification in familiar and non-familiar
real words spoken by native SBE speakers.
2.2 Procedure
All perceptual tasks took place in three quiet computer rooms at the universities
with individual computers and headphones (listening volume was set at 75dB)
and were always closely monitored by the two researchers. The research period
involved one fall semester while data were collected in different sessions in
which the phoneme identification task was administered first (first session). The
next two sessions (sessions two and three) involved the administration of the dis-
crimination task; during the second session, the task focusing on consonants was
administered while session three involved the administration of the discrimina-
tion task focusing on vowels. The last task administered was the word dictation;
this task, however, was completed in six different sessions, as described below.
The four tasks were pre-recorded using Audacity 1.3. Beta software for recording
Effect of task, word length and frequency on speech perception in L2 English 51
and editing sounds. All the speakers were native SBE speakers; for the first two
tasks, namely the phoneme identification and discrimination tasks, the same fe-
male speaker (age 35) was used. For the word dictation task, one female and a
male speaker were used, as the acoustic input was from the online Macmillan
Dictionary.
During the first session, participants had to listen to five different words
that consisted of a minimal set, namely a target word and its four foils; they
were required to circle the word they could hear for a second time. Overall, the
participants had to respond to 20 minimal sets involving 100 words in total. For
sessions two and three, the discrimination tasks focusing on consonants and
vowels involved two-alternative forced-choice tasks in which participants had
to respond to 60 minimal pairs (120 words in total) in each task by circling the
one of the two words they heard for a second time. In sessions four to nine, the
participants had to listen to the six dictation tasks and record on a given score-
sheet the words they could hear; the task involved 120 words in total while
each session consisted of 20 words, namely ten words involving consonants
and ten involving vowels. For all the perceptual tests, participants had the
printed form in front of them, and they could listen to every task for a second
time, which allowed them to complete any missing information.
2.3 Stimuli
The target sounds were a set of SBE consonants and vowels, which are problematic
for CG listeners of L2 English; these were placed word-initially, -medially or -finally
in real high- and low-frequency words, which had a transparent spelling. All three
tasks focused on the word level and the words were checked against the minimal
pairs for English RP (Received Pronunciation) lists by John Higgins (2008) and the
word list of Francis and Kucera (1982). The decision to examine these sets of pho-
nemes was driven by predictions of L2 speech perception models such as NLM,
PAM, and SLM, which take into account the influence of the L1 inventory when
predicting difficulties in L2 acquisition as well as evidence from previous studies
in L2 speech perception by CG listeners (Karpava and Kkese 2020; Kkese 2016,
2020a, 2020b; Kkese and Karpava 2019; Kkese and Petinou 2017a, 2017b). Partici-
pants had to undertake the three tasks, developed by the researchers, which
aimed to expose them to both L2 consonant and vowel contrasts in different trials.
They had to respond via a circling response mode and/or recording target words
while their answers were scored as correct or incorrect, generating an overall cor-
rect score percentage. Besides, correct score percentages for every consonant and
vowel category were further obtained.
52 Elena Kkese, Sviatlana Karpava
With reference to the consonants used for this task, these involved /ð/-/d/, /s/-/z/,
/θ/-/ð/, /p/- /f/, /t/-/d/, /m/-/n/, /θ/-/s/, /p/-/b/, /k/-/g/, and /l/-/r/. Vowels in-
cluded /ɪ/-/i:/, /e/-/i:/, /æ/-/e/, /ɒ/-/ɔ:/, /u:/-/ʊ/, /ɑ:/-/æ/, /ə/-/ɑ:/, /ɜ:/-/e/,
/ɔ:/-/ɜ:/, and /ʌ/-/ɜ:/. The focus of this task was on minimal pairs, so two dis-
crimination tasks (see Appendices 2a and 2b) were developed to address both
consonants and vowels. Each involved a total of 120 mono- or bi-syllabic words
presented in two fully randomised blocks of 60 minimal pairs. Distractors made
up eight of the minimal pairs; specifically, two to four distractors were used for
every 16 presentations. Consonants served as the distractors of the discrimination
task focusing on vowels, while the distractors were vowels for the minimal pairs
focusing on consonants. The same female native speaker of SBE was used.
The SBE consonants included in the word dictation task were /ð z θ v d ŋ h b g ɹ/;
vowels involved /æ ɜː ɔː i: u: ɑ: e ʌ ə ʊ/. With regard to consonants, these involved
sounds which are problematic to CG speakers, mainly voiced consonants (Kkese
2016); as for vowels, five short and five long vowels were chosen given that in CG
there is no distinction in vowel length (Kkese 2016). The task (see Appendices 3a
and 3b) was made of 120 mono- and bi-syllabic words, out of which 60 involved
consonants and 60, vowels. There were ten conditions for consonant sounds
(6 words each) and ten conditions for vowel sounds (6 words each). The dicta-
tion task was split into six dictation sessions, each consisting of 20 words (ten for
consonants and ten for vowels). The acoustic input for isolated words from the
Effect of task, word length and frequency on speech perception in L2 English 53
online Macmillan Dictionary was used while two speakers were employed, namely
a female and a male native SBE users.
3 Results
3.1 Target perception: Vowels vs. consonants
The researchers have analysed the data for each task in terms of the target percep-
tion of vowels and consonants by the participants, which are presented as percen-
tages, averages of participants’ correct answers. The results of the study indicated
that the students were able to identify the vowel sounds /æ/, /ɜ:/, /e/, /ʌ/, /ʊ/, /e/
better in the phoneme identification task than in the other two tasks. This was not
the case for the sound /u:/, since performance was better in the dictation task;
also, for the sounds /ɔ:/, /ɑ:/, /i:/, performance was better in the discrimination
task, see Table 5. Taking each task into consideration, the most challenging vowels
in the phoneme identification task were the long vowels /ɑ:/ (20%), /u:/ (60.76%)
and /ɔ:/ (61.53%), which share backness.
It should be noted that the dictation task was quite challenging for the par-
ticipants as their performance were below or barely above 50% of accuracy for
most of the categories.
The most difficult vowels for perception in the dictation task were the long
vowels /ɜ:/ (21.90%), /ɑ:/ (42.47%) and /i:/ (47.68%); also, the short vowels /ʌ/
(40.62%), /æ/ (44.17%) and /ʊ/ (44.35%) were causing further difficulties to the
participants. The other vowel sounds were slightly better (above 50%): /ɔ:/
(53.81%) and /e/ (52.86%). It should be noted that the students were quite suc-
cessful regarding the perception of [ə] (74.62%).
With respect to the discrimination task, the most challenging vowel pairs for
perception (below 50%) were /ɒ ɔ/ (44.83%); and /u: ʊ/ (47.98%), see Table 5.
Category Phoneme identification task Dictation task Category Minimal pair task
Table 5 (continued)
Category Phoneme identification task Dictation task Category Minimal pair task
As for the consonants, the results of the study indicated that the students were
able to identify the consonant sounds /z/, /d/, /b/, /g/, /t/ better in the phoneme
identification task than in the other two tasks. This was not the case for the sounds
/ð d/, /ð θ/, /p f/, /m n/, /θ s/, /l r/, which were perceived better in the discrimina-
tion task, and the sound /h/, which was perceived better in the dictation task.
Looking into each task, the most challenging consonants (below 50% of the
target-like perception) in the phoneme identification task were /ð/ (34.60%), /f/
(35.38%) and /θ/ (43.80%), which are similar in terms of manner of articulation.
The other difficult consonant sounds were /n/ (65.38%) and /s/ (73%), which are
comparable concerning the place of articulation.
The most difficult consonants for perception (below 50% of the target-like
perception) in the dictation task were /ð/ (23.13%) and /θ/ (37%), which are
close in terms of manner and place of articulation, /v/ (42.91%), as well as /ŋ/
(29%) and /g/ (40.07%), these latter sharing the same place and voicing. The
other consonant sounds that caused some difficulties were /d/ (54.44%) and /b/
(56.74%), which are similar with respect to the voicing and manner of articula-
tion, and /z/ (59.38%) and /r/ (70.20%), which are similar regarding voicing. It
should be noted that the students were successful in perceiving /h/ (85.18%).
As for the discrimination task, there were no challenging consonant sound
pairs for perception (below 50%). Still, some consonants caused difficulties: /ð θ/
(68.62%), /p b/ (70.51%) and /t d/ (73.44%), which differ in voicing, see Table 6.
Category Phoneme identification task Dictation task Category Minimal pair task
Table 6 (continued)
Category Phoneme identification task Dictation task Category Minimal pair task
With reference to non-target perception, several vowels were replaced in the three
perceptual tasks; namely, the substitutions mostly involved the vowels /æ e ɜ:/.
Both in the phoneme identification and dictation tasks, /æ/ was mainly substituted
by /ʌ/, as well as by /ɑː/. Concerning the discrimination task, /e/ was mostly re-
placed by /æ/ since the two vowels are front and unrounded. The vowel sound /ɜ:/
was misperceived as /ɔː/, /ʊ/ and /uː/ in the phoneme identification task, depend-
ing on duration. In the dictation task, /ɜ:/ was replaced by /εə/, /ʌ/, /ɑː/ and /æ/.
In the discrimination task, /e/ was mostly substituted by /ɜ:/.
In the phoneme identification task and dictation task, /ɔ:/ was replaced
by /ɒ/. In the discrimination task, /ɜ:/ was more substituted by /ɔ:/; /ɔ:/ was
more misperceived as /ɒ/. In the dictation task, /ɔ:/ was also replaced by /əʊ/.
In the phoneme identification task and discrimination task, /e/ was compen-
sated by /iː/; in the dictation task, /e/ was substituted by /eɪ/, /ʌ/ and /æ/.
In all three tasks, /ɑ:/ was misperceived as /æ/. In the dictation task, /ɑ:/
was further substituted by /aʊ/, /aɪ/, /ʌ/, and by /ɒ/, which are central/back
vowels. In the discrimination task, /ʌ/ was misperceived as /ɜ:/.
In the phoneme identification task and the minimal pair task, the sound /u:/
was taken over by /ʊ/. In the dictation task, /u:/ was misperceived as /iː/ and /ɪ/.
In the minimal pair task, the sound /ɪ/ was substituted mostly by /i:/. In the dicta-
tion task, the sound /i:/ was misperceived as its short counterpart /ɪ/ and as /u:/.
In the discrimination task, /ə/ was replaced by /ɑː/ and in the dictation task by /ɪ/
and /aʊ/, see Table 7.
Table 7: Vowel perception across the three tasks: non-target perception and types of errors.
56
/iː/ /e/ /ʌ/ /ɑː/ /æ/ /ɒ/ /ɜː/ /ɔː/ /uː/ /ʊ/ /ɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
P /æ/ % % % % % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
M æ/e % N/A % N/A N/A % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
D /æ/ % % % % % % % % % % % % % % % % %
Category Non-target /ɔː/ /uː/ /ʊ/ /ɒ/ /ɜ:/ /ɑː/ /ʌ/ /æ/ /e/ /i:/ /ɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
P /ɜ:/ % % % % % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
M ɜ:/e % N/A N/A N/A N/A % N/A N/A N/A % N/A N/A N/A N/A N/A N/A N/A
D /ɜ:/ % % % % % % % % % % % % % % % % %
Elena Kkese, Sviatlana Karpava
Category Non-target /ʊ/ /ɒ/ /uː/ /ɜː/ /ɔ:/ /ɑː/ /ʌ/ /æ/ /e/ /i:/ /ɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
P /ɔ:/ % % % % % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
M ɔ:/ɜ: % N/A N/A N/A % % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
D /ɔ:/ % % % % % % % % % % % % % % % % %
Category Non-target /iː/ /ʌ/ /æ/ /ɪ/ /e/ /ɑː/ /ɒ/ /ʊ/ /uː/ /ɔ:/ /eɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
P /e/ % % % % % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
M e/i: % % N/A N/A N/A % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
D /e/ % % % % % % % % % % % % % % % % %
Category Non-target /æ/ /e/ /ʌ/ /iː/ /ɑ:/ /ɒ/ /ɪ/ /ʊ/ /uː/ /ɔ:/ /eɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
P /ɑ:/ % % % % % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
M ɑ:/æ % % N/A N/A N/A % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
D /ɑ:/ % % % % % % % % % % % % % % % % %
Category Non-target /æ/ /ɑː/ /e/ /ɜ:/ /ʌ/ /ɒ/ /ɔ:/ /uː/ /ɪ/ /ɔ:/ /eɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
P /ʌ/ % % % % % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
M ʌ/ɜ: % N/A N/A N/A % % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
D /ʌ/ % % % % % % % % % % % % % % % % %
Category Non-target /ɜː/ /ʊ/ /ɔː/ /ɒ/ /u:/ /ʌ/ /e/ /iː/ /ɪ/ /æ/ /eɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
P /u:/ % % % % % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
M u:/ʊ % N/A % N/A N/A % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
D /u:/ % % % % % % % % % % % % % % % % %
Category Non-target /uː/ /ɜː/ /ɒ/ /ɔː/ /ʊ/ /ʌ/ /e/ /iː/ /ɪ/ /æ/ /eɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
P /ʊ/ % % % % % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
M u:/ʊ % % N/A N/A N/A % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
D /ʊ/ % % % % % % % % % % % % % % % % %
Category Non-target /ʊ/ /ɒ/ /uː/ /ɜː/ /ɔ:/ /ʌ/ /æ/ /e/ /ɪ/ /iː/ /eɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
P /ɔ:/ % % % % % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
M ɒ/ɔ: % N/A % N/A N/A % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
D /ɔ:/ % % % % % % % % % % % % % % % % %
Category Non-target /ʌ/ /æ/ /ɪ/ /e/ /iː/ /ə/ /uː/ /ʊ/ /ɒ/ /ɜː/ /eɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
P /iː/ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
M ɪ/i: % N/A N/A % N/A % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
D /i:/ % % % % % % % % % % % % % % % % %
(continued)
Effect of task, word length and frequency on speech perception in L2 English
57
Table 7 (continued)
58
/iː/ /e/ /ʌ/ /ɑː/ /æ/ /ɒ/ /ɜː/ /ɔː/ /uː/ /ʊ/ /ɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
Category Non-target /ʌ/ /ɑː/ /ɪ/ /e/ /ə/ /æ/ /ɜː/ /ɔ:/ /ɒ/ /iː/ /eɪ/ /əʊ/ /aɪ/ /aʊ/ /eə/ No production
P /ə/ N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
M ə/ɑ: % N/A % N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
D /ə/ % % % % % % % % % % % % % % % % %
✶
P=Phoneme identification task; M= Discrimination task; D=Dictation task; N/A=Not Available
Elena Kkese, Sviatlana Karpava
Effect of task, word length and frequency on speech perception in L2 English 59
The analysis of the non-target perception revealed that in the phoneme identifica-
tion task and in the dictation task, students misperceived /ð/, /θ/ and /t/ as /d/. In
the discrimination task, /d/was more substituted by /ð/. In addition, in the dicta-
tion task, /ð/ was also perceived as /w/, /v/, /l/, and /j/.
In the phoneme identification task, /z/ was misperceived as /dʒ/; in the dis-
crimination task, /z/ was more commonly replaced by its voiceless counterpart /s/.
In the dictation task, /z/ was perceived as /s/, /d/ and /dʒ/, which could be due to
the similarity in voicing and manner of articulation. In the phoneme identification
task, /θ/ was misperceived as /d/ and /v/; in the discrimination task, /θ/ was more
commonly perceived as /ð/, while in the dictation task, the students perceived /θ/
as /b/, /f/, /t/, /d/, /k/, /p/, /v/ and /l/.
In the phoneme identification task, /f/ was misperceived as /v/, /j/ and/w/.
In the discrimination task, /f/ was substituted by /p/. In the phoneme identifica-
tion task, /d/ was replaced by /b/, /p/ and /t/. In the discrimination task, /d/ was
substituted by /t/. In the dictation task, /d/ was misperceived as /t/, /b/ and /k/.
In the phoneme identification task, /n/ was identified as /m/, /ŋ/, /r/ and /l/.
In the discrimination task, /n/ was replaced by /m/. In the phoneme identification
task, /s/ was changed into /z/ and /ʃ/; in the discrimination task, /θ/ was per-
ceived as /s/, which is mainly due to the matching in voicing and manner of
articulation.
In the phoneme identification task, /b/ was substituted by /p/; in the dis-
crimination task, /p/ was replaced mostly by /b/; in the dictation task, /b/ was
substituted by /p/, /d/, /k/, /t/ and /l/. In the phoneme identification task, /g/
was substituted by /p/; in the discrimination task, /g/ was not differentiated
from /k/. In the dictation task, /g/ was misperceived as /t/, /d/, /k/, and /r/,
based on voicing, manner and place of articulation.
In the phoneme identification task, /t/ was substituted by /n/, /m/, and /ŋ/.
In the discrimination task, /t/ was misperceived as /d/, as they are congruent in
manner and place of articulation. In the discrimination task, /r/ was more misper-
ceived as /l/. In the dictation task, /r/ was substituted by /l/, /b/, /k/ and /p/. In
the dictation task, /v/ was represented by /t/, /w/, /l/, /f/, /b/, /n/, /s/, and /ð/; in
turn, /ŋ/ was replaced by /n/, /m/, and /ʃ/, see Table 8.
60 Elena Kkese, Sviatlana Karpava
Table 8: Consonant perception across the three tasks: non-target perception and types of
errors.
P /ð/ % % % % % N/A N/A N/A N/A N/A
M ð/d % N/A % N/A N/A % N/A N/A N/A N/A
D /ð/ % % % % % % % % % %
Category Non-target /dʒ/ /s/ /ʃ/ /tʃ/ /z/ /ð/ /d/ /b/ /v/
Category Non-target /v/ /d/ /w/ /h/ /θ/ /ð/ /b/ /f/ /t/
Category Non-target /v/ /w/ /h/ /j/ /θ/ /ð/ /f/ /p/ /t/
Category Non-target /g/ /p/ /b/ /t/ /d/ /ð/ /f/ /ʃ/ /r/
Category Non-target /m/ /ŋ/ /r/ /l/ /n/ /ð/ /f/ /p/ /r/
Category Non-target /z/ /ʃ/ /dʒ/ /tʃ/ /s/ /θ/ /f/ /p/ /r/
Category Non-target /p/ /d/ /k/ /g/ /b/ /θ/ /f/ /ʃ/ /t/
Table 8:
/r/ /k/ /l/ /m/ /n/ /z/ /p/ /s/ /dʒ/ /g/ /j/ No production
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
% % % % % % % % N/A N/A % %
/r/ /k/ /l/ /m/ /n/ /w/ /p/ /t/ /dʒ/ /g/ /j/ No production
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
/k/ /z/ /p/ /m/ /n/ /z/ /p/ /s/ /dʒ/ /g/ /j/ No production
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
% % % N/A N/A N/A N/A N/A N/A N/A N/A %
/k/ /z/ /ʃ/ /m/ /n/ /z/ /p/ /s/ /dʒ/ /g/ /j/ No production
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
/v/ /w/ /k/ /n/ /z/ /dʒ/ /θ/ /s/ /m/ /g/ /j/ No production
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
% % % % % % % % N/A N/A N/A %
/v/ /w/ /k/ /n/ /z/ /dʒ/ /θ/ /s/ /m/ /g/ /j/ No production
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
/v/ /w/ /k/ /n/ /z/ /dʒ/ /θ/ /s/ /m/ /g/ /j/ No production
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
/r/ /v/ /w/ /l/ /m/ /n/ /s/ /θ/ /dʒ/ /g/ /j/ No production
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
% % % % % % % N/A N/A N/A N/A %
62 Elena Kkese, Sviatlana Karpava
Table 8 (continued)
Category Non-target /t/ /d/ /p/ /b/ /k/ /g/ /f/ /θ/ /t/
Category Non-target /m/ /n/ /ŋ/ /b/ /t/ /d/ /f/ /θ/ /t/
Category Non-target /l/ /r/ /m/ /n/ /ŋ/ /d/ /f/ /θ/ /b/
P N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
M l/r % % % N/A N/A N/A N/A N/A N/A N/A
D /r/ % % N/A % % N/A % % N/A %
Category Non-target /b/ /f/ /t/ /r/ /h/ /w/ /k/ /l/ /m/
P /v/ % % % % % % % % % %
M /ŋ/ % % N/A % N/A N/A N/A N/A % %
D /h/ % % N/A N/A % % N/A N/A N/A N/A
✶
P=Phoneme identification task; M= Discrimination task; D=Dictation task; NA=Not Available
Effect of task, word length and frequency on speech perception in L2 English 63
Table 8 (continued)
/r/ /k/ /l/ /m/ /n/ /z/ /p/ /s/ /dʒ/ /g/ /j/ No production
/r/ /v/ /w/ /l/ /m/ /n/ /s/ /θ/ /dʒ/ /g/ /j/ No production
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
% N/A N/A % % % N/A N/A N/A N/A N/A %
/r/ /v/ /w/ /l/ /ʃ/ /p/ /s/ /θ/ /dʒ/ /g/ /j/ No production
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A %
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
/k/ /p/ /g/ /ð/ /ʃ/ /p/ /s/ /θ/ /dʒ/ /g/ /j/ No production
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
% % % % N/A N/A N/A N/A N/A N/A N/A %
/n/ /p/ /θ/ /s/ /ð/ /d/ /v/ /k/ /dʒ/ /g/ /ʃ/ No production
Vowels Consonants
100%
80%
60%
40%
20%
0%
Phoneme identification task Minimal pair task Dictation task
According to the paired samples t-test statistical analysis (using IBM SPSS Sta-
tistics 25), the difference between target vowel and consonant perception is
statistically significant: in the phoneme identification task (t(129)= −3.293,
p=.001 ✶✶ , d=0.826 ✶✶ ), with a large effect size, in the discrimination task
(t(129)= −12.366, p=.000✶✶; d=0.937✶✶), with a large effect size, and in the
dictation task (t(129)= −9.958, p=.000✶✶; d=1.190✶✶), with a large effect size.
Effect of task, word length and frequency on speech perception in L2 English 65
The findings of this study showed that the word length effect depends on the
task and the type of the sound. In the phoneme identification and discrimina-
tion tasks, two-syllable words elicited more target perception of the vowel and
consonant sounds. In the dictation task, one-syllable words elicited more target
perception of the vowel sounds, whereas two-syllable words elicited more tar-
get perception of the consonant sounds and overall.
According to a one-way ANOVA, there is no statistically significant difference
among the three tasks regarding vowel perception in words of one-syllable
length (F(2,128)= 1.525, p=.121), consonant perception in words with two sylla-
bles (F(2,128)= 1.389, p=.177), but there is a statistically significant difference
66 Elena Kkese, Sviatlana Karpava
Dictation task
Dictation task
the Perceptual Assimilation Model (PAM and PAM-2: Best 1993, 1994, 1995; Best
and Tyler 2007), which follows an ecological approach to speech perception
(Best 1984), articulatory gestures that are used by L1 learners for perception are
also used by L2 learners for L2 discrimination; this suggests that L1 sounds are
assimilated to different/single native categories, depending on how similar the
sounds are. According to the Speech Learning Model (SLM: Flege 1995, 2002),
the process of L2 perception is constrained by L1 phonology since there is one
common phonological space for both L1 and L2 systems. L2 learners compare
new sounds and the L1 positional allophones; therefore, L2 perception is easier
if L1 and L2 sounds are different.
Our findings support the NLM model as L2 learners seem to have difficulty
with the sounds that are similar in both the L1 and the L2, at least in terms of some
of their acoustic cues. Most of the vowel sounds that received high perception
scores in L2 English are different from the L1 CG vowels (/æ/, /ɜ:/, /ʌ/, /u:/, /ɔ:/,
/ɑ:/, /i:/) in terms of vowel length, highness, frontness, and roundedness.
This partially supports the Native Language Magnet theory (NLM and NLM-e:
Kuhl 1993; Kuhl et al. 2008) as the students found it easier to perceive the sounds
that are different from their L1 sounds or those that are not present in their L1
sound system. In terms of consonants, the L1 CG consonant sound inventory is
richer than L2 English. The results showed that the students had higher percep-
tion scores for the consonant sounds that are similar in both languages: /z/, /d/,
/b/, /g/, /t/, /ð d/, /ð θ/, /p f/, /m n/, /θ s/, /l r/, /h/. This could support the Per-
ceptual Assimilation Model (PAM and PAM-2: Best 1993, 1994, 1995; Best and
Tyler 2007) as well as the Speech Learning Model (SLM: Flege 1995, 2002) in case
the participants were more advanced in the L2.
Regarding the first research question, there is an effect of the task on the
vowel and consonant perception. The task effect may be due to the peculiarities
of each task, as in the phoneme identification and discrimination tasks, the stu-
dents had to listen to almost identical words and choose the target sound that
was repeated; what differed between the two tasks was the number of words in-
volved, since in the phoneme identification task, minimal sets of five words were
involved, while the discrimination task included only two words. However, in
the dictation task, there was a link between oral and written form, as the learners
had to listen to the aural input, decode it and then encode it by writing a relevant
word that is in line with the orthographic rules of the English language. L2 learn-
ers seemed to have difficulty in perceiving non-native sounds, mapping the
speech signal to meaning, decoding and encoding. According to the NLM theory,
listeners seem to be better at discriminating between- as opposed to within-
category contrasts for both consonants and vowels (Kuhl 1993).
70 Elena Kkese, Sviatlana Karpava
The dictation task was found to be more difficult for the students than the
phoneme identification and discrimination tasks. Vowel perception was better
in the phoneme identification task, while consonant perception rates seemed to
be higher in the minimal pair and the dictation tasks. It was found that age,
gender, years of studying L2 English, visits to English-speaking countries as
well as contact with English people are significant factors that affect vowel and
consonant perception. The importance of these extra-linguistic factors has been
the focus of previous studies; age (Hurford, 1991; Lenneberg, 1967; Long, 1990;
Patkowski, 1990; Scovel, 1969; Walsh and Diller, 1981), gender (Moyer 2016; Oh
2011), years of studying L2 English (Best and Tyler 2007), visits to English-
speaking countries (Schumann 1978), and reported use in the L2 (Johnson and
Krug 1980; Krashen et al. 1978; Schumann 1978). Based on these studies, when
language learners start learning the L2 early and are exposed to enough com-
prehensible input, they are more successful.
The non-target perception of L2 English by CG speakers can be explained
by the differences in the sound systems and grapheme-phoneme correspond-
ences between L1 and L2, which is in line with previous studies (Flege and Way-
land 2019; Karpava and Kkese 2020; Kkese and Karpava 2019; Wang and Chen
2019). There are acoustic and functional differences between vowels and conso-
nants (Bonatti et al. 2005), as vowels tend to cause more difficulties to L2 listen-
ers (Pereira 2014). This depends on the L1 vowel inventory, how rich it is, and
whether L2 learners have a cue to process L2 sounds in a non-native language
(Hacquard, Walter, and Marantz 2007; Kivistö-de Souza and Carlet 2014). The
vowel system of English is more complex than that of CG and having in mind
that sounds are perceived categorically, this can explain the fact that vowels
are more difficult for L2 perception.
As for the second research question, the results of the three tasks may sug-
gest that some one-syllable words are more difficult than two-syllable words for
L2 perception, as they seem to lack information on primary stress. On the other
hand, polysyllabic words can be affected by different parameters including
loudness, length, pitch, and quality (Goldsmith 1990; Roach 2009). Moreover,
polysyllabic words adhere to various patterns of stress placement; a disyllabic
adjective such as lovely [ˈlʌv.li] is stressed on the first syllable while its three-
syllable noun counterpart [ˈlʌv.li.nəs] is stressed on the second syllable.
Finally, with regard to the third research question, low-frequency words
were misperceived more in the dictation task. This is in line with Monsell,
Doyle, and Haggard (1989), who suggest that low-frequency words are misper-
ceived more in L2 English compared to high-frequency words due to the lack of
the familiarity effect. Based on the familiarity effect, words that are frequent
are acquired early and are more likely to be known compared to less frequent
Effect of task, word length and frequency on speech perception in L2 English 71
words. This implies that high-frequency words need less time to be compared
to low-frequency words. However, this was not the case in the phoneme identi-
fication and the discrimination tasks, as high frequency words involving vowels
as the target sounds were misperceived more.
5 Pedagogical implications
The current study aimed to examine L2 perception in vowel and consonant
sounds by adult L2 listeners of L1 CG, who ranged from low intermediate to ad-
vanced L2 proficiency level. L2 learners tend to adjust the target L2 sounds to
the existing L1 cues as revealed through the results of the present study. How-
ever, the L2 perceptual ability may be influenced by further factors such as the
word length, lexical frequency, and type of task, as well as other individual dif-
ferences such as the age of acquisition, exposure to L2 learning, L2 proficiency,
and living in an L2-speaking country. Taken together, these factors may affect
perception, pointing to the need for pronunciation instruction experience and
specifically the need to incorporate bottom-up processing activities to support
L2 listening in the L2 classroom context. Bottom-up processing allows the L2
learners, especially at the early stages of L2 acquisition, to segment the speech
stream into meaningful units. Even though further studies are needed to test
the generalisability of the findings to different L2 learners (i.e., of different L2
proficiency and age profiles), pronunciation-focused teaching could consider-
ably help L2 learners of English in the investigated context and generally L2 set-
tings to master different acoustic-orthographic dimensions of L2 English at
both controlled and spontaneous speech levels (Saito 2015). Nonetheless, in the
educational system of Greek-speaking Cyprus, L2 learners of English do not re-
ceive any extensive pronunciation training.
The focus of the current study has been speech perception in L2 English, not
examining, therefore, speech production. Nonetheless, further studies could
focus on the impact of instruction to the perception and production abilities of L1
CG speakers of L2 English in an effort to help L2 English instructors understand
what makes the perception-production link difficult, thus helping the L2 listeners
overcome these difficulties (Kkese 2016). The current study also points to the
need for examining pronunciation instruction integrated into meaning-oriented
instruction contexts (Lee and Lyster 2016). Given that the study presented in this
chapter involved perception tasks conducted in an ‘isolated’ setting without any
sentential/communicative context, it would be very interesting to investigate how
pronunciation instruction integrated into meaning-oriented instruction contexts
72 Elena Kkese, Sviatlana Karpava
(continued)
(continued)
(continued)
Consonants
I. Condition [ð]
there /ðeə(r)/ [ð] (high frequency, initial position, syllable, male voice)
thy /ðaɪ/ [ð] (low frequency, initial position, syllable, male voice)
southern /ˈsʌðən/ [ð] (high frequency, middle position, syllables, female voice)
heather /ˈheðə(r)/ [ð] (low frequency, middle position, syllables, male voice)
smooth /smu:ð/ [ð] (high frequency, final position, syllable, female voice)
lathe /leɪð/ [ð] (low frequency, final position, syllable, female voice)
zap /zæp/ [z] (low frequency, initial position, syllable, male voice)
zebra /ˈzebrə/ [z] (high frequency, initial position, syllables, male voice)
muzzle /ˈmʌz(ə)l/ [z] (low frequency, middle position, syllable, male voice)
puzzle /ˈpʌz(ə)l/ [z] (high frequency, middle position, syllable, female voice)
demise /dɪˈmaɪz/ [z](low frequency, final position, syllables, female voice)
confuse /kənˈfjuːz/ [z] (high frequency, final position, syllables, female voice)
Effect of task, word length and frequency on speech perception in L2 English 77
(continued)
Consonants
thick /θɪk/ [θ] (high frequency, initial position, syllable, male voice)
Thursday /ˈθɜː(r)zdeɪ/ [θ] (high frequency, initial position, syllables, male voice)
ether /ˈiːθə(r)/ [θ] (low frequency, middle position, syllables, male voice)
anthem /ˈænθəm/ [θ] (low frequency, middle position, syllables, female voice)
wreath /riːθ/ [θ] (low frequency, final position, syllable, male voice)
depth /depθ/ [θ] (high frequency, final position, syllable, female voice)
vent /vent/ [v] (low frequency, initial position, syllable, female voice)
vault /vɔːlt/ [v] (low frequency, initial position, syllable, male voice)
beaver /ˈbiːvə(r)/ [v] (low frequency, middle position, syllables, male voice)
cover /ˈkʌvə(r)/ [v] (high frequency, middle position, syllables, female voice)
behave /bɪˈheɪv/ [v] (high frequency, final position, syllables, male voice)
give /ɡɪv/ [v] (high frequency, final position, syllable, male voice)
V. Condition [d]
dough /dəʊ/ [d] (low frequency, initial position, syllable, male voice)
doctor /ˈdɒktə(r)/ [d] (high frequency, initial position, syllables, female voice)
udder /ˈʌdə(r)/ [d] (low frequency, middle position, syllables, female voice)
fodder /ˈfɒdə(r)/ [d] (low frequency, middle position, syllables, male voice)
sad /sæd/ [d] (high frequency, final position, syllable, male voice)
red /red/ [d] (high frequency, final position, syllable, female voice)
dung /dʌŋ/ [ŋ] (low frequency, final position, syllable, male voice)
sing /sɪŋ/ [ŋ] (high frequency, final position, syllable, female voice)
cunning /ˈkʌnɪŋ/ [ŋ] (low frequency, final position, syllables, female voice)
finger /ˈfɪŋɡə(r)/ /ŋ/(high frequency, middle position, syllables, male voice)
juncture /ˈdʒʌŋktʃə(r)/ /ŋ/(low frequency, middle position, syllables, female voice)
tongue /tʌŋ/ /ŋ/ (high frequency, middle position, syllable, male voice)
hour /ˈaʊə(r)/ [h] (high frequency, initial position, syllable, female voice)
heir /eə(r)/ [h] (low frequency, initial position, syllable, male voice)
whelp /welp/ [h] (low frequency, middle position, syllable, male voice)
vehicle /ˈviːəkl/ [h] (high frequency, middle position, syllables, female voice)
downright /ˈdaʊnˌraɪt/ [h] (low frequency, final position, syllables, male voice)
although /ɔːlˈðəʊ/ [h] (high frequency, final position, syllables, female voice)
blunder /ˈblʌndə(r)/ [b] (low frequency, initial position, syllables, male voice)
78 Elena Kkese, Sviatlana Karpava
(continued)
Consonants
blatant /ˈbleɪt(ə)nt/ [b] (low frequency, initial position, syllables, male voice)
debunk /diːˈbʌŋk/ [b] (low frequency, middle position, syllables, female voice)
table /ˈteɪb(ə)l/ [b] (high frequency, middle position, syllable, male voice)
pub /pʌb/ [b] (high frequency, final position, syllable, female voice)
club /klʌb/ [b] (high frequency, final position, syllable, female voice)
gruff /ɡrʌf/ [g] (low frequency, initial position, syllable, male voice)
gaunt /ɡɔːnt/ [g] (low frequency, initial position, syllable, male voice)
cognate /ˈkɒɡneɪt/ [g] (low frequency, middle position, syllables, female voice)
angry /ˈæŋɡri/ [g] (high frequency, middle position, syllables, female voice)
frog/frɒɡ/ [g] (high frequency, final position, syllable, male voice)
colleague /ˈkɒliːɡ/ [g] (high frequency, final position, syllables, female voice)
X. Condition [ɹ]
rankle /ˈræŋk(ə)l/ [r] (low frequency, initial position, syllable, female voice)
ribald /ˈrɪb(ə)ld/ [r] (low frequency, initial position, syllables, male voice)
firm /fɜː(r)m/ [r] (high frequency, middle position, syllable, male voice)
corn /kɔː(r)n/ [r] (high frequency, middle position, syllable, female voice)
bicker /ˈbɪkə(r)/ [r] (low frequency, final position, syllables, male voice)
colour /ˈkʌlə(r)/ [r] (low frequency, final position, syllables, female voice)
Vowels
I. Condition [æ]
ant /ænt/ [æ] (high frequency, initial position, syllable, female voice)
amber /ˈæmbə(r)/ [æ] (low frequency, initial position, syllables, female voice)
barren /ˈbærən/ [æ] (low frequency, middle position, syllables, male voice)
clamour/ˈklæmə(r)/ [æ] (low frequency, middle position, syllables, male voice)
add /æd/ [æ] (high frequency, initial position, syllable, male voice)
ankle /ˈæŋk(ə)l/ [æ] (high frequency, initial position, syllable, female voice)
urge /ɜː(r)dʒ/ [ɜː] (high frequency, initial position, syllable, female voice)
earn /ɜː(r)n/ [ɜː] (high frequency, initial position, syllable, male voice)
culvert /ˈkʌlvə(r)t/ [ɜː] (low frequency, middle position, syllables, male voice)
immerse /ɪˈmɜː(r)s/ [ɜː] (low frequency, middle position, syllables, female voice)
Effect of task, word length and frequency on speech perception in L2 English 79
(continued)
Vowels
aver /əˈvɜː(r)/ [ɜː] (low frequency, final position, syllables, female voice)
stir /stɜː(r)/ [ɜː] (high frequency, final position, syllable, male voice)
oar /ɔː(r)/ [ɔː] (high frequency, initial position, syllable, female voice)
almost /ˈɔːlməʊst/ [ɔː] (high frequency, initial position, syllables, female voice)
adorn /əˈdɔː(r)n/ [ɔː] (low frequency, middle position, syllables, male voice)
appal /əˈpɔːl/ [ɔː] (low frequency, middle position, syllables, female voice)
roar /rɔː(r)/ [ɔː] (high frequency, final position, syllable, male voice)
sore /sɔː(r)/ [ɔː] (high frequency, final position, syllable, female voice)
V. Condition [u:]
use /juːz/ [u:] (high frequency, initial position, syllable, female voice)
union /ˈjuːnjən/ [u:] (high frequency, initial position, syllables, female voice)
traduce /trəˈdjuːs/ [u:] (low frequency, middle position, syllables, male voice)
extrude /ɪkˈstruːd/ [u:] (low frequency, middle position, syllables, male voice)
crew /kruː/ [u:] (high frequency, final position, syllable, female voice)
lieu /luː/ [u:] (low frequency, final position, syllable, male voice)
arm /ɑː(r)m/ [a:] (high frequency, initial position, syllable, female voice)
arch /ɑː(r)tʃ/ [a:] (high frequency, initial position, syllable, female voice)
alarm /əˈlɑː(r)m/ [a:] (high frequency, middle position, syllables, male voice)
ghastly /ˈɡɑːs(t)li/ [a:] (low frequency, middle position, syllables, male voice)
ajar /əˈdʒɑː(r)/ [a:] (low frequency, final position, syllables, male voice)
spar /spɑː(r)/ [a:] (low frequency, final position, syllable, female voice)
egg /eɡ/ [e] (high frequency, initial position, syllable, female voice)
end /end/ [e] (high frequency, initial position, syllable, male voice)
beget /bɪˈɡet/ [e] (low frequency, middle position, syllables, male voice)
inept /ɪˈnept/ [e] (low frequency, middle position, syllables, female voice)
stench /stentʃ/ [e] (low frequency, middle position, syllable, male voice)
entry /ˈentri/ [e] (high frequency, initial position, syllables, male voice)
80 Elena Kkese, Sviatlana Karpava
(continued)
Vowels
utter /ˈʌtə(r)/ [ʌ] (low frequency, initial position, syllables, female voice)
utmost /ˈʌtməʊst/ [ʌ] (low frequency, initial position, syllables, female voice)
blood /blʌd/ [ʌ] (high frequency, middle position, syllable, male voice)
flood /flʌd/ [ʌ] (high frequency, middle position, syllable, male voice)
sunder /ˈsʌndə(r)/ [ʌ] (low frequency, middle position, syllables, male voice)
other /ˈʌðə(r)/ [ʌ] (high frequency, initial position, syllables, female voice)
alone /əˈləʊn/ [ə] (high frequency, initial position, syllables, female voice)
again /əˈɡen/ [ə] (high frequency, initial position, syllables, male voice)
harangue /həˈræŋ/ [ə] (low frequency, middle position, syllables, male voice)
raiment /ˈreɪmənt/ [ə] (low frequency, middle position, syllables, female voice)
comma /ˈkɒmə/ [ə] (high frequency, final position, syllables, female voice)
swagger /ˈswæɡə(r)/ [ə] (low frequency, final position, syllables, male voice)
X. Condition [ʊ]
book /bʊk/ [u] (high frequency, middle position, syllable, female voice)
truce /truːs/ [u] (low frequency, middle position, syllable, male voice)
bullion /ˈbʊliən/ [u] (low frequency, middle position, syllables, male voice)
gruel /ˈɡruːəl/ [u] (low frequency, middle position, syllables, male voice)
should /ʃʊd/ [u] (high frequency, middle position, syllable, female voice)
hood /hʊd/ [u] (high frequency, middle position, syllable, female voice)
References
Arvaniti, Amalia. 1999. Greek voiced stops: Prosody, syllabification, underlying
representations or selection of the optimal? In Amalia Moser (ed.), Proceedings of the 3rd
International Linguistics Conference for the Greek Language, 1997, 383–390. Athens:
Ellinika Grammata.
Arvaniti, Amalia. 2007. Greek phonetics: The state of the art. Journal of Greek Linguistics 8(1).
97–208.
Arvaniti, Amalia. 2010. A (brief) overview of the phonetics and phonology of Cypriot Greek. In
A. Voskos, D. Goutsos & A. Mozer (eds.), The Greek Language in Cyprus: From Antiquity to
Today, 107–124. Athens: University of Athens.
Best, Catherine. 1984. Discovering messages in the medium. In Hiram Fitzgerald, Barry Lester
& Michael Yogman (eds.), Theory and Research in Behavioral Pediatrics, 97–145. Boston,
MA: Springer.
Best, Catherine. 1993. Emergence of language-specific constraints in perception of non-native
speech: A window on early phonological development. In Bénédicte de Boysson-Bardies,
Effect of task, word length and frequency on speech perception in L2 English 81
Scania de Schonen, Peter Jusczyk, Peter McNeilage & John Morton (eds.), Developmental
Neurocognition: Speech and Face Processing in the First Year of Life, 289–304.
Dordrecht: Springer.
Best, Catherine. 1994. The emergence of native-language phonological influences in infants: A
perceptual assimilation model. In Judith C. Goodman & Howard C. Nusbaum (eds.), The
Development of Speech Perception: The Transition from Speech Sounds to Spoken Words,
233–277. Cambridge, MA: The MIT Press.
Best, Catherine. 1995. A direct realist view of cross-language speech perception. In Winifred
Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-language
Research, 171–204. Timonium, MD: York Press.
Best, Catherine & Gerald McRoberts. 2003. Infant perception of non-native consonant
contrasts that adults assimilate in different ways. Language and Speech 46(2–3).
183–216.
Best, Catherine & Michael Tyler. 2007. Nonnative and second-language speech perception:
Commonalities and complementarities. In Ocke-Schwen Bohn & Murray Munro (eds.),
Language Experience in Second Language Speech Learning: In honor of James Emil Flege,
13–34. Amsterdam: John Benjamins.
Bonatti, Luca, Marcela Peña, Marina Nespor & Jacques Mehler. 2005. Linguistic constraints on
statistical computations: The role of consonants and vowels in continuous speech
processing. Psychological Science 16(6). 451–459.
Bradlow, Ann, David Pisoni, Reiko Akahane-Yamada & Yoh’ichi Tohkura. (1997). Training
Japanese listeners to identify English /ɹ/ and /l/. Journal of the Acoustical Society of
America 101(4). 2299–2310.
Carr, Phillip. 1999. English Phonetics and Phonology. An introduction. Oxford: Blackwell
Publishers.
Cruttenden, Alan. 2014. Gimson’s Pronunciation of English. Abingdon: Routledge.
Deterding, David. 2004. How many vowel sounds are there in English? STETS Language and
Communication Review 19(10). 19–21.
Flege, James Emil. 1995. Second language speech learning: Theory, findings, and problems. In
Winifred Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-
language Research, 233–277. Timonium, MD: York Press.
Flege, James Emil. 2002. Interactions between the native and second-language phonetic
systems. In Petra Burmeister, Thorsten Piske & Andreas Rohde (eds.), An Integrated View
of Language Development: Papers in Honor of Henning Wode, 217–244. Trier:
Wissenschaftlicher Verlag.
Flege, James Emil & Ocke-Schwen Bohn. 2021. The revised Speech Learning Model. In Ratree
Wayland (ed.), Second Language Speech Learning: Theoretical and Empirical Progress,
3–83. Cambridge: Cambridge University Press.
Flege, James Emil & Ratree Wayland. 2019. The role of input in native Spanish late learners’
production and perception of English phonetic segments. Journal of Second Language
Studies 2(1). 1–44.
Francis, Nelson & Henry Kucera. 1982. Frequency Analysis of English Usage: Lexicon and
Grammar. Boston: Houghton Mifflin.
Fry, Dennis, Arthur Abramson, Peter Eimas & Alvin Liberman. 1962. The identification and
discrimination of synthetic vowels. Language and Speech 5(4). 171–189.
Goldsmith, John. 1990. Autosegmental and Metrical Phonology. Oxford: Basil Blackwell.
82 Elena Kkese, Sviatlana Karpava
Kkese, Elena & Kakia Petinou. 2017b. Factors affecting the perception of plosives in second
language English by Cypriot-Greek listeners. In Elena Babatsouli (ed.), Proceedings of the
International Symposium on Monolingual and Bilingual Speech, Chania, Greece, 2017,
162–167. Chania, Greece: Institute of Monolingual and Bilingual Speech.
Krashen, Stephen, Zelinski Stanley, Jones Carl & Usprich Celia. 1978. How important is
instruction? English Language Teaching Journal 32(4). 257–261.
Kuhl, Patricia. 2000. A new view of language acquisition. Proceedings of the National
Academy of Sciences of the United States of America 97(22). 11850–11857.
Kuhl, Patricia. 1993. Innate predispositions and the effects of experience in speech
perception: The Native Language Magnet Theory. In Benedicte de Boysson-Bardies,
Scania de Schonen, Peter Jusczyk, Peter McNeilage & John Morton (eds.), Developmental
Neurocognition: Speech and Face Processing in the First Year of Life, 259–274. Dordrecht:
Springer.
Kuhl, Patricia, Barbara Conboy, Sharon Coffey-Corina, Denise Padden, Maritza Rivera-Gaxiola
& Tobey Nelson. 2008. Phonetic learning as a pathway to language: New data and Native
Language Magnet Theory Expanded (NLM-E). Philosophical Transactions of the Royal
Society B: Biological Sciences 363(1493). 979–1000.
Lee, Andrew & Roy Lyster. 2016. The effects of corrective feedback on instructed L2 speech
perception. Studies in Second Language Acquisition 38(1). 35–64.
Lengeris, Angelos. 2009. Perceptual assimilation and L2 learning: Evidence from the
perception of Southern British English vowels by native speakers of Greek and Japanese.
Phonetica 66(3). 169–187.
Lenneberg, Eric. 1967. Biological Foundations of Language. New York: Wiley.
Long, Michael. 1990. Maturational constraints on language development. Studies in Second
Language Acquisition 12(3). 251–285.
Lovatt, Peter, S. E. Avons & Jackie Masterson. 2000. The word-length effect and disyllabic
words. Quarterly Journal of Experimental Psychology 53A(1). 1–22.
Monsell, Stephen, Michael Doyle & Patrick Haggard. 1989. Effects of frequency on visual word
recognition tasks: Where are they? Journal of Experimental Psychology: General 118(1).
43–71.
Moyer, Alene. 2016. The puzzle of gender effects in L2 phonology. Journal of Second Language
Pronunciation 2(1). 8–28.
Norris, Dennis. 2013. Models of visual word recognition. Trends in Cognitive Sciences 17(10).
517–524.
Oh, Eunjin. 2011. Effects of speaker gender on voice onset time in Korean stops. Journal of
Phonetics 39(1). 59–67.
Patkowski, Mark. 1990. Age and accent in a second language: A reply to James Emil Flege.
Applied Linguistics 11(1). 73–89.
Pereira, Yasna. 2014. Perception and production of English vowels by Chilean learners of
English: Effect of auditory and visual modalities on phonetic training. London: University
College London dissertation.
Petinou, Kakia & Arhonto Terzi. 2002. Clitic misplacement in normally developing and
language impaired Cypriot-Greek children. Language Acquisition 10(1). 1–29.
Pierrehumbert, Janet. 2003. Phonetic diversity, statistical learning, and acquisition of
phonology. Language and Speech 46(Pt 2–3). 115–154.
84 Elena Kkese, Sviatlana Karpava
Raphael, Lawrence, Gloria Borden, & Katherine Harris. 2007. Speech Science Primer:
Physiology, Acoustics, and Perception of Speech. Baltimore, Philadelphia: Lippincott
Williams & Wilkins.
Recasens, Daniel & Aina Espinosa. 2006. Dispersion and variability of Catalan vowels. Speech
Communication 48(6). 645–666.
Repp, Bruno H. 1981. Two strategies in fricative discrimination. Perception and Psychophysics
30(3). 217–227.
Repp, Bruno. 1984. Categorical perception: Issues, methods, findings. In Norman Lass (ed.),
Speech and Language: Advances in Basic Research and Practice, 244–335. Orlando, FL:
Academic Press.
Roach, Peter. 2004. British English: Received pronunciation. Journal of the International
Phonetic Association 34(2). 239–245.
Roach, Peter. 2009. English Phonetics and Phonology. Cambridge: Cambridge University
Press.
Saito, Kazuya. 2015. Communicative focus on L2 phonetic form: Teaching Japanese learners to
perceive and produce English /ɹ/ without explicit instruction. Applied Psycholinguistics
36(2). 377–409.
Schumann, John (1978). The acculturation model for second-language acquisition. In
R. Gingras (ed.), Second-language acquisition and foreign language teaching, 27–50.
Arlington, VA: Center for Applied Linguistics.
Scovel, Tom. 1969. Foreign accents, language acquisition and cerebral dominance. Language
Learning 19(3–4). 245–54.
Thomson, Ron. 2012. Improving L2 listeners’ perception of English vowels: A computer-
mediated approach. Language Learning 62(4). 1231–1258.
Walsh, Terence & Diller Karl. 1981. Neurolinguistic considerations on the optimal age
for second language learning. In K. Diller (ed.), Individual Differences and Universals in
Language Learning Aptitude, 510–524. Rowley, MA: Newbury House.
Wang, Xinchun & Jidong Chen. 2019. English speakers’ perception of Mandarin consonants:
The effect of phonetic distances and L2 experience. In Sasha Calhoun, Paola Escudero,
Marija Tabain & Paul Warren (eds.), Proceedings of the 19th International Congress of
Phonetic Sciences, Melbourne, Australia, 2019, 250–254. Canberra: Australasian Speech
Science and Technology Association Inc.
Wang, Yuling, Minghu Jiang, Yunlong Huang & Qiu Peijun. 2021. An ERP study on the role of
phonological processing in reading two-character compound Chinese words of high and
low frequency. Frontiers in Psychology 12. https://www.frontiersin.org/article/10.3389/
fpsyg.2021.637238.
Werker, Janet & Richard Tees. 1984. Phonemic and phonetic factors in adult cross‐language
speech perception. The Journal of the Acoustical Society of America 75(6). 1866–1878.
Zhang, Qin., John X. Zhang & Lingyue Kong. 2009. An ERP study on the time course of
phonological and semantic activation in Chinese word recognition. International Journal
of Psychophysiology 73(3). 235–245.
Pedro Luis Luchini, Cosme Daniel Paz, María Claudia Troglia
L2 accented speech measured
by Argentinian pre-service teachers
Abstract: Five international students from Argentina, Belgium, China, Japan
and Poland recorded a picture narrative in English that was later assessed for
measurements of comprehensibility and accentedness by a group of 22 Span-
ish-L1 Argentinian prospective English language teachers. After the listening
task, the raters completed a complementary activity where they identified the
linguistic factors that, in their view, had either eased or impaired the measure-
ment task. Results varied across the 5 speech samples due to the wide range of
phonetic-phonological/syntactic-semantic differences brought up by the speak-
ers’ L1 background transfer to L2 production. To determine the degree of associ-
ation between comprehensibility and accentedness, correlation analyses were
conducted. This analysis was significant for the Belgian and Japanese speakers,
but not for the rest of the speakers. Moderation among raters was highly varied
though not statistically significant. Data from the complementary task were
clustered into different linguistic factors: pronunciation, fluency, lexicogram-
mar and speech rate. Frequency analyses revealed that fluency and prosody
emerged as facilitating factors, while sounds and lexicogrammar appeared as
impeding factors. Upon these findings, some suggestions for L2 pronunciation
pedagogy and future research were made.
1 Introduction
For a long time, the teaching of English pronunciation was neglected in the field
of Applied Linguistics (Lee, Jang, and Plonsky 2014). Today, however, pronuncia-
tion is present in numerous worldwide academic settings and well-known jour-
nals in which issues related to L2 pronunciation pedagogy, assessment and
research such as intelligibility, comprehensibility and degree of accentedness are
Pedro Luis Luchini, Cosme Daniel Paz, María Claudia Troglia, Universidad Nacional de Mar
del Plata
https://doi.org/10.1515/9783110736120-004
86 Pedro Luis Luchini, Cosme Daniel Paz, María Claudia Troglia
discussed (Bøhn and Hansen 2017; Derwing and Munro 2015; Derwing, Munro,
and Wiebe 1998; Munro and Derwing 1995).
Motivated by these investigations, in this study, we aim to explore the extent
to which the English produced by international students – with an intermediate
level of proficiency – affects comprehensibility and degree of accentedness as
measured by a group of 22 L1-Spanish prospective English language teachers. Lis-
teners assessed 5 recordings of picture narratives produced by 5 students from
Argentina, Belgium, China, Japan and Poland, respectively. Using a Likert-type
scale, they indicated degree of comprehensibility and accentedness. After per-
forming the measurement task, the listeners completed a complementary activity
whereby they wrote a brief report about each speaker’s productions, describing
the linguistic factors (pronunciation, fluency, lexicogrammar aspects, and speech
rate) that had facilitated or impaired the completion of the perceptual task.
The first part of the paper introduces the literature review, followed by the
method section, in which context, participants and materials are described.
The next section presents the results along with a general discussion. Finally,
some pedagogical implications for teaching L2 pronunciation are addressed,
and some avenues for future research are delineated.
2 Literature review
For the last twenty-five years or so, as a result of new technological advances and
the spread of worldwide globalization, English has become a lingua franca (Jen-
kins 2000; Seidlhofer 2011; Walker 2010). For non-native speakers, English has
thus become an additional language used for international communication. To
facilitate and safeguard global verbal interaction, L2 speakers and listeners need
to be both intelligible and comprehensible. L2 pronunciation plays an essential
role in communication as it constitutes the scaffolding of L2 speech; therefore, it
must be treated as a priority in language teaching (Levis 2005, 2006).
The prevailing requirement for L2 learners to strive for nativelikeness, which
still affects some pronunciation teaching practices, no longer seems to be a real-
istic goal to achieve successful communication. A more contemporary competing
ideology, however, recognizes that L2 learners’ speech needs to be easily under-
stood for communication to be successful, even if their foreign accents are salient
or very strong. In view of this new competing belief, and to meet this goal, teach-
ing practices need to be aligned with the principle of intelligibility (Levis 2005).
Instruction, then, should focus on those L2 speech aspects that have an effect on
understanding rather than on those that are comparably unproductive for that
L2 accented speech measured by Argentinian pre-service teachers 87
each other. That is how speakers can have a very accented L2 speech, and still be
fully comprehensible (Derwing and Munro 2015). The linguistic factors associated
with comprehensibility are more numerous and more varied than those tied to
accentedness. Accentedness correlates with the appropriate use of segments,
while comprehensibility shows a stronger association with suprasegmental fea-
tures (stress, rhythm and intonation), fluency and lexico-grammatical and discur-
sive aspects (Crowther et al. 2015a, 2015b, 2017; Isaacs and Trofimovich 2012;
Saito, Trofimovich, and Isaacs 2015; Saito et al. 2016a, 2016b).
To date, little research has delineated the qualities of perceived L2 compre-
hensibility and accentedness in ELF contexts (Pickering 2006) in a way that can
inform teaching practices. Additional empirical studies need to be conducted in
this context to measure these constructs and identify those linguistic factors
that can influence non-native listeners’ impressions of L2 accented speech.
More knowledge about these linguistic factors may help L2 teachers determine
which aspects of L2 speech deserve to be taught and which can be left out, en-
abling them to set appropriate learning goals. This information may also be
valid for teachers to help them gain better understanding of how to integrate
pronunciation skills with other linguistic areas such as grammar, lexis and dis-
course competence as well as to improve the way of assessing L2 speaking skills
(Celce-Murcia et al. 2010; Isaacs 2009; Kennedy and Trofimovich 2010; Saito
and Lyster 2011).
To address this research need, the current study sets out to explore the ex-
tent to which the English produced by international students affects compre-
hensibility and accentedness as measured by L1-Spanish prospective English
language teachers. The innovative nature of this classroom-based study lies in
the fact that comprehensibility and accentedness measurements will not rely
on expert native-speaker listener ratings (Piske, MacKay, and Flege 2001), but
on non-native speaker listeners. That is, 22 Spanish-L1 listeners judged a set of
5 picture narratives, recorded by different L2 learners from diverse L1 back-
grounds, using a 9-point numerical scale. Raters then wrote evaluative reports
whereby they identified and described the linguistic factors that had either en-
hanced or obstructed their understanding.
3 Research questions
This study presents a series of questions that constitute the main objective of
this research. In the first place, an answer will be given to the extent to which
speech in English, produced by international speakers with different accents,
L2 accented speech measured by Argentinian pre-service teachers 89
4 Method
4.1 Context and Argentinian participants
Data were collected from 22 Spanish-L1 students from a public university in Argen-
tina, studying the 2nd year of a TEFL Program. The group consisted of 4 men and
18 women, aged 19–25 (Sd=2.01). Upon their research consent, they completed the
listening task and the evaluative report as part of a classroom activity. At the time
of completing these tasks, none reported having had hearing problems.
Students from Argentina, Belgium, China, Japan and Poland recorded a picture
narrative, sequenced in a series of 8 pictures (Derwing et al. 2009). To avoid
task repetition effects (Bygate 2001; Lambert 2017), each student received a dif-
ferent set of 2 sequenced pictures of the same story. The missing photos were
replaced by blank spaces for the students to recreate their own narratives. They
were allotted 2-minute planning time before recording.
At the time of recording, these students were participating in an English as a
Foreign Language study program in St. Albans, England. Prior to data collection,
consent was requested from the school authorities to conduct the experiment.
The students were selected considering their level of linguistic competence in En-
glish (B1, as stipulated by the Common Framework of Reference for Languages,
(henceforth, CEFR)). As a requisite, before entering this school, all students took
a placement test administered by the same institution.
students’ speech samples. The trainees completed the listening task individually
in a university classroom. They were granted autonomy to listen to the speech
samples as many times as they needed. To determine comprehensibility meas-
urements, listeners judged each recording using a Likert-type scale with a pro-
gression of 1–9, in which 1 corresponded to L2 speech that was very difficult to
understand, while 9 was equivalent to L2 speech that was very easy to under-
stand. To establish degree of accentedness, listeners used the same scale in
which 1 represented very accented speech, while 9 indicated native-like accent.
The complementary task required listeners to write a brief evaluation report
in which they described the linguistic factors that had facilitated or obstructed
the realization of the measurement tasks (comprehensibility & accentedness). In
this complementary task, the students were asked to refer to segmental aspects
(pronunciation of individual vowels and consonants, deletion or addition of
sounds), prosody (word/sentence stress, rhythm and intonation), speech rate
(speakers’ overall pacing and speed of utterance delivery), lexical and grammati-
cal accuracy (speakers’ choice of words to accomplish the given task/grammati-
cal aspects in relation to word order, morphology tense inflections, plurals,
subject/verb agreement, among others) and fluency (flow, continuity, automatic-
ity, or smoothness of speech, often associated with frequency, length and distri-
bution of pauses).
4.4 Analysis
A descriptive analysis of data was carried out using frequency and spider graphs.
Simple linear Spearman correlation analyses were performed. An ANOVA analy-
sis of variance was also conducted with a significance level of P= 0.05 for each of
the variables measured, considering raters and speaker backgrounds as factors.
The effect sizes were estimated with the Cohen’s d coefficient.
5 Results
This section answers the questions raised in this research. We first inquired the
extent to which the variety of accents produced by L2 international speakers influ-
enced the measurements of comprehensibility and accentedness according to the
perception of a group of Argentine listeners. Figure 1 shows the relative frequency
of assessment for comprehensibility and accentedness according to speakers’
backgrounds. Tables 1 & 2 show the statistical analysis for these variables.
L2 accented speech measured by Argentinian pre-service teachers 91
60 Argentina Belgium
50
40
30
20
10
0
60 China Japan
Relative frequency (%)
50
40
30
20
10
0
60 Poland 1 2 3 4 5 6 7 8 9
Accentedness
Comprehensibility Likert-type scale
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9
Likert-type scale
Figure 1: Bar graph of the relative frequency for comprehensibility (black bars) and
accentedness (gray bars) for the different speaker backgrounds.
F.V. SS DF MS F p-value
F.V. SS DF MS F p-value
Table 3: Accentedness for the different nationalities. Values are the means ± EE.
The means were tested with two-way analysis of variance (ANOVA) for significant
effects. Different letters indicate significant differences (p > 0,05).
Table 4: Comprehensibility for the different nationalities. Values are the means ± EE.
The means were tested with two-way analysis of variance (ANOVA) for significant
effects. Different letters indicate significant differences (p > 0,05).
The second question was about the degree of association between compre-
hensibility and accentedness of English spoken by 5 international students and
evaluated by a group of Argentine raters. Results of this association are shown
in Figure 2 below.
10 Argentina Belgium
0
10 China Japan
Degree of accentedness
0
10 Poland 0 2 4 6 8 10
Comprehensibility
8
0
0 2 4 6 8 10
Comprehensibility
Figure 2: Dispersion graph between comprehensibility and accentedness for the different
speaker backgrounds. Only significant correlations (P <0.01) are shown as a solid line.
L2 accented speech measured by Argentinian pre-service teachers 95
In the third question, we delved into the variability among the raters’ assess-
ment results. Analyses of variance (Tables 1 & 2) do not show a statistically sig-
nificant rater effect for comprehensibility and accentedness, P = 0.2366 and
0.3229, respectively.
Our last question refers to the linguistic factors that, according to the raters’
opinions expressed in the complementary task, facilitated or hindered the reali-
zation of the measurement tasks. The most salient linguistic aspects, deter-
mined by frequency of occurrence, were counted, identified and clustered into
segments (pronunciation of individual vowels, consonants and deletion or ad-
dition of sounds), prosody (stress placement both at word and sentence levels),
rhythm (determined by the succession of stressed and unstressed syllables,
where stressed syllables tend to occur at roughly regular intervals of time), lex-
ico-grammar (the speaker’s choice of words to accomplish the task set, and as-
pects related to word order, sentence structure, morphology tense inflections,
plurals, agreement, among others), speech rate (speaker´s overall pacing and
speed of utterance delivery) and fluency (flow, continuity, automaticity, or
smoothness of speech, often associated with frequency, length and distribution
of pauses). The information from this analysis is summarized in the spider
graphs shown below.
With reference to the Argentine speaker, listeners recognized lexicogram-
mar aspects and sounds as compromising factors that affected the realization
of the listening task with rates of obstruction at 25% and 35%, respectively.
Contrastingly, it was estimated that the assistance rate of prosody and fluency
for successful task completion was about 28% and 56%, correspondingly.
96 Pedro Luis Luchini, Cosme Daniel Paz, María Claudia Troglia
Argentina Fluency
60
50
40
30
Prosody Sounds
Obstructing factors
Facilitating factors
Figure 3: Spider graph showing facilitating and obstructing factors for comprehensibility for
the Argentine speaker.
Regarding the Belgian speaker, the rate for sounds was 46%, emerging as
the most salient factor that hampered the measurement task. Conversely, listen-
ers rated prosody at 50% as the most noticeable factor that facilitated the listen-
ing task. Fluency, lexicogrammar and speech rate were assigned the same
frequency as both facilitating and obstructing factors. This could be due to
raters’ perceptual differences when judging unfamiliar accented speech.
As for the Chinese speaker, prosody averaged 39% as the main obstructing
factor for the measurement task, followed by sounds with an average about
21%. By contrast, fluency and sounds were labeled as facilitators, averaging
33% and 29%, respectively. It is worth pointing out that lexicogrammar aspects
were rated as both hindering and enabling components for the completion of
the listening task. This final result could be partly attributed to the natural vari-
ability in raters’ perceptive skills.
Concerning the Japanese speaker, raters identified no linguistic factor that fa-
cilitated the completion of the assessment task. However, it should be noted that
listeners concurrently rated all factors as obstacles. Sounds were the most ob-
structing component for the task completion, averaging 30%. Fluency and pros-
ody rates followed sounds in order of importance, with an average of 22% each.
L2 accented speech measured by Argentinian pre-service teachers 97
Belgium
Fluency
60
50
40
30
Lexico-grammar 20 Speech rate
10
0
Prosody Sounds
Obstructing factors
Facilitating factors
Figure 4: Spider graph showing facilitating and obstructing factors for comprehensibility
for the Belgian speaker.
China
Fluency
60
50
40
30
Lexico-grammar 20 Speech rate
10
0
Prosody Sounds
Obstructing factors
Facilitating factors
Figure 5: Spider graph showing facilitating and obstructing factors for comprehensibility
for the Chinese speaker.
98 Pedro Luis Luchini, Cosme Daniel Paz, María Claudia Troglia
These results illustrate the role that pronunciation (sounds & prosody) plays in
speech perception and production, and why it should be a concern in foreign lan-
guage classrooms.
Japan
Fluency
60
50
40
30
Prosody Sounds
Obstructing factors
Facilitating factors
Figure 6: Spider graph showing facilitating and obstructing factors for comprehensibility
for the Japanese speaker.
Finally, as to the Polish speaker, listeners rated fluency as the major factor that
obstructed the realization of the listening task with 39%. Lexicogrammar fol-
lowed fluency in order of importance with an average of 29%. In contrast,
speech rate, sounds and prosody were reported as enabling factors for the ac-
complishment of the listening task with averages ranging from 18%, 27% and
36%, respectively. These findings exhibit the contribution of prosodic charac-
teristics in the perception of foreign accented speech.
In the next section, the four questions initially posed will be critically ana-
lyzed, covering general aspects of the results obtained.
L2 accented speech measured by Argentinian pre-service teachers 99
Poland
Fluency
60
50
40
30
Prosody Sounds
Obstructing factors
Facilitating factors
Figure 7: Spider graph showing facilitating and obstructing factors for comprehensibility for
the Polish speaker.
6 Discussion
For the first question, in general terms, the Argentine, Chinese, Polish and Bel-
gian speakers were perceived with a high degree of comprehensibility, while
the Japanese speaker received the lowest scores. Regarding accentedness, the
Argentine, Belgian and Japanese speakers were perceived with a high degree of
foreign accent. The Chinese, however, was perceived as having a low foreign
accent, while the Polish was rated at a medium level. In all, the speakers’ L1
background had a statistically significant effect on the two variables analyzed.
There is little research on speakers’ L1 effect on listener ratings of L2 compre-
hensibility and accentedness. Some of these studies have revealed mixed find-
ings. Anderson-Hsieh, Johnson, and Koehler (1992) showed that prosody was
highly correlated with speakers’ L2 assessment scores notwithstanding their L1
background, while sound deviations were dependent on speakers’ L1. Kang
(2010), on his part, demonstrated that Asian learners (China/Japan) had a
stronger foreign L2 accent than other speakers with different L1 backgrounds
(Arabic, Russian, etc.) due to recurrent misuse of emphatic stress. Crowther
et al. (2014) confirmed that the relative association between comprehensibility
100 Pedro Luis Luchini, Cosme Daniel Paz, María Claudia Troglia
and accentedness with linguistic factors varies according to the speakers’ L1. In
their study, they stated that Chinese speakers’ L2 perception was highly influ-
enced by segmental aspects, Hindi speakers by lexico-grammar variables, and
Farsi speakers showed no correlation with any of the linguistic factors exam-
ined. Derived from our findings, we can conclude that the speakers’ L1s played
a crucial role in determining listeners’ ratings of the L2 speech dimensions ex-
plored. The next research step to follow would be then to carry out an investiga-
tion that allows us to identify and explain what constitutes the nature of our
correlation.
The second question enquired about the correlation between the variables
of comprehensibility and accentedness. Comprehensibility and accentedness
seem to operate independently. Comprehensibility is associated with several
linguistic factors, including prosody, speech rate, lexis and grammatical as-
pects of speech, while accentedness is primarily tied to segmental accuracy and
word stress (Saito, Trofimovich, and Isaacs 2017; Trofimovich and Isaacs 2012).
In the present study, the listeners’ ratings for comprehensibility and accented-
ness for the Argentine, Chinese and Polish speakers showed no association. Al-
though these speakers’ L2 speech were perceived with a strong foreign accent,
raters considered them fairly comprehensible. However, for the Belgian and
Japanese speakers a positive linear correlation was observed. This means that
L2 speech perceived by raters with a strong foreign accent also required greater
cognitive effort on their part to be understood.
In the third question, rater consistency was evaluated. Although there was
variation among the raters’ scores, these differences were not statistically sig-
nificant, which means that assessors largely behaved similarly in how they
rated.
The fourth question examined the complementary task in which listeners
identified the linguistic aspects that, in their understanding, had facilitated or
hindered the realization of the measurement tasks. Generally, fluency emerged
as a facilitating factor for the Argentine and Chinese speakers, while it became
a hindrance for the Japanese and Polish speakers. Fluency had the same pro-
portion as both facilitating and impeding factor for the Belgian speaker. For
both the Argentine and Belgian speakers, prosody served as a promoting factor.
However, for the Japanese and Chinese speakers, prosody hindered under-
standing. Segment accuracy, on their part, constituted an adverse factor for the
Argentine, Belgian and Japanese speakers, while for the Chinese and Polish
speakers they facilitated task completion. Speech rate scored similar results
both as a facilitator and an impeding factor for the Argentine, Chinese and Bel-
gian speakers. This factor was facilitating for the Polish speaker, while for the
Japanese it became an obstacle. Finally, for the Argentine and Polish speakers
L2 accented speech measured by Argentinian pre-service teachers 101
The net contribution results from the difference between facilitating and obstructing factors’
frequency.
102 Pedro Luis Luchini, Cosme Daniel Paz, María Claudia Troglia
source of input in English for them. Foreign accent may also lead them to social
or professional discrimination among other non-native and native teachers
alike (Derwing, Rossiter, and Munro 2002). To avoid this, non-native teachers
need to reduce their foreign accents.
The most common task used to elicit L2 speech for measurements of com-
prehensibility and accentedness has been a picture narrative. Nearly all data
that show the linguistic interdependence between these two L2 speech dimen-
sions emerge from this single task type. Few studies have examined this phe-
nomenon across different task types (Crowther et al. 2015a, 2015b, 2017), and
based on their results, task-type effects should not be ignored (Skehan 2009;
Skehan and Foster 1997). It would thus be interesting to conduct similar studies
to the present one using different task types and compare results.
8 Conclusion
This study allowed us, on the one hand, to inquire about the influence of L2
accented speech on the attribution of comprehensibility and accentedness. On
the other hand, we were also able to distinguish the linguistic factors that had
an impact on the L2 speech perception of Argentine listeners. The effect of the
influence from L1 to L2 of each international speaker clearly affected the assess-
ment results of comprehensibility and accentedness. A direct correlation be-
tween these variables was observed in the Belgian and Japanese speakers, but
not in the rest. Among the linguistic factors that influenced the Argentine lis-
teners’ L2 speech perception, fluency and prosody proved, in general, to have
helped them complete the measurement task successfully, while sounds and
lexicogrammar emerged as obstructing factors. These findings may shed light
on new pedagogical paradigms for teaching L2 pronunciation in diverse con-
texts, including ELF. It would certainly be valuable for other teachers and re-
searchers to replicate and expand on this study, incorporating speakers and
listeners from different L1 backgrounds. Cross comparisons of this kind may
contribute to elucidate relevant L2 pronunciation aspects and features which
should necessarily be a focus in pronunciation classrooms for teachers to help
learners become more efficient in their L2 production.
104 Pedro Luis Luchini, Cosme Daniel Paz, María Claudia Troglia
References
Anderson-Hsieh, Janet, Ruth Johnson & Kenneth Koehler. 1992. The relationship between
native speaker judgments of nonnative pronunciation and deviance in segmentals,
prosody, and syllable structure. Language Learning 42(1). 529–555. doi:10.1111/j.1467-
1770.1992.tb01043.x
Bøhn, Henrik & Thomas Hansen. 2017. Assessing pronunciation in an EFL context: Teachers’
orientations towards nativeness and intelligibility. Language Assessment Quarterly 14(1).
54–68. doi: 10.1080/15434303.2016.1256407
Bygate, Martin. 2001. Effects of task repetition on the structure and control of oral language.
In Martin Bygate, Peter Skehan & Merryl Swain (eds.), Researching Pedagogic
Tasks: Second Language Learning, Teaching and Testing, 23–48. London: Pearson
Education Limited.
Celce-Murcia, Marianne, Donna Brinton, Janet Goodwin & Barry Griner. 2010. Teaching
Pronunciation: A Course Book and Reference Guide. Cambridge: Cambridge University
Press.
Crowther, Dusting, Pavel Trofimovich, Talia Isaacs & Kazuya Saito. 2015a. Does a speaking
task affect second language comprehensibility? The Modern Language Journal 9(1).
80–95. doi:10.1111/modl.12185
Crowther, Dustin, Pavel Trofimovich, Kazuya Saito & Talia Isaacs. 2014. Second language
comprehensibility revisited: Investigating the effects of learner background. TESOL
Quarterly 49(4). 814–837.
Crowther, Dustin, Pavel Trofimovich, Kazuya Saito & Talia Isaacs. 2015b. Second language
comprehensibility revisited: Investigating the effects of learner background. TESOL
Quarterly 49(4). 814–837.
Crowther, Dustin, Pavel Trofimovich, Kazuya Saito & Talia Isaacs. 2017. Linguistic dimensions
of L2 accentedness and comprehensibility vary across speaking tasks. Studies in Second
Language Acquisition 40(2). 443–457. doi:10.1017/S027226311700016X
Derwing, Tracey & Murray Munro. 2005. Second language accent and pronunciation teaching:
A research-based approach. TESOL Quarterly 39(3). 379–397. doi:10.2307/3588486
Derwing, Tracey & Murray Munro. 2009. Putting accent in its place: Rethinking obstacles to
communication. Language Teaching 42(4). 476–490.
Derwing, Tracey & Murray Munro. 2015. Pronunciation Fundamentals: Evidence-based
Perspectives for L2 Teaching and Research. Amsterdam: John Benjamins.
Derwing, Tracy, Murray Munro & Ron Thomson. 2004. Second language fluency: Judgments on
different tasks. Language Learning 54(4). 655–679.
Derwing, Tracey, Murray Munro, Ron Thomson & Marian Rossiter. 2009. The relationship
between L1 fluency and L2 fluency development. Studies in Second Language Acquisition
31(4). 533–557.
Derwing, Tracey, Murray Munro & Grace Wiebe. 1998. Evidence in favor of a broad framework
for pronunciation instruction. Language Learning 48(3). 393–410.
Derwing, Tracey, Marian Rossiter & Murray Munro. 2002. Teaching native speakers to listen to
foreign-accented speech. Journal of Multilingual and Multicultural Development 23(4).
245–259.
Isaacs, Talia. (2009). Integrating form and meaning in L2 pronunciation instruction. TESL
Canada Journal 27(1). 1–12.
L2 accented speech measured by Argentinian pre-service teachers 105
Isaacs, Talia & Pavel Trofimovich. 2012. Deconstructing comprehensibility: Identifying the
linguistic influences on listeners’ L2 comprehensibility ratings. Studies in Second
Language Acquisition 34(3). 475–505. doi:10.1017/S0272263112000150
Jenkins, Jennifer. 2000. The Phonology of English as an International Language. Oxford:
Oxford University Press.
Kang, Okim. 2010. Relative salience of suprasegmental features on judgments of L2
comprehensibility and accentedness. System 38(2). 301–315. doi: 10.1016/j.system.
2010.01.005
Kennedy, Sara & Pavel Trofimovich. 2010. Language awareness and second language
pronunciation: A classroom study. Language Awareness 19(3). 171–185.
Lambert, Craig. 2017. Tasks, affect and second language performance. Language Teaching
Research 21(6). 657–664. doi:10.1177/1362168817736644
Lee, Junkyu, Juhyun Jang, & Luke Plonsky. 2014. The effectiveness of second language
pronunciation instruction: A meta-analysis. Applied Linguistics 36 (3).345–366. 10.1093/
applin/amu040.
Levis, John. 2005. Changing contexts and shifting paradigms in pronunciation teaching.
TESOL Quarterly 39(3). 369–377.
Levis, John. 2006. Pronunciation and the Assessment of Spoken Language. In Rebecca Hughes
(ed.), Spoken English, TESOL and Applied Linguistics, 245–270. London: Palgrave
Macmillan doi.org/10.1057/9780230584587_11
Munro, Murray & Tracey Derwing. 1995. Foreign Accent, Comprehensibility, and Intelligibility
in the speech of second language learners. Language Learning 45(1). 73–97. https://doi.
org/10.1111/j.1467-1770.1995.tb00963.x
Munro, Murray & Tracey Derwing. 1999. Foreign accent, comprehensibility, and intelligibility
in the speech of second language learners. Language Learning 49(1). 285–310. https://
doi.org/10.1111/0023-8333.49.s1.8
Pickering, Lucy. 2006. Current research on intelligibility in English as a Lingua Franca. Annual
Review of Applied Linguistics 26. 219–233. doi:10.1017/S0267190506000110
Piske, Thorsten, Ian MacKay & James E. Flege. 2001. Factors affecting degree of foreign accent
in an L2: A review. Journal of Phonetics 29. 191–215 doi:10.006/jpho.2001.0134
Rossiter, Marian, Tracey Derwing, Linda Manimtim & Ron Thomson. 2010. Oral fluency: The
neglected component in the communicative language classroom. The Canadian Modern
Language Review 66(4). 583–606.
Saito, Kazuya. 2013. Effects of instruction on L2 pronunciation development: A Synthesis of 15
quasi-experimental intervention studies. TESOL Quarterly 46(4). 842–854.
Saito, Kazuya & Roy Lyster. 2011. Effects of form-focused instruction and corrective feedback
on L2 pronunciation development of /ɹ/ by Japanese learners of English. Language
Learning 62(2). 595–633. https://doi.org/10.1111/j.1467-9922.2011.00639.x
Saito, Kazuya, Pavel Trofimovich & Talia Isaacs. 2015. Second language speech production:
Investigating linguistic correlates of comprehensibility and accentedness for learners at
different ability levels. Applied Psycholinguistics 37(2). 217–240. doi:10.1017/
S0142716414000502
Saito, Kazuya, Pavel Trofimovich & Talia Isaacs. 2017. Using Listener Judgments to Investigate
Linguistic Influences on L2 Comprehensibility and Accentedness: A Validation and
Generalization Study. Applied Linguistics 38(4). 439–462. doi:10.1093/applin/amv047
106 Pedro Luis Luchini, Cosme Daniel Paz, María Claudia Troglia
Saito, Kazuya, Stuart Webb, Pavel Trofimovich & Talia Isaacs. 2016a. Lexical profiles of
comprehensible second language speech. Studies in Second Language Acquisition 38(4).
677–701. doi:10.1017/S0272263115000297
Saito, Kazuya, Stuart Webb, Pavel Trofimovich & Talia Isaacs. 2016b. Lexical correlates of
comprehensibility versus accentedness in second language speech. Bilingualism:
Language and Cognition 19(3). 597–609. doi:10.1017/S1366728915000255
Seidlhofer, Barbara. 2011. Understanding English as a Lingua Franca. Oxford: Oxford
University Press.
Skehan, Peter. 2009. Modelling second language performance: Integrating complexity,
accuracy, fluency and lexis. Applied Linguistics 30(4). 510–532.
Skehan, Peter & Pauline Foster. 1997. Task type and task processing conditions as influences
on foreign language performance. Language Teaching Research 1. 185–211.
Trofimovich, Pavel & Talia Isaacs. 2012. Disentangling accent from comprehensibility.
Bilingualism: Language and Cognition 15(4). 905–916. doi:10.1017/S1366728912000168
Walker, Robin. 2010. Teaching the Pronunciation of English as a Lingua Franca. Oxford: Oxford
University Press.
Jeniffer Imaregna Alcantara de Albuquerque,
Ubiratã Kickhöfel Alves
Dynamic paths of intelligibility
and comprehensibility: Implications
for pronunciation teaching from
a longitudinal study with Haitian learners
of Brazilian Portuguese
Abstract: An agenda of studies has shed some light on pronunciation phenom-
ena through the lens of intelligibility and comprehensibility studies (Derwing
and Munro 2015; Munro and Derwing 1995) as complex, dynamic and multimodal
constructs (Albuquerque 2019; Nagle, Trofimovich, and Bergeron 2019; Nagle
et al. 2021; Zielinski and Pryor 2020). This chapter presents the results of a 12-
point longitudinal data collection conducted with three Haitian speakers (all of
them with different lengths of residence in Brazil and showing different profi-
ciency levels in Portuguese) when listened by two Brazilian listeners (showing
different levels of experience in Second Languages and exhibiting different de-
grees of contact with foreigners) and discusses intelligibility and comprehensibil-
ity in the speaker-listener binomial relationship. The study included an oral
repetition task (aiming to obtain the listeners’ oral comprehension of the speak-
ers’ productions) and a comprehensibility task (with a 9-point Likert scale). Re-
sults indicate individual differences between listener-speaker relationships, as
variability may lead to learning (Lowie and Verspoor 2019). Intelligibility and
comprehensibility results reveal an influence of the participants’ personal profile,
i.e., contact with foreigners (for the listeners), formal versus informal language
learning process and amount of time in immersion context (for the speakers).
Both constructs varied in the binomial relationships, and they seemed connected
to both speakers’ improvement in lexical complexity and pronunciation and lis-
teners’ ability to accommodate new data from the speaker’s productions. Our
general findings suggest benefits of a binomial listener-speaker pairing design in
Acknowledgements: The longitudinal study from which our data is drawn was partly funded by
the Brazilian government (CAPES and CNPq funding agencies). We are deeply grateful to the
participants in our data collections.
https://doi.org/10.1515/9783110736120-005
108 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
1 Introduction
Although there seems to be no one-size-fits-all view regarding pronunciation
teaching (Levis 2020), one of the most prominent research agendas since the
late 80’s has been implemented by Tracey Derwing and Murray Munro in their
discussions on intelligibility and comprehensibility. A wealth of previous re-
search under the authors and collaborators’ view investigated several variables
and contexts underlying the above-mentioned constructs: listener judgments and
its connection to pronunciation changes (Derwing, Munro, and Wiebe 1998); the
distinction between the constructs of intelligibility, comprehensibility and ac-
centedness (Derwing and Munro 1997); a closer analysis on comprehensibility
judgments and its specific features (Isaacs and Trofimovich 2012); the influence
of methodological features regarding speech assessment (O’Brien 2014); pedagog-
ical aspects towards intelligibility (Derwing and Munro 2015; Levis 2020). Among
these contributions, there are fewer works on longitudinal and dynamic analyses
of speech rating (Albuquerque and Alves 2020; Derwing and Munro 2013; Nagle,
Trofimovich, and Bergeron 2019; Nagle et al. 2021), which is our major focus in
this chapter.
To cope with this dynamic view of language, many studies see develop-
ment as a constant change scenario instead of a one point in time picture (De
Bot 2017; Larsen-Freeman 2015; Lowie 2017; Lowie and Verspoor 2019; Verspoor
et al. 2011; Verspoor, Lowie, and Van Dijk 2008). Yet, dynamic studies on in-
telligibility and comprehensibility do vary on the notion of “time” and time-
scales, whether operationalizing it as a real-time multiple click measurement
(Nagle, Trofimovich, and Bergeron 2019; Nagle et al. 2021) or as a change in an
L2 learner’s trajectory in months/years (Albuquerque 2019; Albuquerque and
Alves 2020; Zielinski and Pryor 2020). In addition, another important premise
of Complex Dynamic Systems Theory (CDST) is assuming variability as part of
the system’s changes and as a potential force towards learning (Lowie and
Verspoor 2019; Van Geert and Van Dijk 2002). As Larsen-Freeman (2020: 295)
argued, variability should not be set aside language teaching and learning
theories; instead, it should be considered an “indispensable source of infor-
mation”, since it may lead to new learning processes.
Dynamic paths of intelligibility and comprehensibility 109
2 Background literature
2.1 L2 oral intelligibility and comprehensibility:
Contingency in the migration processes
Brazil, especially the south of the country, has received a great number of refugees
who faced natural disasters and war incidents in their homelands. According to the
United Nations High Commissioner for Refugees (UNHCR) 2019 report, 79.5 million
people were forcibly displaced of their countries seeking international protection.
Brazil is the fifth country in the world to receive more asylum-seekers, providing
around 260 thousand people with temporary or long-term asylum (UNHCR 2020).
In this scenario, an unprecedented migration process from Haiti took place
after the 2010 earthquake that devastated the country. Therefore, learning Brazil-
ian Portuguese became the most urgent demand to those migrants. According to
Cadely (2012 apud Silva 2015), Haitians speak Haitian Creole and about 10% of
the population speak French (those who were able to receive formal instruction).
Also, some speak and understand a little of Spanish (due to geographical influ-
ence, since the country is surrounded by Spanish-speaking countries). As fasci-
nating as this language mixture may sound (especially when assuming a complex
dynamic system perspective), this new context of teaching brought up a lot of
110 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
doubts. However, not many studies have conducted a deeper investigation about
the oral production and comprehension aspects in these communities.
In the last five years, some studies have been trying to focus on the difficul-
ties in oral production and comprehension which emerge from the linguistic
contact between Haitians and Brazilians (Albuquerque and Alves 2017, 2020;
Machry Da Silva 2017; Silva 2015) and comprehension strategies which may
help them to sound more intelligible. In line with these works, the present
study aims to fill in an L2 diversity gap (since most studies concern English as
an L2 and few studies focus on contributions from other languages), as well as
provide future input for further studies on the interaction between Haitians and
Brazilians.
The data discussed in this chapter may help pronunciation studies to fill in
at least three important gaps. The first one is connected to the fact that most
works focus on developmental data of English as an L2, i.e., on the oral produc-
tion of non-native speakers of English, whether more basic or advanced learn-
ers, in perception or intelligibility and comprehensibility studies whose judges
are usually native listeners. By analyzing data from Brazilian Portuguese, we
may broaden the scope of more linguistic and sociolinguistic findings to lesser-
researched languages.
The second gap is related to the common assumption that learners can usu-
ally achieve higher levels of proficiency in an L2 if they are in an immersion
context, i.e., developing the language by living in a country where this lan-
guage is used as a native one. However, it is important to situate the Haitian
learners in the Brazilian context. According to Norton (2013), when learning a
language, it is important that learners have access to both symbolic and mate-
rial resources, the first connected to cultural aspects and the second with mate-
rial benefits that may emerge from learning the language, as getting a job, good
housing, etc. Just living in the country, unfortunately, does not guarantee a
complete immersion in the language or having access to these symbolic and
material resources. A report by the UNHCR (2020) indicates that even after liv-
ing for more than one year in Brazil, a great number of Haitians do not feel
fully immersed in the country or are able to achieve a solid basic level in Brazil-
ian Portuguese (Albuquerque 2019).
Last, but not least, the third gap, and maybe the key one in our investiga-
tion, is connected to the lack of studies that focus on Haitian learners’ personal
trajectories. By taking into account individual development over time, we also
assume that language, as well as its associated constructs as intelligibility and
comprehensibility, are complex, dynamic systems. Therefore, by adopting a
CDST approach, one can propose major implications to both language develop-
ment and L2 pronunciation teaching.
Dynamic paths of intelligibility and comprehensibility 111
this mistake may not be directly connected only to speech production issues,
but with greater cognitive functions related to memory and orthographic proc-
essing, for example. Munro and Derwing (2020) acknowledge this argument
and add some questioning about memory load, which Kang, Thomson, and
Moran (2018) had previously referred to, explaining that a transcription task
could increase the working memory overload and its potential impact in the
results. Aiming to reflect about the role of transcription in intelligibility stud-
ies, following Albuquerque (2019), Alves, Albuquerque, and Bondaruk (2021)
and De Weers (2020), we propose that the construct could be operationalized
in a way that promotes a more active reply from participants, by allowing
them to recover either fine detail or more general information (i.e., individual
sounds, group of sounds, semantic content) through an oral repetition task.
This method of data collection was employed in the present study.
Notwithstanding the exposed gaps (which are intrinsic aspects of any ob-
served phenomenon), there is a growing discussion concerning both intelligi-
bility and comprehensibility as dynamic processes. Despite the fact that the
term ‘dynamic’ was not in Derwing and Murray’s first works, in many of their
investigations (and related studies) that followed the seminal contribution of
1995, the authors shed light on the dynamic aspects of oral interaction, e.g., on
listeners’ variability in judgements and its relation towards both intelligibility and
comprehensibility ratings; the dependance of speakers’ intelligibility on their life
trajectories and multiple variables that have influenced their language develop-
ment; the interdependence of listener and speaker in an interaction. This moti-
vates us to pursue a dynamic account of intelligibility and comprehensibility.
using a timescale of 2–5 minute intervals (in the 2019 study) and of 2–3 minute
intervals using 100-millimeter scales, obtaining seven ratings per interlocutor in a
17-minute task interaction (in the 2020 study). The general results of the 2019
study pointed out to a great deal of individual variability and to the fact that clips
that received lower grades would frequently receive lower global ratings. In turn,
Nagle et al. (2021) not only showed a U-shaped function of comprehensibility rat-
ings throughout time (i.e., beginning with high levels and also finishing with high
ones), but also a ‘pairability’ among interlocutors’ ratings, in the sense that their
evaluation became quite similar over time.
The findings reported by Nagle, Trofimovich, and Bergeron (2019) and
Nagle et al. (2021) pave a more organic path towards comprehensibility. How-
ever, some questions concerning how individual variability plays a role still re-
main, since the overall results of these studies focus more on revealing group
tendencies and group alignment.
In addition, another study focusing on a dynamic view of comprehensibility
was presented in Zielinski and Pryor (2020), which is an exploratory investigation
of everyday English use in individual trajectories over time. The study was con-
ducted in a 10-month timescale with 14 L2 English learners (8 beginners and 6
intermediate), who were interviewed four times during this period. Besides taking
into account the comprehensibility of beginners, who are not commonly investi-
gated, the study highlights learners’ non-linear trajectories of English use, by
showcasing the importance of individual variability. Our study is aligned with this
investigation since we understand that in order to analyze ‘change’, whether in L2
comprehensibility or intelligibility, individual variability must be accounted for.
These issues considered, the current study sees both intelligibility and compre-
hensibility as imbricated in a comprehension gradient, in which there are stages of
more macro or micro tuning of different subsystems’ (e.g. phonic, lexical, syntac-
tic, semantic) association and a constant cognitive accommodation process (Alves,
Albuquerque, and Bondaruk 2021). It is important to state that this recognition or
tuning process does not follow a linear order. We stress the need for insights in the
process of intelligibility and comprehensibility development and how it changes
over time for the listener-speaker binomial relationship.
4 Methods
4.1 Participants
Speakers
S S S
Age
Table 1 (continued)
Speakers
S S S
Listeners
LA LB LC
Age
For both speakers and listeners, the study used a time window of 6 months,
within a time scale of each 15 days, which in total resulted in 12 data points in
time. The data collection points for listeners and speakers can be seen in Figures 1
and 2 and were based on Yu and Lowie’s (2020) layout.
As for recording and sentence edition, all speakers’ data were segmented
on Praat, version 6.0.53 (Boersma and Weenink 2019). Moreover, the audios
were edited on Audacity, version 2.3.2 (2019) and normalized at −5dB intensity.
Speakers would receive weekly general oral and written linguistic feedback
from the first author, so that they could keep training their Portuguese.
https://www.ufrgs.br/acervocelpebras/acervo/
118 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
18
18
18
18
19
19
19
19
19
19
19
19
20
20
20
20
20
20
20
20
20
20
20
20
1/
1/
1/
2/
1/
1/
2/
2/
3/
3/
3/
4/
/1
/1
/1
/1
/0
/0
/0
/0
/0
/0
/0
/0
02
16
30
14
04
18
01
15
01
15
29
12
Figure 1: Speakers recording dates (dd/mm/yyyy).
18
18
18
18
19
19
19
19
19
19
19
19
20
20
20
20
20
20
20
20
20
20
20
20
1/
1/
2/
2/
1/
1/
2/
2/
3/
3/
4/
4/
/1
/1
/1
/1
/0
/0
/0
/0
/0
/0
/0
/0
05
19
03
17
07
21
04
18
04
18
01
15
Figure 2: Listeners’ receiving dates (dd/mm/yyyy).
As for the listeners’ group, on the first day of the data collection, they
received an email with all the necessary guidelines to perform the audio
evaluation. Each week, they would receive the data pack and download it to
their personal computer. Overall, listeners had the task to evaluate all the
audios, save them in a .zip file and send them back to the researcher. All data
were obtained on the AEPI2 app (Bondaruk, Albuquerque, and Alves 2018).
The intelligibility data were coded from the listener’s oral productions based on
what they were able to repeat or explain from the Haitian speakers’ produc-
tions. As this was an oral production task, listeners were told they could orally
reproduce a wide range of nuances: from some sounds, full words or the whole
sentence. In order to score the points, the researcher took into account content
words, i.e., if listeners were not able to retrieve the articles or prepositions in a
sentence as “In Brazil it is too hot”, this would not be considered a mistake
since they are function words and do not carry the main meaning of the sen-
tence. Each content word was scored as a point and the total amount of correct
words was converted in percentage values. In addition, the comprehensibility
data were coded using the raw Likert scale scores for each data point.
The data were analyzed using moving min and max graphs and Monte
Carlo simulations, according to the methodology proposed by Verspoor, De Bot,
and Lowie (2011). In the moving min-max graphs, data can be analyzed by the
moving minima, maxima and values which can be depicted, and variability pat-
terns may be seen through different bandwiths. This way, we had access to the
binomial’s developmental trajectory and potential changes in both intelligibil-
ity and comprehensibility over time. The moving average of both intelligibility
and comprehensibility performances and the moving minima and maxima of
the two constructs were extracted by a predetermined moving window of 2 posi-
tions (as the total data point is composed of 12 points in time). Each point in
time presented a set of 14 sentences that were extracted from the conversations
with the Haitian speakers (being four sentences for Speaker 1, five sentences for
Speaker 2 and five sentences for Speaker 33). Monte Carlo simulations were run
to explore possible unexpected changes in the binomial developmental trajec-
tory. The simulations were calculated through resampling the original data and
reshuffling them 5000 times (Van Geert, Steenbeek, and Kunnen 2012).
5 Results
Albuquerque (2019) pointed out that through a product, inferential analysis of
the whole group, it could be generally observed that intelligibility decreased
from data point 1 (first data collection point) to 12 (last data collection point). Al-
though obtaining a descending curve seemed to be counterintuitive, the individ-
ual speaker-listener binomial analyses pointed out that none of the binomial
relationships between speakers and listeners presented such a high decrease
movement at data point 12. In contrast, non-linear developmental trajectories
among the binomial relationships were observed as the main occurrence. More-
over, the descriptive differences among the binomials led to individual differen-
ces that seemed to depend on the speaker-listener relationship, i.e., it could not
be stated that a speaker is intelligible by him/herself or that the listener is able to
linearly understand random speakers throughout time. Not only intelligibility,
but also comprehensibility seem to vary among the binomial relationships, and
they may be connected to both speakers’ improvement in lexical complexity and
The uneven number of words produced by the participants is connected to the participants’
proficiency level, i.e., Speaker 1 was the least proficient among all participants and was not
able to produce full/complete sentences in the first data points. Therefore, it was decided to
maintain the speaker’s productions in the sample as she was the only more basic participant
in the study, and because a CDST approach reinforces the need for natural data.
120 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
pronunciation and listeners’ ability to accommodate new data and nuances from
the speakers’ productions.
Speaker 1
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11 12
A B C Average
Speaker 2
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11 12
A B C Average
Speaker 3
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11 12
A B C Average
(30%). Also, the speakers seemed to present different learning stages. Standing
by the premise that none of the stages are linear, Speaker 1 seemed to present
three stages, one that went from data point 1 to 8 and another from data point 8
to 10, and a smaller one from 10 to 12. Data point 8 could be taken as a develop-
mental “jump” (in which intelligibility was at 36% for most listeners) and it
reached 56% for Listener A and 80% for Listener C, in data point 9, and 95% for
122 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
Listener B, in data point 10. However, Speakers 2 and 3 displayed more diffuse
developmental stages when taking account of different listeners, especially tak-
ing into account some peaks (e.g. between data points 3–5) and valleys (e.g. for
Speaker 3 at data point 7 and for Speaker 2 at data point 8).
Figures 6–8 present the intelligibility graphs in which we focus on the lis-
teners’ rating patterns.
Listener A
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 Average
Listener B
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 Average
Listener C
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 Average
Again, when focusing on the listeners and their rating trajectories, a great
deal of fluctuation can be observed. A converging point among the graphs seems
to rely on how different listeners rate speakers who present a basic knowledge of
Portuguese, since Speaker 1 received the lowest intelligibility ratings, being the
lowest one among all the speakers and for all listeners at data point 8, varying
from 33% to 38%. Nevertheless, Speaker 1 is also the one who received the high-
est intelligibility ratings from all listeners towards the last data point. When ac-
counting for developmental stages, one cannot either clearly draw this scenario
or point out that specific listeners are more dynamic raters than others. Notwith-
standing, it may be observed that until their third/fourth data point mark, listen-
ers did not seem to vary a lot in their ratings, since a more dynamic perception
started to take place from data point 7/8 onwards.
Interestingly, Figures 6–8 portray a very diverse scenario, in which it is diffi-
cult to point out specific developmental stages for all listeners and all speakers,
since results portray a potential influence of the speaker-listener relationship
over intelligibility ratings. As visual inspection may work as a resourceful tool to
analyze variability in longitudinal studies (Van Dijk, Verspoor, and Lowie 2011),
we present the moving min-max graphs for all listeners and speakers in their bi-
nomial settings.
We present below the results for the min-max graphs for the intelligibility
construct, in the selected binomial relationships: (i) S1-LA; S1-LB; S1-LC; (ii) S2-LA;
S2-LB; S2-LC; (iii) S3 – LA; S3-LB; S3-LC. Overall, it can be observed that re-
sults reached ceiling effects, which is going to be discussed at the end of this
124 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
chapter, concerning the methodological issues of the study. Thus, all min-
max graph analyses will take into account the min results.
In general, in Figure 9 some fluctuation for the binomial settings S1-LA and
S1-LB can be observed towards the first half of the data points, from data point 1 to
5, and for S1-LC, from data point 1 to 3. Yet, all binomial settings seem to present a
rather stable development in the mid part of the data points and an increase of
the min values from data points 9 to 10 (S1-LA, from 0% to 33% and S1-LB, from
0% to 66%) and from data points 8 to 9 (S1-LC, from 20% to 50%), which may
indicate a developmental change for some of the binomial relationships.
In the min-max graphs of S2-LA; S2-LB; S2-LC shown in Figure 10, one can
observe some growing fluctuations in the intelligibility scores in two moments
of the binomial setting S2-LA (from data points 1 to 2, from 33% to 75%, and
from data points 6 to 7, from 33% to 50%). Moreover, in the graphs of S2-LB, we
can also find moments of fluctuations which may indicate that intelligibility in-
creases (from data points 5 to 6, from 0% to 66%, and from data points 8 to 9,
from 0% to 25%) and a great descending moment (from data points 1 to 4, from
60% to 0%). Also, for S2-LC a possible, but minor developmental peak may be
observed in one moment (from data points 4 to 5, from 50% to 60%), as well as
a great descending moment (from data points 9 to 10, from 50% to 0%).
As for the binomial relationships S3 – LA; S3-LB; S3-LC, despite the fact
that we can also observe ceiling effects, different valley and peak patterns can
be seen. A rather wider bandwidth (the lowest and highest values of fluctuation
moments) can be observed in the graphs in Figure 11, which can be connected
to how listeners were accommodating the speaker’s productions. One can no-
tice a major growing fluctuation for S3 – LA in one moment (from data points 7
to 8, from 0% to 75%). In addition, S3-LB presented two slightly similar mo-
ments indicating an increase in intelligibility scores (from data points 7 to 8,
from 0% to 71% and from data points 10 to 11, from 0% to 37%). Both binomial
relationships presented a min line showing a larger increase in data points 7 to
8. Finally, S3-LC presented two major growing moments in their intelligibility
scores (from data points 3 to 4, from 25% to 62%, and from data points 8 to 10,
from 62% to 100%).
The variability ranges can point out to different accommodation processes
between each listener-speaker relationship. They might be the result of coinci-
dental fluctuations or significant “tuning” moments, which may be evoked by
an “oh, I got it” assumption, in which speakers decide to use new and risky
forms and listeners try to accommodate this content.
Taking into account the previously mentioned binomial relationships, we
can observe that variability is present in all binomial relationships and all of
them, in different data points, seem to present developmental jumps. In order
Dynamic paths of intelligibility and comprehensibility 125
Speaker 1 - Listener A
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11
Speaker 1 - Listener B
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11
Speaker 1 - Listener C
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11
Figure 9: Moving min-max intelligibility binomials for Speaker 1 (S1-LA, S1-LB, S1-LC).
126 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
Speaker 2 - Listener A
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11
Speaker 2 - Listener B
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11
Speaker 2 - Listener C
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11
Figure 10: Moving min-max intelligibility binomials for Speaker 2 (S2-LA, S2-LB, S2-LC).
Dynamic paths of intelligibility and comprehensibility 127
Speaker 3 - Listener A
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11
Speaker 3 - Listener B
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11
Speaker 3 - Listener C
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1 2 3 4 5 6 7 8 9 10 11
Figure 11: Moving min-max intelligibility binomials for Speaker 3 (S3-LA, S3-LB, S3-LC).
128 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
to take a closer look at the significant peaks in the binomial pairs, we ran
Monte Carlo Simulations (Verspoor, De Bot, and Lowie 2011). These simula-
tions (5000 interactions) revealed that, among the nine binomials, significant
intelligibility peaks (p ≤ 0.05) were found for Speaker 1-Listener A (p= 0,0248),
Speaker 3-Listener A (p= 0,000) and Speaker 2-Listener B (p=0,0438). In contrast,
the peaks were likely the result of coincidental fluctuations for the other binomial
relationships.
When tracing back characteristics from speakers and listeners’ profiles, we
may observe that Listener A’s ability as a more experienced language teacher
may have helped tuning in with both speakers’ 1 and 3 productions, i.e., the par-
ticipant may be very much adapted to a “class-like” speech production. Speakers
1 and 3, in turn, could be placed in the two extremes of a proficiency scale (in a
more traditional classification), i.e, during the data collection, Speaker 1 was
starting formal classes of Portuguese and had been living in Brazil for a little
time, and Speaker 3 had been having classes for over a year and living in Brazil
for longer than 06 months. More importantly, both of them received formal train-
ing in a “class-like” scenario. However, Listener B had some experience in teach-
ing but had a closer contact with foreigners speaking Portuguese as an L2, an
ability which might have helped to accommodate Speaker 2’s more informal
speech, the one who had less formal training in Portuguese (and did not engage
as much in formal lessons as Speaker 3, for example).
Graphs in Figures 12–17 explore not only the listeners’ relationships toward dif-
ferent speakers, but also how speakers were rated by distinct listeners concern-
ing the comprehensibility dimension.
The graphs present the 12-point longitudinal data collection on the X-axis
and the degree of difficulty felt by the listeners when rating speakers’ produc-
tions on the Y-axis, in which 1 stands for “very difficult to understand” and 9 as
“very easy to understand”.
Similarly to the intelligibility analysis, one can also observe that listeners and
speakers’ trajectories are non-linear and thus show moments of progression and
regression in comprehensibility development. In addition, in tune with the intelli-
gibility results, it can be observed that Speaker 1 also presented the lowest com-
prehensibility scores in many data points and to different listeners (2.2 in data
point 6 for Listener B and 4 in data points 7 and 8 for Listeners A and C, respec-
tively), i.e., her oral productions were generally considered more difficult to under-
stand. Speakers 2 and 3’s graphs, on the other hand, display a different scenario
Dynamic paths of intelligibility and comprehensibility 129
Speaker 1
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11 12
A B C Average
Figure 12: Comprehensibility binomials for Speaker 1 (S1-LA, S1-LB and S1-LC).
Speaker 2
9
1
1 2 3 4 5 6 7 8 9 10 11 12
A B C Average
Figure 13: Comprehensibility binomials for Speaker 2 (S2-LA, S2-LB and S2-LC).
Speaker 3
9
1
1 2 3 4 5 6 7 8 9 10 11 12
A B C Average
Figure 14: Comprehensibility binomials for Speaker 3 (S3-LA, S3-LB and S3-LC).
data point 1 and goes until data point 5, and another one from data point 8 until
the last data point.
Figures 15–17 present the comprehensibility graphs in which we focus on
the listeners’ possible rating patterns.
Listener A
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 Average
Figure 15: Comprehensibility binomials for Listener A (S1-LA, S2-LA and S3-LA).
Dynamic paths of intelligibility and comprehensibility 131
Listener B
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 Average
Figure 16: Comprehensibility binomials for Listener B (S1-LB, S2-LB and S3-LB).
Listener C
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 Average
Figure 17: Comprehensibility binomials for Listener C (S1-LC, S2-LC and S3-LC).
Once again, when focusing on the listeners and their rating trajectories, a great
deal of fluctuation can be observed. Notwithstanding, when taking a closer look at
each listener, it is clear that Listener A started rating the three different speakers
similarly. Listener A displayed an initial pattern which groups the distinct speakers
in a way that ratings from data point 1 to 3 range from 6 to 7 in the likert scale,
meaning that all speakers seemed to be relatively easy to understand in the first
data collection points. Yet, some event seemed to cause some major changes from
132 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
data point 4 forward. A similar movement may be observed for Listener C. In con-
trast, Listener B exhibited an interesting and different movement throughout time,
since initial ratings (in data points 1 and 2) converged to a similar rating range in
the last point (with likert scores varying from 4 to 6). In addition, Listeners A and C
generally displayed lower ratings for Speaker 1, i.e., they identified speaker 1’s pro-
ductions as being mostly more difficult to understand than Speakers 2 and 3.
Likewise Figures 3–8, it is challenging to indicate specific developmental
stages for all listeners and all speakers in our discussion for Figures 12–17,
since once again results portrait a potential influence of speaker-listener rela-
tionship in comprehensibility ratings. As visual inspection may work as a re-
sourceful tool to analyze variability in longitudinal studies (Van Dijk, Verspoor,
and Lowie 2011), we present the moving min-max graphs for all listeners and
speakers in a binomial setting.
Overall in Figures 18–20, it can be observed that not all results reach ceiling
effects as in the intelligibility results, perhaps due to the nature of the construct,
which is a more subjective measure of perceived comprehension difficulty. A vi-
sual inspection may suggest that the S1-LB binomial relationship presents more
variability than S1-LA, for example. In the min-max graphs of the S1-LA binomial,
we can observe some fluctuations in many moments, but variability may be tak-
ing place in data points 3 to 5, since sentences are now scored as more difficult
to understand (scores go from 5 points to 1 point in the Likert scale). Moreover, in
the S1-LB graph, we can observe three main developmental stages, which can be
analyzed through the bandwidth (the variation between min and max results in
time). The first one is a rather narrow bandwidth, which may indicate less variabil-
ity, around data points 2 and 3, oscillating from 1, being the min, and 5, the max.
This is followed by two slightly wider bandwidths, which may indicate more vari-
ability, one around data point 4, in which the scores move from a min value of 2
and reach a max value of 7 and the other from data point 6 onwards. In the last of
this series of pairs, S1-LC, one can visualize from two to three stages in which vari-
ability changes, starting with a fairly narrow bandwidth (from data points 1 to 4),
followed by a wide bandwidth (from data points 5 to 7) and, finally, reaching a
rather wide bandwidth (from data points 8 to 9, being the min value 1 and the
max, 9).
In the min-max graphs of the S2-LA binomial, as ceiling effects were observed,
the analyses were made based on the min results. One can notice two main stages
where we believe a significant change would occur: from data points 1 to 2 (whose
scores in the Likert scale go from 1 to 5 points, indicating that data are easier to be
understood) and a quite abrupt change in the comprehensibility scores from data
points 8 to 10. In addition, in the S2-LB graph, fewer ceiling effects as well as more
stable scores are observed. In the least stable moment, where increased variability
Dynamic paths of intelligibility and comprehensibility 133
Listener A - Speaker 1
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11
Listener B - Speaker 1
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11
Listener C - Speaker 1
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11
Figure 18: Moving min-max comprehensibility binomials for Speaker 1 (S1-LA, S1-LB, S1-LC).
134 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
Listener A - Speaker 2
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11
comprehensibility performance min max
Listener B - Speaker 2
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11
comprehensibility performance min max
Listener C - Speaker 2
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11
comprehensibility performance min max
Figure 19: Moving min-max comprehensibility binomials for Speaker 2 (S2-LA, S2-LB, S2-LC).
Dynamic paths of intelligibility and comprehensibility 135
Listener A - Speaker 3
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11
Listener B - Speaker 3
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11
Listener C - Speaker 3
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11
Figure 20: Moving min-max comprehensibility binomials for Speaker 3 (S3-LA, S3-LB, S3-LC).
136 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
may be observed, there is a rather narrow and descended moment (from data
points 8 to 10, in which Likert scale scores varied from max 7 or 8 and min 1). In
addition, for S2-LC, a great descending peak from data points 8 to 11 can be found
(in which Likert scale scores varied from max 7 to min 1, indicating that produc-
tions were assumed as difficult to understand.
In the min-max graphs of the S3-LA binomial, we can observe a major de-
scending moment (between data points 2 to 3, in which Likert scale moved from
5 to 2), a more stable comprehensibility rating process in data points 3 to 7, fol-
lowed by a major increase in comprehension levels in data point 7 (in which Lik-
ert scale scores varied from 2–7, meaning the data production was easier to
understand). In addition, the S3-LB graph presents a rather narrow and some-
what stable development from data points 1 to 5, a subtle slightly decrease in
data point 5 (Likert scale judgements go from 2 to 8 points) and a rather wide
increase in the Likert scale scores from data points 6 to 7 (from 2 to 9 points).
Finally, S3-LC presents a bandwidth with a possible significant variability score
between data points 2 and 3, whose Likert scale points go from 3 to 9.
To this point, we have showcased the descriptive analyses based on
moving min-max graphs from the comprehensibility results. When com-
pared to the intelligibility rates, we see that if participants have to deal with
a more subjective dimension of how difficult or easy it is to understand
someone, more fluctuations and variations of rather narrow and wide band-
widths may be found in the same pairings. Taking into account the binomial
relationships previously mentioned, it can be observed that the variability is
present in all binomial relationships and all of them, in different data points,
seem to present developmental jumps. In order to take a closer look at the signifi-
cant peaks in the binomial pairs, we ran Monte Carlo Simulations (Verspoor, De
Bot, and Lowie 2011). These simulations (5000 interactions) revealed that, among
the nine binomials, a significant intelligibility peak (p ≤ 0.05) was found only for
Speaker 3-Listener A (p= 0,0156), and a marginal significance was found for
Speaker 3- Listener B (p= 0,0534). In contrast, the peaks were likely the result of
coincidental fluctuations in the other binomial relationships.
When aligning the Monte Carlo results and speakers and listeners’ profiles,
we may observe that Listeners A and B shared some similarities in their profiles
which may become handy when dealing with so much fluctuation, supporting the
fact that the first was a more experienced teacher and the second had more experi-
ence with foreign speech. Speaker 3, as we have previously explored, had more
formal experience with Portuguese (which may be portraited as a more accurate
learner) and had been living in Brazil for a longer period than Speaker 1, yet less
time than Speaker 2. A possible explanation may be provided as we consider com-
plex accommodating and tuning in processes. Speaker 2, for example, presented a
Dynamic paths of intelligibility and comprehensibility 137
very diverse context for Portuguese usage, since the participant speaks Portuguese
not only at work, but used Portuguese freely for general oral conversation mo-
ments. The amount and diversity of contact with Portuguese caused an impact in
this learners’ production and in the reception of their speech.
6 Discussion
In this chapter, we aimed to display some important features and nuances of
variability and potential individual development concerning intelligibility and
comprehensibility constructs by discussing binomial listener-speaker relation-
ships. The pairs of participants who took part in the analysis consisted of three
Haitian learners (referred as the ‘speakers’), all of them showing both different
lengths of residence in Brazil and proficiency levels in Brazilian Portuguese,
and three Brazilians (referred to as the ‘listeners’), who had distinct experiences
with other L2s (like English or German) and exhibited different degrees of con-
tact with foreigners. The findings provided interesting results on the impor-
tance of longitudinal studies in exploring how speaker and listener may work
as a binomial when one assumes ‘understanding’ as a non-linear process over
time. Also, the study raised necessary theoretical-methodological issues.
This study took into consideration some gaps concerning the intelligibil-
ity construct raised by some authors throughout the years: criticism of the
use of transcription in intelligibility tasks (Alves, Albuquerque, and Bon-
daruk 2021; Munro and Derwing 2020; Zielinski 2006) and concerns related
to working memory load (Kang, Thomson, and Moran 2018). Taking this crit-
icism into account, in this study we chose to adopt an oral repetition task. The
task was a holistic attempt to help listeners to recover oral information more
freely, i.e., participants could either retake small pieces of information (sounds,
syllables, isolated words) or bigger blocks of information (parts of the sentence
or its general idea). One of the major contributions of this study is printed on
not needing to recover words orthographically-like, but, instead, as idea chunks
or the whole idea, as it was semantically displayed. An example can be seen in a
sentence produced by Speaker 2.
Table 3 presents Listeners A and C comments on what they understood of
Speaker 2’s productions. It is important to state that these comments were all
collected in the AEPI app in a written form. The researchers asked all listeners
to provide all sorts of impressions on the productions: from more detailed notes
(related to sound comprehension) to more general ideas (semantic content). It
can be observed that the listener who had some previous teaching experience
138 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
Speaker Listener
Aimed production by the speaker: “Curitiba is Listener A’s comprehension: “I think he said
too hot” (in BP “Curitiba é muito calor”). ‘Curitiba is too warm’, but I am not sure
because there was a problem with a sound).
Actual produced sentence by the speaker:
Listener C’ comprehension: “I understood he
“Curitiba is too ‘hor’ (in BP “Curitiba é muito
said ‘Curitiba is very expensive’, but the
caror).
pronunciation of the final sound caused me
problems, it may be another word”.
and Bergeron (2019) and Nagle et al. (2021) in pointing out the role of variability
in the comprehensibility dimension.
Generally, both intelligibility and comprehensibility constructs have showed
a non-linear behavior over time, with each binomial relationship presenting dif-
ferent patterns and variability contours. Also, we side with Ranta and Meckel-
borg (2013) and Zielinski and Pryor (2020) by observing that learning a language
in an immersion environment does not guarantee a constant increase of profi-
ciency over time, as made clear in our results.
The study faced some issues concerning the lengths of the sentences. Since speak-
ers had different proficiency scores, they initially presented distinct access to a
rather wide lexicon size which, in turn, led to smaller sentences being produced
by Speaker 1, for example, when compared to Speakers 2 and 3. Therefore, senten-
ces ranged from 3 to 8 words, being 5 or 6 words the most frequent pattern. This
may have influenced the ceiling effects of both intelligibility and comprehensibil-
ity results. Yet, intelligibility scores might have suffered a larger influence since it
is a more objective measure, and the chances of either getting all words wrong or
right is high in shorter sentences (which could explain the ceiling effects). This
sort of issue also occurred in other studies which work with more naturalistic
data. In the same fashion, Munro and Derwing (2020) report that the listeners’ per-
formance in the 1995 experiment was probably connected to sentence length
(which varied from 7 to 13 words).
When taking comprehensibility into account, many scale lengths have
been used by researchers, but there is no agreement on which would be a best
fit (Munro 2018). Zielinski and Pryor (2020) argue that it is of major importance
that scales assure raters with a comfortable range to evaluate comprehensibil-
ity. In their study, they used a 5-point scale, which may have influenced the
way participants rated beginners and intermediate learners, since variability
may vary differently depending on the perceived fluency. In our study, despite
the space for more subtle differences to identify, since we used a 1–9 scale, a
similar effect may have taken place, since we also had speakers from distinct
proficiency levels and the listeners might have used different criteria to score
their comprehension. Thus, Speaker 1, who was the least proficient one, may
have reached lower and higher levels more frequently than Speakers 2 and 3.
Also, Zielinski and Pryor (2020) also point to an effect mentioned by Munro and
Derwing (2015): unsupervised rating. According to the authors, participants
may not be able to constantly control the underlying conditions for rating, and
140 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
in a longitudinal study this effect can increase, since conditions have to be fre-
quently revisited.
Tracey Derwing and Murray Munro have had a major impact on how both in-
telligibility and comprehensibility have been researched, and their findings
have served as a true pedagogical tool towards pronunciation teaching and
learning. This way, intelligibility and comprehensibility may be seen through a
lens on how learners’ “mistakes” can be overlooked by taking into account in-
dividual variability as a key element to analyze development.
Although not all studies attempt to conduct longitudinal investigations, this
kind of research can reveal that development does not generally follow a flat or
balanced line. In contrast, it is by analyzing variability that potential learning pro-
cesses may emerge. According to Van Dijk, Verspoor, and Lowie (2011), more tra-
ditional paradigms assume the contrast between competence vs performance, in
which the last one is usually connected to an intrinsic and intense variability pro-
cess and the first one is related to the stability of sounds and forms. Thus, accord-
ing to more traditional paradigms, learners’ mistakes are connected to their
system’s irregularities and should be discarded, diminished, eliminated. However,
we understand that in order to learn, individuals have to make mistakes.
Instead of being the one to be left out, as it was “noise” in a balanced system,
“variability is not something to be ignored, but rather offers an indispensable
source of information.” (Larsen-Freeman 2020: 295). This information has a huge
impact over both teaching and learning pronunciation processes, since language
teachers will have to analyze his/her students development over time and in a way
that individual variability is not set aside so that group tendencies may take place.
As we reflect more specifically upon the constructs of intelligibility and
comprehensibility, we conclude that variability can be seen as an important
strategy in listener-speaker pairing in class, i.e., instead of pairing students fre-
quently in the same groups, learners’ production and comprehension strategies
will probably increase if they learn how to accommodate new details, e.g., dif-
ferences in vowels and consonants. Also, when varying what sort of content
and how learners have to retrieve it, teachers may be helping students to de-
velop fine-grained detail and more holistic recovery processes.
Last, but not least, we would like to raise awareness once again to the interde-
pendence of speaker and listener in an oral communication moment, as a sort of
comprehension dance. As it is longer known that individuals are not intelligible or
comprehensible by themselves, but context or even person-dependent, it seems
Dynamic paths of intelligibility and comprehensibility 141
that overseeing the constructs as not only partially interconnected (Derwing and
Munro 2015) but also as tunned in a speaker-listener binomial relationship may
have important implications for the studies on language development and L2 pro-
nunciation teaching. In this sense, it is important to mention the possibility that
talker familiarity may be taking place longitudinally, as the listeners slowly learn
how to deal with accented speech (Albuquerque and Alves 2017). We should high-
light the importance of “learning to listen”, which is made possible as both speak-
ers and listeners are exposed to variation in the language input, leading them to
familiarize with different varieties of the language (Leung 2012, 2014), including
L2-accented speech. These results, therefore, highlight the importance of teaching
not only how to pronounce, but also how to listen. This latter type of learning is of
paramount important regardless of whether we are dealing with native or non-
native speakers of a language.
7 Conclusion
This longitudinal study was set as an exploratory attempt to bring nonmain-
stream data, originated from Haitian learners of Brazilian Portuguese, to a well-
known field of research on intelligibility and comprehensibility. Although the
pairs of speakers and listeners selected for this study formed a small group to
be analyzed, the aim of the investigation was to highlight the binomial mem-
bers’ personal trajectories to observe individual differences over time instead of
regular group tendencies.
Intelligibility results have shown significant variability peaks in the binomial
relationships for Speaker 1-Listener A, Speaker 3-Listener A and Speaker 2-Listener
B. As for comprehensibility, binomial relationships which presented significant
results were Speaker 3-Listener A and Speaker 3-Listener B. Generally, only one
pair, Speaker 1-Listener A, displayed significant variability patterns that could
be connected to both intelligibility and comprehensibility tasks. Thus, we can ob-
serve, once again, how the influence of personal features such as having previous
contact with foreigner speech samples (for listeners) and receiving formal L2 tu-
ition (for speakers) seems to have an effect on intelligibility and comprehensibility.
We hope this chapter has contributed to paving the way for future studies
that take the individuals and their interactions as the locus of analysis in in-
telligibility and comprehensibility studies. Therefore, we state the importance
of longitudinal studies on intelligibility and comprehensibility as well as taking
variability as an important cue for learning, in order to improve not only L2
teaching methods, but also learners’ strategies towards L2 development.
142 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
References
Abercrombie, David. 1949. Teaching pronunciation. English Language Teaching 3(5). 113–122.
Albuquerque, Jeniffer Imaregna Alcantara. 2019. Caminhos dinâmicos em Inteligibilidade e
Compreensibilidade de Línguas Adicionais: um estudo longitudinal com dados de fala de
haitianos aprendizes de Português Brasileiro [Dynamic paths in intelligibility and
comprehenisbility in Additional Languages: a longitudinal study with data from Haitian
learners of Brazilian Portuguese]. Porto Alegre: Universidade Federal do Rio Grande do
Sul dissertation.
Albuquerque, Jeniffer Imaregna Alcantara & Ubiratã Kickhöfel Alves. 2017.
Compreensibilidade em L2: Uma discussão sobre o efeito da experiência do ouvinte e do
tipo de meio em excertos do Português Brasileiro produzidos por um Falante haitiano
[L2 Comprehensibility: a discussion on listerner’s experience effects and type of medium
in the Brazilian Portuguese data produced by a Haitian speaker]. Revista X 12(2). 43–64.
Albuquerque, Jeniffer Imaregna Alcantara & Ubiratã Kickhöfel Alves. 2020. Os construtos de
‘inteligibilidade’ e ‘compreensibilidade’ em dados do Português Brasileiro como língua
adicional: um olhar via Sistemas Dinâmicos Complexos [The constructs of ‘intelligibility’
and ‘comprehensibility’ in Brazilian Portuguese data as an Additional Language through
the lens of Complex Dynamic Systems]. Signótica 32. Retrieved from: https://www.revis
tas.ufg.br/sig/article/view/58214.
Alves, Ubiratã Kickhöfel, Jeniffer Imaregna Alcantara Albuquerque & Patrick D. Bondaruk.
2021. L2 intelligibility and comprehensibility: trying out new measurements with AEPI.
Anales de Lingüística 5. 21–39. Retrieved from: https://revistas.uncu.edu.ar/ojs3/index.
php/analeslinguistica/article/view/4587
Baba, Kyoko & Ryo Nitta. 2014. Phase transitions in development of writing fluency from a
complex dynamic systems perspective. Language Learning 64(1). 1–35.
Boersma, Paul & David Weenink. 2019. Praat: doing phonetics by computer [Computer
software]. Version 6.0.53. http://www.praat.org/.
Bondaruk, Patrick D., Jeniffer Imaregna Alcantara de Albuquerque & Ubiratã Kickhöfel Alves.
2018. AEPI – Aplicativo para Estudos de Percepção e Inteligibilidade [AEPI – An app for
perceptual and intelligibility studies]. Version 0.01. https://en:aepi.e-pi.co. (Accessed
21 February 2021).
Cadely, Jean-Robert. 2012. Haiti: The politics of language. Journal of Teaching and Education
1(3). 389–394.
De Bot, Kees. 2017. Complexity Theory and Dynamic Systems Theory: Same or different? In
Lourdes Ortega & ZhaoHong Han (eds.), Complexity Theory and Language Development:
In Celebration of Diane Larsen-Freeman, 51–58. Amsterdam: John Benjamins.
Derwing, Tracey & Murray Munro. 1997. Accent, comprehensibility and intelligibility: Evidence
from four L1s. Studies in Second Language Acquisition 19(1). 1–16. https://doi.org/
10.1017/S0272263197001010
Derwing, Tracey & Murray Munro. 2013. The development of L2 oral language skills in two L1
groups: A 7‐year study. Language Learning 63(2). 163–185.
Derwing, Tracey & Murray Munro. 2015. Pronunciation Fundamentals: Evidence-based
Perspectives for L2 Teaching and Research. Amsterdam: John Benjamins.
Derwing, Tracey, Murray Munro & Grace Wiebe. 1998. Evidence in favor of a broad framework
for pronunciation instruction. Language Learning 48(3). 393–410.
Dynamic paths of intelligibility and comprehensibility 143
De Weers, Noortje. 2020. A critical (re)assessment of the effect of speaker ethnicity on speech
processing and evaluation. Burnaby: Simon Fraser University dissertation.
Isaacs, Talia & Pavel Trofimovich. 2012. Deconstructing comprehensibility: Identifying the
linguistic influences on listeners’ L2 comprehensibility ratings. Studies in Second
Language Acquisition 34(3). 475–505.
Kang, Okim, Ron I. Thomson & Meghan Moran. 2018. Empirical approaches to measuring the
intelligibility of different varieties of English in predicting listeners comprehension.
Language Learning 68(1). 115–146.
Larsen-Freeman, Diane. 2015. Ten “lessons” from complex dynamic systems theory: What is
on offer. In Zoltán Dörnyei, Peter D. MacIntyre & Alastair Henry (eds.), Motivational
Dynamics in Language Learning, 1–11. Bristol: Multilingual Matters.
Larsen-Freeman, Diane. 2020. Epilogue. In Wander Lowie, Marije Michel, Merel Keijzer &
Rasmus Steinkrauss (eds.), Usage-Based Dynamics in Second Language Development,
295–300. Bristol: Multilingual Matters.
Leung, Alex Ho-Cheong. 2012. Bad influence? – An investigation into the purported negative
influence of foreign domestic helpers on children’s second language English acquisition.
Journal of Multilingual and Multicultural Development 33(2). 133–148.
Leung, Alex Ho-Cheong. 2014. Input multiplicity and the robustness of phonological
categories in child L2 phonology acquisition. Concordia Working Papers in Applied
Linguistics 5. 401–415.
Levis, John. 2020. Revisiting the Intelligibility and Nativeness Principles. Journal of Second
Language Pronunciation 6(3). 310–328.
Lowie, Wander. 2017. Lost in state space? Methodological considerations in Complex Dynamic
Theory approaches to second language development research. In Lourdes Ortega &
ZhaoHong Han (eds.), Complexity Theory and Language Development: In Celebration of
Diane Larsen-Freeman, 123–141. Amsterdam: John Benjamins.
Lowie, Wander & Marjolijn Verspoor. 2019. Individual differences and the ergodicity problem.
Language Learning 69(S1). 184–206. doi:10.1111/lang.12324.
Machry da Silva, Susiele. 2017. Aprendizagem do português por haitianos: percepção das
consoantes líquidas /l/ e /ɾ/. [The learning of Portuguese by Haitians: a perception study
of liquid consonants /l/ and /ɾ/]. Ilha do Desterro: A Journal of English Language,
Literatures in English & Cultural Studies 70(3). 47–62.
Munro, Murray. 2018. Dimensions of pronunciation. In Okim Kang, Ron I. Thomson &
John M. Murphy (eds.), The Routledge Handbook of Contemporary English Pronunciation,
413–431. New York: Routledge.
Munro, Murray & Tracey Derwing. 1995. Foreign accent, comprehensibility and intelligibility in
the speech of second language learners. Language Learning 45(1). 73–97.
Munro, Murray & Tracey Derwing. 2015. A prospectus for pronunciation research in the 21st
century: A point of view. Journal of Second Language Pronunciation 1(1). 11–42.
Munro, Murray & Tracey Derwing. 2020. Foreign accent, comprehensibility and intelligibility,
redux. Journal of Second Language Pronunciation 6(3). 283–309.
Nagle, Charles, Pavel Trofimovich & Annie Bergeron. 2019. Toward a dynamic view of second
language comprehensibility. Studies in Second Language Acquisition 41(4). 647–672.
https://doi.org/10.1017/S0272263119000044
Nagle, Charles, Pavel Trofimovich, Mary G. O’Brien, Mary & Sara Kennedy. 2021. Beyond
linguistic features: Exploring the behavioral and affective correlates of
144 Jeniffer Imaregna Alcantara de Albuquerque, Ubiratã Kickhöfel Alves
Acknowledgement: This project has been partially funded by the Brazilian National Council for
Scientific and Technological Development (CNPq), grant number 471868/2014-0.
https://doi.org/10.1515/9783110736120-006
148 Ronaldo Lima Jr
1 Introduction
There are several aspects of the pronunciation of English that cause difficulties
to Brazilian learners, and vowels are among the most challenging ones. Since
Brazilian Portuguese has only seven vowels, /i e ɛ a ɔ o u/, it comes as no sur-
prise that learning a language with more vowels, as is the case of English, will be
especially challenging for Brazilian learners, who will need to create new vowel
categories in the vocalic space. Among the English vowels, the pairs /i ɪ/, /ɛ æ/
and /u ʊ/ are particularly challenging for Brazilian learners due to the expected
difficulty to perceive and produce L2 sounds which are very similar yet not con-
trasted in the learner’s L1 (Flege 1995; Flege and Bohn 2021).
When acquiring their native language, people learn how to accommodate
the variation of the acoustic signal into prototypical phonological categories so
that communication can take place, and the brain does so by taking statistics of
the input and assigning exemplars to the corresponding categories (Bybee 2003;
Cristófaro Silva 2003; Kuhl et al. 2008; Leather 2003; Pierrehumbert 1990). Hav-
ing learned the L1 so well is what makes it challenging to perceive and produce
L2 sounds that are very close, but not identical, to an L1 sound, especially when
there are two L2 sounds competing for some acoustic (and perceptual) space that is
occupied by a single vowel of the L1. This is the case of English vowels /i ɪ ɛ æ u ʊ/,
which tend to be perceived and produced by Brazilian learners within the pro-
totypical categories of Brazilian Portuguese /i ɛ u/, respectively (Bion et al.
2006; Lima Jr 2015; Nobre-Oliveira 2007; Rauber 2006). That is why (i) analy-
ses of the production of these six English vowels by Brazilian learners will be
presented in this paper.
The word development, instead of acquisition, in the title of this chapter was
intentionally chosen since it will be argued that language (whether L1 or L2) is a
complex dynamic system, and the development of L2 is a dynamic process (Beck-
ner et al. 2009; De Bot 2008; De Bot, Lowie, and Verspoor 2007; Larsen-Freeman
1997). Under such perspective of language development, the phonological cate-
gories created for communication in the L1 are seen as attractor states for the L2
(Lima Jr 2013). Attractors are states of temporary accommodation of a complex
dynamic system, where the system finds temporary stability. These states are
temporary due to the dynamic nature of such systems, which may move, or even
keep moving, from one attractor state to another. That is why development is bet-
ter suited than acquisition as the former captures the dynamic, never-ending
change in time as the system moves through different attractor states.
Some attractor states require more energy for the system to move away
from them than others. De Bot, Lowie, and Verspoor (2007) illustrate this fact
A dynamic account of the development of English (L2) vowels 149
with the image of a surface, like a table, with some holes on it, of different sizes
and depths, and a ball moving from one hole to another. As we tilt the surface,
depending on how we do it, the ball resting in one hole might get out of it and
stop in another one, and the bigger and deeper the hole, the more we must tilt
the surface to get the ball out of it. In other words, more energy will be needed
to take the ball from one attractor state to another depending on how strongly
that state is attracting the ball.
In this metaphor, the table/surface is the learner (and their L2 developing
system); the ball is their pronunciation of the L2, in this case, the English vowels
/i ɪ ɛ æ u ʊ/; and the holes are the prototypical phonological categories of the L1,
in this case, Brazilian Portuguese /i ɛ u/. The energy to tilt the surface is related
to the nature, strength, frequency, quantity, quality, etc. of perturbation intro-
duced to the systems; in this case, perturbations might be language lessons, ex-
posure to the L2, interaction with L2 speakers, experiences abroad, etc. That is
why (ii) this paper seeks to compare two types of perturbation: having communi-
cative language lessons and having explicit instruction on pronunciation.
Another typical characteristic of complex dynamic systems is the non-
linearity between cause and effect, between perturbation and movement of the
system. To illustrate this characteristic, Bak and Weismann (1997) use the image
of someone dropping sand on a surface. In the beginning, it is possible to drop
several grains of sand, one onto the other, with the sand forming a cone-shaped
pile. However, as more grains are added to the system, the pile becomes steeper
and steeper, with the system reaching a critical point at which one single grain
of sand may cause an avalanche, which, in turn, may also cause other ava-
lanches, not predictable in number or dimension. As Johnson (1997) puts it, a
linear relation is like the volume knob of a radio, with each and every nuance of
change on the knob causing the same change of volume. A non-linear relation,
on the other hand, is like the tuning knob of a radio, for at the same time that a
small change on the knob might cause a great effect (getting out of a station),
great changes might also have no result at all (as when navigating through static
radio frequencies).
This means that potential effects of communicative language lessons or ex-
plicit instruction will probably not be seen equally among all learners, and
some effects might not be seen immediately, as they might contribute to getting
a learner’s L2 developing system closer to a critical point, but may not necessar-
ily cause the aforementioned avalanche. Added to the dynamic nature of such
systems, this means that L2 development is better studied through longitudinal
studies (De Bot and Larsen-Freeman 2011; Lima Jr 2016a; Verspoor, De Bot, and
Lowie 2011;), and this is why (iii) the data presented in this paper comprise four
150 Ronaldo Lima Jr
2 Method
2.1 Participants
Participants in this study are coded with letters, and this loss of participants along the way
is the reason some letters are skipped in this paper (see Table 1). The data of the missing par-
ticipants appeared in a preliminary analysis previously reported (Lima Jr 2016b), but they did
not do all four recordings reported in this paper.
152 Ronaldo Lima Jr
2.2 Data
The ten participants were recorded individually at the end of the first, second,
third and fourth semesters of their college studies in English Language Teach-
ing. The recordings were conducted in a silent room with a supercardioid Shure
150B lapel microphone connected to a Zoom 4Hn recorder. The audio was cap-
tured in mono, with a sampling rate of 44 kHz, and later saved in .wav format.
Students were recorded reading words inserted in the carrier sentence
“I said token this time”, which controls for the prosodic context of the target
word. The corpus was composed of three words for each target vowel. The
words, presented in Table 1, were all monosyllabic and with a CVC structure,
with most Cs being voiceless plosives, to prevent acoustic bias from neighbor-
ing segments and to help later identify, segment and label the vowels in PRAAT
(Boersma and Weenink 2019).
The sentences were shown in a slide presentation, with each slide containing the
carrier sentence with a different, randomly selected target word. Each word was
presented four times, generating 12 tokens per vowel per participant, which gen-
erated 72 tokens per participant, and a total of 720 vowels per semester. In the
end, 2,880 vowels were identified, segmented, and labeled in PRAAT.
A common method to extract formant values is through Linear Predictive
Coding (LPC), which is an algorithm that decomposes the acoustic signal and
estimates the resonances generated in the vocal tract. However, automatic
LPC analyses have been criticized (e.g., Vallabha and Tuller 2002; Wempe and
Boersma 2003) because they may introduce systematic errors in the formant
extraction depending on the parameters set beforehand by the researcher.
With the automatic LPC analysis, the researcher needs to define the order of
the LPC (i.e., the quantity of formants to be found) and the maximum (ceiling)
frequency in which to look, which is usually set as 5 kHz for men and 5.5 kHz for
women. However, different men and women might have different frequency ceil-
ings, which, if not set accordingly, might lead the LPC into identifying peaks that
do not exist and overlooking peaks that do.
A dynamic account of the development of English (L2) vowels 153
3 Results
The first step in the analysis was to visually inspect individual vowel spaces,
comparing the distributions of the speakers’ vowels in the four different record-
ings. For this comparison, the vowel spaces were plotted by recording (so four
plots for each speaker), and each plot contained every occurrence of the six En-
glish vowels, as well as the mean F1 and F2 values for each vowel. Figure 1 has
an example of such a plot, containing the vowels of speaker A in their first re-
cording. The actual vowels produced by the speaker are plotted as the smaller
and lighter phonetic symbols, and the larger and darker phonetic symbols are
at the mean values of F1 and F2 for each vowel. The ellipses represent one stan-
dard deviation from the mean.
In the vowel space of Figure 1, it is easy to see that speaker A already had
two separate categories for the /i ɪ/ pair. The occurrences of each of these vow-
els are very far from one another, generating clearly separated averages with
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 ðF1a − F1bÞ2 + ðF2a − F2bÞ2 .
154 Ronaldo Lima Jr
Figure 1: Vowel space of speaker A’s productions in recording 1. Smaller and lighter
phonetics symbols are the individual vowels produced by the speaker in their F1-F2
intersections; larger and darker phonetic symbols are located at the mean F1-F2 values for
each vowel, surrounded by a 1-standard-deviation ellipsis. Colors represent different vowels.
ellipses that do not touch each other. It is also easy to identify that, on the
other hand, speaker A’s /ɛ/ and /æ/ are completely overlapped, with nearly
identical F1-F2 means, and ellipses that overlap almost entirely. The vowels /u ʊ/,
despite being slightly more separated than /ɛ æ/, still occupy the same area of the
vowel space, with their ellipses overlapping almost completely.
To compare possible changes throughout the four recordings, images of the
four vowel spaces, like the ones in Figure 2, were inspected.
By visually inspecting the four vowel spaces, one can see that: (i) the /i ɪ/
pair is kept as separate categories throughout the four recordings; (ii) the /ɛ æ/
pair gets separated in the third recording (as a possible effect of the English
Phonetics and Phonology course) and is kept separated in the fourth recording;
and (iii) the /u ʊ/ pair seems to be separating in the third recording, still with
some overlap of the ellipses, but gets overlapped again in the fourth recording.
The four plots depict the dynamic nature of phonological development as
well as its gradient emergence. Sometimes it was not easy to decide whether
A dynamic account of the development of English (L2) vowels 155
Figure 2: Vowel spaces of speaker A’s productions in the four recordings. Smaller and lighter
phonetics symbols are the individual vowels produced by the speaker in their F1-F2
intersections; larger and darker phonetic symbols are located at the mean F1-F2 values for
each vowel, surrounded by a 1-standard-deviation ellipsis. Vowel space on the top-left corner
is the first recording, top right is the second one, bottom left is the third one, and bottom
right is the last recording. Colors represent different vowels.
156 Ronaldo Lima Jr
two vowels are overlapping or if their distance would be enough for the produc-
tion to be heard as two different vowels. Also, the data collected are simply
four photographs of the vowel spaces at four different points in time within two
years of language development, and different configurations probably took
place at different moments within those two years in both directions, making
pairs of target vowels closer together and farther apart. Nonetheless, one re-
cording a semester is what was feasible at the moment, and, for research pur-
poses, there is usefulness in classifying the pairs of target vowels in each
recording for every participant as either overlapped or as distinct vowels, so
some criteria were needed.
When two vowels had at least half of their ellipses overlapping, they were
considered overlapping vowels right away; and when less than half of the ellip-
ses overlapped or when they did not overlap at all, they were marked as poten-
tial candidates of separate vowel categories. To confirm the status of those
potential candidates for separate vowels, the Euclidean Distance between the
vowels in each pair was used. As was explained in the Method section, the Eu-
clidean Distance is a measure of dissimilarity that can be used to calculate the
distance between two points in an x-y plane, like the F1-F2 vowel space. Since
F2 values change in greater increments than F1 values, the Euclidean Distances
need to be calculated with normalized/standardized values (z-scores in this
case). To give the reader an idea of the scale of distances resulting from this
calculation of Euclidean Distances with normalized F1-F2 values, Figure 3
presents the productions of speaker A in all four recordings with the distances
between /i/ and /ɪ/ (1.26, 1.14, 1.14 and 1.02 for recordings 1, 2, 3 and 4) and the
distances between /ɛ/ and /æ/ (0.09, 0.12, 0.97 and 0.68, respectively) marked
on the plot.
In a previous study with the same method of data collection and analysis
(Lima Jr 2015), the Euclidean Distances between the normalized mean formant
values of a group of ten native speakers of American English were reported as
0.46 for /i ɪ/, 0.38 for /ɛ æ/ and 0.33 for /u ʊ/. Therefore, in this study, those
potential separate vowels (based on the overlap of the ellipses) were in fact
considered separate vowel categories only if their Euclidean Distances were of
at least 0.3. It is based on these two criteria that Table 2 shows in which record-
ings there is a contrast between the target vowels for each speaker.
As can be seen, there are all types of developmental routes, from a learner
that did not develop separate vowel categories at all (speaker D); to those who
developed along the way, especially after taking the English Phonetics and Pho-
nology course (recording 3 – learners A and N, for instance); and those who cre-
ated new phonetic categories but then lost them (K and L). From the 10 learners, 7
already had separate vowel spaces for [i ɪ] in recording 1, and the other 3 learners
A dynamic account of the development of English (L2) vowels 157
Figure 3: Euclidean Distances between /i/ and /ɪ/ and between /ɛ/ and /æ/ for Speaker
A in all four recordings.
Table 2: Pairs of target vowels consisting of two separate categories marked YES for each
recording of every participant.
A YES no no G YES no no
YES no no YES no YES
YES YES YES YES no YES
YES YES YES YES no YES
B YES no no K YES no no
YES no no YES no no
YES no no YES YES no
YES no no YES no no
D no no no L YES no YES
no no no YES no YES
no no no YES YES YES
no no no YES no no
158 Ronaldo Lima Jr
Table 2 (continued)
E no no YES M no YES no
no no YES no YES no
no no YES no YES no
no no no no YES no
did not develop these categories in the other three recordings. For the /ɛ æ/ pair,
only one student already had separate categories for them in recording 1 (M),
three learners developed separate categories for them in recording 3 (right after
the Phonetics and Phonology course) and kept them in recording 4 (A, F and N),
and two participants also produced them as separate vowels in recording 3 but
not anymore in recording 4 (K and L). For the high back vowels, two learners pro-
duced them separately in recordings 1 through 3 but not in recording 4 (E and L),
three learners created separate vowel categories for them along the way (A, F and
G), and only one already had them separate from recording 1 onwards (N). Only
three learners got to recording 4 with separate phonetic categories for all three
pairs (A, F and N). The column with most YES’s is the one for the /i ɪ/ pair, and
the one with fewest is the /ɛ æ/ one, confirming previous findings that, from
those three pairs, /ɛ æ/ is the most challenging for Brazilians (Lima Jr 2015).
Lastly, as an attempt to look at a general vowel development index for each
learner and for the group as a whole, the sum of the Euclidean Distances of the
three target pairs of vowels was used to fit a Bayesian mixed-effects model. The
expectation was that learners would increase their distances as they advanced in
time in their studies. In the model, the fixed effects were the intercept and the
slope of the trend for the population of all 10 learners, and the random effects
were the deviations in intercept and in slope that each subject’s own trend had
from the population values.3 Regularizing priors were used,4 allowing for sums of
Euclidean Distances within a realistic range, and allowing for both positive and
negative slopes. Figure 4 presents graphs containing the main results of the
model.
In Figure 4, each panel represents one participant. The four black dots in
each panel are the sums of the Euclidean Distances of the three pairs of L2 vow-
els in each recording, and the black dashed line, which is repeated in every in-
dividual plot, represents the tendency of the group as a whole, derived from
the fixed effects given by the model,5 which favors the hypothesis that learners
should increase their distances with time of study. The orange lines in each
graph are a sample of 100 probable lines predicted by the model for each
speaker considering the random effects. They show that not all speakers had a
positive correlation between the sums of Euclidean Distances and time. The ex-
pectation was that learners should increase the distances between contrasting
vowels as they advance in their study of English, but only six of them (A, B, F,
G, L, N) ended up with a clear positive correlation – some of which with lines
much higher and with a steeper slope than that of the group tendency. From
the other four, one had a clear negative slope (D), and the others had lines that
indicate either stagnation or extremely mild movements. This result highlights
the degree of variance found among speakers.
Lastly, the dotted blue line, with no slope and repeated in all individual
plots at 1.17, marks the sum of the Euclidean Distances from normalized mean
F1-F2 values of a group of ten native speakers of American English. This serves
as a reference, showing that three of the four learners that showed no satisfactory
progress (D, E and M) had sums of Euclidean Distances below that of the group
of native speakers; and that all learners with positively correlated lines had dis-
tances above that of the native speakers. Most learners produced their vowels
with Euclidean Distances greater than those of the group of native speakers
(above the fixed dotted line). This does not mean that they necessarily produced
vowels in separate phonetic categories because, in many cases, even though the
mean F1-F2 values were somewhat distant, the one-standard-deviation ellipses in
their vowel spaces were still overlapping due to variability, which did not happen
with the group of native speakers. This means that at some point in their devel-
opmental routes, the learners were able to produce some of the target words with
distinct vowel categories, but not all of them, or not all the time, resulting in
great variance, and thus large ellipses in their vowel spaces, whereas the native
speakers were able to maintain their vowel categories completely separate (with
More specifically, the median of the posterior distributions for the intercept (1.16) and the
slope (0.14).
160 Ronaldo Lima Jr
Figure 4: Result from the Bayesian mixed-effects model fit to the sum of Euclidean Distances.
Each panel corresponds to one participant; the black dots are the sums of the Euclidean
Distances of the three pairs of L2 vowels in each recording, four for each participant; the black
A dynamic account of the development of English (L2) vowels 161
4 Discussion
There was a lot of variability in the observed development of the learners, which
was expected given that each learner has their own complex dynamic L2 system.
Each system is made up of so many elements, whose interaction among them-
selves and with the environment make the performance in the L2 emerge, that it
is impossible to expect all learners to exhibit the same developmental pattern.
Each lesson, be it a holistic communicative lesson or some explicit pronuncia-
tion instruction, is a perturbation of the system, but each system is at a different
stage, some closer to a critical point that might lead to an avalanche (returning
to the metaphor from the introduction) and others still in the beginning of the
sand-accumulation process. Since the cause-effect relation is non-linear in com-
plex dynamic systems, it is only natural that one observes different behaviors
from different learners’ L2 systems.
This confirms the need to analyze L2 developmental data individually,
even if also looking into group tendencies, for a lot of information is lost in a
more traditional design looking only at grouped data (Lima Jr 2016a; Verspoor,
Lowie, and Van Dijk 2008; Verspoor and Van Dijk 2012). A linear regression
looking only at group tendencies would lead one to ignore the fact that some
students really excelled in their developmental (increasing) trajectory, such as
participants A, F, G, L and N (see their [orange] trend lines in Figure 4); and to
also ignore participants who had decreasing trend lines, such as speaker D.
Another characteristic of complex dynamic systems is that they are sensi-
tive to initial/previous states. Among the productions of all learners, there was
a total of 11 vowel contrasts already present in recording 1 (7 for /i ɪ/, 1 for /ɛ æ/
and 3 for /u ʊ/). Even controlling for some individual variables (only learners
who had never been to an English-speaking country, did not have contact with
English native speakers, and had not taken extracurricular English lessons
Figure 4 (continued)
dashed line is the trend line for the group; the orange lines are probable trend lines from the
model for each participant; and the blue dotted line is the sum of Euclidean Distances for the
three target pairs of vowels for a control group of native speakers of English.
162 Ronaldo Lima Jr
not display any vowel contrast whatsoever in any of the recordings. It is possible
that, later on, and triggered by other perturbations of their systems, those learn-
ers that showed no (immediate) effect will move their systems away from the at-
tractor states of the prototypical L1 vowel categories.
In total, there were 11 vowel contrasts in the first recording, 12 in the second,
18 in the third (the semester they took the Phonetics and Phonology course), and
15 in the last one. No student had distinct vowels for all three pairs in the first
two recordings; three learners presented distinct vowels for all three pairs in the
third recording (speakers A, L and N); and, in the last recording, speaker L did
not present the contrasts in all three pairs anymore, but another learner
(speaker F) showed distinct vowels in all pairs. Added to all the discussion
conducted so far, this result highlights the positive role of explicit pronuncia-
tion instruction in the development of new vowel categories, but without less-
ening the also positive influence of communicative lessons in creating and/or
maintaining newly created vowel contrasts.
Finally, the results section attempted to categorize students’ productions
into “yes” and “no” concerning the presence of separate vowels in the three
pairs in focus. However, language development is not categorical, but gradient
in nature. It was not always easy to decide if two vowels should be considered
“with” or “without” a contrast. That is why some criteria needed to be defined
and followed for the categorization of the results. Nevertheless, the gradience
found in the data cannot be overlooked. There were cases of students classified
with “no contrast”, for instance, who were on the brink of creating new catego-
ries. The binary classification of participants may give the wrong impression
that all learners with a “no” in Table 2 produced the contrasts equally over-
lapped, which was not the case. Some students moved their vowels apart, just
not enough to fulfill the pre-established criteria. Likewise, not all speakers with
contrasting vowels in Table 2 produced them equally well. Some produced
them in the threshold of the criteria, whereas others produced truly separated
vowels, with the ellipses far from touching each other. There was variation
even within the same speaker. Speakers F, G and K, for instance, all marked
with separate categories for [i ɪ] in all recordings, produced contrasts much
more separate in the last two recordings, showing an influence of the explicit
instruction not depicted in the way data were treated categorically.
164 Ronaldo Lima Jr
5 Conclusion
The goal of this study was to investigate possible effects of communicative lan-
guage lessons and of explicit pronunciation instruction on the development of
English vowels /i ɪ ɛ æ u ʊ/ by Brazilian learners in the first four semesters of
their college studies in English Language Teaching. This was done by analyzing
the emergence of new vowel categories for the L2 vowels, and the developmen-
tal route of each learner through visual inspection of vowel spaces, calculation
of Euclidean Distances between contrasting vowels, and the results of a Bayes-
ian mixed-effects model with the sum of the Euclidean Distances, which helped
look both into group trend and individual variation.
The analyses showed a lot of variability in the development of the target
vowels by the learners, which is expected when L2 developing systems are seen
as complex dynamic systems. Many learners developed new vowel categories
throughout the first four semesters, and more contrasts are expected to develop
as they continue their studies. The main conclusion is that, even though com-
municative lessons play an important role in the development (and also in the
maintenance) of the L2 vowel system, explicit pronunciation instruction had a
greater impact on the emergence of new vowel contrasts.
Future investigations of this nature should include an analysis of the duration
of the vowels as well as the analysis of less monitored production (reading a text
or speaking spontaneously). Future research could also include perceptual studies
as an attempt to witness the emergence of both perceptual and productive vowel
categories. Lastly, as has been argued throughout this paper, investigations of L2
development are more informative if done with longitudinal data, so the collection
of data in more time points within those years and/or the collection of data for
more than two years would provide even more information to draw inferences of
the L2 developmental process.
References
Arantes, Pablo. 2010. Formants.Praat. [Computer software].
Arantes, Pablo. 2011. Collectformants.Praat. [Computer software].
Bak, Per & Michael Weissman. 1997. How nature works: The science of self-organized
criticality. American Journal of Physics 65(6). 579–80.
Beckner, Clay, Richard Blythe, Joan Bybee, Morten H Christiansen, William Croft, Nick C. Ellis,
John Holland, Jinyun Ke, Diane Larsen-Freeman & Tom Schoenemann. 2009. Language is
a complex adaptive system: position paper. Language Learning 59(s1). 1–26.
A dynamic account of the development of English (L2) vowels 165
Bion, Ricardo Augusto Hoffmann, Paola Escudero, Andréia S. Rauber & Barbara O. Baptista.
2006. Category formation and the role of spectral quality in the perception and
production of English front vowels. In Richard M. Stern (ed.), Ninth International
Conference on Spoken Language Processing, Pittsburgh, USA, 2006, 1363–1366. Baixas,
France: International Speech Communication Association.
Boersma, Paul & David Weenink. 2019. Praat: doing phonetics by computer [Computer
software]. Version 6. 1.03. http://www.praat.org/ (accessed 8 October 2019).
Bybee, Joan. 2003. Phonology and Language Use. Cambridge: Cambridge University Press.
Cristófaro Silva, Thaïs. 2003. Descartando fonemas: a representação mental da fonologia de
uso [Discarding phonemes: the mental representation of use phonology]. In Dermeval da
Hora & Gisela Collischonn (eds.), Teoria Linguística: Fonologia e Outros Temas [Linguistic
Theory: Phonology and other topics], 200–251. João Pessoa: Editora Universitária.
De Bot, Kees. 2008. Introduction: second language development as a dynamic process. The
Modern Language Journal 92(2). 166–178.
De Bot, Kees & Diane Larsen-Freeman. 2011. Researching second language development from
a Dynamic Systems Theory perspective. In Marjolijn Verspoor, Kees De Bot &Wander
Lowie (eds.), A Dynamic Approach to Second Language Development: Methods and
Techniques, 5–24. Amsterdam: John Benjamins Publishing.
De Bot, Kees, Wander Lowie & Marjolijn Verspoor. 2007. A dynamic systems theory approach
to Second Language Acquisition. Bilingualism: Language and Cognition 10(1). 7–21.
Flege, James Emil. 1995. Second language speech learning: theory, findings, and problems. In
Winifred Strange (ed.), Speech perception and linguistic experience: Issues in cross-
language research, 233–277. York: York Press.
Flege, James Emil & Ocke-Schwen Bohn. 2021. The revised Speech Learning Model (SLM-R). In
Ratree Wayland (ed.), Second Language Speech Learning: Theoretical and Empirical
Progress, 3–83. Cambridge: Cambridge University Press.
Johnson, Keith. 1997. Acoustics and Auditory Phonetics. Malden: Blackwell Publishing.
Kuhl, Patricia, Barbara T. Conboy, Sharon Coffey-Corina, Denise Padden, Maritza Rivera-
Gaxiola & Tobey Nelson. 2008. Phonetic learning as a pathway to language: new data
and native language magnet theory expanded (NLM-E). Philosophical Transactions of the
Royal Society B: Biological Sciences 363(1493). 979–1000.
Larsen-Freeman, Diane. 1997. Chaos/Complexity science and Second Language Acquisition.
Applied Linguistics 18(2). 141–65.
Leather, Jonathan. 2003. Phonological acquisition in multilingualism. In María del Pilar
García Mayo and María Luisa García Lecumberri (eds.), Age and the Acquisition of English
as a Foreign Language, 23–58. Clevedon: Multilingual Matters.
Lima Jr, Ronaldo Mangueira. 2013. Complexity in second language phonology acquisition.
Revista Brasileira de Lingüística Aplicada 13(2). 549–576.
Lima Jr, Ronaldo Mangueira. 2015. A influência da idade na aquisição de seis vogais do inglês
por alunos brasileiros [The influence of age on the acquisition of six English vowels by
Brazilian learners]. Organon 30(58). 15–31.
Lima Jr, Ronaldo Mangueira. 2016a. A necessidade de dados individuais e longitudinais para
análise do desenvolvimento fonológico de L2 como sistema complexo [The need of
individual and longitudinal data for the analysis of L2 phonological development as a
complex system]. ReVEL 14(27). 203–225.
Lima Jr, Ronaldo Mangueira. 2016b. Análise longitudinal de vogais do inglês-L2 de brasileiros:
dados preliminares [A longitudinal analysis of English-L2 vowels by Brazilians:
166 Ronaldo Lima Jr
Tim Kochem, Idée Edalatishams, Lily Compton, Elena Cotos, Iowa State University
https://doi.org/10.1515/9783110736120-007
168 Tim Kochem et al.
1 Introduction
Effective oral communication is a staple in academic settings in the United
States (US), especially for postgraduate and postdoctoral students. These indi-
viduals are often required to deliver presentations (both formally at conferences
and informally in front of peers), act as teaching assistants, and conduct re-
search with peers and colleagues. While native speakers of English may require
some instruction on the specifics of these oral communication tasks, interna-
tional students who are nonnative speakers of English may require both task
knowledge and more general skills of English, such as grammar or pronuncia-
tion. Compounding this issue is that most English language instructors within
higher education regularly assess international students on general English
skills, but many do not possess a knowledge of English that would allow them
to deliver effective instruction to their students, who are typically placed in
their courses based on institutional test scores or entrance exam results (such
as TOEFL or IELTS). Students whose scores indicate a more advanced level of
proficiency also find themselves in a precarious situation: their English skills
may be good enough to pass a test and to not require additional coursework,
yet are still problematic enough to require additional assistance in order to
meet the oral communication demands of academia.
Of all the English skills, pronunciation is perceived to be one of the most
difficult skills to teach (Baker 2014; Couper 2017). In fact, studies have found
that amongst English as a second language (ESL) instructors, pronunciation is
often neglected due to instructors’ lack of confidence, skill, or knowledge
(Baker 2014; Derwing 2019). A common belief is that residing in a naturalistic
setting presents ample opportunities to engage with the target language, which
should be sufficient for language growth (Lightbown and Spada 2006). How-
ever, this belief does not hold quite as true for pronunciation, as some studies
have found that pronunciation rarely improves past the first year in a target
language environment without explicit instruction (e.g., Derwing and Munro
2013). The need for explicit instruction creates an unusual challenge, in that in-
ternational students studying in the US may not be able or want to spend time
taking a full course in oral communication, but still need fine-tuning if they are
to be successful in their academic career.
To meet this challenge, Iowa State University’s Center for Communication
Excellence (CCE) developed a model for assisting international students with
their English-speaking needs. Following an overview indicating the importance
of English-speaking ability in general, we introduce the English-Speaking
Consultation (ESC) model, and then describe the specifics of ESC consultant
training. Second, we cover how technology plays an essential role in the
An extra layer of support: Developing an English-speaking consultation program 169
ESCs, including how the consultants are trained to use technology and how
technology is leveraged for supplementary language instruction. To do this,
we use the technological pedagogical content knowledge framework (TPACK:
Mishra and Koehler 2007) as an underpinning to connect the consultants’
training in technology with their ability to deliver effective ESL pronunciation
tutoring. Finally, this chapter integrates personal insights and reflections
from two English-speaking consultants to further contextualize how ESCs are
conducted, as well as how the ESC training has influenced their knowledge of
pronunciation training.
instruction that focuses on oral communication skills related to giving oral pre-
sentations, participating in small and large group discussions, and asking and
answering questions in class (Berman and Cheng 2001). Kim (2006) adds that
due to the differences in educational practices between the US and interna-
tional graduate students’ home countries, language instruction should also
help students gain meta-knowledge about the oral communication skills re-
quired of them in the context of US academia, and assist them in developing an
understanding the values and need for active participation through speaking.
Explicit instruction on prosodic features of speech can also activate non-native
speakers’ knowledge of the prosodic structures in their first languages, helping
them obtain better results in their pronunciation practice (Liu 2020). These
identified needs are targeted in the English-Speaking Consultations (ESCs),
which are in high demand at Iowa State University. The next section describes
how and by whom this support is provided.
The ESC model was developed and implemented in the Center for Communica-
tion Excellence (CCE) housed within Iowa State University’s Graduate College.
The CCE aims to support the academic and professional communication needs of
all graduate students and postdoctoral scholars at the university. Since its found-
ing in 2015, the CCE has launched seven programs specialized in different aspects
of written and oral communication, all the programs being grounded in research
from communication genres and scholarship of teaching and learning. The En-
glish Language Development Program (ELDP) is one of the seven programs. It
was designed to provide opportunities for individualized language practice and
improvement by offering three types of service: English writing consultations
(EWCs), English-speaking consultations (ESCs), and Peer Speaking Practice Groups
(PSPGs). The EWCs are one-on-one tutoring sessions focusing on macro- and
micro-level aspects of writing, and the ESCs focus on oral communication profi-
ciencies including pronunciation. The PSPGs, in turn, engage small groups of par-
ticipants to practice speaking on various topics, and assigned facilitators generally
focus on pronunciation topics as they arise. This chapter focuses on elements of
the ESCs with a specific emphasis on pronunciation.
174 Tim Kochem et al.
The ESCs are 50-minute-long tutoring sessions. They can be of two types:
Type 1 – Assistance with specific English-speaking tasks, and Type 2 – Develop-
ment of general oral communication English skills. For Type 1 ESCs, consultants
focus on helping students prepare for speaking tasks such as conference presen-
tations, thesis/dissertation defenses, job talks, etc. Often, students and novice
scholars feel apprehensive or nervous about high-stakes performances (e.g., the-
sis/dissertation defenses or job interviews), or concerned that they might be
viewed as incompetent by peers or faculty during low-stake performances (e.g.,
group reports or class presentations). In Type 1 ESCs, consultants give concrete
recommendations with regards to the target task, with that helping to increase
speaker confidence. In Type 2 ESCs, consultants focus on specific language traits
to help students develop their speaking skills. Students seeking these consulta-
tions may be new to the US or at the beginning of their graduate studies, or they
may have experienced difficulties in expressing themselves clearly in academic
settings. Type 2 ESCs are often scheduled on a recurring basis so that consultants
can develop individualized plans after conducting a needs analysis. For each
type of consultation, specific recommended procedures were developed and are
put in place. Figure 1 shows the flowchart for recommended procedures for both
consultation types.
Overall, there are four key components: needs analysis, consensus build-
ing, formative tasks, and recommended next steps. Needs analysis is the first
step in the Type 1 ESCs and Phase 1 in the Type 2 ESCs. Since Type 1 ESCs are
based on a specific task, the needs analysis is completed in approximately five
minutes to get information about the task, context, and target audience. For
Type 2 ESCs, the needs analysis process may take one to two sessions. Consul-
tants use two carefully structured tools, the self-assessment interview and the
English language skills diagnostic tool, to (1) help establish rapport and credi-
bility, (2) discuss expectations about time and effort, and (3) analyze strengths
and weaknesses. After the needs analysis, both consultants and learners work
together to establish short and long term goals. Consensus building is a key
component of all ESCs to encourage learners’ sense of self-efficacy and whole-
hearted buy-in to the process. Thus, Step 2 in Type 1 ESCs and Step 2 in Type 2
ESCs (Phase 1) include a discussion about priorities based on the amount of
time available and the needs analysis. This step usually takes about 5–10
minutes.
The next step is the implementation of formative tasks. For Type 1 ESCs, role-
play is the most common formative task to allow the consultants to focus on the
specific language task. During the role-play, the consultants work on organization,
An extra layer of support: Developing an English-speaking consultation program 175
The final step shown in Figure 1 at the end of both consultation types is the “En-
glish-Speaking Consultation Evaluation”. This step is not directly related to the
learning tasks but rather a protocol for the CCE to collect feedback from students
about their experiences in an effort to improve the protocol and consultation
quality. The accumulated feedback is compiled and reviewed each semester.
It is worth mentioning that students often make appointments with more
than one consultant, which is why it is important to enact a procedure for infor-
mation transfer among consultants. This is particularly true for Type 2 ESCs,
An extra layer of support: Developing an English-speaking consultation program 177
4 English-speaking consultants
4.1 General ESC training
Of the five major aspects of oral communication covered in the ESC training,
pronunciation receives the most attention as it is arguably both the most com-
plex to teach and the most sought-after instruction from international students
in speaking consultations. Even for those consultations that focus on a particu-
lar discourse setting (e.g., interview, presentation), there is always an element
178 Tim Kochem et al.
Speaking and pronunciation – Identify vowel and consonant differences between the first
language and the English language
– Use of correct vowel and consonants for effective oral
communication
– Use thought groups, intonation patterns, word stress,
focus, and/or volume appropriately for effective oral
communication
– Use of appropriate speech patterns in English to enhance
overall fluency
– Use different communication strategies to maintain and
repair oral discourse
however, additional features may need attention as well. Students may not be
aware of specific pronunciation errors, and they may seek help because a pro-
fessor or colleague mentioned they should work on their pronunciation, with-
out providing any concrete examples of their mispronunciations. This gives
further justification for the elongated consultant training in pronunciation fea-
tures. Consultations are first and foremost a service provided for international
students, so the needs and goals of the student come first. Still, the ability to
evaluate, assess, and prioritize pronunciation errors can sometimes reveal er-
rors that are more detrimental to a student’s ability to produce comprehensible
and intelligible speech.
Therefore, the module for pronunciation focuses on training the consultants
in intelligibility-based pronunciation instruction (Levis 2018). In other words, the
goal of pronunciation instruction is geared towards producing speech that is eas-
ily decoded and understood. This is accomplished by centering on six major
topics and respective subtopics, which are listed in Table 2.
180 Tim Kochem et al.
Table 2: Major pronunciation topics and subtopics covered in the Pronunciation module of the
ESC training.
Topic Subtopic(s)
For each topic, trainees are first presented with real-life scenarios where
students express their concerns. Here’s an example of a scenario for thought
groups:
An hour ago, you were meeting with a student who speaks so fast that you – and others,
according to the student – find following what she says very difficult. The student you are
meeting now is not nearly as fluent, tripping over almost every other word, her speech epito-
mizing the label ‘broken English.’ Using standard pronunciation training terminology, how
would you define each student’s problem? What during-appointment and homework activi-
ties would you assign for each student? Why?
An extra layer of support: Developing an English-speaking consultation program 181
The trainees begin with some of their own ideas based on their own language
teaching experiences and knowledge. They then go through readings grounded
in theory and research that were specifically written for this training. The mod-
ule also includes recommended techniques and tools for actual ESCs, which
were curated by an experienced practitioner and researcher of L2 pronunciation
instruction.
Before trainees can be taught to tutor in pronunciation, they must have a
firm understanding of the phonetic and phonological features of spoken En-
glish. After trainees have shared their understanding of the topic, they engage
with instructional materials that enhance their knowledge of these pronuncia-
tion features. For example, in the above scenario about thought groups, the lit-
erature review for the training program emphasizes the importance of thought
grouping followed by five features that are common in thought groups. Figure 4
shows excerpts from the training module related to the scenario.
Figure 4 is but a small snapshot of the instructional materials that trainees en-
gage with throughout the pronunciation module. As Figure 4 shows, the train-
ees are given instruction not only on the phonetic and phonological aspects of
language, but also on their direct and indirect impact on producing comprehen-
sible and intelligible speech. The promotion of developing effective oral commu-
nication, rather than focusing solely on accent reduction, is a more reasonable
goal for learners, who are also immersed in their own studies, and it is often-
times a more achievable goal.
The trainees are also exposed to content that promotes their understanding
of pronunciation errors common to speakers of certain first languages, being
182 Tim Kochem et al.
guided in terms of how to identify said errors. That is, in addition to an over-
view of common errors, there is an extended literature review that describes in
detail how to identify such errors, reasons why they occur, and how to ap-
proach them. This provides the trainees with ways to contextualize their in-
struction, which is often a more desired mode of teaching rather than teaching
pronunciation in a vacuum (Levis 2018).
Once the trainees have sufficient knowledge of the phonetic and phonologi-
cal features of English, as well as a firm understanding of their impact on produc-
ing comprehensible and intelligible speech, they move on to the final section of
each topic, which is the activities and techniques for effective teaching. Here,
they are provided with extensive pedagogical knowledge, as well as key take-
aways towards the end of the teaching section (see Figure 5). The takeaways pro-
vide a quick reference point for trainees when consulting, in case they need a
refresher or an activity on the go. This section is arguably the most important, as
teachers who report a lack of confidence in their ability to teach pronunciation
will sometimes provide instruction as written in a textbook or forego instruction
altogether (Baker 2014; Derwing 2019; Levis and Kochem in press).
To end each topic, the trainees are asked to revisit the initial teaching scenario
and revise their answers as necessary. Even if a trainee was correct in their
identification of the pronunciation error at the beginning, at this point they
should elaborate more on not only the issue at hand, but how they would pro-
vide instruction that would be both meaningful and effective for the individual
students.
An extra layer of support: Developing an English-speaking consultation program 183
A YouTube video, for instance, might be a great tool to practice listening to tar-
get language sounds. However, it is important to select videos with potential
for language learning. In that regard, closed-captions and visuals in the video
186 Tim Kochem et al.
can scaffold students’ attention to target sounds. Additionally, using the tran-
scripts and functions such as pause and rewind can assist foster their attention
to the target language sounds. These are technological content and technologi-
cal pedagogical skills.
However, simply listening to the target language sounds on YouTube and
using functions such as closed-captions and transcripts cannot improve stu-
dents’ pronunciation skills. Since the selected videos would likely be created
for purposes unrelated to language instruction, the trainees and consultants
need to identify relevant learning objectives and supplement the learning with
practice activities. For example, with the techniques listed in Figure 5, e.g. Ana-
lyze2Imitate, consultants can model how to use the transcript from a YouTube
video and mark all pauses from a segment with a slash, followed by capital-
izing words that have the emphasis as illustrated in the example below:
After that, they could model the technique of Analyzing your own practice talk to
imitate the pauses and emphasis using the marked transcript. Students then can
practice to demonstrate understanding of the tasks and later extend this practice
to their own recordings to facilitate comparison of their version with the version
from the video. These carefully selected instructional techniques, combined with
the understanding of students’ needs, would illustrate the trainee’s or consultant’s
intersectional grasp of technological, pedagogical, and content knowledge that is
technological pedagogical content knowledge (i.e., TPACK). An expert may give a
very clear and detailed description of a pronunciation feature, which may be fur-
ther complemented by visual aids. However, these videos rarely have additional
practice for their viewers, which is a common pitfall of such instruction. Under-
standing and acknowledging this pitfall, the trainee or consultant can provide
practice activities that complement or supplement the instructional video (for an
example, see Table 3).
PK Knowledge about using free videos as multimodal resources for instructional use
TK Knowledge about YouTube and how to search for videos, use functions like pause and
rewind buttons
TCK Knowledge about selecting specific and appropriate YouTube videos for listening to
target language sounds based on topic, vocabulary level, type of English, etc.
An extra layer of support: Developing an English-speaking consultation program 187
Table 3 (continued)
TPK Knowledge about using transcripts or closed captioning to facilitate the listening of
target language sounds or using the playback speed to adjust the pace of speech
according to learners’ level
TPACK Knowledge about using selected YouTube video with specific instructional techniques to
meet desired learning objective, e.g. reducing pauses or emphasizing important words
6.1 Billy
However, the first component of the training that was quite helpful for one-
on-one consulting, which was not exclusive to the pronunciation module but
certainly helped regardless, was the identification of Type I and II consulta-
tions. Following the flowchart in Figure 1 helps me to stay on track and identify
the needs of the learner first, which to me is a more crucial step than in a class-
room setting. Oftentimes, classes are designed for a specific purpose (e.g., aca-
demic, business), but in a consultation, you really have no idea who your next
student will be, what background they’re coming from, and what they want to
work on. Starting by identifying the needs of the student, the flowchart pro-
vides a nice pathway to achieve their goals. Sometimes it’s a very specific goal,
such as practicing their thesis or dissertation presentation, and sometimes it’s
much more general, such as talking with labmates or other colleagues.
An additional benefit that I gained, and continue to gain from my consul-
tant peers and through professional development, which we engage in bi-
weekly, is how to leverage the use of technology for both face-to-face tutoring
and for continued instruction outside of the consultations. The need for contin-
ued instruction cannot be understated when it comes to pronunciation instruc-
tion – gains in pronunciation often require the breaking down of some speech
habits while simultaneously building new ones. Quite often, this is a matter of
first language influence (though it is certainly not limited to it), where students
are using the speech features of their native language in the L2. To break this
cycle, students require much more instructional time than the one-hour consul-
tation can give.
Therefore, by implementing technological resources (such as English Ac-
cent Coach, YouGlish, or dozens of other tools), we can assign ‘homework’ for
our students to help them continue their learning at least a little bit every day.
This continual learning strategy typically results in more automatized speech
though it does require additional effort on the part of the student. Likewise, this
approach with technology is both appealing for most students and it provides op-
portunities for them to encounter language (e.g., grammar, vocabulary) that they
may otherwise not be aware of. For example, using a web page for consonant clus-
ters (e.g., https://usefulenglish.ru/phonetics/practice-consonant-clusters) can ex-
pose students not only to common consonant clusters in English, but also to new
vocabulary items, which can help learners develop new ways to express them-
selves as well as concepts and ideas that are relevant to their field or discipline.”
An extra layer of support: Developing an English-speaking consultation program 189
6.2 Ali
“One of the qualities I looked for in a graduate assistantship was the opportu-
nity to support academic communication. Over the first few years of my gradu-
ate studies in Applied Linguistics, I had taught or tutored students on academic
writing skills. Regardless of their level, discipline, and first language back-
ground, I would always sense their need for oral communication support. With
the opportunity of involvement in the development of the ESC model and train-
ing, I felt empowered to examine and analyze oral communication needs of
graduate students in depth and to understand what types of support can best
address these needs. As I started offering consultations, Type I consultations
were easier for me. Since the student would come in with a specific task, needs
analysis seemed to be a much more straightforward process. Type II consulta-
tions, however, required me to greatly pull from my ELS educator knowledge to
identify the different aspects of English speaking and pronunciation that a stu-
dent needed to focus on. At times, choosing or designing activities that targeted
specific linguistic features such as a vowel/consonant, or a prosodic feature
such as thought grouping, could be challenging. Many times, students had con-
cerns about their “fluency,” without recognizing the numerous factors that play
a role in this broad concept. My ESC training and ESL educator knowledge in
general and in pronunciation teaching in particular would help me explain con-
cepts such as connected speech, shortened vowels in unstressed syllables,
filled/unfilled pauses, speech rate, etc. and walk the student through the steps
of setting priorities and practicing in these areas. As I gained experience work-
ing with students from the same language backgrounds, and as our repertoire
of ESC activities grew larger, I began to feel more confident in finding activities
targeting specific linguistic features.
As someone who enjoys working with individual students more than teach-
ing a class, offering ESCs has been a wonderfully rewarding experience, allow-
ing me the opportunity to connect with graduate students with non-English
language backgrounds (like myself) in ways that no other graduate assistant-
ship would allow me to. I was able to draw on my own experiences of learning
English as a second language and encountering linguistic challenges as a grad-
uate student to understand other students’ challenges, especially those very
new to living and studying in a dominantly English-speaking country. In meet-
ing with a student on a regular basis for Type II consultations, we would have
informal conversations about difficulties in interpersonal communication and
sometimes chat about awkward positions we had found ourselves in, because
of not understanding a joke or other reasons related to the cultural differences
between our country of origin and the US. Sharing such experiences and later
190 Tim Kochem et al.
laughing about such memories together were valuable to me and motivating for
the student. Many of the students I worked with had not been in the country
long enough to establish relationships with anybody other than a few friends
who spoke the same native language as theirs. ESCs would allow them to de-
vote at least an hour every week to speak in English, both conversationally and
in practicing actual academic oral communication tasks.
Working as an English-speaking consultant also prepared me for a career
in supporting academic communication needs of students, with a holistic view
on academic communication that accounts for not only written communication,
but also oral communication and the skills needed in performing a variety of
tasks such as teaching as a graduate assistant, giving a presentation, holding a
conversation with a colleague, etc. While writing as a common academic com-
munication mode has received extensive attention in both practice and re-
search, speaking and pronunciation are areas that students can always use
more help with. Working as an English-speaking consultant shaped my career
as an ESL specialist to a great extent, but also motivated me to conduct re-
search on particular characteristics of non-native English speech in academic
settings.”
References
Anderson‐Hsieh, Janet, Ruth Johnson & Kenneth Koehler. 1992. The relationship between
native speaker judgments of nonnative pronunciation and deviance in segmentals,
prosody, and syllable structure. Language Learning 42(4). 529–555.
Baker, Amanda. 2014. Exploring teachers’ knowledge of second language pronunciation
techniques: Teacher cognitions, observed classroom practices, and student perceptions.
TESOL Quarterly 48(1). 136–163.
Bent, Tessa, Ann R. Bradlow & Bruce L. Smith. 2007. Intelligibility of non-native speech. In
Ocke-Schwen Bohn & Murray J. Munro (eds.), Language Experience in Second Language
Speech Learning: In Honor of James Emil Flege, 331–348. Philadelphia, PA: John
Benjamins.
Berman, Robert & Liying Cheng. 2001. English academic language skills: Perceived difficulties
by undergraduate and graduate students, and their academic achievement. Canadian
Journal of Applied Linguistics 4(1). 25–40.
Celce-Murcia, Marianne, Donna M. Brinton, Janet M. Goodwin & Barry Griner. 2010. Teaching
Pronunciation: A Course Book and Reference Guide. Cambridge: Cambridge University
Press.
Couper, Graeme. 2017. Teacher cognition of pronunciation teaching: Teachers’ concerns and
issues. TESOL Quarterly 51(4). 820–843.
Derwing, Tracey M. 2019. Utopian goals for pronunciation research revisited. In John Levis,
Charles Nagle & Erin Todey (eds.), Proceedings of the 10th Pronunciation in Second
Language Learning and Teaching Conference, Ames, USA, 2018, 27–35. Ames, IA: Iowa
State University.
An extra layer of support: Developing an English-speaking consultation program 193
Derwing, Tracey M. & Murray J. Munro. 2005. Second language accent and pronunciation
teaching: A research‐based approach. TESOL Quarterly 39(3). 379–397.
Derwing, Tracey M. & Murray J. Munro. 2013. The development of L2 oral language skills in two
L1 groups: A 7‐year study. Language Learning 63(2). 163–185.
Ferris, Dana. 1998. Students’ views of academic aural/oral skills: A comparative needs
analysis. TESOL Quarterly 32(2). 289–316.
Ferris, Dana & Tracy Tagg. 1996. Academic oral communication needs of EAP learners: What
subject‐matter instructors actually require. TESOL Quarterly 30(1). 31–58.
Flege, James Emil & Serena Liu. 2001. The effect of experience on adults’ acquisition of
a second language. Studies in Second Language Acquisition 23(4). 527–552.
Gallego, Juan Carlos. 1990. The intelligibility of three nonnative English-speaking teaching
assistants: An analysis of student-reported communication breakdowns. Issues in
Applied Linguistics 1(2). 219–237.
Hahn, Laura D. 2004. Primary stress and intelligibility: Research to motivate the teaching of
suprasegmentals. TESOL Quarterly 38(2). 201–223.
Hoekje, Barbara & Jessica Williams. 1992. Communicative competence and the dilemma of
international teaching assistant education. TESOL Quarterly 26(2). 243–269.
Hubbard, Philip. 2013. Making a case for learner training in technology enhanced language
learning environments. CALICO Journal 30(2). 163–178.
Im, Jiyon & John Levis. 2015. Judgments of non-standard segmental sounds and international
teaching assistants’ spoken proficiency levels. In Greta Gorsuch (ed.), Talking Matters:
Research on Talk and Communication of International Teaching Assistants, 113–142.
Stillwater, OK: New Forums Press.
Jenkins, Jennifer. 2002. A sociolinguistically based, empirically researched pronunciation
syllabus for English as an international language. Applied Linguistics 23(1). 83–103.
Kang, Okim. 2010. ESL learners’ attitudes toward pronunciation instruction and varieties of
English. In John Levis & Kimberly LeVelle (eds.), Proceedings of the 1st Pronunciation
in Second Language Learning and Teaching Conference, Ames, USA, 2009, 105–118.
Ames, IA: Iowa State University.
Kim, Soonhyang. 2006. Academic oral communication needs of East Asian international
graduate students in non-science and non-engineering fields. English for Specific
Purposes 25(4). 479–489.
Lee, Junkyu, Juhyun Jang & Luke Plonsky. 2015. The effectiveness of second language
pronunciation instruction: A meta-analysis. Applied Linguistics 36(3). 345–366.
Levis, John M. 2005. Changing contexts and shifting paradigms in pronunciation teaching.
TESOL Quarterly 39(3). 369–377.
Levis, John M. 2018. Intelligibility, Oral Communication, and the Teaching of Pronunciation.
Cambridge: Cambridge University Press.
Levis, John M. 2020. Conversations with experts – In conversation with John Levis, Editor of
Journal of Second Language Pronunciation. RELC Journal 52(3). 1–14.
Levis, John M. & Tim Kochem. In press. Pronunciation tutoring as teacher preparation. In
Veronica G. Sardegna & Anna Jarosz (eds.), English pronunciation teaching: theory,
practice, and research findings. Bristol: Multilingual Matters.
Lightbown, Patsy & Nina Spada. 2006. How Languages are Learned, 3rd edn. New York, NY:
Oxford University Press.
Liu, Jiang. 2020. A combination of metalinguistic instruction and task repetition in teaching
Chinese prosody. In Okim Kang, Shelley Staples, Kate Yaw & Kevin Hirschi (eds.),
194 Tim Kochem et al.
Sereno, Joan, Lynne Lammers & Allard Jongman. 2016. The relative contribution of segments
and intonation to the perception of foreign-accented speech. Applied Psycholinguistics
37(2). 303–322.
Spada, Nina & Yasuyo Tomita. 2010. Interactions between type of instruction and type of
language feature: A meta‐analysis. Language Learning 60(2). 263–308.
Subtirelu, Nicholas Close. 2017. Students’ orientations to communication across linguistic
differences with international teaching assistants at an internationalizing university in
the United States. Multilingua 36(3). 247–280.
Swan, Michael & Bernard Smith. 2001. Learner English: A teacher’s guide to interference and
other problems. Cambridge: Cambridge University Press.
Thomson, Ron I. & Tracey M. Derwing. 2015. The effectiveness of L2 pronunciation instruction:
A narrative review. Applied Linguistics 36(3). 326–344.
Yanagi, Miho & Amanda A. Baker. 2016. Challenges experienced by Japanese students with
oral communication skills in Australian universities. TESOL Journal 7(3). 621–644.
Zielinski, Beth W. 2008. The listener: No longer the silent partner in reduced intelligibility.
System 36(1). 69–84.
Ilvi Blessenaar, Lizet van Ewijk
Putting participation first: The use
of the ICF-model in the assessment
and instruction of L2 pronunciation
Abstract: L2 pronunciation training should unequivocally be linked to complex
daily life experiences (Derwing 2017). Each client comes from a different back-
ground, participates in a different environmental context and engages in different
activities within those contexts (Threats 2008). This is a particularly challenging
aspect in the L2 practice (Derwing 2017). The International Classification of Func-
tioning, Disability and Health, also known as the ICF-Model (WHO 2001, 2013),
offers a conceptual framework that acknowledges the intricate dimensions of
human functioning and incorporates personal and contextual factors that can
influence participation in daily live (Heerkens and de Beer 2007; Ma, Threats,
and Worrall 2008). This paper provides an exploration of the application of
this model to pronunciation and intelligibility difficulties in L2 learning. We
apply the model to a specific L2 learner, Mahmout and demonstrate how its use
allows for consideration of factors much broader than the phonological or pho-
netic challenges Mahmout faces. Mahmout must be able to generalize that what
he has learned into functional communicative competences to improve his par-
ticipation. The ICF-model (WHO 2001, 2013) is used globally in a broad array of
healthcare professions, including Speech and Language Therapists (SLT’s). Yet,
it is not a customary tool, nor probably an obvious one, used by L2-professionals
(Blake and McLeod 2019). Of course, our goal is not to classify pronunciation
problems of L2 learners as disabilities. The model proves a useful tool to view the
individual L2 learner as a whole, and part of a larger system. It may allow L2
professionals to tailor their intervention to the individual’s needs and situation
and will consequently be able to establish priorities in instruction to enable
appropriate goal setting for each individual (Blake and McLeod 2019). It allows
identification of influencing barriers or facilitating factors within the stagna-
tion or improvement of pronunciation (Blake and McLeod 2019; Howe 2008).
https://doi.org/10.1515/9783110736120-008
198 Ilvi Blessenaar, Lizet van Ewijk
1 Introduction
The International Classification of Functioning Disability and Health (ICF) was
introduced by the World Health Organisation (WHO) in 2001, with two main
goals: to offer a conceptual framework for health and health-related states, and
to create a common language for researchers, clinicians, educators, and policy
makers. The introduction of the ICF was a milestone towards more holistic and
person-centred care, as it offers a biopsychosocial perspective on health and in-
cludes the influence of personal and environmental factors. It offers a philoso-
phy, a way of acknowledging the complex dimensions of human functioning
and its interaction with its environment. It assists in ccomprehensively describ-
ing a person’s individual functioning profile that in turn helps to better under-
stand the person’s specific needs.
With the introduction of the ICF, WHO provided a new perspective on the
terms health and disability, acknowledging that all people can and will experience
some level of ‘disability’ in their life. The ICF aims to be universally applicable to
all people, without link to aetiology. Despite its obvious roots in healthcare, this
approach to health and life opens up possibilities for application to (groups of)
people who may not have poor health status in the biological sense but are
hindered in their ability to function and participate fully in life due to other
(external) challenges. These challenges can span the total breadth of human
experiences and can be related to the person as much as to the system around
the person. ICF is a social model that attributes limitations in functioning as a
socially created problem and not an attribute of an individual (Cerniauskaite
et al. 2011; Jelsma 2009; Üstün et al. 2003). In other words, if society, the envi-
ronment, would be maximally adjusted, the person would not experience limita-
tions. If an L2 learner lived in a community that would be completely accepting
towards linguistic and cultural differences, the L2 learner would experience far
fewer constraints on his/her functioning in a new language context.
The ICF model (ICF) is used globally in a broad array of healthcare profes-
sions, yet it is neither a customary tool, nor probably an obvious one, in the field
of L2 learning and teaching around the world. “Pathologizing” L2 speech is a
harmful and unwanted practice in the broad field of L2. The authors strongly
condemn this phenomenon and the discrimination associated with it and do not
wish to contribute to it in any way. In fact, the aim and philosophy of the model
and the WHO’s core value is quite the contrary of pathologizing: “equity, inclu-
sion and the aim of all to achieve a life where each person can exploit his or her
opportunities to the fullest possible degree” (WHO 2002). The application of the
model in the L2 context contributes towards placing pronunciation at the heart
Putting participation first 199
2 What is ICF?
2.1 A little bit of history
The first attempt to classify and describe consequences of health and health-
related experiences was the development of the International Classification of
Impairments, Disabilities, and Handicaps (ICIDH) in the 1980s. This classifica-
tion system already aimed to advance the idea that health is much more than
the absence of illness. This system, however, did not reflect the (influence of
the) complex interrelations and interactions between various factors in people’s
lives (Ma, Threats, and Worrall 2008). In 1993 the WHO started developing the
ICF after numerous field trials and consultations. All 191 WHO Member States
in the Fifty-fourth World Health Assembly officially endorsed it on 22 May 2001
to be used in their policymaking, and scientific standardisation in research,
planning, and care. By doing so, the WHO shifted the perspective on health
from cause (illness, disability, handicap) to impact (WHO 2002). It also reso-
nates with the current WHO definition of health (1948), which describes health
as “a state of complete physical, mental and social well-being and not merely
the absence of disease or infirmity” (WHO 1948).
The recently suggested update of this definition (Huber et al. 2011) concep-
tualizes health as “the ability to adapt and self-manage in the face of social,
physical, and emotional challenges” takes the concept of health even further
200 Ilvi Blessenaar, Lizet van Ewijk
away from the biomedical approach to health. With its six dimensions ranging
from bodily functions to the spiritual/existential dimension, health is clearly
seen as something much broader than the absence of disease.
We can define the ICF as a universal, neutral and social model. It describes
domains of functioning applicable to every human being. In line with WHO’s
core value, it applies to all people irrespective of their culture, health condition,
gender or age. It espouses a neutral perspective. Ultimately, the ICF is about
people and its premise is that of focussing on the positive abilities of the indi-
vidual (Cerniauskaite et al. 2011; Üstün et al. 2003; WHO 2002).
What exactly does the ICF-model entail? The model is composed of three do-
mains (Figure 1): ‘Body Structures and Functions’, ‘Activities and Participation’
and ‘Contextual Factors’. The definition of ICF categories were defined using neu-
tral language without negative connotation so it can indicate neutral aspects of
health and health related states under the umbrella term of functioning.
CONTEXTUAL FACTORS
The starting point within the ICF philosophy is always the perspective of the per-
son him- or herself. This means that the ‘formulated wants, needs and goals’ are
the starting point of any conversation and possible intervention. The diagram
identifies the three levels of human functioning classified by ICF: functioning at
the level of the physical body, the whole person, and the whole person in a social
context. The five constructs of body structures/functions, activities, participation,
Putting participation first 201
environmental factors, and personal factors are identified, and bidirectional ar-
rows represent the interactions among the different components, reflecting the
ongoing influence of environmental factors on body functions, activities, and
participation, and vice versa (WHO 2001; WHO 2002). To illustrate the model on
a basic level, we use the examples of a broken leg and stuttering in Table 1:
Contextual Personal Woman (A), years old, Man (B), years old,
factors factors store manager, good overall African descent, works in
health, likes sports.Motivated construction, introvert.
for physical therapy.
Level of the Body The bone in the right upper Mild to severe stutter,
Physical Body Functions leg is broken in places. severity increases with
and stress
structures
Level of the Activities This means she cannot walk, This means he experiences
whole person run, drive, ride her bike, play trouble with speaking,
sports, jump, etc. especially with strangers or
on the telephone.
Level of Participation Because she cannot walk, He wants to start his own
person in drive, ride her bike, she is not contracting business, but his
social context able to go to work. Because stutter makes him insecure.
she cannot run and jump, she He will have to talk to a lot
cannot play tennis or go hiking. more people. He does not
She experiences participation know if he will be able to
problems because of her secure clients.
broken leg.
202 Ilvi Blessenaar, Lizet van Ewijk
These examples show the bidirectional influence between the various lev-
els of the ICF. The fact that example A has a broken bone hinders work and
hobbies for this particular client and, therefore, participation in society. A sup-
port network, on the other hand, might facilitate recovery. In example B, stut-
tering severity increases for person B in stressful situations. This influences his
choice to start a new business, a life event associated with many stressors. Fur-
thermore, cultural attitudes might negatively affect progress.
Around the world, different titles are used for comparable professions: In Europe, the title
‘Speech and Language Therapist’ is more common, while in North-America ‘Speech-Language
Pathologist’ is the most common term. We wish to include all variations here.
Putting participation first 203
However, when relating the definition above to L2 learning, one can only conclude
that this definition is very applicable to L2 learners. The aetiology of possible bar-
riers in activities and participation is of little importance, if we focus on intelligi-
bility, functioning and participation.
In the following sections, we address the aspects of the model that are most
relevant to L2 pronunciation. We will discuss the various aspects of the ICF and
describe its constructs and their use more closely. In Figure 2 different domains
and corresponding paragraphs are referenced.
CONTEXTUAL FACTORS
Body Structures and Functions are at the base of spoken communication. This
is true for every individual. Body structures are defined as “anatomical parts of
the body, such as organs, limbs and their components” (WHO 2002) and are
less relevant in the context of L2 learning, as they are generally considered to
be intact. Body Functions, on the other hand, are defined as “the physiological
functions of body systems, including psychological functions” (WHO 2002).
The Practical Manual for using the ICF (WHO 2013) states that the production
204 Ilvi Blessenaar, Lizet van Ewijk
comprehensibility2 has been shown repeatedly (Crowther et al. 2015a, 2018). The
discrepancy between abilities in the classroom compared to their experiences in
daily life can be significant (O’Halloran and Larkins 2008). For example, being
proficient in a small classroom with well-known peers in a controlled exercise
that focuses on the production or perception of one sound with high frequent
words (capacity) has little predictive value on the proficiency using that speech
sound in low frequent words in a conversation with a stranger, or an authority
figure at work (performance). Activities range from basic to complex along a con-
tinuum, with controlled and targeted tasks on one end, moving toward multidi-
mensional complex activities on the other end.
Participation constitutes the “involvement in a life situation”, which implies
a role in society and entails choice and judgement (O’Halloran and Larkins
2008). It therefore per definition deals with ‘performance’ of the learner and al-
ways incorporates all elements that are essential for successful communication.
Potential problems in speech sound production or perception that were identified
in ‘body functions’, which were possibly also present in the capacity and/or per-
formance of ‘activities,’ are now only a small fragment of the whole picture.
This also explains discrepancies between client and professionals’ perspec-
tives, in terms of assessment of proficiency. The level of capacity of ‘activities’ is
generally judged by the professional (and client), whereas the level of perfor-
mance can only be assessed by the person themselves (or a proxy). For example,
the professional can judge an individual’s intelligibility to be moderate to good
on a Likert-scale or score a speech sound as ‘correct’. The L2 professional as-
sesses speech in an ‘ideal’ situation with all the knowledge, skills, expertise and
experience they have. The L2 learner, on the other hand, may qualify his/her
own intelligibility as insufficient. When queried, the L2 learner will likely talk
about a communicative situation in which he/she was not understood or in
which a misunderstanding occurred because of his/her speech (intelligibility in
context). In summary, the distinction between capacity and performance pro-
vides the L2 professional with complementary information. Even when the client
is able to produce speech sounds correctly under certain circumstances (capac-
ity), this does not necessarily translate to the ability to use these sounds in real
life situations (performance). Furthermore, addressing ‘performance’ contributes
to the fact that it is overall intelligibility that is critical for communication, which
is not necessarily directly or unequivocally the result of the identified unacquired
patterns in perception or production, as identified in ‘body functions’ (Derwing
Comprehensibility is defined by Derwing and Munro (2009) as: the listeners perception of
how easy or difficult it is to understand a given speech sample (Derwing and Munro 2009: 4).
206 Ilvi Blessenaar, Lizet van Ewijk
and Munro 2009; Munro and Derwing 2009). In this example, even if the learner
has difficulties with a particular segmental contrast both in class (capacity) and
real life (performance), this contrast may have a very limited functional load and
may well be of limited influence on the learners’ overall intelligibility (Munro
and Derwing 2006; Suzukida and Saito 2021). This contrast is therefore much less
useful to work on if we focus on participation.
The ICF is not the only model that aims to reveal interacting domains or fac-
tors. Parallels between the core principles of ICF and Dynamic System Theory
(DST), the science of complex systems, have been made for numerous areas in
healthcare (Andrews 1996; Beckman, Fernandez, and Coulter 1996; Fannin
2016; McDougall, Wright, and Rosenbaum 2010). In fact, George Engel (1977),
who is considered the founder of the principles of the ICF, has from the outset
established the relationship between the holistic biopsychosocial model and
systems theory. De Bot, Lowie, and Verspoor (2007) applied the principles of
DST to L2 and argue that a DST approach may help us to develop a more realis-
tic representation of L2 development than other linguistic theories. DST de-
scribes that cognitive, social, and environmental factors continuously interact,
resulting in the emergence of creative communicative behaviours. DST de-
scribes language development as a process that takes place through interac-
tion between the individual and its environment. After all, the main purpose of
language is participating in social experiences. DST implies that focussing on
one aspect of this process only, cannot but provide an oversimplification of re-
ality. Only when we consider the dynamic interaction of all factors, are we
able to appreciate the actual complexity of the process (de Bot, Lowie, and Ver-
spoor 2007; Verspoor 2013). DST and the ICF have in common that they reflect
multidimensionality, a holistic approach, non-linearity and perpetually chang-
ing human circumstances. ICF attempts to tease apart the multidimensionality
by the use of interacting domains, recognising that the whole is greater than
the sum of its parts (McDougall, Wright, and Rosenbaum 2010). Thus, we sug-
gest that ICF could be a way of translating DST principles into daily practice
and create a realistic representation of the life of an L2 learner. By doing so,
the gap between the theoretical and highly conceptual DST and the way this
translates into daily practice for L2 professionals could be addressed. Further
exploration of this application and empirical research is of course necessary.
Putting participation first 211
The role of SLTs in the L2 field is controversial (Grant 2014; Muller, Ball, and
Guendouzi 2000; Schmidt and Sullivan 2003). The problematic role of com-
merce in ‘accent modification’, ‘accent reduction’ and medicalisation of the
normal language learning processes (Derwing and Munro 2009) have unfortu-
nately muddled the waters of L2 pronunciation instruction for SLTs. Several re-
searchers have already shared some valid ethical considerations on this matter
(Derwing et al. 2014a; Thomson and Foote 2019). It has overshadowed the po-
tential of collaboration between SLTs and L2 trainers and the possibilities of
capitalising on each other’s strengths. The knowledge of SLTs on the impact of
communication difficulties on daily functioning, knowledge on phonetics and
phonology, therapeutic techniques and skills could be a very useful addition to
the research and daily practice concerning L2 learners. Applied linguists and L2
instructors, -teachers and -educators on the other hand, have tremendous ex-
pertise in and experience on L2 acquisition, L2 learning processes, L2 classroom
dynamics and didactics, theoretical frameworks, etc. A combination of these
strengths could benefit the assessment and training of pronunciation and in-
telligibility of L2 learners greatly. In essence, the cause of intelligibility prob-
lems is of secondary importance, if we focus on intelligibility, functioning and
participation of the L2 learner and look at expertise to achieve this.
By referring to the term ‘L2 professional’ throughout this chapter, we wish
to focus on collaboration and expertise and move away from (self-inflicted) pro-
fessional boundaries. From the perspective of the L2 learner, it does not matter
who can help them improve their intelligibility, as long as it helps them live a
fulfilling life in a society and it is done in an ethical way.
Furthermore, as the issue of immigration is a permanent resident in current
affairs and the spotlight globally remains on integration and participation of im-
migrants in society, there continues to be a high demand for L2 instruction in
general and more specifically on pronunciation and intelligibility (Blake, Knee-
bone, and McLeod, 2017; European Commission 2021; Verbakel, van den Brink,
and Groot 2020). We simply cannot afford to shy away from interprofessional col-
laboration. As the ICF allows for the use of a standard set of internationally rec-
ognized terms, it is ideally suited for broad use. This facilitates cooperation with
other professionals greatly and provides a much broader view on functioning, in-
tegration, and participation in society. It has also been translated into numerous
languages and as a result offers a solid base for international collaboration, both
in research and in practice.
212 Ilvi Blessenaar, Lizet van Ewijk
Table 2: Example questions LONT interview organised by ICF domain (Blessenaar et al. 2018).
Domain Example
Body functions Can you describe what bothers you in your pronunciation of Dutch?
Activities How well can you understand people who speak Dutch?
Participation When do you experience the most difficulties speaking Dutch?
Personal Factors How important is being intelligible in Dutch for you?
Environmental factors Are you always well understood when you speak Dutch?
For information on the LONT assessment (The SLT assessment protocol of Dutch as L2 “Log-
opedisch Onderzoeksprotocol NT2”, Blessenaar et al. 2018), please contact the authors: ilvi.
blessenaar@hu.nl, lizet.vanewijk@hu.nl.
Putting participation first 213
In the third part, phonemes at the word level, prosody at the lexical level
and at the discourse level can be assessed, based on the results of the screening.
The LONT assessment can be conducted repeatedly to monitor progress (Blesse-
naar et al. 2018). Furthermore, it allows for dynamic assessment, using cues to
probe the clients’ response to instruction. There are no normative data, as the
assessment does not strive towards comparison between clients.
Secondly, by determining which relationships exist and how they interact,
the L2 professional and the L2 learner can identify the aspects of speech that are
most detrimental to intelligibility and thus set priorities in training for that spe-
cific L2 learner. Achievable goals are set together, to work towards intelligible
conversational speech within a person’s environment, with the possibility to
evaluate these goals in detail over time. This way the L2 professional will be able
to formulate a realistic prognosis that includes relevant barriers and facilitators
and give necessary recommendations. The ICF and, more specifically, the rela-
tionships between the domains are reflected in the goals that are formulated, the
priorities that must be made, and the training means that need to be chosen.
This means that, in short, training is designed to capitalize on strengths and ad-
dress weaknesses, to facilitate the individual’s activities and participation by as-
sisting the person to acquire new skills and strategies; and to modify contextual
factors to reduce barriers and enhance facilitators of successful communication
and participation (American Speech-Language-Hearing Association 2004).
Thirdly, L2 professionals are of course aware of the individuality of learners
without the use of the ICF. They understand that each L2 learner comes from a
different background, has had different experiences, and encounters different
communicative activities. The diversity in L2 learners is huge; ranging from ex-
change students who stay in a country for a semester and spend their time with
other international students at University, someone who moves across the
world for love for the rest of their lives and learns to integrate into a new family,
to refugees hoping to return to their home when wars are over. All these cir-
cumstances bring about very different starting points, influences, and motiva-
tions in the process of learning a language. However, research shows that this
knowledge does not automatically translate into the way L2 professionals adapt
their choices or actions to best suit the learners’ needs (Cormack and Worrall
2008). The ICF could provide a framework to translate this knowledge and its
implications on instruction for L2 professionals and force them to ensure L2 in-
struction is directly related to real life.
To summarize, because the ICF considers functional implications of intelligi-
bility, it can contribute to an improvement of L2 learners’ communication in their
everyday environment. When attention is paid to the multifaceted character of
functioning, a more tailored and therefore effective training can be designed
214 Ilvi Blessenaar, Lizet van Ewijk
(WHO 2001). The ICF provides additional areas of consideration to enable appro-
priate goal setting for each individual: it considers limitations and social factors,
ultimately to bring about change in the lives of people. After all, each L2 learner
comes from a different background, participates in a different environmental
context, and engages in different activities within those contexts (Threats 2008).
As for the L2 learner, Mahmout L. is a 28-year-old Syrian man looking for help
to improve his intelligibility in Dutch. He formulated his wishes for pronuncia-
tion improvement in the following way:
Dutch people often don’t understand me. I want to improve my speech, because I want to be
a teacher again.
4.1 Assessment
During the initial session, Mahmout was assessed using LONT (Blessenaar et al
2018). The first part of this protocol consists of a screening of spontaneous speech
to determine which aspects of speech influence intelligibility the most (A). It also
contains extensive topic list for an interview to map out environmental factors (EF),
personal factors (PF) and the possible challenges in relation to activities and par-
ticipation (A&P). The second part consists of an assessment of vowels, diphthongs,
consonants, clusters and of suprasegmentals, such as word stress, intonation, and
rhythm (BF).
Putting participation first
215
LONT results
Mahmout is a 28-year-old former high school biology teacher (PF) born and
raised in Aleppo in Syria (PF). He arrived in the Netherlands in 2016, after he
fled Aleppo on his own, because he feared for his life and his family’s (EF). He
left his wife behind with the intention of applying for family reunification when
he arrived in Europe (EF). During this journey, he experienced several traumatic
incidences: near drowning, violence, fear of border patrol and starvation (PF). He
stayed at an immigration center for one year, but now has permanent resi-
dency and his wife joined him in the Netherlands in 2018 (EF). Now, he works
part-time as a computer consultant, while he tries to improve his Dutch to be
able to get into a teacher-training program (EF). Their financial and housing situ-
ation is unstable and precarious (EF). There are no relevant medical issues (PF).
Mahmout speaks Syrian Arabic at home and a lot with friends and fam-
ily (over the phone). He watches Syrian television and CNN and reads a lot of
English (EF). English was his second language and is better than his Dutch (PF). His
proficiency in Dutch is at a B1 level4 (Council of Europe 2001, 2020) (PF).
When asked ‘can you explain what made you decide to seek help?’ Mahm-
out elaborated candidly and comprehensively and was perceived as an outgo-
ing person that was not afraid to speak Dutch (P). He mentioned several examples
of activities in his daily life during which he experienced limitations: recurring
miscommunications with strangers, acquaintances, and friends, as well as fre-
quent problems during phone calls and difficulties in group conversations. He
also describes that he finds some speech sounds in Dutch very difficult, because
they do not exist in Syrian Arabic (A). He is highly motivated and decided to seek
help himself (PF).
Mahmout also described (possible future) problems in his participation in
society. He missed a promotion at work because of his limited intelligibility and
fears he will not be accepted into teacher training next year. Mahmout eventu-
ally wants to function as a Dutch-speaking professional. Mahmout describes
The CEFR is an international standard for describing language ability. It describes language
ability on a 6-point scale, starting at A1 (Beginners) going up to C2 (Mastery). It defines 5 skills
on every level: Listening, Reading, Spoken Interaction, Spoken production, Writing. B1 is de-
fined as intermediate, independent user: Can understand the main points of clear standard
input on familiar matters regularly encountered. Can deal with most situations likely to arise
where the language is spoken. Can produce simple connected text on topics which are familiar
or of personal interest. Can describe experiences and events, dreams, hopes and ambitions
and briefly give reasons and explanations for opinions and plans. (Council of Europe. Council
for Cultural Co-operation. Education Committee. Modern Languages Division 2001; Council of
Europe 2020).
Putting participation first 217
feeling limited in his social abilities because of his intelligibility. He would like
to make more meaningful connections to Dutch people (P).
In his spontaneous speech, a lot of influence of English and Arabic oc-
curred (BF). In addition, he persistently put the word stress on the first sylla-
ble, and he showed inconsistent rhythm and intonation patterns. His intelligibility
was impacted by little articulatory movement, his speech rate was fast, and his
general articulation skills weak (BF). He scored a 3 on a 5-point scale for overall
intelligibility, which correlates to ‘moderately intelligible’ (A).
The formal assessment of segments showed significant segmental mistakes,
mostly on vowels and diphthongs, both on the word level and on the level of
spontaneous speech (BF). For example, the Dutch diphthongs /œy/ and /ø/,
which do not exist in Arabic, were substituted by vowels and diphthongs that
do exist in his mother tongue.
For example:
– Dutch Huis /hœys/ ‘house’ was pronounced [haʊs]
– Dutch Deur /døːr/ ‘door’ was pronounced [dɔr]
Based on the ICF (Figure 3) and through co-creation, a training plan was made
together with Mahmout. This plan formulates goals, means, priorities and rec-
ommendations as a direct result of the relationships between the aspects within
the ICF and existing evidence in research. The goal below was formulated at
the level of participation:
In 4 months’ time, Mahmout is able to clearly convey a complex message in Dutch on the
phone and in conversations, without using English and he feels confident doing so.
This goal was formulated based on all the relevant information collected within
the ICF framework. The training plan lists the following priorities:
– contrast /œy/ - /ɑu/ and production of /œy/
– contrast /o/, /ø/- /ɔ/ and production of /o/, /ø/
– contrast /ɛɪ/ - /aːi/ and production of /ɛɪ/
– word stress
– improvement of general articulation skills
The choice of these priorities is research-based (Derwing and Munro 2005; Grant
2014; Levis 2016) and based on five principles: First of all, we focus on both segmen-
tal and suprasegmental aspects of Mahmout’s speech. Segmental and suprasegmen-
tal errors contribute at least equally to intelligibility; moreover, the existence of both
218 Ilvi Blessenaar, Lizet van Ewijk
types of errors can exacerbate intelligibility difficulties (Caspers 2009, 2010; Gordon
and Darcy 2016). There was attention to both global and segmental approach as
the general articulation skills of Mahmout were weak (Derwing, Munro, and
Wiebe 1998). Secondly, in order to improve production skills, perception exer-
cises were included for the selected contrasts (Derwing and Munro 2005; Lee,
Jang, and Plonsky 2015; Sakai and Moorman 2018). Thirdly, there was a focus on
form in the initial stages of addressing a contrast (Gordon and Darcy 2016;
Thomson and Derwing 2015), but is quickly integrated with meaning and con-
text relevant to Mahmout. We provide authentic practice material to attain the
ultimate goal of intelligible spontaneous speech (Darcy 2018; Levis 2005). We
chose the above-stated contrast as the analysis of the LONT assessment provided
us with a clear image of which features were most detrimental to Mahmout’s in-
telligibility. Additionally, research on the sound frequency of Dutch segmentals
and the segmentals most important for intelligibility and comprehensibility in-
formed these choices (Luyckx et al. 2007; Neri, Cucchiarini, and Strik 2006). For
example, the improvement of vowels and diphthongs have a higher priority
than consonants to improve intelligibility in spontaneous speech in Dutch (Neri,
Cucchiarini, and Strik 2006). A fourth and fifth important factor was the role
of explicit corrective feedback (Kissling 2013; Lee and Lyster 2017; Saito and Ly-
ster 2012) and self-monitoring (Pawlak and Szyszka 2018).
The fact that Mahmout is self-aware (PF) and shows a high level of intrinsic
motivation (PF) can be considered a facilitating factor in the prognosis. He is
also very invested in Dutch society (EF). The unstable economic factors (EF),
the fact that his exposure to Dutch is quite limited at the moment (EF) (Gurer
2019), and that he spends a considerable amount of time with a third language
(English) (EF) can be considered barriers in the prognosis. Additionally, the
presence of trauma’s (PF) can potentially create barriers (Schick et al. 2016).
The following recommendation was discussed with Mahmout: his contact
with (conversational) Dutch should be increased (Gurer 2019). This could be
achieved by signing up for a mentor-program that matches volunteers to L2
learners to enhance their opportunities to practice conversational Dutch in a
daily setting. We also recommended to (temporarily) limit his exposure to En-
glish and watch Dutch television, for example (Derwing 2018; Piske, MacKay,
and Flege 2001).
The training consisted of authentic exercises (task based, cf. Gordon 2021)
with a focus on intelligibility and on applying what was learned during instruc-
tion in daily life. He was urged to practice on a regular basis. During the course
of training, his expectations of qualifying to enter the teacher-program should
Putting participation first 219
be discussed and possibly be adjusted if he does not meet the criteria for C1
level.5
The CEFR is an international standard for describing language ability. It describes language
ability on a 6-point scale, starting at A1 (Beginners) going up to C2 (Mastery). It defines 5 skills
on every level: Listening, Reading, Spoken Interaction, Spoken production, Writing. C1 is de-
fined as advanced user: Can understand majority of sponken language even when less struc-
tured. Can express him/herself fluently including metaphorical language. Can produce detailed
discriptions about complex subjects, formulate specific points of view and round off with an ap-
propriate conclusion. Can use the language flexibly and effectively socially and professionally.
Can accurately articulate ideas and opinions and skillfully contribute to a conversation (Council
of Europe 2001, 2020).
220 Ilvi Blessenaar, Lizet van Ewijk
By doing so, we worked towards the goal we set together with Mahm-
out. Mahmout’s motivation only grew during the course of this intervention, and
he reported feeling increasingly more confident in talking Dutch (PF). Overall, he
indicated he was much more aware of when and where he had to pay extra at-
tention to his intelligibility and how to actively influence it. He reported using
self-correction a lot more frequently in daily situations (A) and a second LONT
assessment after 16 weeks indicated that the number of segmental and supra-
segmental mistakes decreased significantly, on word level (BF) as well as in
spontaneous speech (A). At work, his superiors also noticed a clear improve-
ment in meetings and calls with clients: his intelligibility and comprehensibil-
ity increased (P). The fact that Mahmout had asked for feedback from his
colleagues turned out to provide a great new social opportunity to connect with
his co-workers (P). In addition, he gained new Dutch contacts through the men-
tor program (EF) and he indicated his time spent speaking Dutch drastically in-
creased (EF). He started watching Dutch singing competitions instead of English
ones and grew to be a huge fan of a famous Dutch soap (EF).
Looking back on the goal we set together:
In 4 months, Mahmout is able to clearly convey a complex message in Dutch on the phone
and in conversations without using English and he feels confident doing so.
Mahmout himself stated that he reached this goal and quickly formulated a
new one for himself:
“I want to apply the learned techniques in my daily life and further improve my intelligibil-
ity to become a biology teacher.”
5 Conclusion
This case demonstrates how the ICF was used as a tool to determine functional
goals for L2 intervention (Threats 2006). Using the model, we demonstrated
how goals could be set that are relevant and obtainable to the individual,
Mahmout (Blake and McLeod 2019). He was able to set a realistic goal and sig-
nificantly improve his intelligibility, in all aspects of his life. The concepts of
capacity (intelligibility in a simple context such as in structured classroom ex-
ercises), and performance (intelligibility in real life situations) provide insight
into the well-known conundrum in L2 instruction: transfer of knowledge and
skills to daily life.
We have touched upon the overlap between ICF and Dynamic Systems The-
ory and argued that the ICF model could be a way of manifesting DST principles
Putting participation first 221
and translating theory into daily practice. In summary, the ICF could make a
useful addition to the tools L2 professionals have to consider the complex rela-
tionships between learner characteristics, circumstances, goals, attitudes, and
context. We have illustrated its use with a single case, supported by an exten-
sive and growing body of literature (cf. Blake and McLeod 2019). Of course, it
would be beneficial to increase the empirical knowledge on this application by
exploring multiple cases and expanding the research in L2 contexts. Addition-
ally, the ICF could be a catalyst for improved collaboration between different L2
professionals with complementary expertise. That may be one of the missing
pieces of this polymorphous puzzle that is called L2 pronunciation that we all
need to complete.
References
American Speech-Language-Hearing Association. 2004. Preferred Practice Patterns for the
Profession of Speech-Language Pathology. doi:10.1044/policy.PP2004-00191.
Anderson-Hsieh, Janet & Kenneth Koehler. 1988. The effect of foreign accent and speaking
rate on native speaker comprehension. Language Learning 38(4). 561–613.
Andrews, James. 1996. Theory and practice in speech-language pathology: A review of
systemic principles. Seminars in Speech and Language 17(2). 97–106. doi:10.1055/
s-2008-1064090.
Baker, Elise, Karen Croot, Sharynne Mcleod & Rhea Paul. 2001. Psycholinguistic models of
speech development and their application to clinical practice. Journal of Speech,
Language, and Hearing Research 44(3). 685–702.
Beckman John. F., Charles E. Fernandez & Ian D. Coulter. 1996. A systems model of health
care: A proposal. Manipulative & Physiological Therapeutics 19(3). 208–215.
Blake, Helen L., Laura Bennetts Kneebone & Sharynne McLeod. 2017. The impact of oral
English proficiency on humanitarian migrants’ experiences of settling in Australia.
International Journal of Bilingual Education and Bilingualism 22(6). 1–17. doi:10.1080/
13670050.2017.1294557.
Blake, Helen L. & Sharynne McLeod. 2019. Speech-language pathologists’ support for
multilingual speakers’ English intelligibility and participation informed by the ICF. Journal
of Communication Disorders 77. 56–70. doi:10.1016/j.jcomdis.2018.12.003.
Blessenaar, Ilvi, Emmy van Bommel, Marietta Aprea, Leonoor Oonk & Lizet van Ewijk. 2018.
Logopedisch onderzoeksprotocol NT2 [The SLT assessment protocol of Dutch as L2].
Utrecht: Hogeschool Utrecht.
Caspers, Johanneke. 2009. The perception of word stress in existing and non-existing Dutch
words by native speakers and second language learners. Linguistics in the Netherlands
26(1). 25–38. doi:10.1075/avt.26.04cas.
Caspers, Johanneke. 2010. The influence of erroneous stress position and segmental errors on
intelligibility, comprehensibility and foreign accent in Dutch as a second language.
Linguistics in the Netherlands 27. 17–29. doi:10.1075/avt.27.03cas.
222 Ilvi Blessenaar, Lizet van Ewijk
Caspers, Johanneke & Katarzyna Horłoza. 2012. Intelligibility of non-natively produced Dutch
words: Interaction between segmental and suprasegmental errors. Phonetica 69(1–2).
94–107. doi:10.1159/000342622.
Cerniauskaite, Milda, Rui Quintas, Christine Boldt, Alberto Raggi, Alarcos Cieza, Jerome
Edmond Bickenbach & Matilde Leonardi. 2011. Systematic literature review on ICF from
2001 to 2009: Its use, implementation and operationalisation. Disability and
Rehabilitation 33(4). 281–309. doi:10.3109/09638288.2010.529235
Cormack, Jane M. C. & Linda E. Worrall. 2008. The ICF body functions and structures related to
speech-language pathology. International Journal of Speech-Language Pathology
10 (1–2).9–17. doi:10.1080/14417040701759742.
Council of Europe. 2001. The Common European Framework in its political and educational
context 1.1 What is the Common European Framework? Strasbourg: Council of Europe
Publishing.
Council of Europe. 2020. Common European Framework of Reference for Languages: Learning,
Teaching, Assessment: Companion Volume. Strasbourg: Council of Europe Publishing.
Crowther, Dustin, Pavel Trofimovich, Talia Isaacs & Kazuya Saito. 2015a. Does a speaking task
affect second language comprehensibility? Modern Language Journal 99 (1). 80–95.
doi:10.1111/modl.12185.
Crowther, Dustin, Pavel Trofimovich, Talia Isaacs & Kazuya Saito. 2018. Linguistic dimensions
of second language accentedness and comprehensibility vary across speaking
tasks. Second Language Acquisition 40(2). 443–457.
Crowther, Dustin, Pavel Trofimovich, Kazuya Saito & Talia Isaacs. 2015b. Second language
comprehensibility revisited: investigating the effects of learner background. TESOL
Quarterly 49(4). 814–837. doi:10.1002/tesq.203.
Darcy, Isabelle. 2018. Powerful and effective pronunciation instruction: how can we achieve
it? The CATESOL Journal. 30(1). 13–45.
de Bot, Kees, Wander Lowie & Marjolijn Verspoor. 2007. A Dynamic Systems Theory approach
to second language acquisition. Bilingualism: Language and Cognition 10(1). 7–21.
doi:10.1017/S1366728906002732.
Derwing, Tracey M. 2003. What do ESL student say about their accents? Canadian Modern
Language Review 59(4). 547–565.
Derwing, Tracey M. 2017. The role of phonological awareness. In Peter Garrett & Josep M. Cots
(eds.), The Routledge Handbook of Language Awareness, 339–354. New York: Routledge.
https://doi.org/10.4324/9781315676494
Derwing, Tracey M. 2018. The efficacy of pronunciation instruction. In Okim Kang, Ron
I. Thomson & John M. Murphy (eds.), The Routledge Handbook of Contemporary English
Pronunciation, 320–334. New York: Routledge.
Derwing, Tracey M., Helen Fraser, Okim Kang & Ron I. Thomson. 2014a. L2 Accent and ethics:
issues that merit attention. In Ahmar Mahboob & Leslie Barrat (eds.), Englishes in
Multilingual Contexts, 63–80. New York: Springer. doi:10.1007/978-94-017-8869-4_5.
Derwing, Tracey M. & Murray J. Munro. 2005. Second language accent and pronunciation
teaching: A research-based approach. TESOL Quarterly 39(3). 379–397.
Derwing, Tracey M. & Murray J. Munro. 2009. Putting accent in its place: Rethinking obstacles
to communication. Language Teaching 42(4). 476–490. doi:10.1017/
S026144480800551X.
Putting participation first 223
Derwing, Tracey M., Murray J. Munro, Jennifer A. Foote, Erin Waugh & Jason Fleming. 2014b.
Opening the window on comprehensible pronunciation after 19 years: A workplace
training study. Language Learning 64(3). 526–548. doi:10.1111/lang.12053.
Derwing, Tracey M., Murray J. Munro & Grace Wiebe. 1998. Evidence in favor of a broad
framework for pronunciation instruction. Language Learning 48(3). 393–410.
Derwing, Tracey M., Marian J. Rossiter & Murray J. Munro. 2002. Teaching native speakers to
listen to foreign-accented speech. Journal of Multilingual and Multicultural Development
23(4). 245–259. doi:10.1080/01434630208666468.
Dragojevic, Marko & Howard Giles. 2016. I don’t like you because you’re hard to understand:
The role of processing fluency in the language attitudes process. Human Communication
Research 42(3). 396–420. doi:10.1111/hcre.12079.
Engel, George Lucas. 1977. The need for a new medical model: A challenge for biomedicine.
Science 196(4286). 129–136. doi: 10.1126/science.847460.
European Commission. 2021. “Statistics on migration in Europe”. European Commission.
https://ec.europa.eu/info/strategy/priorities-2019-2024/promoting-our-european-way-
life/statistics-migration-europe_en. (accessed 25 May 2021).
Fannin, Danai Kasambira. 2016. The intersection of culture and ICF-CY personal and
environmental factors for alternative and augmentative communication. Perspectives of
the ASHA Special Interest Groups 12(1). 63–82.
Garcia, Linda J., Chantal Laroche & Jacques Barrette. 2002. Work integration issues go beyond
the nature of the communication disorder. Journal of Communication Disorders 35(2).
187–211.
Gordon, Joshua. 2021: Pronunciation and task-based instruction: Effects of a classroom
intervention. RELC Journal 52(1). 94–109. doi:10.1177/0033688220986919.
Gordon, Joshua & Isabelle Darcy. 2016. The development of comprehensible speech in L2
learners. Journal of Second Language Pronunciation 2(1). 56–92. doi:10.1075/
jslp.2.1.03gor.
Grant, Linda. 2014. Pronunciation myths: Applying Second Language Research to Classroom
Teaching. Ann Arbor: University of Michigan Press.
Gurer, Cuneyt. 2019. Refugee perspectives on integration in Germany. American Journal of
Qualitative Research 3(2). 52–70. doi:10.29333/ajqr/6433.
Gurzynski-Weiss, Laura, Avizia Yim Long & Megan Solon. 2017. TBLT and L2 pronunciation.
Studies in Second Language Acquisition 39(2). 213–224. doi:10.1017/
S0272263117000080.
Hahn, Laura D. 2004. Primary stress and intelligibility: Research to motivate the teaching of
suprasegmentals. TESOL Quarterly 38(2). 201–223. doi:10.2307/3588378.
Heerkens, Yvonne F. & Joost de Beer. 2007. International classification of functioning
disability and health: Gebruik van de ICF in de logopedie. Logopedie en Foniatrie 4.
112–119.
Howe, Tami J. 2008. The ICF Contextual Factors related to speech-language pathology.
International Journal of Speech-Language Pathology 10(1–2). 27–37. doi:10.1080/
14417040701774824.
Huber, Machteld, J. André Knottnerus, Lawrence Green, Henriëtte van der Horst, Alejandro
R. Jadad, Daan Kromhout, Brian Leonard, Kate Lorig, Maria Isabel Loureiro, Jos W. M. van
der Meer, Paul Schnabel, Richard Smith, Chris van Weel & Henk Smid. 2011. How should
we define health? BMJ (Online) 343(7817). 1–3. doi:10.1136/bmj.d4163.
224 Ilvi Blessenaar, Lizet van Ewijk
Munro, Murray J. & Tracey M. Derwing. 1995. Foreign accent, comprehensibility, and
intelligibility in the speech of second language learners. Language Learning 45(1).
73–97.
Munro, Murray J. & Tracey M. Derwing. 2006. The functional load principle in
ESL pronunciation instruction: An exploratory study. System 34(4). 520–531.
doi:10.1016/j.system.2006.09.004.
Munro, Murray J. & Tracey M. Derwing. 2009. Putting accent in its place: rethinking obstacles
to communication. Language Teaching 42(4). 476–490. doi:10.1017/S0261444811000103.
Munro, Murray J. & Tracey M. Derwing. 2011. The foundations of accent and intelligibility in
pronunciation research. Language Teaching 44(3). 316–327. doi:10.1017/
S0261444811000103.
Neri, Ambra, Catia Cucchiarini & Helmer Strik. 2006. Selecting segmental errors in non-native
Dutch for optimal pronunciation training. IRAL – International Review of Applied
Linguistics in Language Teaching 44(4). 357–404. doi:10.1515/IRAL.2006.016.
O’Halloran, Robyn O. & Brigette Larkins. 2008. The ICF activities and participation related to
speech-language pathology. International Journal of Speech-Language Pathology
10(1–2). 18–26. doi:10.1080/14417040701772620.
Pawlak, Mirosław & Magdalena Szyszka. 2018. Researching pronunciation learning strategies:
An overview and a critical look. Studies in Second Language Learning and Teaching 8(2).
293–323. doi:10.14746/ssllt.2018.8.2.6.
Piske, Thorsten, Ian R. A. MacKay & James. E. Flege. 2001. Factors affecting degree of foreign
accent in an L2: A review. Journal of Phonetics 29(2). 191–215. doi:doi:10.006/
jpho.2001.0134.
Saito, Kazuya & Roy Lyster. 2012. Investigating the pedagogical potential of recasts for L2
vowel acquisition. TESOL Quarterly 46(2). 387–398. doi:10.1002/tesq.25.
Sakai, Mari & Colleen Moorman. 2018. Can perception training improve the production
of second language phonemes? A meta-analytic review of 25 years of perception training
research. Applied Psycholinguistics 39(1) 187–224. doi:10.1017/S0142716417000418.
Schick, Matthis, Andre Zumwald, Bina Knöpfli, Angela Nickerson, Richard A Bryant, Ulrich
Schnyder, Julia Müller & Naser Morina. 2016. Challenging future, challenging past: the
relationship of social integration and psychological impairment in traumatized refugees.
European Journal of Psychotraumatology 7(1). 28057. doi:10.3402/ejpt.v7.28057.
Schmidt, Anna Marie & Shannon Sullivan. 2003. Clinical training in foreign accent
modification: A national survey. Contemporary Issues in Communication Science and
Disorders 30(Fall). 125–135.
Suzukida, Yui & Kazuya Saito. 2021. Which segmental features matter for successful L2
comprehensibility? Revisiting and generalizing the pedagogical value of the functional
load principle. Language Teaching Research 25(3). 431–450. doi:10.1177/
1362168819858246.
Thomson, Ron. I. & Tracey M. Derwing. 2015. The effectiveness of L2 pronunciation
instruction: A narrative review. Applied Linguistics 36(3). 326–344. doi:10.1093/applin/
amu076.
Thomson, Ron I. & Jennifer A. Foote. 2019. Pronunciation teaching: Whose ethical domain is it
anyways? In John Levis, Charles Nagle & Erin Todey (eds.), Proceedings of the 10th
Pronunciation in Second Language Learning and Teaching Conference, vol. 2018,
213–235. Ames, IA: Iowa State University.
226 Ilvi Blessenaar, Lizet van Ewijk
1 Introduction
A long-time concern of second language pronunciation research is learners’
mixed success acquiring certain novel segments of the target language. While
some segments are acquired relatively easily and early, others are acquired
later, or in some cases, not at all (Archibald 2021; O’Brien 2021). One such case
is that of Francophone learners of English /h/, a segment that is frequently
https://doi.org/10.1515/9783110736120-009
230 Susan Jackson, Walcir Cardoso
deleted at all levels of proficiency, even when the other phonemes of English
have been mastered (see e.g., Janda and Auger 1992). As such, h-deletion (indi-
cated by a single quotation mark ‘, as in ‘owever, ‘istory instead of /h/owever
and /h/istory, respectively) is a recognizable feature of French-accented En-
glish. Learners’ difficulty with /h/ is a somewhat unique case in that it is nei-
ther a problem of articulation (/h/-insertion is also common) nor necessarily
one of perception, although their discrimination of [h]/Ø pairs (e.g., eat and
heat) has been shown to be weaker than other contrasts such as [i]/[I] or [t]/[θ],
although well above chance (e.g., LaCharité and Prévost 1999; Mielke 2008).
Yet, this phenomenon has not been well studied, with only a handful of excep-
tions (see e.g., Janda and Auger 1992; John 2006; LaCharité and Prévost 1999;
Mah 2011).
French has <h> orthographically (<h> represents the letter h as in hour and
hot), but it does not correspond to any phoneme that is overtly realized in the
language. In certain cases, it does have a phonological status as a phantom con-
sonant (Walker 2001) triggering liaison-blocking (i.e., h-aspiré), a phenomenon
that blocks across-word resyllabification of coda-onset sequences such as les ha-
macs ‘the hammocks’, pronounced [le.a.mak], not ✶[le.za.mak]. More commonly,
<h> is purely orthographic with no influence on neighboring sounds, so that a
phrase such as les habits ‘the clothes’ undergoes resyllabification and thus is pro-
nounced [le.za.bi], not ✶[le.a.bi]. Regardless, <h> is uniformly silent in French
and learners may transfer this knowledge over to their productions of English.
In English, on the other hand, the pronunciation of <h> varies: it is usually
pronounced at the beginning of words with the exception of a handful of loan-
words from French where it is silent, such as in hour or honour, as well as some
dialect-dependent deletions in words such as herb in American English. It is
also pronounced at the head of non-word-initial syllables with primary or sec-
ondary stress (e.g., inherent and alcohol), with certain exceptions in some dia-
lects, for example in the word Nottingham in many British varieties. However, it
is subject to categorical deletion at the head of weak syllables (e.g., vehicle)
and variable deletion in function words (e.g., hers, him, have) when not phrase
initial or subject to focus. In all other positions, <h> is silent, including when
part of consonant cluster, e.g., ghost and though. Considering the numerous in-
stances of <h> being silent or deleted (categorically or variably), a learner may
encounter it far more frequently in writing than they will hear it in speech.
Moreover, the environment in which it is deleted depends on syllable stress,
which is a particularly challenging feature of English phonology for Franco-
phone learners (Dupoux et al. 1997; Peperkamp, Vendelin, and Dupoux 2010).
This means that when it should and should not be pronounced may be experi-
enced as unpredictable to the learner. Therefore, there is not only an incongruent
Orthographic interference in the acquisition of English /h/ by Francophones 231
mapping between the grapheme to phoneme between French and English, but an
inconsistent grapheme-to-phoneme correspondence (GPC) in English, which we
propose is a contributing factor to the difficulties Francophones have with En-
glish /h/.
The role that orthography plays when learning new words is one way in
which second language (L2) acquisition can be set apart from first language (L1)
acquisition. Unlike L1 learners who are exposed to auditory input well before
they learn to read, L2 learners typically encounter the spoken and written forms
of words together, often in formal instruction through reading and writing. Even
before this, in a bilingual country such as Canada, children may become aware
of the written forms of words in the second language, widely available on prod-
uct packaging, for example, before they learn their pronunciation.
While there is considerable evidence that orthography is encoded as part of
a lexical entry and has an effect on speech processing in the L1 (e.g., Castles,
Wilson, and Coltheart 2011; Frost and Zigler 2007; Saletta, Goffman, and Hogan
2016), research has also demonstrated its effect on L2 speech processing and
production (e.g., Bürki et al. 2019; Escudero 2015; Hayes-Harb, Nicol, and
Barker 2010; Shea 2017; Showalter and Hayes-Harb 2015; Rafat 2016). Therefore,
it is worthwhile looking at the effect of written input when investigating L2 pho-
nology, especially sounds in the second language that may pose particular
problems for learners. While Francophone learners of English are likely affected
by the mismatch between the grapheme-to-phoneme correspondence for <h> in
their L1 and in the target language, they may find this segment particularly
challenging due to the complexity around when it is pronounced and when it is
silent. The question we explore in this pilot study is whether Francophone
learners exploit English orthography during word learning and, if so, whether
the observed variability in the pronunciation of <h> is a contributing factor in
their difficulty encoding /h/ as part of a lexical representation.
2 Background
2.1 The effect of orthography on L2 speech processing
and lexical representations
Certain factors have been shown to influence the degree to which learners attend
to spelling during word learning. One is the transparency of the L1 orthographic
system, or orthographic depth, which can lead to either an over- or under-reliance
on orthography. Transparency is defined as the number of one-to-one or one-to-
many relationships between phonemes and graphemes. A language with a trans-
parent orthography has a larger number of one-to-one correspondences and is,
therefore, a reliable representation of a word’s phonological form, as is the case
with Spanish and German. An orthography with an abundance of one-to-many or
many-to-one relationships, as is the case with English, is considered opaque.
Learners whose L1 has an opaque orthography may experience less interference
from the L2 orthography during word learning simply because they are accus-
tomed to not relying on it. The inverse may also be true: L1 speakers of phonologi-
cally transparent orthographies may over-rely on the orthographic forms. In a
study by Erdener and Burnham (2005), Turkish (transparent) participants outper-
formed their Australian English (opaque) participants in their productions of L1
Spanish (transparent) words after trainingbut performed less well on their produc-
tions of Irish words (opaque). The rationale being that their reliance on orthogra-
phy lead to more confusion. French is considered to have an opaque orthography,
but unlike English, the opacity is not bidirectional: the pronunciation of a written
word is relatively predictable, while the spelling of an unknown word upon hear-
ing it is not predictable due the frequent use of “silent” letters (Marjou 2019).
Orthographic interference in the acquisition of English /h/ by Francophones 233
Another factor which has been shown to have an influence is congruency of the
grapheme-phoneme relationship of a particular contrast between the L1 and
the L2. In a novel word learning task with L1 English speakers, Hayes-Harb,
Nicol, and Barker (2010) found that incongruencies in the GPC for which there
was no counterpart in English – for example, the spelling <faza> paired with
the auditory [fɑʃə]) – lead L1 participants to perform more poorly on an auditory
word-picture matching task, demonstrating interference from their L1 ortho-
graphic conventions. However, if a particular contrastive pair shows a similar
correspondence across the L1 and L2, regardless of phonological similarity,
learners are able to make use of the spelling during word learning. For exam-
ple, Escudero, Simon, and Mulak (2014) found that their L1 Spanish partici-
pants who were exposed to both auditory and orthographic forms of Dutch
pseudo-words during training performed better on contrasts that were phono-
logically different but congruent in both languages (one-to-one match between
both), and worse on vowel pairs in which the GPCs in Dutch were incongruent
with those of in their native language, Spanish. Escudero (2015), however,
found that there was no effect of orthographic transparency, and orthography
only helped learners as a redundant cue on perceptually easy contrasts.
A one-to-many correspondence may be the result of an allophonic alterna-
tion, and here too, L1 orthography has been found to cause interference in
word processing in the L2. Shea (2017) tested L1 English learners of Spanish on
their processing of intervocalic stop-approximant alternation (e.g., <nada>
‘nothing’, [naða]). The shared stop graphemes (<b, d, g>) correspond to one
phone in the L1 but two variants in the L2. In a lexical decision task with cross-
modal and within-modal priming, participants activated the L1 stop variant
more strongly than the L2 approximant allophone when primed by the written
form of a word such as cabello [kaβeʝo], but not when the prime was auditory.
In another study which examined allophonic alternations, Hayes-Harb,
Brown, and Smith (2018) found a similar effect in a production task where
written input interfered in L1 English speakers’ acquisition of German coda
devoicing in a novel word learning task that included minimal pairs such as
<trop>/<trob>. Learners in the with-spelling condition failed to neutralize the
coda voicing in their productions of words spelling with <b,d,g> word-finally,
while those not exposed to the spelling performed similarly to native speaker
controls. This effect persisted even after participants received explicit instruction
as to the allophonic contrast. Both these studies point to the persistent, strong
influence of L1 grapheme-to-phoneme relationships in L2 lexical representations.
234 Susan Jackson, Walcir Cardoso
The first research question we asked was whether Francophone learners of En-
glish attend to the spelling of a word during word learning. If they do not, there
should be no difference in their ability to encode /h/ as part of a newly learned
word whether presented with the spelling during word learning or the pronun-
ciation alone.
The second research question we asked was whether an inconsistency in
the GPC during word learning would affect the Francophone participants’ abil-
ity to encode /h/ as part of a newly learned word. If it does not, then results
should be similar between the consistent and inconsistent spelling conditions.
However, if it does, then we would expect lower accuracy rates for participants
who were exposed to inconsistent spelling during learning.
Orthographic interference in the acquisition of English /h/ by Francophones 235
3 Method
3.1 Participants
3.2 Materials
Consistent spelling houl [hul] oul [ul] mep [mɛp] tep [tɛp]
Inconsistent spelling houl [hul] oul [ul] mep [mɛp] tep [tɛp]
3.3 Procedure
The experiment was conducted online using gorilla.sc (Anwyl-Irvine et al. 2020),
a web-based experimental software. Participants were given a URL and a unique
access code to begin the experiment using their own computer and headphones.
They were offered a $5 Amazon electronic gift certificate for their time.
In the word learning phase, participants were presented with an image of a
novel object on-screen and simultaneously heard the audio of the label for that
object, a non-word conforming to English phonotactics. Depending on the ex-
perimental group in which they were placed, they either only heard the audio,
or they were also presented with the spelling of the label. In each block, four
words were presented in a random sequence with a three-second delay between
each. Participants were then told they would be tested on their memory of
these four words. All instructions were given in French to ensure comprehen-
sion, due to the variability in English proficiency, as indicated earlier.
On the next screen, participants saw the four objects randomly displayed in a
grid. One of the words was presented auditorily, with or without its spelling dis-
played depending on the experimental group, and participants were instructed to
click on the corresponding object (see Figure 1). If they answered correctly, they
were given feedback in the form of a green checkmark, and the next test word
from the set was presented with the same four images. If they selected the incor-
rect image, a red X appeared briefly, the image was removed from the grid, and
they could try again with the remaining three images. Images were removed until
the response was correct. Any incorrect responses resulted in the whole task being
repeated. Before moving on to the next block of four words, participants needed to
get all four correct on the first try. Ten blocks of four words were presented in this
manner. The training allowed for up to five attempts to score 100% for each block,
but only one participant required more than two rounds. Training took on average
20 minutes to complete.
To maintain engagement throughout the word learning phase (see Bell 2018
for the rationale), participants were congratulated for completing each block and
they collected tokens: pieces of pie to complete a full pie in the first five blocks
(Figure 2), and penguins to collect a family of penguins for the second five blocks.
[Congratulations! 100%! You have obtained a piece [Wonderful! You have the complete pie! We will
of pie. Let’s try 4 new words. When you are ready, proceed to the second part. When you are ready, click
click on the button below.] on the button below.]
Once the learning phase was completed, participants were asked to take a break
of 30 minutes during which a countdown timer appeared on the screen, and the
experiment was locked. In the main test that followed, images of objects were
presented one by one in sets of ten with either the correct audio or the minimal
pair counterpart. No spelling appeared on the screen during the test. Participants
were instructed to click on a green ‘thumbs up’ icon if they thought they heard
the correct label, or a red ‘thumbs down’ icon if they thought the label they
heard was incorrect. They completed four sets in all, totaling 40 trials. No feed-
back was given during the test, but they received a final score at the end. The
experiment took on average 60 minutes to complete, including the break.
3.4 Analysis
Inconsistent Spelling group were compared between words learned with silent
<h> versus those in which <h> was pronounced.
4 Results
The percent of correct responses on the word–picture matching test was calcu-
lated for each participant in each experimental group for matched and mis-
matched word–picture pairs separately. Group means and standard deviations
(SD, in parentheses) are presented in Table 2.
Table 2: Mean percent correct for the matched and mismatched word–picture pairs, by word
learning group (Learning Condition).
Targets Fillers
Consistent Spelling (n=) . (.) . (.) . (.) . (.)
Auditory (n=) . (.) . (.) . (.) . (.)
Inconsistent Spelling (n=) . (.) . (.) . (.) . (.)
As Table 2 illustrates, performance on the matched pairs was high for both the
targets and fillers across all three conditions, but poorer on the mismatched
items in each case. For mismatched targets (e.g., when they saw an image of a
houl and heard [ul] or vice versa), correct scores were near chance for the Audi-
tory group and well below chance for the Inconsistent Spelling group.
A Kruskal-Wallis H test was run to investigate the overall impact of learning
condition on the percent of correct responses. Distributions of test scores for target
pairs were not similar between groups, as assessed by visual inspection of a box-
plot, nor were they statistically significant. Looking at matched and mismatched
targets separately, the same test revealed a statistically significant difference be-
tween learning condition groups for matched targets alone, H(2) = 9.319, p = .009,
but none for mismatched targets.
Multiple Mann-Whitney tests were then run to determine if there were signifi-
cant differences in test scores on targets between pairs of learning condition
groups. While scores were not significantly different between the Consistent Spell-
ing and Auditory groups nor the Inconsistent Spelling and Auditory Groups, scores
for the Consistent Spelling group (mean rank = 7.5) were statistically significantly
240 Susan Jackson, Walcir Cardoso
higher than those for the Inconsistent Spelling group (mean rank = 3.5),
U = 2.5, z = −2.128, p = .032. Analysing matched and mismatched targets separately
revealed only the matched target pairs between the Consistent Spelling and Incon-
sistent Spelling groups were significantly different, U =.000, z = −2.730, p = .008.
Nonetheless, a visual pattern in the data can be seen in Figure 3.
100
90
80
70
60
50 Target Match
40 Target Mismatch
30
20
10
0
Consistent Spelling Auditory Inconsistent Spelling
Figure 3: Mean percent correct condition for target matched and mismatched word–picture
pairs by word learning condition. Bars indicate standard error.
Match Mismatch
5 Discussion
This study was a preliminary investigation into whether the presence or ab-
sence of a written form would affect Francophone’s encoding of English /h/
during word learning, and whether inconsistency in the grapheme-phoneme
correspondence had the effect of making accurate encoding more difficult.
The first research question we addressed was whether Francophone learn-
ers of English rely on the orthography during word learning. The difference in
response accuracy rates on targets between the Consistent and Inconsistent
conditions provides evidence that they do, as these scores should have been
similar if the presence or absence of <h> in the spelling was inconsequential.
This result is inconsistent with studies that demonstrate that learners whose
L1 orthographic system is opaque, such as French, rely less on the spelling (e.g.,
Erdener and Burnham 2005), but it does fit with the bidirectional nature of opac-
ity in languages such as French (i.e., spelling is more predictive of pronunciation
than pronunciation is of spelling). This suggests that Francophones do in fact
rely on the orthography when learning the pronunciation of a word if given the
opportunity. As anecdotal evidence, observe the statement by one participant
after the experiment: “I gave myself reference points with the image and the
word, but associating the words I heard and the image was much more difficult.”
It was possible that relying on the orthography to help establish a distinc-
tion would have led to higher scores in the Consistent Spelling condition over
the Auditory condition, but no significant difference was found. There was,
242 Susan Jackson, Walcir Cardoso
time, an incubation period of at least 12 hours during which the learner has slept
is needed for it to enter into lexical competition (Dumay and Gaskell 2007).
While the results in this study did not show an overall significant difference be-
tween the Spelling and Auditory conditions in word learning, low scores in the
inconsistent spelling condition highlighted a possible inhibitory effect on learn-
er’s ability to encode /h/ as part of a word, especially given that it mirrored the
real-world variability of /h/ pronunciation. If this pattern is replicated in a
larger study, the question to ask is how such findings may be used to inform a
pedagogical approach to teaching this difficult segment: how can /h/ be pre-
sented to learners in a way that might help them establish more target-like rep-
resentations in their mental lexicon and be able to produce /h/ accurately?
One possibility is to set aside the spelling when teaching the pronunciation
of h-words and use pictures instead. The purpose would be to develop and rein-
force an association between the phonological form of a word and its meaning
without the interference of orthography. Learners could play word-picture
matching games such as pronunciation bingo or be asked to listen to a story or
song containing minimal pairs and choose the correct image from a worksheet.
Participating in picture identification exercises would not only highlight the
difference between a minimal pair, but also strengthen the association of /h/
with individual words. This would be especially important at the lower profi-
ciency levels, before the spelling has become part of the learner’s representa-
tion of a word through practice with reading and writing.
One issue pointed out by Trofimovich and John (2009) is that the number
of pairs that can be created by /h/- and a vowel- initial counterpart is minimal,
and many of them do not lend themselves easily to illustration (e.g., had-add).
Nonetheless, strengthening the representations of some words with /h/ onsets
may help learners to both notice when /h/ should be pronounced and to gener-
alize to other /h/-initial words. Pairing pronunciation with other channels of
sensory perception is also possible if pictures are not feasible, such as the use
of tactile or kinesthetic reinforcement (Celce-Murcia et al. 2010; Chan 2018).
Learners could use touch or gestures with /h/ words when learning new vocab-
ulary or when reciting rhymes and songs.
Finally, increasing the frequency of /h/ in oral input in the instructional set-
ting is another potentially helpful strategy. Aside from the handful of words
where initial <h> is silent, a great number of other potential /h/ tokens are de-
leted in natural speech, or they occur in environments that hinder its perceptual
244 Susan Jackson, Walcir Cardoso
salience (Jackson and Cardoso 2017). However, /h/ is deleted less often in careful
speech, such as that used in the classroom, and this could be reinforced through
the addition of activities such as reading aloud to students (for the rationale, see
Collins et al. 2009).
Together, these approaches – using pictures, kinesthetic reinforcement, and
increasing its frequency in oral input – may well help learners distinguish these
words and establish accurate representations. At later stages, it would then be
possible to explicitly teach words where /h/ is silent. In pronunciation materials
used in the ESL classroom, there is some focus on the instances where /h/ is silent
but much less is typically given to the wider variability in /h/ production. There-
fore, learners may also be taught the phonological contexts in which it is deleted.
6 Conclusion
The question addressed in this pilot study was whether the difficulty Franco-
phone learners have with English /h/ may be partially due to orthographic inter-
ference and its inconsistent grapheme-to-phoneme correspondences in English.
Although the sample size was small, the results do point to this being a contribut-
ing factor. This issue may be compounded with the unpredictability of when /h/
should be pronounced and when it should not: it is either uniformly silent for
some words, or subject to rule-governed deletion in contexts that may not be re-
coverable for a Francophone learner (e.g., when at the head of a weak syllable).
The range of scores in the current study indicate that it would be worthwhile
investigating other variables to see which most strongly correlate with accurate ver-
sus inaccurate encoding of /h/. One of the more obvious is level of English profi-
ciency. Future research comparing learners of different levels of proficiency might
uncover an effect of experience with English as more advanced learners may have
trouble overcoming an entrenched pattern. Notably, the Inconsistent Spelling con-
dition contained all upper intermediate level speakers (a result of random group
assignment), and accuracy rates were lowest in this condition. Also interesting may
be individual ability to discriminate between /h/-initial words and their vowel-
initial counterparts. A typical reason given for why Francophones have such diffi-
culty with /h/ is its weak perceptual salience (e.g., Collins et al. 2009). However,
while they do not discriminate between h- and vowel initial pairs as well as Anglo-
phones, they have been shown to perform above chance on discrimination tasks
(e.g., Mah 2011; Mielke 2008). In addition, adding a production task could deter-
mine whether the scores on word learning correlate with accurate productions of
/h/-initial words. It may be the case, that the inconsistencies in the grapheme-to-
Orthographic interference in the acquisition of English /h/ by Francophones 245
References
Anwyl-Irvine, Alexander L., Jessica Massonnié, Adam Flitton, Natasha Kirkham & Jo
K. Evershed. 2020. Gorilla in our midst: An online behavioural experiment builder.
Behaviour Research Methods 52(1). 388–407.
Archibald, John. 2021. Ease and Difficulty in L2 Phonology: A Mini-Review. Frontiers in
Communication 6. https://doi.org/10.3389/fcomm.2021.626529
Bell, Kevin. 2018. Game on!: Gamification, Gameful Design, and the Rise of the Gamer
Educator. Baltimore: Johns Hopkins University Press.
Bürki, Audrey, Pauline Welby, Mélanie Clément & Elsa Spinelli. 2019. Orthography and second
language word learning: Moving beyond “friend or foe?” The Journal of the Acoustical
Society of America 145(4). EL265–EL271.
Castles, Anne, Katherine Wilson & Max Coltheart. 2011. Early orthographic influences on
phonemic awareness tasks: Evidence from a preschool training study. Journal of
Experimental Child Psychology 108(1). 203–210.
Celce-Murcia, Marianne, Donna M. Brinton, Janet M. Goodwin & Barry Griner. 2010. Teaching
Pronunciation: A Reference for Teachers of English to Speakers of Other Languages. 2nd
edn. Cambridge: Cambridge University Press.
Chan, M. J. 2018. Embodied Pronunciation Learning: Research and Practice. CATESOL Journal
30(1). 47–68.
Collins, Laura, Pavel Trofimovich, Joanna White, Walcir Cardoso & Marlise Horst. 2009. Some
input on the easy/difficult grammar question: An empirical study. The Modern Language
Journal 93(3). 336–353.
Cutler, Anne, Andrea Weber & Takashi Otake. 2006. Asymmetric mapping from phonetic to
lexical representations in second-language listening. Journal of Phonetics 34(2). 269–284.
Dumay, Nicolas & M. Gareth Gaskell. 2007. Sleep-associated changes in the mental
representation of spoken words. Psychological Science 18(1). 35–39.
Dupoux, Emmanuel, Christophe Pallier, Nuria Sebastian & Jacques Mehler. 1997. A
destressing “deafness” in French? Journal of Memory and Language 36(3). 406–421.
Erdener, V. Doǧu & Denis K. Burnham. 2005. The role of audiovisual speech and orthographic
information in nonnative speech production. Language Learning 55(2). 191–228.
246 Susan Jackson, Walcir Cardoso
Escudero, Paola. 2015. Orthography plays a limited role when learning the phonological forms
of new words: The case of Spanish and English learners of novel Dutch words. Applied
Psycholinguistics 36(1). 7–22.
Escudero, Paola, Rachel Hayes-Harb & Holger Mitterer. 2008. Novel second-language words
and asymmetric lexical access. Journal of Phonetics 36(2). 345–360.
Escudero, Paola, Ellen Simon & Karen Mulak. 2014. Learning words in a new language:
Orthography doesn’t always help. Bilingualism: Language and Cognition 17(2). 384–395.
Escudero, Paola & Karen Wanrooij. 2010. The effect of L1 orthography on non-native vowel
perception. Language and Speech 53(3), 343–365.
Frost, Ram & Johannes C. Ziegler. 2007. Speech and spelling interaction: The interdependence
of visual and auditory word recognition. In M. Gareth Gaskell (ed.), The Oxford Handbook
of Psycholinguistics, 107–118. Oxford: Oxford University Press.
Hayes-Harb, Rachel, Kelsey Brown and Bruce L. Smith. 2018. Orthographic input and the acquisition
of German final devoicing by native speakers of English. Language and Speech 61(4). 547–564.
Hayes-Harb, Rachel, Janet Nicol & Jason Barker. 2010. Learning the phonological forms of new
words: effects of orthographic and auditory input. Language and Speech 53(3). 367–381.
Horst, Jessica S. & Michael C. Hout. 2015. The Novel Object and Unusual Name (NOUN)
Database: A collection of novel images for use in experimental research. Behavior
Research Methods 48(4). 1393–1409.
Jackson, Susan & Walcir Cardoso. 2017. The acquisition of English /h/ by Francophones: Input
frequency and perceptual salience in a corpus study. In Jaime Demperio, Suzanne
Springer, & Beau Zuercher (eds.), Proceedings of the Meeting on English Language
Teaching. Québec: Université du Québec à Montréal Press.
Janda, Richard D. & Julie Auger. 1992. Quantitative evidence, qualitative hypercorrection,
sociolinguistic variables – And French speakers’ ‘eadhaches with English h/Ø. Language
& Communication 12(3–4). 195–236.
John, Paul. 2006. Variable h-epenthesis in the interlanguage of Francophone ESL learners.
Montreal, Canada: Concordia University MA thesis.
LaCharité, Darlene & Philippe Prévost. 1999. Le rôle de la langue maternelle et de
l’enseignement dans l’acquisition des segments de l’anglais langue seconde par des
apprenants francophones. Langues et linguistique 25. 81–109.
Mah, Jennifer. 2011. Segmental representations in interlanguage grammars: the case of
francophones and English /h/. Montreal, Canada: McGill University dissertation.
Marjou, Xavier. 2019. OTEANN: Estimating the Transparency of Orthographies with an Artificial
Neural Network. Retrieved from https://arxiv.org/abs/1912.13321v3
Mielke, Jeff. 2008. Interplay between perceptual salience and contrast: /h/ perceptibility in
Turkish, Arabic, English, and French. In Peter Avery, Elan Dresher & Keren Rice (eds.), Contrast
in Phonology: Theory, Perception, Acquisition, 173–192. Berlin, New York: Mouton de Gruyter.
O’Brien, Mary. 2021. Ease and Difficulty in L2 Pronunciation Teaching: A Mini-Review. Frontiers
in Communication 6. https://doi.org/10.3389/fcomm.2020.626985
Peperkamp, Sharon, Inga Vendelin & Emmanuel Dupoux. 2010. Perception of predictable
stress: A cross-linguistic investigation. Journal of Phonetics 38(3). 422–430.
Rafat, Yasaman. 2016. Orthography-induced transfer in the production of English-speaking
learners of Spanish. The Language Learning Journal 44(2). 197–213.
Saletta, Meredith, Lisa Goffman & Tiffany P. Hogan. 2016. Orthography and modality influence
speech production in adults and children. Journal of Speech, Language, and Hearing
Research 59(6). 1421–1435.
Orthographic interference in the acquisition of English /h/ by Francophones 247
Shatzman, Keren B. & James M. McQueen. 2006. Segment duration as a cue to word
boundaries in spoken-word recognition. Perception & Psychophysics 68(1). 1–16.
Shea, Christine. 2017. L1 English/L2 Spanish: Orthography–phonology activation without
contrasts. Second Language Research 33(2). 207–232.
Showalter, Catherine E. and Rachel Hayes-Harb. 2015. Native English speakers learning
Arabic: The influence of novel orthographic information on second language phonological
acquisition. Applied Psycholinguistics 36(1). 23–42.
Statistics Canada. 2017. Focus on Geography Series, 2016 Census. Statistics Canada
Catalogue no. 98-404-X2016001. Ottawa, Ontario. Retrieved May 7th from Statistics
Canada: https://www12.statcan.gc.ca/census-recensement/2016/as-sa/fogs-spg/Facts-
cma-eng.cfm?LANG=Eng&GK=CMA&GC=485&TOPIC=5
Trofimovich, Pavel & Paul John. 2011. When ‘three’ equals ‘tree’: Examining the nature of
phonological entries in L2 lexicons of Quebec speakers of English. In Pavel Trofimovich &
Kim McDonough (eds.), Applying priming methods to L2 learning, teaching and research:
Insights from psycholinguistics, 105–129. Amsterdam: John Benjamins.
Walker, Douglas C. 2001. French Sound Structure (Vol. 1). Calgary: University of Calgary Press.
Weber, Andrea & Anne Cutler. 2004. Lexical competition in non-native spoken-word
recognition. Journal of Memory and Language 50(1). 1–25.
Auditory P F intermediate
P F beginner >
P M upper intermediate
P F intermediate –
P F elementary >
P F low intermediate –
1 Introduction
The importance of learning English skills has been a focus of education courses
around the world due to the globalization of economies. In order to communicate
with other people in English, there are many skills to be mastered: English gram-
mar, vocabulary, and syntax, which together constitute the basic knowledge of
English itself, but there is also socio-cultural understanding, listening, and
speaking, as well as non-verbal communication skills such as facial and manual
gestures (Acton 1984; Smotrova 2017). Unquestionably, pronunciation plays the
most crucial role in oral interaction, and pronunciation errors may lead to severe
breakdowns in communication; therefore, the teaching and learning of correct
Acknowledgments: This study was supported by a Grant-in-Aid for Scientific Research promoted
by JSPS (the Japan Society for the Promotion of Science; Grant No. 17K02951, 18K00787). VER-
SON2 and Nissho Co. helped with the development of the ICT materials.
https://doi.org/10.1515/9783110736120-010
250 Yuri Nishio, Akiyo Joto
they did not have the opportunity to compare their own pronunciation simulta-
neously unless they used a mirror to watch their mouth moving. We, therefore,
developed our ICT training with a self-video, and in our study, we examined how
the ICT training with a self-video could improve the learners’ pronunciation in
comparison with the ICT training without a self-video. The ICT material to be
learned should involve familiar lexical items, which would be retained more eas-
ily by the learners (Carley and Mees 2020). Building on this idea, we chose the
names of the letters in the English alphabet because it is introduced to English
beginners at quite an early stage, so they should know how to pronounce the
names of the letters of the alphabet. If they are unable to pronounce some of the
names of the letters of the alphabet, these will be considered as having become
fossilized. In addition, half of all English phonemes are included when the letters
of the alphabet are pronounced (e.g., /b/+/iː/ for B). It is assumed that if /bi:/ for
B is pronounced correctly, the word including /bi/ sounds like beach /biːʧ/ could
be pronounced correctly. Furthermore, none of the previous studies on English
phonetics have dealt with the sounds in the names of the letters of the alphabet.
Therefore, we investigated whether the ICT materials we developed using the al-
phabet could be effective in helping Japanese university students to improve
their English pronunciation of consonants and vowels. Our goal is to demon-
strate how these ICT materials can help both teachers and learners improve their
English and help them with their pronunciation.
Our research questions are as follows:
1. Can ICT self-training help participants improve their pronunciation of the
names of the letters of the alphabet?
2. Is ICT training with a self-video more beneficial to participants than ICT
training without a self-video?
3. What do participants think about the ICT materials provided?
2 Learning pronunciation
2.1 English pronunciation in Japanese education
Japanese education systems have changed drastically due to both historical and
economic reasons. Sasaki (2008) describes a 150-year history of school-based En-
glish education and assessment in Japan, going back to around 1860. Her study
shows how, prior to 1970, learning English was regarded as a unilateral means of
importing foreign culture and knowledge. However, from 1970 to 1990, English
education was influenced by rapid globalization, Japan’s economic growth, and
252 Yuri Nishio, Akiyo Joto
For Japanese learners of English, one of the reasons for the difficulties they experi-
ence with English pronunciation is that the English phonemes are very different
Improving fossilized English pronunciation by simultaneously 253
from those of Japanese. Lado (1957) developed the Contrastive Analysis Hypothesis
(CAH) theory to explain this, which suggests that by comparing a first language
(L1) with an L2, it is possible to predict which pronunciation features will be either
the easiest or the most difficult for the learner to master. Flege’s (1995) Speech
Learning Model (SLM) predicts that if an L2 learner perceives an L2 speech sound
to be similar to a known L1 speech sound, the two sounds will be combined and
assimilated. In contrast, if the L2 sound is perceived as new, then a new category
will be established with properties that may eventually match the properties of the
true L2 sound. Another model, the Perceptual Assimilation Model (PAM) (Best
1995; Best and Tyler 2007), explains that the discrimination of a non-native con-
trast is perceived as assimilated sounds if the phonological equivalent to a native
contrast is perceived.
There have been several studies on how the Japanese perceive vowels that
are similar in English and Japanese. Shimizu (2016) describes the acoustic and
phonetic characteristics of Japanese (L1) and English (L2) vowels produced by
Japanese ESL learners and compares them with those of 11 native English
speakers. He focuses on the first (Fl) and the second (F2) formants of vowels in
both the L1 and the L2 of Japanese ESL learners. The Japanese learners tended
to use their own vowel regions in the vocal tract to produce American English
(AE) vowels, which are similar to Ll sounds. Simizu concludes that they seem
to support the PAM (Best 1995; Best and Tyler 2007) in the way they acquire
their L2 vowels.
Oh et al. (2011) investigated the effect of age of acquisition on first and second
language vowel production by Native Japanese (NJ) adults and children as well as
by age-matched Native English (NE) adults and children. After living in the USA
for one year, the NJ children had more accurate production for English “new” vow-
els, /ɪ/, /ε/, /ɑ/, /ʌ/, and /ʊ/ in a native-like manner, but the NJ adults did not
reach an accurate production.
Lambacher et al. (2005) examined whether a six-week identification training
would be effective in improving the identification and production of the Ameri-
can English (AE) mid and low vowels /æ/, /ɑ/, /ʌ/, /ɔ/, /ɝ/ by native Japanese.
The identification performance of the participants improved after identification
training with feedback, and the training also had a positive effect on their pro-
duction of the targeted AE vowels.
From these studies, as Oh et al. (2011) mentioned, it was evident that native
Japanese children acquired native-like vowels, but native Japanese adults did not
reach the native levels, although six weeks of identification training could have a
positive effect on their production (Lambacher et al. 2005). However, Japanese uni-
versity students in Japan used the same L1 vowel tract regions to produce American
vowels (Shimizu 2016), so we can assume that English vowels are more challenging
254 Yuri Nishio, Akiyo Joto
to acquire because some of the phonemes are quite similar to the Japanese ones,
especially for Japanese adults living in Japan.
English consonants are also different from the Japanese ones. Riney and An-
derson-Hsieh (1993) mentioned that standard Tokyo Japanese includes the conso-
nants /p, t, k, b, d, g, ts, s, z, m, n, ɾ, h, j/, whereas American English had the
following consonants: /p, b, t, d, k, g, f, v, θ, ð, s, z, ʃ, ʒ, ʧ, ʤ, m, n, ŋ, l, r, j, w, ʍ, h/.
Comparing the two inventories, /f/, /v/, /θ/, /ð/, /ʃ/, /ʒ/, /ʧ/, /ʤ/, and /ʍ/ did
not exist among the Japanese consonants.
Regarding the perception of the English consonants, Yamada and Adachi
(1998) studied comprehensive data inquiring about which English phonemes
were difficult to identify. The participants listened to the words, which con-
sisted of the target consonant and vowel /iː/, and distinguished the correct
word. As the following results show, generally, less than 50% of the sounds
were correctly distinguished: /z/ showed an accuracy rate of 52% [misidentified
as /ð/ (23%), and as /ʤ/ (18%)]; /f/ presented an accuracy rate of 37% [misi-
dentified as /ð/ (26%), and as /s/ (20%)]; /θ/ was correct in 37% of the cases
[misidentified as /s/ (30%), and as /ʃ/ (17%)]; /ð/ had an accuracy of 34% [mis-
identified as /z/ (28%), /ʤ/ (12%), and /v/ (11%)]; /v/ was identified correctly
in 29% of the times [misidentified as /ð/ (25%), /z/ (13%), and /b/ (10%)]. The
results of the perception task revealed that no equivalent consonants existed in
Japanese, which made them difficult to distinguish.
Regarding pronunciation, Yamada and Adachi (1999) explained which En-
glish consonants were mispronounced and substituted by Japanese phonemes,
for example, /s/ was substituted by /ʃ/; /f/ by /ɸ/; and /r/and /l/ by /ɾ/. Joto
(2020) found that Japanese learners mispronounced the English fricative /s/
and /ʃ/ as the Japanese fricative /ɕ/. Joto (2009) also investigated how native En-
glish speakers judged the English consonants pronounced by Japanese university
students based on their intelligibility rates. Those getting lower intelligibility
scores than the average (of 2.47, where 3 is the full mark) were /ʤ/ (major) 2.15;
/w/ (wet) 2.01; /ð/ (then) 1.92; /θ/ (thick); /w/ (womb) 1.78; /z/ (zee) 1.73; /j/ yeast
1.55; /w/ (wood) 1.57. The English phonemes which have a similar counterpart
in Japanese, namely /j/ and /w/, were particularly problematic; however,
even when the phonemes in Japanese did not have similar counterparts, the
English phonemes tended to be substituted by the Japanese ones.
Vance (1987) explained the articulatory differences between Japanese and
English, which include: (a) lip rounding, which is weaker in Japanese than in
English; (b) jaw position, which is more open in English than in Japanese; and
(c) a “tongue blade articulator” in Japanese versus a “tongue tip articulator” in
English.
Improving fossilized English pronunciation by simultaneously 255
The results from several studies show that for Japanese learners of English,
not only sounds that are similar to English, but also new sounds that do not
exist in the L2 system, can be considered to be problematic.
What materials should be used for training in pronunciation? Familiar and com-
mon ways consist of having learners listen to individual phonemes in words or
minimal pairs showing the contrasts (Carley and Mees 2020). In our study, we
used the names of the letters in the English alphabet itself as the target for the
pronunciation training, so that A was learned as the diphthong /eɪ/, B as a
consonant+vowel /biː/, etc. The English alphabet is introduced during the early
256 Yuri Nishio, Akiyo Joto
stages of learning, and there are several studies showing that alphabet knowl-
edge of letterforms, e.g. the corresponding sound of the letter A in ‘apple’ is /æ/,
is essential for reading, spelling acquisition, and comprehension of L1 children
(Piasta and Wagner 2010). The teaching of a letter with its corresponding sounds
is called phonics, which is helpful in learning to read (Ehri 2013, 2020).
In Japan, the English alphabet is introduced in the first textbook for third-
year pupils in elementary schools. Teaching pronunciation using the alphabet
can contribute to accurate pronunciation because 24 phonemes, about half of
the total English phonemes, appear when the letters of the alphabet are pro-
nounced (e.g., /eɪ/ for A, /b/+/iː/ for B). There are eight vowels that appear in
the alphabet:/ɛ/ in F, S, X; /ʌ/ in W; /iː/ in B, C, D, G, P, T, V, Z; /uː/ in Q, U and
W; /eɪ/ in A, H, J, K; /aɪ/ in I, Y; /oʊ/ in O; /ɑɚ/ in R. This also applies to the
consonants in English: /b/ in B; /s/ in C, S, X; /f/ in F; /ʤː/ in G, J; /ʧ/ in H; /k/
in K, Q, X; /l/ in L; /m/ in M; /n/ in N; /p/ in P; /j/ in Q, U; /t/ in T; /w/ in Y;
and /z/ in Z. However, Japanese learners tend to replace some of the English
sounds with similar Japanese ones (e.g., Z [zi:]→[ʥi:], A [eɪ]→[e]+[i]).
Additionally, Japanese loan words are used for the letters of the alphabet as
follows: /eː/ as A; /biː/ as B; /ɕiː/ as C; /diː/ as D; /iː/ as E; / eɸ/ as F; /dʑi/ as G;
/eiʧ/ as H; /ai/ as I; /ʥeː/ as J; /keː/ as K; /eɾu/ as L; /emu/ as M; /enu/ as N; /oː/ as
O; /piː/ as P; /kjɯ:/ as Q; /a:ɾu/ as R; /esu/ as S; /tiː/ as T; /jɯː/ as U; /bɯi/ as V,
/dabuɾjɯː/ as W; /ekkɯsɯ/ as X; /wai/ as Y; /dzetto/ as Z. If the Japanese loan
word influences the pronunciation of the English alphabet, Japanese learners
of English will pronounce A as /eː/ instead of /eɪ/. If some of the names of the
letters of the alphabet are pronounced like the Japanese sounds, these sounds
can be considered fossilized because the Japanese learners learned the alpha-
bet a long time before.
Traditionally, there are two major approaches to teaching pronunciation: the in-
tuitive-imitative approach and the analytic-linguistic approach (Celce-Murcia
2001). The intuitive-imitative approach is based on the learner’s ability to imitate
sounds and speech. As one of the factors influencing pronunciation is oral-
mimicry (Purcell and Suter 1980; Thompson 1991), learners with a good ear for
mimicry can acquire the L2 sounds well. Teachers tend to show how to produce
particular segments and suprasegmentals without any explicit instruction and
have students listen to the sounds and repeat them in a traditional teaching way.
Stevick (1978) mentioned that learners were able to copy new sound forms
easily, but three things could cause difficulties for learners in doing so. First,
the learners might overlook some features. In this case, the teacher helped
them by providing a suitable model that was appropriate to their level. Second,
the learners might sound bad to themselves although they were copying well.
Students were very sensitive about their pronunciation when demonstrating
foreign sounds, either in the classroom or in public, so they should be helped
to develop a more positive attitude. Third, learners could become anxious
about making the sounds. In this case, the teacher should not point out the
learners’ errors but should find ways to reduce their anxiety.
258 Yuri Nishio, Akiyo Joto
Several ICT training software applications are available on the Internet, which
has been developed based on second-language speech processing research re-
sults. These programs involve auditory or visual input (pictures of a speaker or
video clips in which target sounds or words are pronounced), which can help
learners improve their L2 pronunciation and speech perception (Hardison 2010).
Auditory-visual integration was considered crucial and was explained by the re-
sults of the McGurk Effect (McGurk and MacDonald 1976), which recognizes that
visual mouth information and the sound together affected the decoding process.
The various types of electronic visual display, such as for viewing amplitude and
pitch and for viewing and measuring the duration and frequency range, were
helpful in improving learners’ pronunciation (Lambacher 2010).
One of the studies relating to Japanese learners was that of Hazan et al.
(2006), which investigated the sensitivity of second language learners to the
phonetic information contained in visual cues when identifying a non-native
phonemic contrast. Spanish and Japanese learners of English were tested on
their perception of /b/-/p/ in three conditions: audio (A), visual (V), and audio-
visual (AV) modalities. The A condition involved listening to the target sounds,
the V condition involved watching video clips of a native speaker’s face, and
the AV condition had them combined. Although the Spanish students showed
better performance overall, both learner groups achieved higher scores in the
AV condition. The same experiment was conducted for /l/-/r/ by Korean and
Japanese learners. Overall, these results show the impact of the learner’s lan-
guage background, although correlations between scores for the auditory and
visual conditions suggest that increasing auditory proficiency in identifying a
non-native contrast is linked with increased proficiency in using visual cues to
the contrast.
Lambacher (2010) reported the use of a CALL tool that utilizes acoustic data
in real-time to help Japanese L2 learners improve their perception and production
Improving fossilized English pronunciation by simultaneously 259
3 Method
3.1 Participants
Grade rd year, th year of university rd year, th year of university
Age of studying Less than years old: , Less than years old: , – years
English – years old: , – years old: , – years old: , – years
old: , – years old: old:
Place of studying Cram school or English school: Cram school or English school:
English , Elementary school: , Elementary school:
Their TOEIC listening and reading scores ranged from 555 to 915, indicat-
ing the Common European Framework of Reference (CEFR) level of B1 to B2
(TOEIC Official HP), so their English levels were intermediate or close to ad-
vanced. They were divided into two groups based on their TOEIC scores: an ex-
perimental group (EX) and a control group (CO). The division into two groups
was based on their demographic variables and the results of their TOEIC scores:
the EX group: n = 10 (females = 9, male = 1), average age = 21, and average
TOECI score = 731; the CO group: n = 10 (females = 7, males = 3), average age = 21,
and average TOECI score 772. The Kruskal-Wallis test was conducted to ensure
that both groups were at the same level of English proficiency (p > .427), and it
confirmed that the groups were at equivalent levels. Most participants had traveled
to several countries, Hong Kong or Thailand, etc., for a short time, from three days
to one week. They were exposed to English on a daily basis because they were tak-
ing several English classes every day, such as English Communication, Reading,
Writing, or Discussion courses. Outside the curriculum, the university provided a
facility called the Global Plaza to encourage students to communicate with foreign
teachers freely. They were asked about their total time of contact with English per
week, including listening, reading, writing, and speaking. The Kruskal-Wallis Test
showed that differences in the duration of their English study were not significant
(listening: p = .967 >.05; reading: p = .539 >.05; writing: p = .902 > .05; speaking:
p = .427 > .05). Both groups were thus considered to have similar backgrounds for
English proficiency and experience.
Seven and eight students in the EX and the CO groups, respectively, took
an English phonetic course, knew how to pronounce the names of the letters of
the alphabet, and also received feedback regarding the pronunciation of the
names of the letters of the alphabet by their English teachers. Therefore, all par-
ticipants’ proficiency levels, English experiences and attitudes, and conditions
for teaching and learning would be the same for the factors that Nation and
Newton (2009) suggested would influence pronunciation. This research was ap-
proved by the ethical board of the university where the first author works, and
all participants consented to participating in the experiment.
3.2 Materials
For the pre-and post-tests, both groups were asked to record videos of themselves
pronouncing the names of the letters of the alphabet from A to Z by using their
cellphones.
262 Yuri Nishio, Akiyo Joto
The EX group and the CO group had two different platforms (see Appendix A).
Both platforms had the native speaker’s video clip seen from three directions
(the front, the side, and a focus on the mouth from the front) giving an explana-
tion of how to pronounce the sound (e.g., “B, B, try to pronounce B by breath-
ing out air on your hand”). The alphabet letter and the corresponding phonetic
symbol were displayed (e.g., B-b /bi:/). For the EX platform, a self-learning
video was shown next to the native speaker’s video. The learner could see his
or her face pronouncing the English simultaneously while watching the native
speaker’s pronunciation on the video and listening to his pronunciation. The
learner then tried to mimic the native speaker’s pronunciation.
The learners pressed each alphabet letter from A to Z once and then pressed
Review to review the material from A to Z again, without stopping.
3.2.3 Questionnaires
After both groups had completed their ICT training, the participants were asked
to fill in a paper-and-pencil type of questionnaire. The questions were as follows:
Q1: Was this PC program useful? Q2: Were the native speaker’s videos helpful?
Q3: Were the explanations of the pronunciation by the native speaker helpful?
Q4: Was your self-video helpful? Q5: Was your self-voice recording helpful? Q6:
Was the IPA (International Phonetic alphabet) useful? Q7: Was this PC program
easy to use? Q8: Which of the contents were most useful? Choose the three items
that were the most helpful in improving your pronunciation and rank them (both
groups had the following options: ‘Native speaker’s video’, ‘Explanations on pro-
nunciation’, ‘Self voice pronouncing’ and ‘the IPA.’ The EX group only had ‘Your
self-video’ as an additional option).
3.3 Procedure
pausing for one second between the alphabet letters. Each cellphone had a
high-tech camera and a high-quality sound recording system, so the partici-
pants recorded their voices pronouncing the alphabet using the movie app in
their own camera and then sent the video clip to the author via e-mail or the
LINE app. After that, while sitting at a PC, the members of the two groups stud-
ied their specific materials on the ICT site individually for about 30 minutes.
After the ICT training, the CO and EX participants all recorded their pronuncia-
tions of the alphabet as a post-test. Finally, they answered the questionnaire
about the usefulness of and their satisfaction with the ICT material.
In order to allow the analyses of the sounds of the names of the letters of the
alphabet, two male native speakers of American English were asked to pro-
nounce the alphabet so that their productions could be compared with the Jap-
anese speakers’ pronunciation. The productions of the Japanese and American
speakers were digitally recorded and saved in a wave file format on a computer.
These speech materials were listened by the two authors, who were trained to
transcribe the sounds in the IPA, and the two authors’ inter-rater reliability was
shown to be high by Cronbach’s coefficient alpha, which was .865. Addition-
ally, we examined the sounds using Praat.
4 Results
4.1 The effectiveness of the ICT training
The results of the pre-and post-tests for both the EX Group and the CO Group were
described using the IPA (see Appendix B). The total number of samples was 1040
(26 letters of the alphabet for 20 participants for both pre-and post-tests). To examine
the differences between the pre-and post-tests of the EX and CO groups, a two-way
repeated-measures ANOVA was conducted (see Table 2) using SPSS 25. Statistically
significant differences between the pre-and post-tests were found following the ICT
training for both conditions: the EX group (with self-videos) and the CO group (with-
out self-videos) [F(1, 18) =14.96, p <.001]. The effect size was 0.454, which means it
was very large. In terms of the differences between the EX group and the CO group,
the results showed no significant differences [F(1, 18) = .316, p = .581 > .05], and the
effect size was 0.017, indicating it was small. The interaction between the groups
264 Yuri Nishio, Akiyo Joto
and the tests was not statistically significant [F(1, 18) = 2.10, p = .105 > .05], but the
effect size was medium (0.164). These results show that the ICT training was effec-
tive in improving the participants’ pronunciation of the alphabet, as based on the
improvements between the pre-and post-tests. However, the results of the EX group
with its self-video training and the CO group without self-video training were not
seen as statistically different.
Considering the results shown in Appendix B, which shows the participants’
productions, there were differences in the difficulty experienced by the EX and CO
groups. Based on the percentages obtained in the pre-and post-tests for both
groups, the alphabet sounds fell into four categories of correctness: 100%–80%
(B, D, E, I, M, Q, S, U), 80%–50% (A, C, F, K, N, O, T), 50%–30% (L, P, X), and
30%–0% (R, V, W, Y, Z). In quantitative research, the mean scores are examined
to compare one condition with another. As Table 2 shows, the difference between
pre-and post-tests was statistically significant for both groups, though the differen-
ces between ICT with and without a self-video were not significant. That means
the self-learning system, regardless of whether it includes a self-video or not,
could prove to be helpful in improving English alphabet pronunciation. However,
some of the names of the letter of the alphabet were quite well pronounced even
before the ICT training, such as B, D, E, I, M, Q, S, and U. The pronunciations of
the names of other letters were found more difficult, and there were both similari-
ties and difficulties in the improvement of the EX and CO groups (see Appendix B).
Regarding improvement in the individual sounds, we investigated whether
a learner would improve more on a particular alphabet letter when using the
ICT material with a self-video, as the EX group did, or using the ICT material
without a self-video, as the CO group did. In the following section, we will ex-
amine which letters of the alphabet improved most in each of the groups.
Table 2: Means, Standard Deviation, and Two-Way ANOVA Statistics for Pre- and Post-tests
between EX and CO Groups.
M SD M SD Effect F ratio df η
Note: N = 10. ANOVA = analysis of variance, G = group, Pre&Post = pre- and post-tests
✶✶✶
p < .001
Improving fossilized English pronunciation by simultaneously 265
A /eɪ/ eː eɪ
B /biː/ ― ― ― ― ―
C /siː/ ɕiː siː
F/ɛf/ eɸ ɛf
G /ʤiː/ ʥiː ʤiː
H /eɪʧ/ eiʨ eɪʧ
J /ʤeɪ/ ʥei ―
266 Yuri Nishio, Akiyo Joto
Table 3 (continued)
A /eɪ/ eː eɪ
B /biː/ vː bː
C /siː/ ɕiː ―
F/ɛf/ ɛf ɛf
G /ʤiː/ ʥiː ʤiː
H /eɪʧ/ eiʨ eɪʧ
J /ʤeɪ/ ʥei ʤeɪ
K [kʰeɪ] keː [kʰeɪ]
L /ɛl/ eɯ el
N /ɛn/ ― ― ― ― ―
O /oʊ/ oː oʊ
P [pʰiː] [piː] [pʰiː]
R /ɑɚ/ aːɯ ɑɚ
S /ɛs/ eɵ ɛs
Improving fossilized English pronunciation by simultaneously 267
Table 4 (continued)
Generally, some alphabet letters had a high score, which meant a high level of cor-
rect pronunciation before the training with the ICT. These included B, D, E, I, M,
Q, S, and U, which contained the following vowels: /iː/ in B, D, E; /aɪ/ in I; /ɛ/
in M and S; and /uː/ in Q and U. These vowels show similar sounds between the
Japanese sounds for the names of the alphabet letters and the English sounds. For
example, the Japanese pronounce B /biː/, D /diː/, E /iː/, I /ai/.
Regarding the other alphabet letters containing /iː/, as in C, G, P, T, V, and Z,
they were not correctly pronounced. The tense vowel /iː/ tends to be substituted by
the Japanese /iː/ in pronunciation, and although the sounds are quite similar, the
pronunciation problems of those alphabet letters were not due to the vowel but to
the consonants they contained. Furthermore, the English diphthong /eɪ/ is pro-
duced differently in Japanese learners’ pronunciation. The Japanese A, J, and K are
pronounced /eː/, /ʥeː/, /keː/, respectively. In the pre-test, two learners in both the
EX and CO groups did not pronounce this diphthong correctly, but in the post-test,
they pronounced /eɪ/ correctly. In contrast, J was not pronounced appropriately:
the vowel /eɪ/ was pronounced accurately, but the consonant was not correctly
produced. This problem will be discussed in the next section.
268 Yuri Nishio, Akiyo Joto
The diphthong /aɪ/ was less difficult to pronounce. The letter I was pro-
nounced perfectly in both the pre-and post-tests by both the EX and the CO
groups. Although the /aɪ/ in Y was pronounced correctly, the consonant /w/
was not pronounced accurately. The consonants /ʤ/, /w/, and /k/ will be dis-
cussed in the following section.
The diphthong /oʊ/ showed slightly different improvement rates: the im-
provement rate in the EX group was 100% (5 out of 5), but the CO showed an
improvement rate of 50% (2 out of 4). The participants in the EX group may have
learned its production after having seen their self-video pronouncing the /oʊ/,
which is more rounded than the Japanese /oː/.
The short vowel /ɛ/ is found in the letters S and X. The former was pro-
nounced correctly in the pre-and post-test by both the EX and the CO groups.
The /ɛ/ in X was pronounced correctly, but the consonant /k/ was influenced
by the Japanese geminate /kk/, as seen in some productions of [ekks].
The most difficult letter was R /ɑɚ/, which showed a lot of variation in the
sounds used, such as [aːɾ], [aːɾɯ], [a˞ː], and [aː]. The Japanese sound inventory
does not have either /r/ or /l/, so Japanese learners tend to assimilate and sub-
stitute the Japanese sound /ɾ/ for both /r/ and /l/. Additionally, the English [ɑ]
is an open back unrounded vowel, but the Japanese counterpart is /a/, which is
an open front unrounded vowel. The English sound /r/ is produced by the
tongue-tip curling slightly upward toward the rear part of the alveolar ridge
(Carley 2020; Carley and Mees 2020). The Japanese participants tried to make
the /r/ sound by curling up the tip of the tongue, but their sounds resulted
in /aːɹ/ with the Japanese /aː/ followed by the consonant /ɹ/ instead of the rhotic
vowel sound /ɚ/, which was different from the English diphthong /ɑɚ/. Generally
speaking, the vowel pronunciation of the names of the letters, except for /ɑɚ/, was
fairly well understood for each letter. Some errors were reduced following the ICT
training for both the EX and CO groups.
A variety of consonant sounds can be found in the names of the letters of the
alphabet: B /b/, C /s/, D /d/, F /f/, G and J /ʤ/, H /ʧ/, K /k/, L /l/, M /m/, N /n/,
P/p/, Q and U /j/, S /s/, T/t/, V /v/, Y /w/, and Z /z/. We will focus on the prob-
lematic consonants: the stops, the fricatives, the affricates, and the approxi-
mates, followed by the three-syllable alphabet letter W.
Improving fossilized English pronunciation by simultaneously 269
4.4.1 Stops
There are several contrasts between voiceless stops and voiced stops, such as
P/p/ and B/b/, and T/t/ and D/d/. Additionally, a voiceless stop should be aspi-
rated at the beginning of the word, and whether the stop consonants are aspi-
rated or not is a crucial aspect in the acquisition of L2 English. Therefore, the
VOT (voice onset time) was measured in Praat in order to check whether the aspi-
ration in voiceless stops would be long enough. According to Lisker and Abram-
son (1964), as defined by their VOT values, voiceless unaspirated stops present
VOT values from 0 to 25 ms, while VOT values in voiceless aspirated stops range
from 60 to 100 ms. The pronunciation of the stop consonants /p/, /t/, /k/ tends
to be difficult for Japanese L2 learners because /p/, /t/, /k/ in the Japanese inven-
tories are not aspirated in the assigned beginning of a Japanese word. Therefore,
the participants were assumed to pronounce /p/, /t/, /k/ in English without any
aspiration, even at the beginning of the word. Table 5 shows that the VOT is
shorter in the Japanese counterparts of the English consonants. From these con-
sonants pronounced by the Japanese, /k/ showed a relatively good performance,
whereas /p/ and /t/ were especially difficult. As for /p/, seven participants in the
EX group and four in the CO group did not pronounce these correctly in the pre-
test (see Tables 3 and 4), but after the training, in the post-test, 50% (2 of 4) in
the CO group and 71% (5 of 7) in the EX group improved. Regarding VOT, /p/ in
the EX was longer as we compared the pre-test with the post-test, from 24 ms to
64 ms (40 ms longer), and /p/ in the CO also improved, from 46 ms to 78 ms (32
ms longer). The alveolar /t/ was the most similar consonant in the EX and CO
groups. For this consonant, 50% (3 of 6) in the EX improved, and 50% (1 of 2) in
the CO group also showed improvement. The VOTs also became longer (49
ms→74 ms in the EX group and 58→84 ms in the CO group). From this perspec-
tive, the training, both with a self-video and without a self-video, can contribute
to improvements in the aspiration of the stop consonants based on the VOTs, but
only /p/ for the improvement rates of the EX group was found that the ICT with a
self-video had slightly advantages.
The voiced stop consonants /b/, /d/ were pronounced relatively well. How-
ever, one instance of /b/ was pronounced with the upper teeth touching the
lower lip but with no friction: the place of articulation was the same as /v/, but
the manner was the same as that of a stop sound. Additionally, in the produc-
tion of /d/, the blade of the tongue reached a wide area of the alveolar ridge,
which is a similar pronunciation to that in Japanese. Further analyses are still
needed to clarify this.
270 Yuri Nishio, Akiyo Joto
Table 5: VOT (Voice Onset Time) in EX group, CO group, and English native speakers.
Ex Co Native
M SD M SD M SD M SD M SD
K /keɪ/ . . . . . . . . . .
P /piː/ . . . . . . . . . .
T /tiː/ . . . . . . . . . .
4.4.2 Fricatives
4.4.3 Affricates
The affricates /ʧ/, in H, and /ʤ/, in G and J, were also problematic. The partic-
ipants in both groups pronounced the Japanese sounds [ʨ] for /ʧ/ and [ʥ], [ʣ]
Improving fossilized English pronunciation by simultaneously 271
for /ʤ/. Surprisingly, no one in either group pronounced them correctly in the
pre-test. After the training, the improvement rate for /ʧ/ in H was 30% in the EX
group and 10% in the CO group. Likewise, for /ʤ/ in G and J, the improvement
rate was 30% in the EX group and 10% in the CO group. Therefore, the EX
group was able to use the articulation information more and mimic the sounds
better than the CO group.
4.4.4 Approximants
Approximants /l/ in L and /w/ in Y were also very challenging phonemes. L should
be pronounced as [ɛɬ], but only a few participants, namely three in the CO group,
could pronounce the dark /l/, while the other participants could not pronounce
the dark /l/ at all. However, the participants were regarded as having proper pro-
nunciation if they pronounced /l/, and not the Japanese /ɾ/.
The /w/ in Y was also very difficult. The Japanese do not have a /w/ sound
and they usually substitute it with the Japanese /ɰ/. The participants in the EX
group tried to make their lips rounded, but it was not enough. The sound did
not improve at all.
W [dʌbɫjuː] is the only word with three syllables. The Japanese W is pronounced
as [dabɯɾjɯ], so this pronunciation appeared three times in the pre-test in both
the EX and CO groups. The post-test for the EX group was even worse, as it was
pronounced incorrectly seven times. No progress was shown for either group. The
learners might not know that W is a three-syllable word; besides, the dark /l/ in
W [dʌbɫju] and the consonant clusters [bɫ] were difficult for the Japanese learners.
Table 6 shows the results of the questionnaire asking what the participants in
both groups thought about the ICT training program. Questions 1 through 8
were the same for both groups, with the exception of Q4, because the ICT train-
ing program for the CO group did not include a self-video. Regarding Q8, the
participants identified which content they thought was useful, choosing three
of the alternatives, but for the CO group, the item ‘your self-video’ was not in-
cluded. Although one participant in the EX group gave a slightly lower score
272 Yuri Nishio, Akiyo Joto
(the average score was 2.57 out of 5 for eight items) than the others, the other
participants from both groups gave positive answers for all the items. In gen-
eral, they found the ICT program useful, and the native speaker’s videos and
explanations were considered to be very helpful. The EX group also found the
self-video useful. The self-pronouncing voice was more beneficial for the CO
group than for the EX group. The IPA was less helpful than the other items. The
most useful items of the content were the explanations on how to pronounce,
the native speaker’s video, and, for the EX group, the self-video. The CO group
listed the native speaker’s videos, the explanation on how to pronounce, and
the self-pronouncing voice as most helpful.
Table 7 shows all suggestions made by the participants about improving
the ICT materials. Several comments suggested that the video clips of the alpha-
bet should be a little long and that if the video clips could appear automati-
cally, without clicking, the time could become shorter than the original ones.
One participant would prefer to listen to a female voice.
Table 8 shows all other comments from the participants. For both groups,
some participants were surprised that they did not know how to pronounce the
alphabet, and they found the W to be especially difficult. The explanation of
the pronunciation by the native speaker was regarded as very easy to under-
stand and useful. For the EX group, the self-video was very useful for compari-
son with the native video clips, while the CO group realized how important it
was to watch the native speaker’s mouth carefully and mimic the sounds. Their
comments indicated that their awareness of pronunciation would increase fol-
lowing the ICT training.
Table 6: Satisfaction and usefulness of ICT materials in both the EX and the CO groups.
M SD M SD
Q Were the native speaker’s videos helpful? . . . .
Q Was the explanation of the ways of pronunciations by the native . . . .
speaker helpful?
Table 6 (continued)
M SD M SD
The IPA
EX Group CO Group
– I want to listen to a female’s voice as – Every time I have to click to watch a video
well as a male’s voice. clip, it might be better if a video clip
– It would be better if my pronunciation could be played back.
could be judged with a score. – The training was a bit too long.
– It would be better if learners could turn – One set of the alphabet was a bit too
the microphone on and off more easily. long.
– Pauses between video clips would allow – It might be more effective if the
for more practice. explanation were shorter.
– If I could hear my own voice clearly and – It might be useful to learn not only the
naturally, it would be better. alphabet but also some vocabulary.
274 Yuri Nishio, Akiyo Joto
EX Group CO Group
– I was shocked by how unaware I was – I felt that making sounds that do not
regarding the openness of the mouth and exist in Japanese was challenging.
the placement of the tongue. – I was able to learn how to pronounce
– I was able to understand how to English sounds, and this would be
pronounce English sounds accurately helpful in conversations in English.
when I watched the movement of the – I have not practiced the pronunciation of
mouth and the tongue of the native the alphabet since I was an elementary
speaker from the front and the side student. At that time, I watched the
angles. native English teacher’s mouth carefully
– I did not understand all of the phonetic and tried to mimic his articulation. This
symbols, so it would be difficult to base helped me improve my pronunciation.
my pronunciation exclusively on the – I found that the English sounds could be
reading of phonetic symbols. improved if I became conscious of my
– The explanations of the native speaker pronunciation.
were easy to understand, such as “the – I think watching the native speaker’s
movement of your mouth when you eat a mouth very carefully is the key to
very sour pickled plum [umeboshi].” improving pronunciation. Watching and
– This ICT program was very mimicking – more than listening to the
straightforward and useful. I could watch native speaker’s sounds – encourage me
videos and listen to the sounds. I want to to practice pronunciation.
continue using it several times a day. – The native speaker’s explanation was
– “W” was more difficult than I expected. straightforward This ICT program is very
Hearing my own voice and comparing it effective for practicing pronunciation. I
with the native speaker’s voice was very wish I could have learned with this
useful. program when I was in elementary
– I was able to compare my video with the school.
native speaker’s video simultaneously, – I was surprised I had not watched the IPA
and I tried very carefully to mimic the presented on the screen. I also found out
native speaker’s articulation. Then I how often my pronunciation was
found my pronunciation was getting inaccurate. I gained a better
better. understanding by watching the native
speaker’s video, and I was able to
improve my pronunciation greatly by
opening my mouth wider.
– It was very easy to understand how to
pronounce English alphabet because I
was able to watch how to pronounce it
from different angles.
Improving fossilized English pronunciation by simultaneously 275
5 Discussion
This paper aimed to investigate whether an ICT training system could help
Japanese learners of English improve their pronunciation of the names of the
letters of the English alphabet. The names of the letters of the alphabet con-
tain about half of all English phonemes, and they are introduced at the begin-
ning of the learning process, so they have been known for a long time. If some
of the phonemes still cannot be pronounced correctly, they are regarded as
having fossilized.
Based on the results from the pre-and post-tests, both the EX group, who
learned from the ICT and with a self-video, and the CO group, who learned
without the self-video, improved their pronunciation. Those results prove that
the answer to Research Question 1 is positive, and the participants felt the ICT
was useful. Research Question 2, regarding whether the ICT training with a self-
video or without a self-video would be more beneficial to participants, did not
show a statistically significant difference. However, there were some differen-
ces between the two groups. From the vowels, /oʊ/, as in O, could not be pro-
nounced correctly by half of the participants in both the EX and CO groups in
the pre-test. After the training, 100% of the learners in the EX group (5 of 5)
pronounced it correctly, and 50% of the students in the CO group (2 of 4) im-
proved their pronunciation. The self-video helped the EX group to understand
the roundness of the lips better than the CO group. On the other hand, the pro-
duction of R /ɑɚ/, which was challenging, improved to 50% in both the EX and
CO groups. Lambacher et al. (2005) also investigated improvements in the iden-
tification and production of /ɑ/ after a six-week identification training period,
though, according to Oh et al. (2011), native Japanese adults during a one-year
stay in America showed lower accuracy rates in the pronunciation of /ɑ/. In our
study, the unrounded open back vowel /ɑ/ was replaced by the Japanese un-
rounded open vowel /a/, as in Shimizu’s (2016) study, as Japanese students
tend to substitute Japanese sounds for American English sounds, and this
seems to confirm the PAM (Best 1995; Best and Tyler 2007).
Regarding /iː/ and /ɪ/, these two vowels are also very challenging for Japanese
native speakers because the Japanese contrast the long and short lengths of /iː/
and /i/, and not the vowel quality (Heidlmayr, Ferragne, and Isel 2021). Heidl-
mayr, Ferragne, and Isel investigated Japanese adults’ hearing abilities, testing
them two times (one week after starting living in Canada and one year later), but
their pronunciation did not become similar to that of native English speakers. Shi-
mizu (2016) mentions that the F1 and F2 formants of English, /iː, ɪ, ɛ, æ/, as pro-
nounced by five male university students, were similar to those of a native
male English speaker, but the formants of six female university students were
276 Yuri Nishio, Akiyo Joto
still different from those of a native female English speaker. We could conclude
that the /iː/ in the productions of B, C, D, E, etc. by Japanese speakers was similar
to that of English speakers, based on the perceptions of the authors and the anal-
ysis of the F1 and F2 formants. However, we need further analyses to compare
the Japanese versions with their English counterparts. For now, we may conclude
that the EX intervention was able to help participants improve /oʊ/ in O, and
that both ICT training could help to improve /ɑɚ/ to some extent.
Regarding the consonants, the voiceless stops /p, t, k/ were improved by both
types of ICT training, though the production of /t/ in the EX group with a self-video
seems scarcely advantageous for highlighting aspiration. The fricatives /s, z, f, v/
were also difficult phonemes because these sounds are similar to the Japanese
phonemes /ɕ, ʣ, ʥ, ɸ, b/, respectively, so they were easily substituted. This can
be explained by the SLM (Flege 1995). After the training, /s/ and /f/ improved,
but /z/ and /v/ did not. The EX group had more benefits in the production of the
four fricatives than the CO group. The most difficult were the affricates /ʧ/ and /ʤ/,
whose articulation should be done with rounded lips and intense friction. For
this information about articulation, the ICT with a self-video can be extremely
useful for learners.
The approximants /l/ and /w/ were very challenging phonemes, especially
the production of the dark /l/, because the learners do not tend to learn the dis-
tinction between the two kinds of /l/, the light /l/ and the dark /l/ at school, so
they do not know where the tongue should be positioned. /w/ in Y was substi-
tuted by the Japanese /ɰ/, which is not rounded. The three-syllable word W was
incredibly difficult, and even after the ICT training, no progress was shown.
Regarding all the above-mentioned consonants, Yamada and Adachi, in a
study on perception (1998) and another study (1999) on production, reported
that these consonants were problematic. This was also shown by Joto (2009),
who indicated that these phonemes had low intelligibility ratings. Even the chal-
lenging phonemes /ʧ/ and /ʤ/ showed progress when using the ICT with a self-
video, so we can suggest that for the sounds whose mouth movements are more
clearly visualized, the self-videos are the most powerful aid for learners. First,
they watch the native speaker’s video and understand how to pronounce the
sounds, and then they can compare their own production with the native articu-
lation by using the self-videos in real-time. They can also gain awareness of pro-
nunciation by paying careful attention to the articulation. With regard to the
participants’ ad-hoc questionnaires, learners mentioned that they were satisfied
with both ICT training programs, and they commented that both the native vid-
eos and the explanations were very useful. Regarding the ICT with self-videos,
the students recognized that the self-videos were beneficial for improving their
Improving fossilized English pronunciation by simultaneously 277
pronunciation. As Purcell and Suter (1980) insisted, concern for accuracy is one
of the important predictors of improving pronunciation.
The ICT training for both groups improved their pronunciation according to
the pre- and post-tests. Although the results for the EX and CO groups did not
differ in terms of improvement rates or number of improvements, we can con-
clude that the EX group benefitted more than the CO group because participants
in the EX group could see their self-videos and check their mouth movements,
simultaneously comparing them with the native speaker’s.
Regarding pedagogical implications, Nation and Newton (2009) suggested
that teachers could understand the influence of the L1 by becoming familiar with
the sound system of the learners’ first language and thus gaining ideas for creat-
ing the effort and attention needed to bring about the desired changes. As Couper
(2006) mentioned, appropriately focused instruction could lead to changes in
learners’ phonological interlanguage even when this might appear to have be-
come fossilized. We strongly suggest that teachers give instruction regarding pro-
nunciation systematically and regularly and hold that most fossilized phonemes
could be changed.
An additional way to use this ICT system is to apply bottom-up and top-
down training, as the system now provides evaluations on vowels, consonants,
syllables, rhythms, intonations, and individual sounds. Alphabet training in
this research is regarded as a top-down type of training. Learners can practice
at the Alphabet site and learn how to pronounce the alphabet, where each let-
ter contains a consonant plus a vowel (B /bi:/) or just a vowel (A/eɪ/). If they
find that some phonemes are difficult to pronounce or if they do not recognize
how to pronounce them, they can access the Vowels and Consonants sites to
make sure they can make these sounds correctly. Learners can access each site,
moving back and forth, and then increase their practice on their own. In con-
trast, for the bottom-up training, the learners access the Vowels and Conso-
nants sites first to learn the segments and then access the other site to learn to
put into practice a variety of words, phrases, and sentences, such as the alpha-
bet, rhythms, intonations, and sound changes. These features of ICT training
can improve the learners’ self-learning autonomy.
Several developments are needed in this ICT self-learning system to improve
the vowels because the explanations for the vowels are less than those available
for the consonants. Furthermore, the native speaker’s mouth movements cannot
show the inside of the mouth and how the tongue is moving up and down or
forward and back. As one participant commented, “It would be better if I could
see the inside of the native speaker’s mouth.” Pennington and Roger-Revell
(2019) reviewed the technologies currently available for teaching pronunciation,
focusing on feedback about their usefulness and limitations. Based on these
278 Yuri Nishio, Akiyo Joto
results, we acknowledge that the ICT alphabet training site does not provide real-
time feedback, so we have been developing a feedback system in which the utter-
ances in the intonation or rhythm training sites within this self-learning training
system are immediately transcribed into text. If the effectiveness of the real-
feedback system is proven, we will also put the feedback system on the alphabet
training site.
6 Conclusions
Although the alphabet is introduced in the early stages of learning, students
seldom really learn the specific sounds of the names of the letters. In our study,
after practicing for short periods, such as 30 minutes, the participants showed
an improvement in their pronunciation. This training provides insights in help-
ing learners to recognize how to articulate sounds. Noticing and recognizing
how to articulate is essential, as is emphasized by Purcell and Suter (1980). The
participants’ pronunciation of the consonants improved through the ICT self-
learning system, using a self-video.
We also suggest this system can be used for teachers as well as learners. In
particular, elementary school teachers who teach English pronunciation to
young learners will be able to prevent them from developing fossilized sounds
that are affected by their native Japanese.
Note:
URL for the ICT system:
https://npl-mock.glexa.net/intonation
Improving fossilized English pronunciation by simultaneously 279
Appendix A
The ICT platform for the EX group
Alphabet Native Pronunciation Numbers (%) Pronunciation Numbers (%) Pronunciation Numbers (%) Pronunciation Numbers (%)
A eɪ eɪ eɪ eɪ eɪ
eː eː eː eː
Yuri Nishio, Akiyo Joto
F ɛf ɛf ɛf ɛf ɛf
ɛf✶ ɛf✶ ɛf✶ ɛf✶
eɸ eɸ eɸ eɸ
Note: ef✶ indicates /f/ with no friction, k✶eː, k✶eɪ indicates the VOT of /k/ shows less than 40 ms.
Improving fossilized English pronunciation by simultaneously
281
EX—Pre EX-Post CO-Pre CO-Post
282
Alphabet Native Pronunciation Numbers (%) Pronunciation Numbers (%) Pronunciation Numbers (%) Pronunciation Numbers (%)
O oʊ oʊ oʊ oʊ oʊ
oː oː o:
R ɑɚ ɑɚ ɑɚ ɑɚ ɑɚ
Yuri Nishio, Akiyo Joto
S ɛs ɛs ɛs ɛs ɛs
eθ eθ
Note: piː indicates the VOT of /p/ shows less than 25 ms. a˞ː✶ indicates rhotic /a/. t✶iː indicates the VOT of /t/ shows less than 35ms. v✶iː indicates
✶
/v/ with no friction, which sounds like /b/. In W, there is a variety of wrong sounds, especially/ð/ and /v/ with no friction.
Improving fossilized English pronunciation by simultaneously
283
284 Yuri Nishio, Akiyo Joto
References
Acton, William. 1984. Changing fossilized pronunciation. TESOL Quarterly 18(1). 71–85.
Best, Catherine T. 1995. A direct realist view of cross-language speech perception. In Winifred
Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-language
Research, 171–204. Timonium, MD: York Press.
Best, Catherine T. & Michael D. Tyler. 2007. Non-native and second-language speech
perception: Commonalities and complementarities. In Ocke-Schwen Bohn & Murray
J. Munro (eds.), Language Experience in Second Language Speech Learning: In honor of
James Emil Flege, 13–34. Amsterdam: John Benjamins.
Carley, Paul & Inger M. Mees. 2020. American English Phonetics and Pronunciation Practice.
New York: Routledge.
Celce-Murcia, Marianne. 2001. Teaching English as a Second or Foreign Language, 3rd ed.
Boston: Heinle & Heinle Publisher.
Couper, Graeme. 2006. The short and long-term effects of pronunciation instruction. Prospect
21(1). 46–66.
Ehri, Linnea C. 2013. Orthographic mapping in the acquisition of sight word reading, spelling
memory, and vocabulary learning. Scientific Studies of Reading 18(1). 5–21. https://doi.
org/10.1080/10888438.2013.819356
Ehri, Linnea C. 2020. The science of learning to read words: A case for systematic phonics
instruction. Reading Research Quarterly 55(S1). S45–S60.
Flege, James E. 1995. Second language speech learning theory, findings, and problems. In
Winifred Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-
Language Research, 233–277. Timonium: York Press.
Gass, Susan M. & Larry Selinker (eds). 1992. Language Transfer in Language Learning:
Revised edition. Amsterdam: John Benjamins.
Han, ZhaoHong & Terence Odlin (eds). 2005. Studies of Fossilization in Second Language
Acquisition. Clevedon: Multilingual Matters. https://doi.org/https://doi.org/10.21832/
9781853598371
Hardison, Debra M. 2010. Visual and auditory input in second-language speech processing.
Language Teaching, 43(1). 84–95. https://doi.org/DOI:10.1017/S0261444809990176
Hazan, Valerie, Anke Sennema, Andrew Faulkner, Marta Ortega-Llebaria, Midori Iba, &
Hyunsong Chung. 2006. The use of visual cues in the perception of non-native consonant
contrasts. The Journal of the Acoustical Society of America, 119(3). 1740–1751.
https://doi.org/doi:10.1121/1.2166611
Heidlmayr, Karin, Emmanuel Ferragne & Frederic Isel. 2021. Neuroplasticity in the
phonological system: The PMN and the N400 as markers for the perception of non-native
phonemic contrasts by late second language learners. Neuropsychologia 156. 107831.
https://doi.org/10.1016/j.neuropsychologia.2021.107831
Jarosz, Anna. 2019. English Pronunciation in L2 Instruction: The case of Secondary School
Learners. Cham: Springer.
Joto, Akiyo. 2009. komyunikeshon noryoku wo koryo shita nihongobogowasha no eigoonsei ni
kansuru hokatsutekikenkyu [A comprehensive study of English sounds produced by
native speakers of Japanese from the perspective of communicative ability]
Kagakukenkyuhi Hojokin Kenkyuseika Hokokusho [Kaken Research Report].
Improving fossilized English pronunciation by simultaneously 285
Joto, Akiyo. 2020. Intelligibility and acoustic features of the English fricatives /s/ and /ʃ/
produced by native speakers of Japanese. Nihon Gengo Onsei Gakkai [JALS Japan
Association of Language and Speech] 2. 39–54.
Joto, Akiyo, Misuzu Miyake & Yuri Nishio. 2017. Shogakko eigokatsudo ni shisuru
hatsuonshido manyuaru no sakuseini mukete: Eigohatsuon shido no jittaichosa to
kyokashobunseki wo motoni [Toward the development of a teacher’s manual for teaching
English pronunciation in elementary school English activities: based on a questionnaire
survey of English sound education and an analysis of English textbooks]. JACET Chugoku-
Shikoku Chapter Research Bulletin 14. 143–160.
Kachru, Braj B. 1985. Standards, codification and sociolinguistic realism: The English
language in the Outer Circle. In Randolph Quirk & Henry George Widdowson (eds.),
English in the World: Teaching and Learning the Language and Literature, 11–30.
Cambridge: Cambridge University Press.
Kokusaikoryukikin. 1989. Kyoshiyo nihongohandobukku hatsuon kaiteiban [Japanese
handbook for teachers, pronunciaiton, revised]. Tokyo: Bonjinsha.
Lado, Robert. 1957. Linguistics Across Cultures. Ann Arbor: University of Michigan Press.
Lambacher, Stephen G. 2010. A CALL tool for improving second language acquisition of
English consonants by Japanese learners. Computer Assisted Language Learning 12(2).
137–156. https://doi.org/10.1076/call.12.2.137.5722
Lambacher, Stephen G., William L. Martens, Kazuhiko Kakehi, Chandrajith A. Marasinghe &
Garry Molholt. 2005. The effects of identification training on the identification and
production of American English vowels by native speakers of Japanese. Applied
Psycholinguistics 26(2). 227–247.
Levis, John M. 2018. Intelligibility, Oral Communication, and the Teaching of Pronunciation.
Cambridge: Cambridge University Press.
Lisker, Leigh & Arthur S. Abramson. 1964. A cross-language study of voicing in initial stops:
Acoustic measurements. Word 20(3). 384–422.
McGurk, Harry & John MacDonald. 1976. Hearing lips and seeing voices. Nature 264. 746–748.
Major, Roy C. 1987. Foreign accent: recent research and theory. International Review of
Applied Linguistics in Language Teaching 25(3). 185–202.
MEXT. 2015. Heisei 26 nendo, shoggako gaikokugokatsudo jisshijokyochosa no kekka no gaiyo
[The survey of elementary English education, 2014]. https://www.mext.go.jp/
a_menu/kokusai/gaikokugo/1362148.htm
MEXT. 2017a. Shin kyoshoku katei koa karikyuramu [New teacher training course core
curriculum]. https://www.mext.go.jp/component/b_menu/shingi/toushin/__icsFiles/
afieldfile/2017/11/27/1398442_1_3.pdf
MEXT. 2017b. Shogakko shin gakushushidoyoryo kaisetsu gaikokugo katsudo gaikokugo hen
[Course of Study guides, Foreign language activities]. https://www.mext.go.jp/content/
20201029-mxt_kyoiku01-100002607_11.pdf
Nation, I. S. P. & Jonathan Newton. 2009. Teaching ESL/EFL Listening and Speaking. New York:
Routledge.
Oh, Grace E., Susan Guion-Anderson, Katsura Aoyama, James E. Flege, Reiko Akahane-Yamada
& Tsuneo Yamada. 2011. A one-year longitudinal study of English and Japanese vowel
production by Japanese adults and children in an English speaking setting. Journal of
Phonetics 39(2). 1–25. https://doi.org/doi:10.1016/j.wocn.2011.01.002.
Pennington, Martha C. & Pamela Rogerson-Revell. 2019. English Pronunciation Teaching and
Research. London: Palgrave Macmillan.
286 Yuri Nishio, Akiyo Joto
Piasta, Shayne B. & Richard K. Wagner. 2010. Learning letter names and sounds: Effects of
instruction, letter type, and phonological processing skill. Journal of Experimental Child
Psychology 105(4). 324–344. https://doi.org/10.1016/j.jecp.2009.12.008.
Purcell, Edward D. & Richard W. Suter. 1980. Predictors of pronunciation accuracy: a
reexamination. Language Learning 30(2). 271–287.
Riney, Tim & Janet Anderson-Hsieh. 1993. Japanese pronunciation of English. JALT Journal 15(1).
21–36.
Sasaki, Miyuki. 2008. The 150-year history of English language assessment in Japanese
education. Language Testing 25(1). 63–83. https://doi.org/10.1177/0265532207083745.
Selinker, Larry. 1972. Interlanguage. International Review of Applied Linguistics 10. 209–231.
https://doi.org/https://doi.org/10.1515/iral.1972.10.1-4.209.
Shimizu, Katsumasa. 2016. Nihonjin gakushusha ni yoru eigoboin no shutoku ni tsuiteno
kosatsu [A study on the acquisition of English vowels by Japanese ESL Learners]. JACET
Chubu Journal 14. 51–62.
Smotrova, Tetyana. 2017. Making pronunciation visible: Gesture in teaching pronunciation.
TESOL Quarterly 51(1). 59–89.
Stevick, Earl W. 1978. Toward a practical philosophy of pronunciation: Another view. TESOL
Quarterly 12(2). 145–150.
Thompson, Irene. 1991. Foreign accents revisited: The English pronunciation of Russian
immigrants. Language Learning 41(2). 177–204.
Vance, Timothy J. 1987. An Introduction to Japanese Phonology. New York: State University of
New York Press.
Yamada, Tsuneo & Takahiro Adachi. 1998. Eigo risuningu kagakuteki jotatsuho [Saientific
ways to improve your skill of English]. Tokyo: Kodansha.
Yamada, Tsuneo & Takahiro Adachi. 1999. Eigo supikingu kagakutekijotatsuho [Scientific ways
to improve your speaking skill of English]. Tokyo: Kodansha.
Zhang, Runhan & Zhou-min Yuan. 2020. Examining the effects of explicit pronunciation
instruction on the development of L2 pronunciation. Studies in Second Language
Acquisition 42(4). 905–918. https://doi.org/10.1017/s0272263120000121
Natallia Liakina, Denis Liakin
Speech technologies and pronunciation
training: What is the potential for efficient
corrective feedback?
Abstract: In this paper, we will first examine different types of implicit and ex-
plicit corrective feedback (CF) that automatic speech recognition (ASR)-based
applications can provide and discuss their impact on the acquisition of L2 pro-
nunciation in light of SLA findings. Second, we will report the results of our
action research on the use of three different ASR-based tools in two university-
level French pronunciation courses, with specific reference to learners’ percep-
tions of the utility of different types of automatic corrective feedback provided
by these tools. To conclude, we will offer avenues of discussion and practical
suggestions for the effective and sensible integration of ASR-based applications
in the teaching and learning of L2 pronunciation, in and beyond the classroom.
1 Introduction
Intelligible speech is integral to L2 acquisition and use, and is essential for ef-
fective communication (Arteaga 2000; Levis and McCrocklin 2018; Morin 2007;
Thomson and Derwing 2014). Although students frequently express the need or
desire to improve their pronunciation, teachers often neglect it in favor of the
development of other skills in the traditional language classroom (Isaacs 2009;
Lang et al. 2012; Lebel 2011; Saito 2012) and, at the same time, it is rare for stu-
dents to receive sufficient instruction and feedback on pronunciation from their
teacher due to the lack of time and/or appropriate resources and training (Col-
lins and Muñoz 2016; Cucchiarini and Strik 2013; Morin 2007; Neri, Cucchiarini,
and Strik 2002). Input alone (exposure inside and outside the classroom) is in-
sufficient for pronunciation advancement (Elliott 1995; Flege 1981; Fortune and
Tedick 2015; Han and Odlin 2006; Kennedy 2011; Solon 2016), so learners need
to have extensive opportunities for output during classroom interactions or
https://doi.org/10.1515/9783110736120-011
288 Natallia Liakina, Denis Liakin
Lyster 2018). A key point is that CF cannot be useful if learners have not en-
gaged in any initial explicit learning opportunities, or if they are not provided
with clear information on erroneous utterances beyond information about cor-
rectness (Hattie and Timperley 2007).
While researchers, teachers and learners consider CF as a crucial compo-
nent of L2 pronunciation teaching and learning, the frequency of CF episodes
targeting pronunciation in L2 classrooms is very low and represents only 22.4%
of teacher-learner interactions (Brown 2016). Since opportunities for pronuncia-
tion training and immediate personalized corrective feedback are limited in the
traditional classroom setting, can the use of new speech technologies be a via-
ble solution to provide the learners with effective pronunciation practice with
meaningful feedback?
A CAPT system may include either audio or visual feedback, which pin-
points pronunciation errors while students are making unlimited trials practic-
ing a target language in the absence of teacher involvement. In a CAPT system,
ASR technology automatically transcribes students’ voice recordings into writ-
ten words. The speech visualization technology integrated into the system visu-
ally shows the deviation of students’ pronunciation from that of native model
speakers, and this feedback can provide learners with particular metacognitive
strategies to facilitate mastery of the target language sound patterns (Tsai
2019).
Speech recognition software varies greatly in terms of its validity, reliability,
and the quality of the feedback provided to the users; therefore, further research
is necessary to determine if speech recognition software actively supports L2 pro-
nunciation development (Bajorek 2017; Liakin, Cardoso, and Liakina 2017b; Mroz
2018).
3.1 TTS
There exists very little research on the effects of the use of TTS as an L2 peda-
gogical tool. Liakin, Cardoso, and Liakina (2017a) investigated the acquisition
in production of French liaison, i.e., the pronunciation of a latent word-final
consonant in a mobile TTS-based learning environment. Using a pre/post/
delayed-posttest design with two experimental groups (TTS-group and French
instructor supervised group) and a control group, the results indicated that the
two groups that received instruction, namely the TTS and teacher-led groups,
outperformed the control group in liaison production. This study confirmed
TTS’ ability to aid in pronunciation learning.
The results obtained by Bione and Cardoso (2020) suggest that synthetic
voices have the potential to deliver intelligible and comprehensible input, simi-
lar to human speech. Their study evaluated a modern English TTS system in an
EFL context in Brazil in terms of its speech quality, ability to be understood by
L2 users, and potential for focus on specific language forms in comparison with
a native English speaker. The results of the study indicate that the performance
of both the TTS and human voices were perceived similarly in terms of compre-
hensibility, while ratings for naturalness were unfavorable for the synthesized
voice. For text comprehension, dictation, and aural identification tasks, partici-
pants performed relatively similarly in response to both voices.
292 Natallia Liakina, Denis Liakin
3.2 ASR
The majority of studies that have investigated the effects of ASR on the acquisi-
tion of L2 pronunciation have shown that, despite many limitations, this technol-
ogy has the potential to be effective. In the context of pronunciation teaching,
researchers suggest two possible applications for ASR: (1) to teach the pronuncia-
tion of a foreign language; and (2) to assess students’ oral production. A series of
studies show that computer-assisted pronunciation instruction using ASR can be
effective in the acquisition of L2 phonological features (Bodnar et al. 2016; Cuc-
chiarini, Neri, and Strik 2009; Garcia, Nickolai, and Jones 2020; Liakin, Cardoso,
and Liakina 2015, 2017b; McCrocklin 2016; Mroz 2018, 2020; Mushangwe 2015;
Neri et al. 2008; Penning de Vries et al. 2014; Seferoglu 2005; Strik et al. 2009,
2012, among others).
Liakin, Cardoso, and Liakina (2015) investigated the effects of mobile ASR-
based learning on the acquisition of the problematic French vowel /y/ in pro-
duction and perception. The study consisted of three groups of learners: one
received instruction via ASR, the other via a French instructor, and the third
acted as the control group. Their findings indicated that the group that received
ASR-based instruction improved significantly in /y/ production from pretest to
posttest, in comparison with the two other groups.
An experimental study by Mroz (2020) aimed to determine the impact of
mobile-based ASR in Gmail on the intelligibility and proficiency of Intermediate
learners of French as a foreign language, and whether any individual factors
influenced learning outcomes. The results of this study showed that ASR users
significantly outperformed non-ASR users on intelligibility, particularly when
exposed to instruction on spelling-to-sound patterns, and demonstrated the
most significant growth in proficiency.
In Garcia, Nickolai, and Jones (2020), the authors presented a 15-week
classroom study measuring the student outcomes of instructor-led pronuncia-
tion lessons versus entirely ASR-based pronunciation training in lower-level
Spanish courses. The study found that both instructor-led and ASR-based in-
struction techniques yielded statistically significant gains in pronunciation rat-
ings. ASR seems to outperform traditional instruction when targeting specific
phonemes, especially in the short-term, while the instructor-led group which
received explicit instruction on pronunciation saw longer-term gains regarding
comprehensibility. The data suggest that ASR-based instruction shows promise
to improve certain aspects of L2 pronunciation.
Speech technologies and pronunciation training 293
While more and more studies investigate the usefulness of speech technologies
to develop different L2 skills, a limited number of researchers investigated the
efficiency of different types of automated, immediate CF that can be provided
to learners using ASR and TTS-based tools for L2 pronunciation practice. The
following studies guided us in designing our action research, which will be pre-
sented in the following section.
Cucchiarini, Neri, and Strik (2009) conducted a research experiment with a
group of 30 adult immigrants who were divided into three groups who used:
(1) an ASR-based Computer Assisted Pronunciation Training (CAPT) system devel-
oped specifically for Dutch L2 learners and providing CF on a limited number of
problematic Dutch sounds; (2) a CAPT system with no CF; or (3) a regular
teacher-front classroom instruction with no CAPT system. The ASR-based feed-
back consisted of the transcription of the utterance produced by the learners
with mispronounced phonemes identified in red, a smiley, and a comment indi-
cating that there was an error. The system also allowed the learners to listen
and to compare their pronunciation with the model. In order to regulate the
level of anxiety, only three errors maximum were signaled for each recording.
According to the results, the group with ASR-based CF outperformed two other
groups on the production of the targeted sounds; however, the difference in im-
provement for three groups was not statistically significant for the phonemes
not targeted by the automatic feedback. While this study demonstrated the pos-
itive impact of the ASR-based CF on the production of the sounds targeted by
the training, the researchers concluded that limiting CF to a restrained number
of problematic sounds is not an effective strategy to obtain significant overall
learning effects and pronunciation quality.
To better understand CF in an ASR system, Wang and Young (2014) re-
searched the effects of two different types of immediate automated CF provided
through a pedagogical ASR-based intelligent computer-assisted speaking learn-
ing (iCASL) system for autonomous practice of English pronunciation. The partic-
ipants – 38 adult ESL learners from Taiwan – were divided into an experimental
and a control group and had to complete weekly reading activities independently
during an eight-week period. While the control group received only implicit CF
that consisted of a speaking score and a waveform diagram, the experimental
group benefited from additional explicit targeted CF, including a corrective com-
ment, a list of words pronounced correctly and with errors, and recasts of the
learners’ utterances. Finally, the learners had access to audio recordings with
full sentences and single-word forms that could be played at a natural and slow
pace. According to the results, 94% of control group participants reported being
294 Natallia Liakina, Denis Liakin
confused and not being able to interpret the overall assessment scores and the
waveforms. Therefore, it is not surprising that only the experimental group’s par-
ticipants, exposed to both implicit and explicit multi-modal targeted CF, attained
significant improvement rates in pronunciation.
Bajorek (2017) considers, among other fundamental points, the importance
of L2 pronunciation and how targeted feedback of spoken production can sup-
port language learners. Her findings indicate that the softwares reviewed (Ro-
setta Stone, Duolingo, Babbel, and Mango Languages) provide insufficient
feedback to learners about their speech and, thus, have unrealized potential.
The author recommends that learners be provided, wherever possible, with tar-
geted feedback so that they can act on this information and improve their
speech via explicit instruction. Accordingly, ASR can be helpful for learners in
providing immediate targeted feedback, but this capability must be explained
through explicit instructions rather than being used as an unexplained assess-
ment tool.
To conclude, ASR is a very promising technology that should allow stu-
dents to get immediate feedback on their pronunciation, thus making them
more independent in learning this aspect. However, as presented in Liakin, Car-
doso, and Liakina (2017b), many participants of their two studies experienced a
great deal of frustration when they were unable to understand why the applica-
tions could not understand them and how they could correct themselves:
“Sometimes I didn’t know what to change, so I just said the same thing over and over.”
“[. . .] I didn’t even know what I was doing wrong.”
“[. . .] when I was getting to the thirteenth, the fourteenth [try] and I’m just like ‘I don’t
know how you want me to say it!’”
There is also little research on learners’ perceptions of the use of ASR and TTS
for French as a second/foreign language pronunciation learning in general
and, more specifically, on the immediate automated feedback they receive.
This study adopts an action research approach to examine the student per-
ceptions of speech technology as a pronunciation-learning tool and the imme-
diate feedback it provides. It aims to explore the use of three different types of
ASR and TTS supported tools that allow learners to practice pronunciation and
receive instantaneous, automatic feedback, not only on structured read and re-
peat tasks but also on a broader range of communicative tasks to transfer the
new skills acquired in a controlled environment into more spontaneous oral
communication contexts. Our goal was to explore the potential of different ASR
and TTS applications and the types of automated corrective feedback they offer
at different stages of the pronunciation learning process, as suggested by SLA
and CALL research and second language pedagogy.
4.2 Method
4.2.1 Participants
The pedagogical design of the study was based on the framework for effective
pronunciation teaching in communicative contexts (Celce-Murcia et al. 2010),
which includes the following steps:
– listening in the form of the perceptual teaching aimed at developing phono-
logical awareness and proposing perception and discrimination activities;
– repetition/imitation in the form of guided practice;
– communication in the form of the reuse of spontaneous production in vari-
ous contexts such as speech acts, presentations, etc. (Celce-Murcia et al.
2010: 45).
The development of pedagogical tasks was also guided by SLA research find-
ings that suggest the efficiency of explicit learning (Derwing and Munro 2015;
Ellis 1994), of focus-on-form (Long 2000) integrated into a thematic framework
and based on the communicative and task-based methods (Elliott 1997; Gatbon-
ton and Segalowitz 2005; Trofimovich and Gatbonton 2006; Yule, Powers, and
Macdonald 1992) and of noticing hypothesis (Schmidt 1994, 1995).
Four pedagogical sequences were developed by the teachers-researchers to
allow the students to practice the phonemes targeted by the course curriculum
outside of the classroom. The tasks were grouped in four 1.5-hour assignments
that students needed to complete outside of regular contact hours as homework.
Each assignment included the following activities in context, each focusing
on specific segmental and suprasegmental elements and with a focus on vocab-
ulary and formulaic expression of a theme (e.g. silent and pronounced final
consonants, rounded vowels /oe/-/ø/, qualitative adjectives and description of
a person, nasal vowels, enumeration intonation and food):
– a review of the articulation, the grapheme-phoneme correspondences and
the pronunciation rules;
– auditory discrimination and grapheme-phoneme correspondence autocor-
rected exercises;
– reading tasks with a focus on targeted phonemes;
– communicative tasks.
In order to achieve these pedagogical goals, three different ASR and TTS-based
tools were integrated: iSpraak, a teaching tool designed for pronunciation train-
ing in a great variety of languages; Pronunciator, a multi-language learning
platform; and Speech to Text Translator TTS app for mobile devices, a free dicta-
tion tool.
Speech technologies and pronunciation training 297
4.2.4 Instruments
At the end of the first week (first use) and at the end of the fourth week (last
use), the participants were invited to respond to a survey questionnaire involv-
ing a set of six statements regarding their perceptions of the use of each speech
technology as a learning tool for pronunciation training and of the immediate
automated corrective feedback provided by each of three tools (using a five-
point Likert scale in order to measure the degree to which students disagreed or
agreed with each statement: (1) strongly disagree, (2) disagree, (3) neutral, (4)
agree, and (5) strongly agree).
In order to better understand the quantitative results, participants were in-
vited to express their opinion on each statement of the questionnaire at the end
of the experiment. For each tool used in the study, the statements asked partici-
pants if: (a) the tool increased their motivation to learn about French pronuncia-
tion; (b) the tool allowed them to become aware of some of their pronunciation
problems; (c) the tool allowed them to evaluate their own pronunciation (to de-
cide whether their pronunciation was correct or incorrect); (d) the tool is user-
friendly; (e) the immediate feedback was helpful; and, finally, (f) they thought
this is a great tool to learn and practice pronunciation.
To guarantee confidentiality and to avoid factors that could affect data col-
lection or interpretation of the statements, the survey was administered at
home, without the presence of the teacher, and using English, the language of
instruction at the university where the study took place.
The data from the survey questionnaire were analyzed using descriptive
statistics, in which we established the mean values and associated standard de-
viations for each item under consideration.
4.3 Results
The data compiled by means of the survey questionnaire were analyzed via a
simple mean calculation with associated standard deviation (descriptive statis-
tics). Means were used to measure the students’ ratings of the statements
adopted in the study.
Speech technologies and pronunciation training 299
Quantitative results
For our first research question, What are students’ perceptions of the immediate
automated corrective feedback provided by speech technology?, we analysed sur-
vey statements 1–3: (1) “the immediate feedback was helpful”; (2) “the tool al-
lowed me to evaluate my own pronunciation”; (3) “the tool allowed me to
become aware of some of my pronunciation problems.” Table 1 illustrates the
results for each of these three items collected after the first and the last use of
the tools.
. The immediate . . . . . . . . . . . .
feedback was
helpful.
. The tool allowed . . . . . . . . . . . .
me to evaluate my
own pronunciation.
. The tool allowed . . . . . . . . . . . .
me to become aware
of some of my
pronunciation
problems.
Although the students’ perceptions were positive for all three statements, the re-
sults of the responses to the questions regarding the usefulness of corrective
feedback clearly indicate that the iSpraak application, which offered a combina-
tion of implicit and explicit corrective cues, comments, and targeted feedback,
was the most highly rated. The Pronunciator platform received the lowest scores:
the speech recognition was not always reliable because of some technical issues
with ASR, and the scores and general appreciation messages that students re-
ceived were inconsistent and difficult to interpret.
What we found very interesting was the change in perception of the Speech
to Text Translator TTS app between the first and the last use of the tool, particu-
larly the appreciation of the implicit feedback that the app offered in the form
300 Natallia Liakina, Denis Liakin
Qualitative results
All the participants were invited to express their opinion1 on each statement
from the questionnaire at the end of the experiment.
As for the help of immediate feedback (Statement 1), on the positive side,
all learners (n=57) appreciated the variety and complementarity of CF types
that can be exemplified with the following excerpts: “It is always nice to have
immediate feedback to be able to correct you mistakes.” – “All feedbacks are
helpful.” – “I can find quickly my pronunciation errors from these tools.” These
responses are likely due to the fact that immediate feedback allowed them to
see their progress, in addition to receiving messages of encouragement in Pro-
nunciator, as illustrated by one of the participants’ comments: “My favorite tool
for feedback is Pronunciator because it encourages me to keep working on pro-
nunciation.” At the same time, the participants appreciated seeing their mis-
takes immediately in Speech to text Translator TTS and iSpraak: “It is very
useful to know exactly what it is that you are mispronouncing.”, allowing them to
“correct mistakes before [they] remember things wrong.”
The less positive comments on the helpfulness of the immediate feedback
concerned the perceived inaccuracy of the tool and the inability to autocorrect
after completing the tasks. In terms of the assessment of their pronunciation,
several participants stated that sometimes iSpraak was a source of slight disap-
pointment since they often obtained an almost perfect score even if they mis-
pronounced several words. As for the helpfulness of the binary feedback
provided by Pronunciator, it was reported that “ . . . it could have been better if
it provided you with more specific feedback (like iSpraak), like telling you ex-
actly which words were mispronounced.”
As for Statement 2 (The tool allowed me to evaluate my own pronunciation),
the learners appreciated all the tools: “I found all of the tools helpful in showing
me what the standard pronunciation should be, and in catching my mistakes.”
They additionally liked the possibility to see their pronunciation score in Pro-
nunciator and iSpraak: “It gives me scores that I can evaluate myself.”
We have chosen to keep the participants’ comments in their original and unmodified form.
Speech technologies and pronunciation training 301
They also appreciated the fact that immediate feedback allows them to vi-
sualize what has been said, to listen to, and compare their pronunciation
against a model in Pronunciator, thus contributing to their practice and learn-
ing experience, as highlighted by a participant: “It is a helpful tool to aware of
my pronunciation problems. It let me compare between my pronunciation and
right pronunciation.”
However, as can be observed in Table 1, they had some difficulties with
Speech to Text Translator TTS at the beginning, since there were ads in the app.
At the same time, some students expressed a lack of confidence and mentioned
the fact that not all tools provided explicit feedback on the errors: “I wish it told
me what was wrong with my pronunciation but I realize that is hard for an appli-
cation to do.”
Finally, the learners appreciated the fact that immediate feedback allowed
them to become aware of their mistakes and the quality of their pronunciation
(Statement 3), thus helping them to identify the elements pronounced incorrectly
and to know exactly what to correct: “I do really recommend the apps iSpraak
and Pronunciator. For the first one, you can know what kind of pronunciation you
should stress on and for the second one, you could imitate what you have heard.” –
“iSpraak and Speech-to-text TranslatorTTS made me realize most what I was doing
wrong.” – “ . . . very useful to know what I said wrong.”
Students particularly liked iSpraak. This can be exemplified with the fol-
lowing excerpt: “I like the system of this tool. First, I listen to the sound clip to be
familiar the right pronunciation and rhythm of the sentence. And then, I record
my pronunciation. Also, they give me quick feedbacks that I can aware where I
need to fix.”
Despite these findings, we need to mention that some participants had a
feeling of frustration, especially those who had pronunciation problems and
would prefer to have the chance to listen to their own recordings.
These results suggest that the type of corrective feedback impacts one’s
learning experience and that tools that provide targeted explicit and implicit
feedback are perceived as more useful. It could also be concluded that learners
need more coaching in spelling decoding and correction strategies if the correc-
tive feedback they receive is implicit. Scores with or without a general comment
are less useful and less appreciated.
302 Natallia Liakina, Denis Liakin
Quantitative results
For our second research question, How do learners perceive the use of speech
technology as a learning tool for pronunciation training?, we analysed the survey
statements 4–6 from Table 2: (4) “I think this is a great tool to learn and practice
pronunciation”; (5) “the tool increased my motivation to learn about French pro-
nunciation”; and (6) “the tool is user-friendly.”
. I think this is a . . . . . . . . . . . .
great tool to learn
and practice
pronunciation.
. The tool . . . . . . . . . . . .
increased my
motivation to learn
about French
pronunciation.
. The tool is user- . . . . . . . . . . . .
friendly.
Similar to the case of the three previous statements, all statements for the second
research question indicated that the students’ perceptions were positive; the only
exception was for Speech to Text Translator TTS after the first week. Here, again,
the assessment was much more favorable at the end of the intervention since the
students were able to learn how to use the tool.
Qualitative results
For Statement 4 (I think this is a great tool to learn and practice pronunciation),
the learners appreciated all three tools and how they complemented each
other: “I think they’re all great tools for different reasons, and I enjoyed the vari-
ety.” – “Each tool offered something different in terms of pronunciation help [. . .]” –
“I would use three all again [. . .] for pronunciation.” Students also appreciated
the possibility of unlimited practice.
Speech technologies and pronunciation training 303
Even though these tools might be a source of frustration for some students,
especially for learners with many pronunciation problems, they seemed to ap-
preciate that learning can be done autonomously, not in front of the group, mo-
tivating the practice: “It is certainly very useful for practicing. It can get a bit
frustrating at times, but it also encourages to keep doing it to get better. It’s like a
very personal challenge.”
As for the motivation statement (The tool increased my motivation to learn
about French pronunciation), students were unanimous and positive: “The styles
of the tools are various, but all tools inspire my motivation to learn about French
pronunciation equally.” – “It made me realize many errors in my knowledge of pro-
nunciation, and it made want to fix them.” – “Most of these tools actually motivate
to do better in French language, because when I am corrected, it shows me what I
need to work on.”
A small number of students mentioned a few drawbacks such as the limited
number of activities and the time to be accustomed to the tools: “I feel that
using the tools helped me to want to learn more, but they took a while to get ac-
customed to.”
Finally, for the statement The tool is user-friendly, although students found
them easy to use (“They are easy to use. Interface is simple and to the point.”),
we can observe that students had some difficulties with Speech to Text Transla-
tor TTS at the beginning (“ . . . has weird layout. Too many ads.”), but at the
end they changed their perception significantly. We want to stress that it is cru-
cial to train learners well, technically and pedagogically, to take the time to
master the tool, and to consider the potential negative influence of pop-ups
and advertisements in apps as it was in the case of Speech to Text Translator
TTS. That is often the case when it comes to applications that offer free access
and are not pedagogically conceived.
In sum, the qualitative analysis allowed us to identify the perceived bene-
fits and drawbacks of speech technologies as pronunciation practice tools and
their potential for effective corrective feedback illustrated in Table 3. These fac-
tors can either enhance or, on the contrary, have a negative effect on a learning
experience and should be considered when a pedagogical decision is made
about the integration of ASR and TTS-based tools into the curriculum.
Benefits Drawbacks
awareness of their mistakes and of the quality of unusual layout of the tool
their pronunciation
the results of our study suggest that targeted feedback provided in form of en-
hanced visual prompts and followed by audio recasts is useful, especially during
guided practice activities when the learners are still acquiring knowledge about
new pronunciation features. This is important at this stage because learners need
a solid scaffolding to support their learning and to prepare them to engage in
more complex oral tasks.
As for the prompts that were given implicitly via written transcription of the
utterances in iSpraak and Speech to Text Translator TTS, it was interesting to ob-
serve how the learners’ perceptions changed over time. As noted by many re-
searchers, CF is beneficial only when it is built on the previous knowledge and if
it can be understood by the learners based on what they already know (see
Ammar and Spada 2006; DeKeyser 2007; Lyster, Saito, and Sato 2013). Also, CF is
more efficient when the learners “have enough phonetic knowledge, conversa-
tional experience, and perceptual awareness of target sounds” (Saito 2021: 422).
At the beginning of the experiment, our participants, all near-beginners, had a
very limited knowledge and experience, therefore they weren’t ready and well
equipped to be able to decode the implicit feedback, to engage in reflection of
their successes and to identify the gaps in their pronunciation that need to be
addressed and they didn’t perceive the ASR-based dictation app as a useful tool
for learning pronunciation. After four weeks of intensive training, including ex-
plicit instruction and extensive practice of grapheme-phoneme correspondence,
which is very challenging for learners of French, the appreciation of the implicit
CF in forms of prompts was as positive as the one of the targeted explicit
feedback with audio recasts. These findings suggest that the integration of
ASR-based dictation tools should be carefully prepared and an appropriate
scaffolding in form of explicit teaching and training on how to fix the pro-
nunciation errors based on the transcription of the utterance provided by the
app in order to avoid learners’ frustration and loss of motivation (Liakin, Car-
doso, and Liakina 2017b). Finally, in terms of the place of pronunciation ac-
tivities supported by ASR and TTS dictation tools, it appears that they are
suitable for semi-guided activities and more communicative tasks that are
part of spontaneous practice.
As for binary feedback, consisting of an overall assessment (score and gen-
eral appreciation message) introduced during the final phase of the sequences
for communicative practice in context, the appreciation of CF remained neutral
during the whole length of the experiment. Such CF was considered a signal
that something went wrong and a prompt to try again, but not as a sufficient
teaching technique to support learning and correct pronunciation. However, it
is important to draw attention to the implicit combination of auditory feedback
available in Pronunciator when the learners, on top of the binary feedback,
306 Natallia Liakina, Denis Liakin
listened to their own recorded utterances and compared them to the model re-
cordings. Many participants found it extremely helpful and expressed the wish
to have the same feature available for all the tools and pronunciation activities,
which could be one of the criteria for teachers who choose ASR-TTS tools for
their students or learning app developers working in the field of CAPT.
In sum, most of the participants found all the apps and the combination of
different types of CF very useful and complementary, which supports the recom-
mendations of many researchers to orchestrate different CF techniques (Ammar
and Spada 2006; Hattie and Temperley 2007; Lyster and Ranta 1997).
In terms of pedagogy, the design of the technology-mediated tasks was in-
spired by the framework for effective pronunciation teaching, which allowed
progressive learning with multiple outcomes, from explicit teaching of pronun-
ciation rules to the practice of grapheme-phoneme correspondence; perception
to production; and from controlled to spontaneous processing and output
(Celce-Murcia et al. 2010). The explicit instruction and oral feedback provided
during class time enabled the learners to identify and understand errors, and
then correct themselves based on different types of automated ASR-based feed-
back. This supports the claims made by many researchers that explicit phonetic
knowledge is necessary for pronunciation-focused CF to be effective, so it could
lead to significant learning gains in pronunciation.
In sum, the results of this study suggest that ASR-based automated immedi-
ate feedback, provided in a variety of formats, has a positive impact on pronunci-
ation learning when it is combined with training on learning strategies, explicit
instruction on articulation techniques, and spelling-to-sound patterns integrated
into level-appropriate realistic tasks. Altogether, this encourages meaningful pro-
nunciation practice in context and empowers learners to become more motivated
and more autonomous in their pronunciation practice outside of the classroom
(Dikerson 2015; Liakin, Cardoso, and Liakina 2017b; McCrocklin 2016).
6 Concluding remarks
The first goal of this study was to review the different types of speech technolo-
gies to better understand how they could be used for pronunciation instruction,
with specific attention on how ASR- and TTS-based applications may offer an
array of opportunities for immediate automated corrective feedback. The second
goal of this study was to better understand learners’ perceptions of the utility of
the different types of corrective feedback provided by the abovementioned appli-
cations (i.e., iSpraak, Speech to Text Translator TTS, Pronunciator). With regard to
Speech technologies and pronunciation training 307
References
Ammar, Ahlem & Nina Spada. 2006. One size fits all? Recasts, prompts and L2 learning.
Studies in Second Language Acquisition 28(4). 543–574.
Arteaga, Deborah L. 2000. Articulatory phonetics in the first-year Spanish classroom. The
Modern Language Journal 84(3). 339–354.
Bajorek, Joan. 2017. L2 Pronunciation in CALL: The unrealized potential of Rosetta Stone, Duolingo,
Babbel, and Mango Languages. Issues and Trends in Educational Technology 5(2). 24–51.
Baker, Amanda & Michael Burri. 2016. Feedback on second language pronunciation: A case
study of EAP teacher’s beliefs and practices. Australian Journal of Teacher Education 41(6).
1–19. doi:10.14221/ajte.2016v41n6.1 (accessed 16 June 2021).
Bione, Tiago & Walcir Cardoso. 2020. Synthetic voices in the foreign language context.
Language Learning & Technology 24(1). 169–186.
Blake, Robert J. 2013. Brave New Digital Classroom: Technology and Foreign Language
Learning. Washington: Georgetown University Press.
Bodnar, Stephen, Catia Cucchiarini, Helmer Strik & Roeland van Hout. 2016. Evaluating the
motivational impact of CALL systems: Current practices and future directions. Computer
Assisted Language Learning 29(1). 186–212.
Brown, Dan. 2016. The type and linguistic foci of oral corrective feedback in the L2 classroom:
A meta-analysis. Language Teaching Research 20(4). 436–458.
Celce-Murcia, Marianne, Donna M. Brinton, Janet M. Goodwin & Barry Griner. 2010. Teaching
Pronunciation: A Course Book and Reference Guide, 2nd edn. Cambridge: Cambridge
University Press.
Chapelle, Carol. 2001. Computer Applications in Second Language Acquisition: Foundations
for Teaching, Testing, and Research. Cambridge: Cambridge University Press.
308 Natallia Liakina, Denis Liakin
Chapelle, Carol & Joan Jamieson. 2008. Tips for Teachers: Computer-assisted Language
Learning. New York: Pearson Longman.
Collins, Laura & Carmen Muñoz. 2016. The foreign language classroom: Current perspectives
and future considerations. The Modern Language Journal 100 (S1).133–147. https://doi.
org/10.1111/modl.12305 (accessed 16 June 2021).
Council of Europe. 2001. Common European Framework of Reference for Languages: Learning,
Teaching, Assessment. Cambridge, U.K.: Press Syndicate of the University of Cambridge.
Crompton, Peter & Sherwin Rodrigues. 2001. The role and nature of feedback on students
learning grammar: A small scale study on the use of feedback in call in language
learning. Proceedings of the Workshop on Computer Assisted Language Learning,
Artificial Intelligence in Education Conference, 70–82.
Cucchiarini, Catia, Ambra Neri & Helmer Strik. 2009. Oral proficiency training in Dutch L2: The
contribution of ASR-based corrective feedback. Speech Communication 51(10). 853–863.
Cucchiarini, Catia & Helmer Strik. 2013. Second language learners’ spoken discourse: Practice
and corrective feedback through Automatic Speech Recognition. In Hwee Ling Lim & Fay
Sudweeks (eds.), Innovative Methods and Technologies for Electronic Discourse Analysis,
169–189. Hershey: Information Science Reference.
DeKeyser, Robert. 2007. Skill Acquisition Theory. In Bill VanPatten & Jessica Williams (eds.),
Theories in Second Language Acquisition: An Introduction, 97–113. Mahwah, NJ:
Lawrence Erlbaum Associates Publishers.
Derwing, Tracey M. 2010. Utopian goals for pronunciation teaching. In John Levis & Kimberly
LeVelle (eds.), Proceedings of the 1st Pronunciation in Second Language Learning and
Teaching Conference, Ames, USA, 2009, 17–19. Ames, IA: Iowa State University.
Derwing, Tracey M. & Murray J. Munro. 2015. Pronunciation Fundamentals: Evidence-based
Perspectives for L2 Teaching and Research. Amsterdam: John Benjamins.
Derwing, Tracey M., Murray J. Munro & Grace Wiebe. 1998. Evidence in Favor of a Broad
Framework for Pronunciation Instruction. Language Learning 48(3). 393–410.
doi:10.1111/0023-8333.00047 (accessed 16 June 2021).
Derwing, Tracey M. & Marian J. Rossiter. 2003. The Effects of Pronunciation Instruction on the
Accuracy, Fluency, and Complexity of L2 Accented Speech. Applied Language Learning 13(1).
1–17.
Dickerson, Wayne. 2015. Using orthography to teach pronunciation. In Marnie Reed & John
Levis (eds.), The Handbook of English Pronunciation, 488–503. Chichester: Wiley
Blackwell.
Elliott, A. Raymond. 1995. Foreign language phonology: field independence, attitude, and the
success of formal instruction in Spanish pronunciation. The Modern Language Journal 79(4).
530–542.
Elliott, A. Raymond. 1997. On the teaching and acquisition of pronunciation within a
communicative approach. Hispania 80(1). 95–108.
Ellis, Rod. 1994. The Study of Second Language Acquisition, 2nd edn. Oxford: Oxford University
Press.
Ellis, Rod. 2012. Language Teaching Research and Language Pedagogy. Oxford:
Wiley–Blackwell.
Ellis, Rod & Young Sheen. 2006. Reexamining the role of recasts in second language
acquisition. Studies in Second Language Acquisition 28(4). 575–600. doi:10.1017/
S027226310606027X (accessed 16 June 2021).
Speech technologies and pronunciation training 309
Flege, James E. 1981. The phonological basis of foreign accent: A hypothesis. TESOL Quarterly
75(4). 443–455.
Fortune, Tara W. & Diane J. Tedick. 2015. Oral proficiency assessment of English-proficient K-8
Spanish immersion students. Modern Language Journal 99(4). 637–655.
Garcia, Christina, Dan Nickolai & Lillian Jones. 2020. Traditional versus ASR-based
pronunciation instruction: An empirical study. Calico Journal 37(3). 213–232.
Gass, Susan M., Jennifer Behney & Luke Plonsky. 2013. Second Language Acquisition: An
Introductory Course, 4th edn. New York: Routledge.
Gatbonton, Elizabeth & Norman Segalowitz. 2005. Rethinking communicative language
teaching: A focus on access to fluency. The Canadian Modern Language Review 61(3).
325–353. http://dx.doi.org/10.3138/cmlr.61.3.325 (accessed 16 June 2021).
Golonka, Ewa M., Anita R. Bowles, Victor M. Frank, Dorna L. Richardson & Suzanne Freynik.
2014. Technologies for foreign language learning: A review of technology types and their
effectiveness. Computer Assisted Language Learning 27(1). 70–105. doi:10.1080/
09588221.2012.700315 (accessed 16 June 2021).
Gooch, Debbie, Paul Thompson, Hannah M. Nash, Margaret J. Snowling & Charles Hulme.
2016. The development of executive function and language skills in the early school
years. Journal of Child Psychology and Psychiatry 57(2). 180–187.
Han, ZhaoHong & Terence Odlin. 2006. Studies of Fossilization in Second Language
Acquisition. Bristol: Multilingual Matters.
Hattie, John & Helen Timperley. 2007. The power of feedback. Review of Educational Research
77(1). 81–112.
Hincks, Rebecca. 2003. Speech Technologies for Pronunciation Feedback and Evaluation.
ReCALL: The Journal of EUROCALL 15(1). 3–20. doi:10.1017/S0958344003000211
(accessed 16 June 2021).
Isaacs, Talia. 2009. Integrating form and meaning in L2 pronunciation instruction. TESL
Canada Journal 27(1). 1–12.
Kartushina, Natalia, Alexis Hervais-Adelman, Ulrich Hans Frauenfelder & Narly Golestani.
2016. Mutual influences between native and non-native vowels in production: Evidence
from short-term visual articulatory feedback training. Journal of Phonetics 57. 21–39.
Kennedy, Sara. 2011. Le développement de la parole L2 d’étudiants universitaires non-natifs.
Paper presented at the Journée d’étude sur la phonétique des langues secondes,
Université du Québec à Montréal, 1 April.
Kennedy, Sara, Josée Blanchet & Pavel Trofimovich. 2014. Learner pronunciation, awareness,
and instruction in French as a second language. Foreign Language Annals 47(1). 76–96.
Lang, Yong, Lin Wang, Lianxia Shen & Yinying Wang. 2012. An integrated approach to the
teaching and learning of zh. Electronic Journal of Foreign Language Teaching 9(2).
215–232.
Lebel, Jean-Guy. 2011. Nécessité de la correction phonétique en FLE. Paper presented at the
Journée d’étude sur la phonétique des langues secondes, Université du Québec à
Montréal, 1 April.
Lee, Andrew H. & Roy Lyster. 2016. Effects of different types of corrective feedback on
receptive skills in a second language: A speech perception training study. Language
Learning 66(4). 809–833.
Lee, Andrew H. & Roy Lyster. 2017. Can corrective feedback on second language speech
perception errors affect production accuracy? Applied Psycholinguistics 38(2). 371–393.
310 Natallia Liakina, Denis Liakin
Lee, Junkyu, Juhyun Jang & Luke Plonsky. 2015. The Effectiveness of Second Language
Pronunciation Instruction: A Meta-Analysis. Applied Linguistics 36(3). 345–366.
Levis, John & Shannon McCrocklin. 2018. Reflective and effective teaching of pronunciation. In
Akram Faravani, Mitra Zeraatpishe, Hamid Reza Kargozari & Maryam Azarnoosh (eds.),
Issues in Syllabus Design, 77–89. Rotterdam: Sense Publishers.
Li, Shaofeng, Yan Zhu & Rod Ellis. 2016. The effects of the timing of corrective feedback on the
acquisition of a new linguistic structure. Modern Language Journal 100(1). 276–295.
Liakin, Denis, Walcir Cardoso & Natallia Liakina. 2015. Learning L2 pronunciation with a
mobile speech recognizer: French /y/. CALICO Journal 32(1). 1–25.
Liakin, Denis, Walcir Cardoso & Natallia Liakina. 2017a. The pedagogical use of mobile speech
synthesis (TTS): Focus on French liaison. Computer Assisted Language Learning 30(3).
348–365.
Liakin, Denis, Walcir Cardoso & Natallia Liakina. 2017b. Mobilizing instruction in a second-
language context: Learners’ perceptions of two speech technologies. Languages 2(3).
1–21.
Loewen, Shawn & Jenefer Philip. 2006. Recasts in the adult L2 classroom: Characteristics,
explicitness and effectiveness. Modern Language Journal 90(4). 536–556.
Long, Michael H. 2000. Focus on form in task-based language teaching. In Richard D. Lambert
& Elana Shohamy (eds.), Language Policy and Pedagogy: Essays in Honor of A. Ronald
Walton, 179–192. Amsterdam: John Benjamins Publishing Company.
Lord, Gillian. 2005. (How) can we teach foreign language pronunciation? On the effects of a
Spanish phonetics course. Hispania 88 (3).557. doi:10.2307/20063159 (accessed June 16
2021).
Lyster, Roy & Leila Ranta. 1997. Corrective feedback and learner uptake: Negotiation of form in
communicative classrooms. Studies in Second Language Acquisition 19(1). 37–66.
Lyster, Roy, Kazuya Saito & Masatoshi Sato. 2013. Oral corrective feedback in second
language classrooms. Language Teaching 46(1). 1–40.
McCrocklin, Shannon. 2014. The potential of Automatic Speech Recognition for fostering
pronunciation learners’ autonomy. Ames: Iowa State University dissertation.
McCrocklin, Shannon. 2016. Pronunciation learner autonomy: The potential of Automatic
Speech Recognition. System 57. 25–42.
Morin, Regina. 2007. A neglected aspect of the standards: Preparing foreign language
Spanish teachers to teach pronunciation. Foreign Language Annals 40(2). 342–360.
Morton, Hazel & Mervyn Jack. 2010. Speech interactive computer-assisted language learning:
a cross-cultural evaluation. Computer Assisted Language Learning 23(4). 295–319.
Mroz, Aurore. 2018. Seeing how people hear you: French learners experiencing intelligibility
through automatic speech recognition. Foreign Language Annals 51(3). 617–637.
Mroz, Aurore. 2020. Aiming for advanced intelligibility and proficiency using mobile ASR.
Journal of Second Language Pronunciation 6(1). 12–38.
Mushangwe, Herbert. 2015. Using voice recognition software in learning of Chinese as a
foreign language pronunciation. The Journal of Language Teaching and Learning 5(1).
52–67.
Neri, Ambra, Catia Cucchiarini & Helmer Strik. 2002. Feedback in Computer Assisted
Pronunciation Training: when technology meets pedagogy. Proceedings of the 10th
International CALL Conference, 179–188. Antwerp: University of Antwerp.
Speech technologies and pronunciation training 311
Neri, Ambra, Ornella Mich, Matteo Gerosa & Diego Giuliani. 2008. The effectiveness of
computer assisted pronunciation training for foreign language learning by children.
Computer Assisted Language Learning 21(5). 393–408.
Nicholas, Howard, Patsy M. Lightbown & Nina Spada. 2001. Recasts as feedback to language
learners. Language Learning 51(4). 719–758.
Penning de Vries, Bart, Catia Cucchiarini, Stephen Bodnar, Helmer Strik & Roeland van Hout.
2014. Spoken grammar practice and feedback in an ASR-based CALL system. Computer
Assisted Language Learning 28(6). 550–576.
Ranta, Leila & Roy Lyster. 2007. A cognitive approach to improving immersion students’ oral
language abilities: The awareness-practice-feedback sequence. In Robert DeKeyser (ed.),
Practice in a Second Language: Perspectives from Applied Linguistics and Cognitive
Psychology, 141–160. New York: Cambridge University Press.
Ranta, Leila & Roy Lyster. 2018. Form-focused instruction. In Peter Garrett & Josep M. Vots
(eds.), The Routledge Handbook of Language Awareness, 40–56. New York: Routledge.
Saito, Kazuya. 2012. Effects of instruction on L2 pronunciation development: A synthesis of 15
quasi-experimental intervention studies. TESOL Quarterly 46(4). 842–854.
Saito, Kazuya. 2021. Effects of corrective feedback on second language pronunciation
development. In H. Nassaji & E. Kartchava (eds.), The Cambridge Handbook of Corrective
Feedback in Second Language Learning and Teaching, 407–428. Cambridge: Cambridge
University Press.
Saito, Kazuya & Roy Lyster. 2012a. Effects of form–focused instruction and corrective
feedback on L2 pronunciation development of /r/ by Japanese learners of English.
Language Learning 62(2). 595–633.
Saito, Kazuya & Roy Lyster. 2012b. Investigating the pedagogical potential of recasts for L2
vowel acquisition. TESOL Quarterly 46(2). 387–398.
Saito, Kazuya & Luke Plonsky. 2019. Effects of second language pronunciation teaching revisited:
A proposed measurement framework and meta-analysis. Language Learning 69(2).
652–708.
Schmidt, Richard. 1994. Deconstructing consciousness in search of useful definitions for
Applied Linguistics. Consciousness in second language learning 11. 237–326.
Schmidt, Richard. 1995. Attention and Awareness in Foreign Language Learning. Honolulu:
University of Hawaii at Manoa.
Seferoglu, Gölge. 2005. Improving students’ pronunciation through accent reduction software.
British Journal of Educational Technology 36(2). 303–316.
Solon, Megan. 2016. Do Learners Lighten Up? Phonetic and Allophonic Acquisition of
Spanish /l/ by English-Speaking Learners. Studies in Second Language Acquisition 39(4).
1–32.
Strik, Helmer, Jozef Colpaert, Joost van Doremalen & Catia Cucchiarini. 2012. The DISCO ASR-
based CALL system: Practicing L2 oral skills and beyond. Proceedings of the Conference
on International Language Resources and Evaluation (LREC 2012). 2702–2707.
Strik, Helmer, Khiet Phuong Truong, Febe de Wet & Catia Cucchiarini. 2009. Comparing
different approaches for automatic pronunciation error detection. Speech Communication
51(10). 845–852.
Thomson, Ron I. 2011. Computer Assisted Pronunciation Training: Targeting second language
vowel perception improves pronunciation. CALICO Journal 28(3). 744–765.
Thomson, Ron I. & Tracey M. Derwing. 2014. The effectiveness of L2 pronunciation instruction:
A narrative review. Applied Linguistics 36(3). 326–344.
312 Natallia Liakina, Denis Liakin
Trofimovitch, Pavel & Elizabeth Gatbonton. 2006. Repetition and Focus on Form in processing
L2 Spanish words: Implications for pronunciation instruction. The Modern Language
Journal 90(4). 519–535.
Tsai, Pi-hua. 2019. Beyond self-directed computer-assisted pronunciation learning: A
qualitative investigation of a collaborative approach. Computer Assisted Language
Learning 32(7). 713–744.
Tsutsui, Michio. 2004. Multimedia as a means to enhance feedback. Computer Assisted
Language Learning 17 (3–4).377–402. doi:10.1080/0958822042000319638 (accessed
16 June 2021).
Wang, Yi Hsuan & Shelley C. Young. 2014. A study of the design and implementation of the
ASR-based iCASL System with corrective feedback to facilitate English learning.
Educational Technology & Society 17(2). 219–233.
Wang, Yi Hsuan & Shelley C. Young. 2015. Effectiveness of feedback for enhancing English
pronunciation in an ASR‐based CALL system. Journal of Computer Assisted Learning 31(6).
493–504.
Wiggins, Grant. 2012. Seven keys for effective feedback. Feedback for Leaning 70(1). 10–16.
Yule, George, Maggie Powers & Doris Macdonald. 1992. The variable effects of some task-
based learning procedures on L2 communicative effectiveness. Language Learning 42(2).
249–277.
Part IV: Pronunciation in the laboratory: High
variability phonetic training
Ellen Simon, Bastien De Clercq, Pauline Degrave,
Quentin Decourcelle
On the robustness of high variability
phonetic training effects: A study on the
perception of non-native Dutch contrasts
by French-speaking learners
Abstract: There is growing evidence in the literature for the positive effect of
high variability phonetic training (HVPT) on the perception of non-native con-
trasts. In the present study, we aim to examine the robustness of perceptual
training effects. We define robustness along three dimensions: (1) the generaliz-
ability of the training to novel tokens and talkers, (2) the long-term retention
effects of the training, and (3) the effect of training in non-optimal listening
conditions, i.e., with noise added to the signal.
The participants are 48 adult L1 French learners of Dutch in Belgium, 27 of
whom are enrolled in secondary education, while the others are university stu-
dents (N=21). Participants are assigned to an experimental (N=27) or a control
group (N=21). Both groups take a pre-test, post-test and delayed post-test, which
consists of a lexical identification task with and without noise. The experimental
group is trained on five Dutch sound contrasts in five multimodal HVPT ses-
sions, consisting of perceptual identification tasks with feedback and metalin-
guistic information.
The results show a nuanced picture: overall, where training effects are
found, learners are able to generalize these to novel tokens and talkers, thus
confirming the effectiveness of HVPT for pronunciation training. However, the
results also reveal considerable variability in the effectiveness of HVPT along
most robustness variables, which can to a large extent be attributed to the mod-
erating variables we examined, being the type of learners (secondary education
vs. university) and the type of sound contrast.
Acknowledgements: We wish to thank Hubert Naets from the Research Centre CENTAL (UCLou-
vain) for his invaluable help with the development of the online environment for the experiment.
https://doi.org/10.1515/9783110736120-012
316 Ellen Simon et al.
1 Introduction
It is generally acknowledged that listening to a non-native language is difficult,
especially with respect to the perception of non-native contrasts which do not
occur in the native language (see for instance Williams and Escudero 2014 for an
overview). Driven by the large number of language learners who report having
difficulty with non-native speech perception, a productive research line has
emerged which addresses the impact of phonetic training on L2 perception.
These studies generally report positive effects of phonetic training, showing that
phonetic training can help improve learners’ non-native perception (see Lee,
Jang, and Plonsky 2015 and Sakai and Moorman 2018, for overviews of experi-
mental studies on this topic). A well-known training approach that has been re-
ported to be effective is called High Variability Phonetic Training (HVPT, see
Thomson 2018 for a detailed overview of training studies within this framework).
In this approach, first developed by Logan, Lively, and Pisoni (1991), learners are
exposed to auditory stimuli from multiple speakers and in multiple contexts
(e.g., target vowels flanked by different consonants). The idea is that the variabil-
ity in the realization of a particular phoneme, due to differences in, for instance,
vocal tract size, dialect and speaking rates will help the learner to build a more
robust phonological category. This will in turn enhance perception and word rec-
ognition across different contexts (Logan, Lively, and Pisoni 1991: 4–5).
HVPT was originally purely auditory-based, exposing learners to a large num-
ber of stimuli containing the target L2 sounds. However, it can be combined with
providing learners with explicit metalinguistic information (e.g., a comparison
with the native language) as well as with articulatory-based instruction. Articula-
tory-based instruction focuses on the position of the articulators during the pro-
duction of L2 vowels and consonants and compares it to the articulatory setting
during corresponding L1 sounds (Saito and Plonsky 2019). As Saito and Plonsky
(2019: 662) note, this approach makes use of visual materials, such as diagrams
and animations. The idea behind it is that learners base their sound representa-
tions on the articulatory gestures made for the production of the speech sounds.
In Best’s Perceptual Assimilation Model (PAM), listeners are hypothesized to “ex-
tract invariants about articulatory gestures from the speech signal, rather than
forming categories from acoustic-phonetic cues” (Best and Tyler 2007: 24). If
learners thus receive information on the position of tongue, lips and velum, this
will help them to build representations for L2 sounds. Hazan et al. (2005) report
on an HVPT study, in which they compared the effectiveness of auditory training
with that of audio-visual perceptual training on perception and production. L1
Japanese speakers were trained on the L2 English contrasts /v/-/b/-/p/ and /l/-/r/.
The results revealed that learners’ perception improved in both conditions, but
On the robustness of high variability phonetic training effects 317
Most studies using HVPT include novel tokens and tokens produced by novel
talkers in the posttest or set up a separate posttest called the ‘generalization test’.
An example of a study that successfully applied HVPT and reported generaliza-
tion to novel tokens and talkers is the study by Bradlow et al. (1997) on a percep-
tual training programme for L1 Japanese learners of L2 English. The training
focused only on the contrast between /r/ and /l/ and consisted of 45 sessions
over a period of 3–4 weeks. The results showed substantial gains in identifica-
tion, from 65% in the pretest to 81% in the posttest and a similarly high percent-
age in the generalization tests with novel words and a novel speaker.
A subset of training studies using HVPT also look at the long-term effects of
the training by including a delayed posttest or retention test several weeks or
months after the end of the training. An early study by Pisoni, Lively, and
Logan (1994) tested the long-term effects of training Japanese listeners on the
identification of English /r/ and /l/ using Logan, Lively and Pisoni’s (1991)
HVPT. They found that accuracy decreased only by 2% from the posttest at the
end of the training to a posttest three months later and no significant decrease
in accuracy was observed for the tests of generalization. After six months of
training, participants still obtained higher scores than at pretest level. Simi-
larly, Wang and Munro (2004) trained native speakers of Mandarin and Canton-
ese on three English vowel contrasts in a programme consisting of 2–3 training
sessions of 50–60 minutes per week over a period of two months. The pro-
gramme had a positive effect on learners’ performance on an identification task
in a posttest as well as in a retention test three months after the programme
had ended. Nishi and Kewley-Port (2007) also found a long-term retention effect
of perceptual training in native Japanese listeners trained on American English
vowels. Both listeners trained on nine vowels and listeners trained on a subset
318 Ellen Simon et al.
of three difficult vowels showed improved perception to novel tokens and novel
talkers in a generalization task and in a delayed posttest after three months. In
a study by Rato (2014), native speakers of European Portuguese were similarly
trained on six difficult English vowels in three HVPT training sessions. Partici-
pants who were trained on the vowel contrasts performed significantly better in
a posttest, including a generalization test with novel tokens and talkers (for
two of the three contrasts), and in a delayed posttest after two months. It
should be noted that not all studies report large gains. Aliaga-García and Mora
(2009), for instance, found only a small effect on L2 perception of a HVPT pro-
gramme targeting two consonant and two vowel contrasts in English which are
known to be problematic for native speakers of Catalan.
In sum, the results suggest that the HVPT paradigm can lead to changes in
listeners’ perception which are long-lasting (or at least still observable after six
months, as in Pisoni, Lively, and Logan’s 1994 study). A possible explanation
for the positive effects of training may be that the training triggers listeners to
subconsciously pay attention to the relevant acoustic cues which were under-
used before training took place, and that this newly acquired sensitivity to rele-
vant cues may be permanent.
It is well known that non-native perception is seriously challenged when the lis-
tening conditions are not optimal, as in the case of background noise or distor-
tions of the signal through, for instance, a bad telephone connection. Indeed, as
Cutler et al. (2004: 3668) point out: “As non-native listeners, we are all too famil-
iar with the phenomenon that listening to non-native language seems dispropor-
tionately difficult under disadvantageous listening conditions, such as against a
noisy background.” Research indeed confirms that non-native listeners have dif-
ficulty with speech recognition when noise has been added to the signal (see the
special issue edited by Garcia-Lecumberri, Cook, and Cutler (2010) on this topic).
Mattys et al. (2012) make a distinction between speech degradation with and
without energetic masking: the former occurs when there is physical overlap be-
tween the target signal and a nontarget signal, such as background noise. The
target signal itself, however, is intact. The latter occurs when speech is filtered,
for instance in the case of telephone transmission, when the lower frequencies
are not transmitted. A number of studies report on perception experiments which
have tried to mimic a context of listening in a noisy environment by adding noise
to the stimuli, thus creating speech degradation with energetic masking. An ex-
ample is the study by Lengeris and Nicolaidis (2015): they set up a programme to
On the robustness of high variability phonetic training effects 319
The notion of second language difficulty further accounts for the many ways in
which HVPT can affect the three core dimensions of the language acquisition
process, i.e. its route, rate and final level of attainment (Ellis 2015). Language
learning difficulty itself has been conceptualized as a multifaceted notion, as a
series of moderating variables influencing the effectiveness of any instructional
treatment. In their taxonomy of L2 difficulty, Housen and Simoens (2016) distin-
guish between learner-related, feature-related, and context-related difficulty.
Learner-related difficulty is described as the “encounter of language features
with the language learner’s individual capacities and abilities” (Housen and
Simoens 2016: 167). From the perspective of phonological acquisition, such diffi-
culty may for example arise as a function of a learner’s phonological awareness
(Anthony and Francis 2005), but also as a function of individual differences,
such as age of learning, motivation and learning styles or strategies (Archibald
2021; Dörnyei 2009; Moyer 1999). These factors are well known in the second lan-
guage literature, but have more rarely been examined in relation to the impact of
HVPT (see also Thomson 2018).
Feature-related difficulty, then, can refer to the inherent cognitive require-
ments posed by a language feature independent of the above-mentioned learner-
related features (Housen and Simoens 2016) or, in phonological terms, can also
be understood to refer to markedness or frequency in the input (Archibald 2021),
or to phonological features which pose inherent articulatory difficulties and may
for example arise late in L1 acquisition. Finally, context-related difficulty stems
320 Ellen Simon et al.
from learning conditions (e.g., instructed vs. naturalistic) (Housen and Simoens
2016) or the nature of phonological instruction (e.g., implicit vs. explicit, see Pel-
tekov 2020).
Crucially, language learning difficulty can also arise at the interface of
these three sources of difficulty. Such is the case when HVPT manipulates
input frequency through repeated exposure in an instructional setting. From a
contrastive perspective, the interface between language and learner-related fac-
tors can also be understood in terms of the relation between the learner’s L1
and L2 phonology, as formalized by Flege’s Speech Learning Model (Flege 1995;
Flege and Bohn 2021) and Best’s Perceptual Assimilation Model (Best 1994; Best
and Tyler 2007). These models start from the idea that listeners interpret non-
native sounds in terms of the phonetic categories of their native language. The
English contrast between /i/ and /ɪ/, for instance, is hard to perceive by native
speakers of (Brazilian and European) Portuguese, as they categorize both in
terms of their native category /i/, and word recognition of words differing in
these sounds, such as ‘sit’ versus ‘seat’, is often problematic (Lima Jr. this vol-
ume; Rato 2014). Similarly, native speakers of Dutch tend to have difficulty per-
ceiving the contrast between English /ɛ/ and /æ/, since Dutch has only one
vowel in that area of the acoustic vowel space, which is transcribed as /ɛ/, but
has various phonetic realizations depending on the regional accent (Escudero,
Simon, and Mitterer 2008). Given the different correspondences between native
and target language phonemes and the range of (spectral and durational)
acoustic cues which may signal contrasts, it is not unlikely that HVPT may be
more effective for some target features than for others.
On the basis of the literature reviewed above, we can conclude that, overall, pre-
vious studies examining the effect of high variability phonetic training on the
perception of difficult L2 contrasts suggest that training leads to gains in the per-
ception of target contrasts and that the learning that has taken place can be gen-
eralized to novel contexts and novel talkers. In addition, there is evidence that
these gains may last until well after the end of the training session. There is also
some evidence that phonetic training in optimal listening conditions may en-
hance the perception of L2 speech with background noise, though more research
on this context is needed. However, previous training studies have relied on con-
siderably different methodologies, using treatments of different (or unreported)
length or frequency and differing in the type of instruction and feedback that is
provided. Studies also often focus on one particular issue, such as the type of
On the robustness of high variability phonetic training effects 321
training, the long-term retention of the training effect or the addition of noise to
the stimuli. Moreover, a large number of studies include training on one L2 con-
trast only (see e.g., the studies on English /r/-/l/ discussed above).
In the present study, we aim to examine the robustness of HVPT by includ-
ing a number of variables that may affect the effects of training in one and the
same study design. Specifically, we examine (1) the generalizability of the per-
ceptual training to novel tokens and novel talkers, (2) the long-term retention
effects of the training, and (3) the effect of training in quiet on perception in a
noisy environment. All of these factors can provide us with information on the
robustness of the training effects: specifically, training effects are more robust
if they can be generalized to novel tokens and novel talkers; they are more ro-
bust if they have long-term effects; and finally, they are more robust if they ex-
tend to the perception of L2 sounds in adverse listening conditions, such as a
noisy background. By examining the effect of novel tokens and novel talkers,
the effect of leaving time between the training and a perception test and the
effect of the training on stimuli in quiet and with noise added to the signal, we
can contribute to the existing body of research on phonetic training by showing
the robustness of HVPT effects in the context of our study.
In addition, we explore the moderating role of learner- and target-related fac-
tors on the robustness of the training effects. The effect of learner profile is ex-
plored by comparing training effects in two groups of French-speaking learners
of Dutch. In contrast to the bulk of training studies which focus on L2 English
(Sakai and Moorman 2018), the current study is set in Belgium and focuses on L2
Dutch vowel and consonant contrasts which do not occur in the learners’ native
language (see Section 3.2 on the Method). In Belgium, French is the only official
language in the French-speaking part of Belgium, the Walloon region, whereas
in Flanders the official language is Dutch (Hamers and Blanc 2000). Most Wal-
loons do not hear or speak Dutch on a daily basis and the same holds true for the
Flemish, who generally do not use French in everyday life. The exception is Brus-
sels, which is officially bilingual and where there is individual bilingualism in
part of the population. As such, Dutch is taught as a foreign language in second-
ary and tertiary education in Wallonia. All schools in Wallonia have to offer at
least one foreign language from 5th grade onwards (pupils aged 10–11). They
have the choice between English, Dutch and German, and can also offer Spanish.
Most pupils in the first year of secondary school take English as their foreign lan-
guage, followed by Dutch and German (Mettewie 2021). In tertiary education,
Dutch language and literature programmes are offered to students majoring in
linguistics, applied linguistics and literature, as well as, in some universities, to
students majoring in other programmes (e.g. law or economics) but with a minor
in Dutch. In the current study, we explore the potential difference in training
322 Ellen Simon et al.
effects between two groups of Dutch language learners with different profiles: a
group of secondary school pupils for whom Dutch is a compulsory subject and a
group of university students enrolled in a Dutch language programme.
The effect of target feature is examined by including five contrasts in the
training. We have selected five Dutch contrasts which have been reported to be
difficult for native speakers of French (see Section 3.2 for details). As a result,
the analysis will allow us to compare the robustness of the training across tar-
get features.
In the next section (Section 2), we formulate the research questions and hy-
potheses, followed by information on the methodology (Section 3). The results
are presented and discussed in Sections 4 and 5, respectively.
RQ1: How robust are HVPT effects on the perception of Dutch contrasts for
French-speaking learners of Dutch in Belgium?
RQ1a: Do HVPT training effects extend to novel tokens and novel talkers?
RQ1b: Is there long-term retention of HVPT training on perceptual identifi-
cation, i.e., are benefits observable in a delayed posttest?
RQ1c: Do training effects of HVPT in quiet extend to phoneme identification
in adverse listening conditions?
RQ3: To what extent is the robustness of HVPT moderated by the type of lan-
guage feature?
On the basis of the literature reviewed in Section 1, we hypothesize that the re-
sults will lead to positive responses to RQ1a and RQ1b. In general, previous re-
search examining the effects of HVPT reports that learners benefit from the
training and that they can generalize the acquired knowledge of or sensitivity
to the use of relevant cues to novel tokens and novel talkers (e.g., Bradlow
et al. 1997; Nishi and Kewley-Port 2007; Pisoni, Lively, and Logan 1994). Since
the HVPT framework explicitly uses multiple talkers and multiple contexts in
order for learners to develop stable phonetic categories for L2 sounds, the gen-
eralizability of the training to novel contexts and talkers is expected within the
framework. Long-term retention of training was observed in earlier studies (a.o.
Nishi and Kewley-Port 2007; Wang and Munro 2004) and we predict to find it in
the current study as well, though we also note that long-term effects may de-
pend on the length and duration of the training sessions.
We also hypothesize a positive response to RQ1c, although research on the
effects of training in quiet on perception in noise is limited (but see Lengeris
and Nicolaidis 2015, discussed in Section 1). We predict that performance on
perceptual identification of stimuli in noise will be lower than of stimuli in
quiet, but that training will have an effect on both groups of stimuli.
The extent to which we can provide positive responses to RQ1a-c will pro-
vide us with insight into the overall robustness of the training, thereby answer-
ing the main research question (RQ1).
With respect to RQ2, we explore the question which type of learner benefits
most from phonetic training using two educational profiles: younger secondary
school pupils with lower intramural exposure to Dutch and an older group of stu-
dents enrolled in a Dutch programme at university. As factors such as proficiency,
age and language exposure have been underresearched as independent variables
in HVPT studies (see Thomson 2018), these two profiles represent an explorative
dimension of the study. A study by Alshangiti and Evans (2014) compared the ef-
fect of HVPT in Arabic learners of English with a higher or lower proficiency level
in English and found mixed results. They observed that high proficiency learners
benefited more from training than low proficiency learners on the perception of
speech in noise (measured through a verbal repetition task of stimuli presented in
noise), but that low proficiency learners showed greater improvement in vowel
identification (measured through a closed-set identification task). The authors hy-
pothesize that the greater improvement in vowel identification in the low profi-
ciency learners may be because these learners had more room to improve than the
high proficiency learners. On the basis of these findings, it is difficult to formulate
324 Ellen Simon et al.
hypotheses about the effects of proficiency level. Unlike Alshangiti and Evans’
(2014) study, our study does not include training in noise (only pre- and post-test
items were presented in noise, see 3.2.3) and we may hence hypothesize that the
lower proficiency learners in our study would benefit more from training than the
high proficiency learners. Alternatively, the fact that the university students have
chosen Dutch as one of their (minor or major) subjects may reflect a more positive
attitude towards Dutch compared to the secondary school pupils for whom Dutch
is a compulsory subject. As a result, we may also expect larger gains in L2 Dutch
perception for the university students compared to the secondary school pupils.
Since motivation or attitude was not measured and controlled for in our study,
this prediction will necessarily remain speculative.
Finally, in response to RQ3, we predict that the perceptual training may be
beneficial for the perceptual identification of all five contrasts. However, we
also predict that we will observe some differences in the level of difficulty of
the five contrasts in the pretest and that these differences may affect the magni-
tude of the training effects.
3 Methodology
3.1 Participants
The study was conducted in Wallonia, i.e., the French-speaking part of Bel-
gium, with a sample of 48 participants: 27 were pupils in a secondary school
and 21 were students at a university.
The secondary school pupils attended general education when being tested.
They were enrolled in the fourth, fifth or sixth year of secondary education. They
typically have four hours of Dutch classes a week and are able to understand
Dutch at an A2 level in the Common European Framework of Reference for Lan-
guages (CEFR). The university students were recruited in a class of Dutch lan-
guage and grammar given at the UCLouvain, a French-speaking university in the
French-speaking part of Belgium. They were 1st or 2nd year students enrolled in
an introductory Dutch proficiency course with a B1 entry requirement, either as
part of a programme of Linguistics and Literature, or as part of an optional Dutch
module in a Law major. They had a minimum of nine hours of Dutch-spoken
classes per week, including courses on Dutch language proficiency, grammar
and literature.
To select the participants, an initial sample of secondary pupils (N=31) and
university students (N=33) completed a background questionnaire enquiring into
On the robustness of high variability phonetic training effects 325
personal information (age, gender, study orientation, place of birth and of resi-
dence, nationality, hearing problems) and their language background (mother
tongue(s), exposure to Dutch, knowledge of other languages). They also com-
pleted the listening comprehension component of Dialang (Lancaster University
n.d.). From this sample, we selected only French-speaking participants who did
not have Dutch as one of their mother tongues and who reported no hearing
problems. The selected participants were divided into two groups of similar size,
i.e., a training group or a control group, each with the same proportion of univer-
sity and secondary school participants. Since some students dropped out of uni-
versity, changed schools or study programmes or did not take part in some parts
of the experiment due to internet connectivity issues, the final sample consists of
27 respondents in the training group (15 secondary pupils and 12 university stu-
dents) and 21 participants in the control group (12 secondary pupils and 9 univer-
sity students). The respondents were not paid for their participation. Table 1
gives an overview of some general characteristics of the different groups.
N
The Dutch learning experience of the samples of secondary pupils and of uni-
versity students can be contrasted at several levels.
First, the secondary pupils show a lower mean Dutch level than the univer-
sity students. The results obtained for the listening comprehension part of the
326 Ellen Simon et al.
Dialang test (Lancaster University n.d.) situate the secondary pupils between A1
and B1 (C1 for one pupil) and the university students between B2 and C2. Mean
self-reported proficiency levels are also lower for the secondary pupils (control =
2.25, s.d. = 0.83; training = 2.06, s.d. = 1.05) than for the university students (con-
trol = 2.62, s.d. = 0.92; training = 3.08, s.d. = 0.99). The self-reported proficiency
was measured by means of a five-point Likert-scale, one standing for very low and
five very high.
Secondly, their exposure to Dutch is different: the university students take
a weekly minimum of nine hours of Dutch classes with a B1 entry requirement,
whereas the secondary pupils are taught four hours of Dutch per week with an
A2 entry requirement. Concerning their contact with Dutch outside school or
university, the selected pupils did not frequently engage in extracurricular L2
Dutch activities, such as watching Dutch-spoken television or media. For in-
stance, only one pupil watched Dutch-spoken television at least once a week. A
vast majority of the pupils reported to engage in these activities less than once
a month. In contrast, the majority of the university respondents watch Dutch
films or television more often (at least once a week).
Thirdly, even though data on the participants’ motivation and attitudes
were not gathered, the university students, who have chosen to study Dutch as
a major or minor at university, are more likely to have a higher motivation and a
more positive attitude towards learning Dutch than secondary pupils whose mo-
tivation is often low in French-speaking schools in Wallonia (Mettewie 2015).
3.2 Materials
Five target contrasts were selected, including five vowels and three consonants:
(1) /i/ vs /ɪ/, (2) /ɑ/ vs /aː/, (3) /ə/ vs ø, (4) /x/ vs /k/, (5) /h/ vs ø. The contrasts
were selected on the basis of a handbook for French learners of Dutch which
discusses the most problematic Dutch sounds for native speakers of French
(Hiligsmann and Rasier 2007).
The contrasts /i/-/ɪ/, /ɑ/-/aː/ and /x/-/k/ are difficult to perceive for native
speakers of French, as French has only one of the two members of the contrast,
respectively /i/, /a/ and /k/. Typically, French learners of Dutch produce both
members of the Dutch contrast as the closest French sound, thereby failing to
distinguish between minimal pairs such as zit /ɪ/ -ziet /i/ (‘sits’-‘sees’), man /ɑ/
On the robustness of high variability phonetic training effects 327
-maan /aː/ (‘man’-‘moon’) and lag /x/ -lak /k/ (‘laugh’-‘varnish’). Dutch /h/
does not have a direct counterpart in French and, in contrast to the previously
mentioned contrasts, is often not realised by French learners of Dutch (or real-
ised as a glottal stop). These learners would then fail to make a contrast be-
tween the members of minimal pairs such as hals /h/ -als ø (‘neck’-‘if’). Finally,
while the central vowel /ə/ exists in French, it is generally silent in word-final
position (with the exception of monosyllabic function words such as je, le or
ne). In Dutch, by contrast, the presence of the vowel plays an important gram-
matical role in the formation of the past tense (hij werkt-hij werkte; ‘he works’,
‘he worked’) and the declension of adjectives (een oud huis-het oude huis; ‘an
old house’-‘the old house’).
For each contrast, eight monosyllabic Dutch minimal pairs were selected
for the pretest and training.1 Half of this list was also used for the posttests,
alongside four additionally recorded monosyllabic minimal pairs for each con-
trast (see below).
Stimuli were recorded by six native speakers of Dutch (three female and
three male), two of which were used for the posttests only (see below). All were
working in the Dutch section of an Applied Linguistics department at Ghent
University, a Dutch-speaking university in Flanders. They all used Dutch on a
daily basis in a professional setting, including in teaching. They were raised
monolingually in childhood, though later in life they had all learnt additional
languages, including English, French and German. One speaker used both
Dutch and French at home at the time of the recording. They grew up in East-
Flanders (N=3), West-Flanders (N=2) or Flemish Brabant (N=1), but all spoke
Standard Dutch without a detectable regional accent. Their ages ranged from
30 to 62 (M=41). The speakers were instructed to read the list of stimuli in the
carrier phrase Ik heb X gezegd: ‘I have said X’. They were asked to read at a
comfortable pace using a normal, falling intonation pattern, to repeat the sen-
tence whenever they hesitated and to take as many breaks as they needed. The
recordings took about 30 minutes per person. They were made in a sound-
attenuated booth with the audio software Reaper, using a Renkforce CU-4
microphone (4 speakers) or in a quiet room with a Marantz solid state re-
corder PMD620 and a Sony ECM-MS907 microphone (2 speakers).
Due to the nature of the contrast, the /ə/ vs ø pairs always involved one monosyllabic stimu-
lus and one disyllabic stimulus (e.g. hij maakt /maːkt/- hij maakte /maːktə/; ‘he makes’-‘he
made’).
328 Ellen Simon et al.
3.2.2 Training
Five training sessions were developed around the five target contrasts. The first
four training sessions were divided into two parts focusing on different contrasts
and with an optional break between the two parts. As shown in Table 2, sessions
3B, 4 and 5 include repetitions of previously presented contrasts, although with
different training materials (text, stimuli and visuals) to present the same infor-
mation. In the fifth and final session all contrasts were briefly repeated, so that
by the end of the training each contrast had been included in three training ses-
sions. Table 2 presents the target contrasts in each of the training sessions.
A. /ɪ/-/i/
B. /ɑ/-/aː/
A. /x/-/k/
B. /h/- ø
A. /ə/- ø
B. Repetition vowels: /ɪ/-/i/, /ɑ/-/aː/
5. Audiovisual input: Two videos showing the lower part of the face of a
speaker, illustrating mouth/lip movements during articulation.
6. Articulatory information: Information on the articulatory setting, explained
in words and accompanied by auditory example stimuli, waveforms (e.g., to
illustrate length differences) or cross-sections illustrating tongue position.
7. HVPT: Forced-choice identification task with feedback.
In addition, for two contrasts, /x/-/k/ and /h/- ø, information on spelling was
provided. This was done for /x/-/k/, because in Dutch both consonants can be
presented by multiple graphemes, namely <g> and <ch> for /x/ and <c> and <k>
for /k/. For /h/, it was mentioned that in Dutch, /h/ is always produced when it
is present in spelling (as <h>), which contrasts with silent <h> in French. For
the other contrasts, Dutch spelling transparently corresponds to each pho-
neme’s pronunciation in the target stimuli, so additional spelling information
was not deemed necessary.
Each session or session part ended with a forced-choice identification task
with feedback. Participants were told that they would hear different native
speakers pronounce Dutch words, corresponding to minimal pairs. After a
sound file was played, participants were instructed to click on the correspond-
ing word from a minimal pair, with a green check or a red cross appearing
after, respectively, a correct and an incorrect response. Upon selecting an incor-
rect response, the stimulus was played again and participants had to select the
correct answer in order to proceed to the following stimulus (Figure 1). Select-
ing the correct button at that point does not necessarily imply that participants
had perceived the contrast, but participants did receive visual feedback and ad-
ditional audio-exposure to the stimulus. The stimuli in the training sessions,
including the forced-choice identification tasks, were all produced by the same
four native speakers of Dutch.
knap knaap
Figure 1: Illustration of forced-choice identification task (‘Click on the word that you heard’).
330 Ellen Simon et al.
3.2.3 Pretest
At the time of the pretest, all participants completed an informed consent form
and an online background questionnaire, enquiring into their language learner
profile and their exposure to and knowledge of Dutch. As mentioned above, at
this stage participants also completed the listening comprehension component
of Dialang (Lancaster University n.d.).
Next, participants completed an online auditory forced-choice identification
task, which was designed to be similar to the HVPT component of the training.
As in the training, participants were instructed that they would hear different na-
tive speakers pronounce Dutch words, corresponding to minimal pairs. However,
during the pretest, participants did not receive feedback and immediately pro-
ceeded to the following stimulus upon selecting their answer.
The identification task consisted of two parts. In the first part, 160 stimuli
were presented in random order without noise, with a break after 80 stimuli. In
the second part, the same 160 stimuli were presented in random order with
noise, half with signal-to-noise ratio 0 (SNR 0) and half with SNR 8, with a
break after 80 stimuli. Noise was applied to the stimuli in Praat (Boersma and
Weenink 2019) using a script by McCloy (2013). In each part, each target feature
of the 5 target contrasts appeared 16 times and was produced by 4 native speak-
ers in a balanced design. Initial instructions were presented in French, but
switched to Dutch once the first part of the identification task started. The order
in which the minimal pairs were presented in the response categories was kept
constant. The stimuli used in the identification task were the same as those
used in the training, albeit with the addition of the two noise conditions.
Each part was preceded by a short training phase, designed to accustom
the participants to the testing format using a contrast (/y/ vs /u/) which did not
feature in the test itself and which is phonemic in both French and Dutch.
3.3 Procedure
To facilitate data collection during the COVID-19 pandemic, all parts of the
study were developed as an online website which participants could access
from home. Participants completed each part individually on a smartphone,
tablet or computer, using headphones. During their Dutch class, participants
completed the consent form and filled in the background questionnaire. One
week later, the participants were asked to complete the listening component of
Dialang. Participants were subsequently selected for further participation in the
study and assigned to the control or training groups.
Next, participants started the experimental phase, consisting of the pretest,
the five trainings, the posttest and the delayed posttest. The secondary pupils
completed the pretest and the first training during their Dutch class. The other
parts (training and posttests) were completed online at home, as the study coin-
cided with the COVID-19 pandemic. This also forced the second training to be
postponed by one week. At this point in time, secondary education had transi-
tioned to remote teaching, so that the subsequent training sessions and the post-
tests were completed at home. The university students, who attended online
classes from the beginning of the academic year, completed the entire experi-
mental phase at home, after their Dutch grammar class. Teaching staff and re-
searchers were available online in case of problems.
The pretest, trainings and first posttest were completed between October 2020
and December 2020. There was an interval of three to four days between each
training, as well as between training 5 and posttest 1. Posttest 2 was administered
one month after posttest 1. The pretest and posttests lasted about 15 minutes each,
and the training sessions each had a duration of about 20 minutes.
Data were only included for further statistical analysis if coming from partici-
pants who had taken part in all pre- and posttests and, for the training group,
had completed all 5 training sessions. Some participants experienced technical
difficulties which forced them to restart the pre- or posttests, or to stop a test
prematurely. In these cases, each participant’s responses were reviewed individ-
ually and complete datasets were prioritised. For example, if a participant re-
started a test after 20 trials and proceeded to fully complete the test afterwards,
the first attempt was discarded and the second was maintained. Considering the
high number of observations per participant, incomplete attempts were also in-
cluded as long as they included more than two thirds of the observations. In
332 Ellen Simon et al.
total, there were an average of 317.71 out of 320 observations per participant, per
test (standard deviation = 10.95).
The data from the lexical identification task were analysed using a mixed-
effects logistic regression model under the generalized linear mixed models
framework, using the lme4 (Douglas et al. 2015) package in R (R Core Team
2020). The following explanatory variables were maintained for inclusion in the
statistical model:
– Participant ID (random factor)
– Timing: pretest, posttest 1, posttest 2
– Group: control, training
– Noise: NA, SNR 8, SNR 0
– New stimulus: no, yes
– New speaker: no, yes
– Profile: secondary school, university
– Target contrast: /ɑ/ vs /aː/, /x/ vs /k/, /h/ vs ø, /i/ vs /ɪ/, /ə/ vs ø
Independent variables were dummy coded, with the exception of the Target
contrast variable, which was effects coded.
The model presented in Section 4.2 was built on the basis of the theoretical
predictions of the study and includes Participant ID as a random factor, Timing,
Group, Participant profile, Target contrast and Noise as main effects and a set
number of two-, three- and four-way interactions. The variables New stimulus
and New speaker were only introduced in three-way interactions with Timing
and Group, since new stimuli or speakers were only introduced in the posttests
(see Table 3).
Due to the high number of variables and interactions, our discussion will
focus on those coefficients that are statistically significant. Higher-order inter-
actions exploring the moderating effects of learner profile and target contrast
will be discussed first, whereas lower-order interactions will only be explored
where higher-order interactions are not statistically significant.
Post-hoc comparisons for fixed effects were carried out using the emmeans
package (Russell 2020), with pairwise or control vs. treatment comparisons for
significant main effects or interactions. Post-hoc tests were always carried out
for pretest vs. posttest 1 and pretest vs. posttest 2, with the exception of the var-
iables New stimulus and New speaker, for which no pretest data were available.
For these variables, new stimuli or speakers were always compared to familiar
stimuli or speakers, for posttests 1 and 2 separately. P-values were adjusted for
multiple comparisons using emmeans’s dunnettx method.
4 Results
4.1 Descriptive statistics
The different numbers of responses per category in Tables 4 and 5 are the result of the con-
nectivity issues described in Section 3.4.
334 Ellen Simon et al.
training group, with the strongest gains observed for the high-noise condition
(posttest 1 = .057; posttest 2 = .047) and the lowest gains for the low-noise con-
dition (posttest 1 = .025; posttest 2 = 0.011). Differences between pre- and post-
tests never exceed .015 in the control group, with the only exception being an
increase in scores of .035 from pretest to posttest 1 in the high noise condition.
As a whole, these results suggest that training effects can be observed in both
posttests and can be extended to new stimuli and new speakers. The presence of
noise impacts participant performance in general, but the training group still
seems to benefit from instruction in all three noise conditions.
Timing only
Control . . . . . .
Training . . . . . .
New speaker
No Control . . . . . .
Training . . . . . .
Yes Control N/A N/A N/A . . . .
Training N/A N/A N/A . . . .
New stimulus
No Control . . . . . .
Training . . . . . .
Yes Control N/A N/A N/A . . . .
Training N/A N/A N/A . . . .
Noise
No noise Control . . . . . .
Training . . . . . .
SNR Control . . . . . .
Training . . . . . .
SNR Control . . . . . .
Training . . . . . .
On the robustness of high variability phonetic training effects 335
A closer look at the moderating variables Profile and Target contrast in rela-
tion to Timing and Group (Table 5) reveals much more limited and less stable
gains in the training group for the secondary school pupils (posttest 1 = .022;
posttest 2 = .006) than for the university participants (posttest 1 = .056; posttest
2 = .055). The control group shows negligible increases for both learner profiles,
with the exception of posttest 1 for the university participants, where the in-
crease of .022 is similar to that of the secondary school training group.
Finally, a comparison of the various target contrasts reveals important dif-
ferences, with contrasts such as /i/ vs /ɪ/ (control = .623; training = .602) being
perceived much closer to chance level than contrasts such as /ə/ vs ø (control =
.855; training = .858) in the pretest. Gains between pre- and posttests are simi-
larly variable. The strongest gains in the training group are observed for /i/ vs /ɪ/
(posttest 1 = .081; posttest 2 = .060), for which the pretest scores were also the
lowest. At the same time, other contrasts for which the training group ob-
tained lower scores in the pretest did not see similar gains. For example, gains
for /ɑ/ vs /aː/, for which the second lowest scores were observed in the pretest,
were limited to .018 (posttest 1) and .009 (posttest 2). In contrast, some sounds
which caused few problems for the training group in the pretest still saw more
important gains. While participants in the training group identified the difference
between /x/ vs /k/ relatively well in the pretest, their performance still increased
comparatively substantially at posttest 1 (.041) and posttest 2 (.044).
Profile
Secondary Control . . . . . .
school Training . . . . . .
University Control . . . . . .
Training . . . . . .
Target contrast
/ɑ/ vs /aː/ Control . . . . . .
Training . . . . . .
/x/ vs /k/ Control . . . . . .
Training . . . . . .
336 Ellen Simon et al.
Table 5 (continued)
/h/ vs ø Control . . . . . .
Training . . . . . .
/i/ vs /ɪ/ Control . . . . . .
Training . . . . . .
/ə/ vs ø Control . . . . . .
Training . . . . . .
The full output of the statistical analysis can be obtained by contacting the authors.
On the robustness of high variability phonetic training effects 337
4.2.3 Noise
As for the training’s effects across different levels of noise, a statistically signifi-
cant effect was found for the interaction between Profile, Timing, Group and
Noise. More specifically, post-hoc tests revealed that, at university level only, the
training group’s scores increased in the no-noise and high-noise conditions from
pretest to posttest 1 (No noise: Est. = .722, z = 6.385, p < .001; SNR 0: Est. = . 456,
z = 3.685, p = .005) and from pretest to posttest 2 (No noise: Est. = .683, z = 6.096,
p < .001; SNR 0: Est. = .517, z= 4.125, p < .001), while the university students’ per-
formance was not statistically significant between pre- and posttests for the SNR
8 condition (p > .05). No statistically significant training effects were observed for
the control group or the secondary school participants in any of the noise condi-
tions (p > .05).
For our second moderator effect, Target contrast, the interaction between Tar-
get contrast, Timing, Group and Noise was revealed to be statistically significant.
Post-hoc tests will be discussed per target contrast. For the contrast /ɑ/ vs /aː/, no
statistically significant gains were observed from the pretest to the posttest in any
of the noise conditions for the control and training groups (p > .05). Training ef-
fects for /x/ vs /k/ were limited to the training group in the highest noise condition
only, both for pretest versus posttest 1 (Est. = .628, z = 3.884, p = .005) and for
pretest versus posttest 2 (Est. = .613, z = 3.774, p = .008).
338 Ellen Simon et al.
5 Discussion
In this study, we set out to investigate how robust HVPT effects are on the per-
ception of Dutch contrasts for French-speaking learners of Dutch in Belgium
and to what extent the robustness of HVPT is moderated by learner-related fac-
tors and by the type of language feature. Robustness was measured along three
dimensions: (1) the generalizability of training effects to novel tokens and novel
talkers, (2) the duration of the impact of training, and (3) the effect of training
on listening in non-optimal conditions, i.e., with background noise. The results
show a nuanced picture: they reveal considerable variability in the effective-
ness of HVPT along most robustness variables, which can to a large extent be
attributed to the moderating variables that were investigated. We briefly dis-
cuss the most important findings that lead to this conclusion.
First, contrary to our hypothesis, training effects were observed for some con-
trasts only. In other words, the target contrast itself was revealed to play an impor-
tant role in the effectiveness of HVPT, as gains were not observed for every single
contrast, nor did gains manifest themselves in the same way when improvements
were observed. As noted in the Introduction, the earliest studies on HVPT focused
on only one consonant contrast (e.g. Bradlow et al. 1997; Pisoni, Lively, and Logan
1994). Studies that did look into more contrasts also found different results for spe-
cific contrasts (e.g. Rato 2014, who found generalization effects for four out of six
vowels). Interestingly, in the present study, we observed variability both for con-
trasts on which participants scored relatively low in the pretest and those which
posed fewer problems in the pretest. For instance, the lowest pretest scores were
observed for /ɑ/ vs /aː/ and for /i/ vs /ɪ/, but participants’ performance only in-
creased in a statistically significant way for the /i/ vs /ɪ/ contrast. We do not readily
On the robustness of high variability phonetic training effects 339
have an explanation for this. Further research into the exact perceptual mapping
of Dutch vowels onto French ones by L2 learners may be needed to help explain
this result. Participants were also often able to improve their performance on those
contrasts where pretest scores were relatively high (e.g. /x/ vs /k/ and /h/ vs /ø/),
albeit not for every noise condition.
Secondly, we only found training effects in the university group, but not in
the secondary school group. Since in most training studies the participants are
university students (e.g. Bradlow et al. 1997; Nishi and Kewley-Port 2007; Wang
and Munro 2004), we cannot readily compare this observation with most earlier
studies. One exception is Shinohara and Iverson’s (2021) study, which com-
pared perceptual training effects of English /l/-/r/ in Japanese adults, adoles-
cents and children. In contrast to the current study, their results revealed
higher gains from perceptual training in the adolescent group than in the adult
group. Crucially, their study included a wider variety of training tasks targeting
a single contrast only. In the current study, perhaps the lack of clear training
effects in the secondary school group may point at a lack of robustness of HVPT
across learner profiles, but it may also be the conditions of the training, which
was organized entirely online, that have impacted the participants’ perfor-
mance, especially for the secondary school pupils (see below). However, for
those contrasts where gains were observed, these generally applied to novel
and familiar stimuli (with the exception of two contrasts) and to novel and fa-
miliar speakers alike, as predicted in our hypotheses. This is in line with earlier
studies using the HVPT design, which generally report generalization across
stimuli and talkers. In general, the strongest training effects were observed for
the university group, across all three robustness indicators (generalizability to
novel stimuli and novel speakers, long-term effects and effects in quiet and
high-noise conditions), with the exception of the low-noise condition. This sug-
gests that, to the extent that HVPT effects can be observed, they appear to be
quite robust.
Thirdly, the generalizability of training effects to different noise conditions
was revealed to be one of the main sources of variability in the data, with training
effects for some target contrasts, such as /ə/ vs ø, only being observed in high-
noise conditions. This implies that, while the high pretest scores for this contrast
may not have characterized this as a high-priority target for pronunciation train-
ing, learners still experienced benefits in more adverse listening conditions.
In sum, the results suggest that considering training effectiveness from
multiple angles yields a nuanced picture of its robustness. However, the results
should also be interpreted with caution. One limitation of the study was that it
was carried out in a context where the control over experimental conditions
was limited. As participants took part in the study at home, it was impossible to
340 Ellen Simon et al.
verify, for instance, the absence of background noise or whether they followed
the instructions closely (e.g. wore headphones). This may not be problematic,
as the results can be taken to be representative of a particular type of online,
individual training, but it is important to bear in mind when interpreting the
results. In addition, we realized that the materials used in the HVPT training
would have been more familiar to the university students than to the secondary
school pupils. From their linguistics courses, the university students would
have been familiar with explanations on vowel duration, waveforms or cross-
sections of the articulators. The reliance on metalinguistic knowledge typically
used in HVTP studies, which as noted tend to focus on the university popula-
tion, may thus help to explain the absence of gains in the secondary school
group.
As a final point in the discussion, leading to suggestions for future re-
search, we would like to bring up the issue of the ecological validity of HVPT
training studies, including the study we are reporting on in this chapter. Two
properties of the current study in particular raise doubts about the usefulness
of the training outside of scientific research.
First, training effects, even when statistically significant, were generally
limited, and a number of contrasts posed relatively few problems to the learn-
ers. The focus on the perception of isolated words may help to explain the high
scores for some contrasts and more natural contexts and production tasks may
paint a different picture. This relates to the more general question of how large
gains should be for a training to be useful in a real learning context. Conceiv-
ably, the answer to that question is not straightforward, but depends on the
pronunciation targets set by learners and teachers, as well as on the amount of
time available to the learners.
Secondly, the participants undoubtedly experienced weariness from extended
remote learning, which may have impacted their motivation when participating in
the study. For some participants, this may have been compounded by the repeti-
tiveness of parts of the training and tests (esp. the forced-choice tasks and the noise
conditions), but also by occasional technical difficulties, which were reported more
often in the secondary school group. While an exploration of the effects of partici-
pant motivation on HVPT effectiveness (esp. if pronunciation instruction is to be
integrated in regular foreign language classrooms) is undoubtedly an interesting av-
enue of research to pursue, it also means the results of the present study need to be
interpreted with necessary caution.
We therefore echo Wang and Munro (2004) when we point out that we do
not claim to have developed a pronunciation training programme that is suit-
able for Dutch as a Foreign Language teachers, even if the design of the train-
ing was directly inspired by existing pronunciation manuals for L2 Dutch. As
On the robustness of high variability phonetic training effects 341
Wang and Munro (2004: 551) note, the development of such a software training
package would require collaboration with pedagogical specialists as well as
with technical experts on, for instance, user-friendly interfaces. Future projects
may well aim at such collaborations, given the limited materials that are now
available for Dutch pronunciation training for French learners.
6 Conclusion
On the whole, the current study makes a nuanced contribution to a growing
body of evidence revealing HVPT to be an effective paradigm for pronunciation
training. The focus on a variety of robustness indicators suggests that, where
training effects are found, learners are able to generalize these to new contexts
and new speakers.
Since one of the main factors found to affect the success of the training was
the participants’ profile, here operationalized mainly in terms of age and educa-
tional level, future research ought to consider how the design of HVPT can be
tailored to different learner profiles, including a focus on younger learners. In-
structed foreign language learning often starts in secondary schools, when
learners are aged between 12 and 18, or earlier. As such, it would be worthwhile
for future pronunciation training studies to shift the focus from university stu-
dents to younger children and adolescents.
Importantly, this study also set out to explore the effectiveness of pronuncia-
tion training in a hitherto underexplored setting, namely Dutch as a second lan-
guage in a French-speaking community. By focusing on a foreign-language-
learning context which is very widespread in French-speaking Belgium and to a
significant extent embedded in the secondary school curriculum, the study also
contributes empirical evidence which is hoped to support the development of
pedagogical materials in Dutch language education in French-speaking Belgium.
References
Aliaga-García, Cristina & Joan C. Mora. 2009. Assessing the effects of phonetic training on L2
sound perception and production. In Michael A. Watkins, Andreia S. Rauber & Barbara O.
Baptista (eds.), Recent Research in Second Language Phonetics/Phonology: Perception
and Production, 2–31. Newcastle upon Tyne, UK: Cambridge Scholars Publishing.
342 Ellen Simon et al.
Alshangiti, Wafaa & Bronwen G. Evans. 2014. Investigating the domain-specificity of phonetic
training for second-language learning: Comparing the effects of production and
perception training on the acquisition of English vowels by Arabic learners of English.
In: Fuchs, Susanne, Martine Grice, Anne Hermes, Leonardo Lancia & Doris Mücke (eds.),
Proceedings of the 10th International Seminar on Speech Production Cologne, Germany,
5–8 May 2014. https://www.researchgate.net/publication/262635724 (accessed
09 June 2021).
Anthony, Jason L. & David J. Francis. 2005. Development of phonological awareness. Current
Directions in Psychological Science 14(5). 255–259. https://doi.org/10.1111/j.0963-
7214.2005.00376.x (accessed 14 June 2021).
Archibald, John. 2021. Ease and difficulty in L2 phonology: A mini-review. Frontiers in
Communication 6(18). 626529.
Best, Catherine. T. 1994. The emergence of native-language phonological influences in
infants: A perceptual assimilation model. In Judith C. Goodman & Howard C. Nusbaum
(eds.), The Development of Speech Perception: The Transition from Speech Sounds to
Spoken Words, 167–224. Cambridge, MA: The MIT Press.
Best, Catherine. T. & Michael D. Tyler. 2007. Nonnative and second-language speech
perception: Commonalities and complementarities. In Murray J. Munro & Ocke-Schwen
Bohn (eds.), Language Experience in Second Language Speech Learning: In Honor of
James Emil Flege, 13–34. Amsterdam: John Benjamins.
Boersma, Paul & David Weenink. 2019. Praat: doing phonetics by computer [Computer
program]. Version 6.1, retrieved from http://www.praat.org/ (accessed 14 June 2021).
Bradlow, Ann R., David B. Pisoni, Reiko Akahane-Yamada & Yoh’ichi Tohkura. 1997. Training
Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning
on speech production. Journal of the Acoustical Society of America 101(4). 2299–2310.
Cutler, Anne, Andrea Weber, Roel Smits & Nicole Cooper. 2004. Patterns of English phoneme
confusions by native and non-native listeners. Journal of the Acoustical Society of
America 116(6). 3668–3678.
Dörnyei, Zoltán. 2009. Individual differences: Interplay of learner characteristics and learning
environment. Language learning 59(s1). 230–248.
Douglas Bates, Martin Maechler, Ben Bolker & Steve Walker. 2015. Fitting linear mixed-effects
models using lme4. Journal of Statistical Software 67(1). 1–48. doi:10.18637/jss.v067.i01.
Ellis, Rod. 2015. Understanding Second Language Acquisition, 2nd edn. Oxford: Oxford
University Press.
Escudero, Paola, Ellen Simon & Holger Mitterer. 2008. The perception of English front vowels
by North Holland and Flemish listeners: Acoustic similarity predicts and explains cross-
linguistic and L2 perception, Journal of Phonetics 40(2). 280–288.
Flege, James. E. 1995. Second-language speech learning: Theory, findings, and problems. In
Winifred Strange (eds.), Speech Perception and Linguistic Experience: Issues in Cross-
language Research, 229–273. Timonium, MD: York Press.
Flege, James E. & Ocke-Schwen Bohn. 2021. The revised Speech Learning Model (SLM-r). In
Ratree Wayland (ed.), Second Language Speech Learning: Theoretical and Empirical
Progress, 3–83. Cambridge: Cambridge University Press.
Garcia-Lecumberri, Maria Luisia, Martin Cook & Anne Cutler. 2010. Non-native speech
perception in adverse conditions: a review. Speech Communication 52(11). 864–886.
Hamers, Josiane F. & Michel A. H. Blanc. 2000. Bilinguality and Bilingualism. Cambridge:
Cambridge University Press.
On the robustness of high variability phonetic training effects 343
Hazan, Valerie, Anke Sennema, Midori Iba & Andrew Faulkner. 2005. Effect of audiovisual
perceptual training on the perception and production of consonants by Japanese learners
of English. Speech Communication 47(3). 360–378.
Hiligsmann, Philippe & Laurent Rasier. 2007. Uitspraakleer Nederlands voor Franstaligen
[Dutch pronunciation for French speakers]. Waterloo: Wolters Plantyn.
Housen, Alex & Hannelore Simoens. 2016. Introduction: Cognitive perspectives on difficulty
and complexity in L2 acquisition. Studies in Second Language Acquisition 38(2). 163–175.
Lancaster University. (n.d.) Dialang. https://dialangweb.lancaster.ac.uk (accessed
25 May 2021).
Lee, Junkyu, Juhyun Jang & Luke Plonsky. 2015. The effectiveness of second language
pronunciation instruction: A meta-analysis. Applied Linguistics 36(3). 345–366.
Lengeris, Angelos & Katerina Nicolaidis. 2015. Effect of phonetic training on the perception of
English consonants by Greek speakers in quiet and noise. Proceedings of Meetings on
Acoustics (POMA), 22. 060002.
Leong, Christine Xiang Ru, Jessica M. Price, Nicola J. Pitchford & Walter J. van Heuven. 2018.
High variability phonetic training in adaptive adverse conditions is rapid, effective, and
sustained. PloS one 13 (10).e0204888. https://doi.org/10.1371/journal.pone.0204888
(accessed 04 June 2021).
Lima Jr., Ronaldo. 2019. A dynamic account of the development of English (L2) vowels by
Brazilian learners through communicative teaching and through explicit instruction, see
Chapter 6, this volume.
Logan, John S., Scott E. Lively & David B. Pisoni. 1991. Training Japanese listeners to identify
English /r/ and /l/: A first report. Journal of the Acoustical Society of America 89(2).
874–886.
Mattys, Sven, Matthew H. Davis, Ann R. Bradlow & Sophie K. Scott. 2012. Speech recognition
in adverse conditions: A review. Language and Cognitive Processes 27(7/8). 953–978.
McCloy, Daniel. 2013. Mix speech with noise [Praat script]. https://github.com/drammock
(accessed 07 June 2021).
Mettewie, Laurence. 2015. Apprendre la langue de “l’Autre” en Belgique: la dimension
affective. Le Langage et l’Homme 50(2). 23–42.
Mettewie, Laurence. 2021. Wordt Nederlands een verplicht vak in Wallonië? [Will Dutch
become a compulsory subject in Wallonia?]. Neerlandia 124(1). 30–31. https://www.anv.
nl/tijdschrift/inhoudsopgaven/2020-1/wordt-nederlands-een-verplicht-vak-in-wallonie/
(accessed 04 June 2021).
Moyer, Alene. 1999. Ultimate attainment in L2 phonology: The critical factors of age,
motivation, and instruction. Studies in Second Language Acquisition 21(1). 81–108.
Nishi, Kanae & Diane Kewley-Port. 2007. Training Japanese listeners to perceive American
English vowels: Influence of training sets. Journal of Speech, Language and Hearing
Research 50(6). 1496–1509.
Peltekov, Peter. 2020. The effectiveness of implicit and explicit instruction on German L2
learners’ pronunciation. Die Unterrichtspraxis/Teaching German 53(1). 1–22.
Pisoni, David B., Scott E. Lively & John S. Logan. 1994. Perceptual learning of nonnative
speech contrasts: Implications for theories of speech perception. In Judith C. Goodman &
Howard C. Nusbaum (eds.), The Development of Speech Perception: The Transition from
Speech Sounds to Spoken Words, 121–166. Cambridge: The MIT Press.
344 Ellen Simon et al.
Rato, Anabela. 2014. Effects of perceptual training on the identification of English vowels by
native speakers of European Portuguese. Concordia Working Papers in Applied
Linguistics 5. 529–546.
R Core Team. 2020. R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. https://www.R-project.org/ (accessed
07 June 2021).
Russell, V. Lenth. 2020. emmeans: Estimated Marginal Means, aka Least-Squares Means. R
package version 1.5.3. https://CRAN.R-project.org/package=emmeans (accessed
07 June 2021).
Saito, Kazuya & Luke Plonsky. 2019. Effects of second language pronunciation teaching
revisited: A proposed measurement framework and meta-analysis. Language Learning
69(3). 652–708.
Sakai, Mari & Colleen Moorman. 2018. Can perception training improve the production
of second language phonemes? A meta-analytic review of 25 years of perception training
research. Applied Psycholinguistics 39(1). 187–224.
Shinohara, Yasuaki & Paul Iverson. 2021. The effect of age on English /r/-/l/ perceptual
training outcomes for Japanese speakers. Journal of Phonetics 89. 101108. https://doi.
org/10.1016/j.wocn.2021.101108 (accessed 02 May 2022).
Thomson, Ron I. 2018. High Variability [Pronunciation] Training (HVPT): A proven technique
about which every language teacher and learner ought to know. Journal of Second
Language Pronunciation 4(2). 208–231.
Wang, Xinchun & Murray J. Munro. 2004. Computer-based training for learning English vowel
contrasts System 32(4). 539–552.
Williams, Daniel & Paola Escudero. 2014. Native and non-native speech perception. Acoustics
Australia 42(2). 79–83.
Pollianna Milan, Denise Cristina Kluge
Effects of perceptual training
in the perception and production of
heterotonics by Brazilian learners of Spanish
Abstract: In this longitudinal study, we investigated the effectiveness of percep-
tual training in the perception and production of heterotonics by Brazilian
learners of Spanish. Twenty-six participants were divided into four groups:
those with less academic exposure to Spanish, called the basic group, divided
between the ‘basic’ group with and without training; and those who had more
academic exposure to Spanish, called ‘intermediate’, divided between interme-
diate group with and without training. All participants took a pre-test, post-
test, generalization test and delayed post-test (between 42 and 58 days after the
training sessions), for both perception and production. Those who did not train
were compared to those who trained in order to find out if those who trained
had an improvement in their performance of the tests when compared to those
who did not take training. In all the perception tests, the participants had to
identify the stressed syllable of the heterotonic words and of the distractors.
The results showed positive effect of perceptual training on the perception and
production of heterotonics by the group that trained and had less academic ex-
perience. As this study follows the principles of Complex Systems, there was a
concern regarding the type of analysis for the results which should include not
only intergroups comparisons, but also individual comparison of the partici-
pants. In the individual analyzes, we concluded that the learners who benefited
most from training were those with lower academic experience and more diffi-
culties at the beginning of the study, i.e., in the pre-tests.
https://doi.org/10.1515/9783110736120-013
346 Pollianna Milan, Denise Cristina Kluge
1 Introduction
The main objective of this longitudinal1 study is to investigate the effects of per-
ceptual training on the development of heterotonics2 of Spanish by Brazilian
speakers. In addition, we seek to analyze if such effects will occur both in per-
ception and in production and whether or not they will be long-lasting. To our
knowledge, there is no previous research on perceptual training of phonetic/
phonological features of Spanish with Brazilian learners, especially regarding
stress assignment, i.e., at the suprasegmental level. For this reason, this is an
original study intended to provide new insights into the field. We believe that a
language is developed according to the principles of Complex Systems; there-
fore, we propose a new approach to the analysis of the results, i.e., on a more
individual basis.
As perceptual training is based on learning a language through use and,
consequently, through repetition, we begin this chapter by pondering over a
quote by Morin (1990: 112): “Combine the cause and the effect, and the effect
will return to the cause, by retroaction, and the product will also be the pro-
ducer.” [“Juntai a causa e o efeito, e o efeito voltará sobre a causa, por retroa-
ção, o produto será também o produtor.”] This statement summarizes, in our
point of view, how a foreign language is used: when exposed to an unfamiliar
target language, learners acquire the status of producers of that language, and
are no longer mere observers (i.e., a product of it). In other words, they become
individuals who will use such language to communicate. The cause (learning a
language) and the effect (using the language learned) blend and complement
each other, because every time individuals use a language, they also develop it,
in a cyclical cause-effect and effect-cause relationship. Thus, depending on
how individuals use a language, categories are stored in their cognitive system;
this way, they can be reused whenever necessary (Bybee 2010). This also means
that the more this target language is used, the stronger this cause-and-effect
and effect-and-cause relationship becomes.
Another point to ponder is that a language is developed in particular con-
texts that cannot be overlooked. Consequently, there is a dynamic interaction
that involves adaptations (also at a personal level) in the teaching-learning pro-
cess, as postulated by Larsen-Freeman and Cameron (2008: 34): “Every change
We consider this a longitudinal study due not only the training sessions, but all the tests
involved, which were administered over approximately four months.
Heterotonics are words from two similar languages, with similar or identical spelling, but
stress on a different syllable.
Effects of perceptual training in the perception and production of heterotonics 347
The term ‘input’ is used by linguists to designate learners’ exposure to the language that
they intend to develop. Research on foreign language development has been attempting to ex-
plain how learners process the input that they receive (Rast 2011). However, according to Ellis
(1985), it is not every input that is processed by speakers, either because they did not under-
stand part of it or because they did not pay attention to it.
348 Pollianna Milan, Denise Cristina Kluge
2 Methodology
Our study used a corpus of 115 heterotonic words,4 i.e., those that, in a compari-
son between two languages (in this case, between Brazilian Portuguese5 and
When preparing the list of heterotonic words arising from the stress contrast between Brazil-
ian Portuguese and Spanish, we found 155 examples; however, the corpus was left with 115
items because 40 of them had to be discarded. One of the reasons was the fact that some of the
words could be pronounced in more than one way, which means that one of these pronuncia-
tions was the same as in Portuguese – for example, the word penalty, whose stress in Spanish
can be assigned to either one of two syllables: pénalti or penalti (the first case of stress assign-
ment also occurs in Brazilian Portuguese). For the list of heterotonics that were discarded and
the respective reasons, see Milan (2019) (the stressed syllable of each word was underlined for
easier identification).
In this chapter, all mentions of Portuguese refer exclusively to Brazilian Portuguese.
Effects of perceptual training in the perception and production of heterotonics 349
Spanish), differ in the position of the stressed syllable. In some cases, for exam-
ple, the word is paroxytonic in Portuguese (as in atmosfera6) but proparoxytonic
(as in atmósfera) in Spanish (atmosphere in English). There are, however, a large
number of words that fall under the following rule: both words are paroxytonic
and end in /ia/. The difference is that in Portuguese the vowel sequence /i-a/ is in
different syllables, forming a hiatus to which stress is assigned, whereas in Span-
ish, this vowel sequence is a diphthong, and stress falls on the previous syllable,
for example, in the Spanish words a-ne-mia [anaemia], bi-ga-mia [bigamy], fo-bia
[phobia] and or-to-pe-dia [orthopedic]. An exception to this group is the word po-lí-cia
[police], a paroxyton in Portuguese that ends in a diphthong, and po-li-cí-a, a parox-
yton in Spanish that ends in hiatus. The 115 heterotonics were distributed across the
tests (pre-test, post-test, delayed post-test and generalization) and the two training
sessions. The tests also had 30 distracting words, selected among those that are usu-
ally well-known by speakers of Spanish as a second language, since they are words
used in everyday life (also in Portuguese) and that frequently appear in textbooks,
such as cultura [culture] and salida [exit].
The corpus for the perception tests was recorded by eight speakers (a charac-
teristic of training methods with high variability) whose mother tongue was
Spanish: four of them were Mexican and their phrases were used in the pre-test,
post-test and delayed post-test, and in the two training sessions. Another four
speakers (two Hondurans and two Cubans) were recorded and their phrases were
used in the perception generalization test. The recordings with the speakers con-
sisted of reading aloud phrases that were displayed on a computer screen, e.g.,
“Yo dije atmósfera” [I said atmosphere]. To facilitate editing, we chose to insert
the words that we needed to create the perceptual tests in the carrier sentence
“Yo dije ______” [I said ______]. After the test items had been recorded by the
speakers, the perception tests were created in the TP software7 (Rauber et al.
2013) and validated by four speakers of Spanish as their mother tongue (other
than the speakers who had recorded the words) to find out if there was any mis-
take in the creation of the tests before they were administered.
All the learners who participated in this study spoke Brazilian Portuguese as
their mother tongue and studied Spanish as a foreign language. They were en-
rolled in an undergraduate degree in Spanish at the Federal University of Paraná,
and attended classes on a regular basis. The 26 participants attended two differ-
ent undergraduate courses: (i) 17 of them were attending the course ‘Spanish
All stressed syllables of the example words were underlined for easier identification during
reading.
The TP software, which was developed for the design and application of perceptual training
and testing, is available free of charge at <www.worken.com.br>.
350 Pollianna Milan, Denise Cristina Kluge
Language 1ʹ; they had had 90-hour exposure to Spanish at the beginning of the
tests and had 180-hour exposure by the end of data collection. This group was
referred to as the basic group; (ii) nine of them were attending the course ‘Spanish
Language 3ʹ and had had 270-hour exposure to Spanish at the beginning of the tests
and had 360-hour exposure by the end of data collection. This second group was
referred to as the intermediate group. The 14 informants (10 from the basic group
and four from the intermediate group) who participated in the perceptual training
sessions were randomly selected. Therefore, there were four groups: a basic group
with training (10 participants); a basic group without training (seven participants);
an intermediate group with training (four participants); and an intermediate group
without training (five participants). All informants signed an informed consent form
in which they confirmed their acceptance to participate in the research. They were
aware of the fact that there would be no financial compensation8 and that they
would not be identified. This whole research is focused on Spanish word stress;
therefore, we checked whether the participants had had classes on Spanish word
stress placement before and/or during data collection. All learners had an expository
lesson and did exercises on Spanish word stress assignment, which means that they
were expected to know how to pronounce heterotonics. In addition, the participants
that had been having classes for a longer time, i.e., those from the intermediate
group, had been taught the stress of 58 heterotonics that were used in the tests. In
other words, this group was more familiar with such heterotonics.
2.1 Procedures
Our perceptual training study followed the standard of testing that is common to
this type of research. Before starting the training sessions, all informants took the
production pre-test and then the perception pre-test, which respectively assessed
the pronunciation and the perception of heterotonics. The pre-tests were used to
assess whether the participants knew, produced and perceived the heterotonics
adequately before our intervention with the study, and also to keep track, from
the beginning, of the heterotonics whose stressed syllable they failed to produce
and/or perceive according to expectations.
Next, some of the informants (14 of them) underwent the two perceptual train-
ing sessions (that will be explained in this section) for the purpose of comparison
to the other 12 informants who had not received perceptual training. The following
In Brazil, researchers are not allowed to give any financial contribution to people who par-
ticipate in scientific/academic research.
Effects of perceptual training in the perception and production of heterotonics 351
step was to replicate the same tests (pre-tests) after training, which is the reason
why they are called post-tests (for both production and perception). Together with
the post-tests, the production and perception generalization tests were adminis-
tered to all participants (in which new heterotonics appeared, not yet seen in the
other tests and in the training sessions). Finally, between 42 and 58 days9 after the
last perceptual training session, the production and perception delayed post-tests
(identical to the pre-tests) were administered to find out if the informants had re-
tained, in the long term, what they may have learned in the training sessions. The
study was conducted10 between August 23 and November 24, 2017. In the analysis
of results, the data collected in each of the tests were compared.
For the production tests (the pre-test, the post-test and the delayed post-test
were the same), the informants were taken individually to a soundproof room for
recording, on scheduled days. They read carrier sentences inserted in Power
Point slides and displayed on a computer screen. These sentences contained het-
erotonics and distractors and had the same format as the phrases read by the
speakers of the perception tests (which were, then, edited because in the percep-
tion tests only the target words were used). In total, each participant read 40 het-
erotonics and 20 distractors inserted in carrier sentences in each test.
After each production test, the participants took11 the respective perception
test. On the day the perception tests were administered, the entire class was
taken to the computer lab for the test and each participant took the test on an
individual computer. The perception tests (the pre-test, post-test and delayed
post-test were identical) contained the same 40 heterotonics and the same 20
distractors spoken in the production tests. The difference was that instead of
pronouncing the sentences, the participants listened (on the TP software) to the
four Mexican speakers pronouncing the target words in isolation, and they
were expected to click on the stressed syllable in each word that they heard.12
Data collection took place on different days because not all groups of students were able to
take the tests on the same weeks, as recess and exams had already been scheduled on the
university calendar.
The research calendar can be seen in detail in Milan (2019).
On average, the perception tests were carried out at 20 days after their respective produc-
tion tests.
Before starting the perceptual tests, the participants were instructed to answer the ques-
tion “what is the number of the strong (stressed) syllable of the word that you heard?”. There
were four answer options, from button one to button four. After that, the participants were
given an example: if they heard the word ‘árboles’ [trees], they should mentally divide that
word into syllables, ár-bo-les, and then click on the button corresponding to the stressed sylla-
ble. They should bear in mind that, for this test, syllables had to be counted from front to
back, that is, the first syllable was ‘ár’, the second was ‘bo’ and the third was ‘les’.
352 Pollianna Milan, Denise Cristina Kluge
The participants were told that they could hear each word 10 times before an-
swering by clicking on the ‘repeat’ button.
The two training sessions (administered after the pre-tests) took place on
different days and for only a part of the participants. The participants who did
not do the training sessions receive non-related input. The training sets were
composed of 56 heterotonics that had not been included in any of the produc-
tion and perception tests: 29 were used in the first training session and 27, in
the second. Precisely because the sets were created for training purposes, they
contained no distractors. The two sessions were also set up in the TP software
but, unlike the perceptual tests, the training sessions provided an answer (im-
mediate feedback) to each choice of stressed syllable made by the participants.
This means that the software pointed out whether the chosen answer was cor-
rect or incorrect. Whenever it was incorrect, the software clearly indicated the
mistake and immediately showed what the correct syllable was. The partici-
pants were supposed to hear the stimulus again and then click on the correct
answer, as pointed out by the software, so that they could move on to the next
stimulus. The generalization tests (which contained 19 new heterotonics and 10
new distractors) were administered together with the post-tests, i.e., the pro-
duction generalization test was randomly combined with the production post-
test. The same situation was administered to the perception generalization test
(in this case, with new speakers, two Hondurans and two Cubans).
The four groups are: those with less academic exposure to Spanish, called the ‘basic’
group, divided between the basic group with and without training; and those who had more
academic exposure to Spanish, called ‘intermediate’, divided between intermediate group
with and without training.
Effects of perceptual training in the perception and production of heterotonics 353
The data were analyzed using non-parametric statistical tests, with significance (p≤ 0.05);
in addition, for each group, we showed the average percentage of correct answers and the
value of the standard deviation. For the Post Hoc Tests, we applied the Bonferroni correction of
p≤ 0.008. Importantly, we only reported the values of the tests that were statistically
significant.
Every participant whose rate of correct answers was the same as the average rate of the
group, and also those whose rate was 10% above or below that value, were considered to be
within the average. The others were considered to be outliers.
354 Pollianna Milan, Denise Cristina Kluge
The other two people who judged the words produced in the tests and that raised doubt
were two Hispanic speakers: a Madrid native who was also a linguist and a Guatemalan who
was a post-graduate student at the Federal University of Paraná.
Effects of perceptual training in the perception and production of heterotonics 355
Figure 1 shows the data (in percentage of correct answers and the respective
standard deviation) resulting from the three production tests (pre-test, post-test
and delayed post-test) of the four groups of informants. In the pre-test, the two
intermediate groups performed better than the two basic groups, showing that
they had knowledge about the pronunciation of heterotonics before the training
sessions, as expected. This is because, in addition to being at a more advanced
level of academic exposure to Spanish, they had had classes on heterotonics, as
explained in the Methodology section. However, the difference in the rate of cor-
rect answers was only significant17 between the two groups that had had no
training: the rate of 30% of correct answers in the basic group without training
was significantly lower than the rate of 78% of correct answers in the intermedi-
ate group without training. In inferential terms, the other groups did not differ in
the percentage of correct answers, although we found that there was a difference
that needs to be considered throughout this research, since the basic group that
underwent training answered 47% of the items correctly in the pre-test while the
intermediate group that received training had a rate of 69% of correct answers.
Basic with training Basic without training Intermediate with training Intermediate without training
Figure 1: Inter-group percentage of correct answers and standard deviation (SD) in the three
production tests. Source: The authors (2021).
In the post-test (central columns of Figure 1), the two intermediate groups
(even the one that did not participate in the training sessions) were equally cor-
rect in 90% of the productions, although there was greater variability in the
group that had undergone training (standard deviation of 12%) in comparison
to the group that had not had training (standard deviation of 9%). There were
significant differences in the correct answers of the four groups;18 however, the
Post Mann Whitney Hoc test did not show where they occurred. Based on the
The value of the Kruskal Wallis test was χ2 = 10.63, p= 0.014. The value of the Mann Whit-
ney test was U= 0.00, p = 0.004 (with Bonferroni correction, the significance being considered
was p≤ 0.008).
The value of the Kruskal Wallis test was χ2 = 10.85, p= 0.013.
356 Pollianna Milan, Denise Cristina Kluge
analysis of the level of significance of the Post Hoc Test, we can affirm that
there was a tendency for the p-value to approach significance when the basic
group without training, which had 55% of correct answers in the post-test, was
compared to the other three groups. This result indicates that this group may
have had significantly fewer correct answers than the other three.
The percentage of correct answers of the groups in the delayed post-test (last
four columns of Figure 1) followed the trend of the post-test, with a small im-
provement in the percentage of correct answers for all groups. This result shows
that, if heterotonics were developed and/or improved during training, such
knowledge was retained in the long term.19 This time, the rate of correct answers
of the basic group with training (89%) was significantly20 higher than that of the
basic group without training (58%). The statistical results indicate a possible pos-
itive effect of perceptual training for the group of participants at the basic level.
Next, we will analyze the performance of the groups as regards perception. The
four groups fared better in perception than in production. In the perception pre-
test, the basic group without training had the lowest percentage of correct an-
swers (65%), as shown in the first four columns of Figure 2. Still, it was a high
percentage when compared to the correct answers in production, since this same
group had correctly answered less than half of the items in the production pre-
test (30%). When comparing correct answers between the four groups in the per-
ception pre-test, the Kruskal-Wallis test pointed out that there were no significant
differences. However, it should be noted that, in descriptive terms, there was a
15% difference in the rate of correct answers between the two basic groups.
Basic with training Basic without training Intermediate with training Intermediate without training
88% 87% 85% 91% 93% 86% 95% 94%
80%
SD 20%
65% SD 15% SD 15% SD 14% 60% SD 13% SD 7% SD 14% 61% SD 8% SD 9%
SD 20% SD 25% SD 25%
Figure 2: Inter-group percentage of correct answers and standard deviation (SD) in the three
perception tests. Source: The authors (2021).
In the perception post-test, the rate of correct answers remained high and simi-
lar to that of the pre-test. However, the basic group without training decreased
the percentage of correct answers from 65% in the pre-test to 60% in the post-
test. In the comparison of the four groups, in the post-test, there was also no
significance in the number of correct answers among the groups.
The correct answers in the perception delayed post-test followed the trend
of the other tests, i.e., it was easier for the learners to perceive the stressed syl-
lables of the heterotonics than to produce them properly. Although there are
significant differences21 in the percentage of correct answers among the four
groups in the delayed post-test, we could not find where this difference oc-
curred when using the Mann Whitney’s Post Hoc test. However, once again,
there was a tendency for the p-value to approach significance when the basic
group without training (61% of correct answers) was compared to the other
three groups whose rate of correct answers was above 80%.
In the comparisons among the four groups of this study, both in the produc-
tion and in the perception of heterotonics, the group that showed more difficulty
was the basic group without training. When comparing the basic group with
training to the two intermediate groups, the former always had a lower percent-
age of correct answers than the others, which shows that having more academic
exposure to Spanish, in addition to explicit exposure on heterotonics, were as-
pects that interfered in this study. To further investigate the topic, we looked at
the performance of each group in the tests and particularly in the generalization
test, in the intra-group analysis. In addition, after discussing the intra-group
analysis for both production and perception, we will focus on the individual
analysis, highlighting the participants who always scored above or below the
group average as well as the individual percentage variation among the tests.
We regrouped the data to observe the performance of each group in each of the
three production tests (pre-test, post-test and delayed post-test); for this compar-
ison, we added the data from the production generalization test. Figure 3 shows
that the informants from the four groups, even those who did not participate in
the training sessions, improved from one test to the next, which may also sug-
gest that exposure to the tests alone has favored this outcome.
The basic group with training correctly produced 47% of the heterotonics
before the training sessions. After having been trained, they increased the rate
of correct answers to 81% in the post-test. Also, there was a lower rate of
Basic with training Basic without training Intermediate with training Intermediate without training
Figure 3: Intra-group percentage of correct answers and standard deviation (SD) in the four
production tests. Source: The authors (2021).
The value of the Friedman test was χ2 = 19.15, p= 0.000. The value of Wilcoxon’s Post Hoc
test was Z= −2.80, p= 0.005 in the comparison between pre-test and post-test; Z= −2.53,
p= 0.011 in the comparison between the post-test and the delayed post-test, and Z= −2.80,
p= 0.005 in the comparison between the pre-test and the delayed post-test.
The value of the Friedman test was χ2 = 27.00, p= 0.000. The value of Wilcoxon’s Post
Hoc test was Z= −2.80, p= 0.005 in the comparison between post-test and generalization test;
Z= −2.81, p= 0.005 in the comparison between the delayed post-test and the generalization test.
The value of the Friedman test was χ2 = 8.07, p= 0.018.
Effects of perceptual training in the perception and production of heterotonics 359
where they occurred. However, when the rate of correct answers of the pre-test
was compared with that of the other two tests, the p-value tended to approach
significance, which may indicate that they actually answered fewer items cor-
rectly in the pre-test than in the others. Notably, this group scored less than the
average of the other three groups in this study: their rate in the delayed post-
test was 58%. In the generalization test, this group scored 36%, an outcome
that was more similar to that of the pre-test than of the post-test and the de-
layed post-test. However, the p-value was not significant and was similar in all
comparisons with the generalization test.
The intermediate group with training also showed an increase in the percent-
age of correct answers from the pre-test (69%) to the post-test (90%) and the
delayed post-test (96%). Although the value of correct answers among the three
tests is significantly25 different, Wilcoxon’s Post Hoc did not show where they oc-
curred. We found that the p-value approached significance more closely when the
pre-test was compared to the other two tests. Thus, descriptively, it can be stated
that this group had fewer correct answers in the pre-test and showed a small dif-
ference (only 6%) in performance from the post-test to the delayed post-test. We
will take the opportunity to report, as has been frequent in the observation of the
results, that a limitation of this research is the small number of participants (for
the intermediate group with training; in particular, there were only four). For this
reason, the statistical tests often did not point out where the difference was in the
percentage of correct answers. This also occurred in the generalization test, in
which the rate of 82% of correct answers is significantly26 different from the rates
of the other three tests; however, Wilcoxon’s Post Hoc did not show where these
differences were. Descriptively, we can affirm that the rate of correct answers in
the generalization test was more similar to that of the post-test and the delayed
post-test, which may indicate a positive generalization of the intermediate group
with training for the new heterotonics.
The intermediate group without training, represented in the last three col-
umns of Figure 3, also showed improvements from one test to the other. Al-
though the rate of correct answers was significantly27 different among the three
tests, Wilcoxon’s Post Hoc did not show where they were. This also happened
when the generalization test was compared to the other three tests. Descrip-
tively, the p-value tended to approach significance when the rate of correct an-
swers in the pre-test (78%) was compared to those of the post-test (90%), the
delayed post-test (94%) and also the generalization (83%). This finding shows
that there has been an improvement in subsequent tests. However, this group,
as well as the intermediate one with training, participated in this study with
considerable previous knowledge of the production of heterotonics. Before we
present the intra-group results of perception, we will report how each individ-
ual fared in the production of heterotonics.
Figure 4 shows the percentage of correct answers for each of the 26 participants in
this study in the pre-test, post-test and delayed post-test. It should be noted that: (i)
numbers 1 to 10 represent the informants of the basic group with training; (ii) 11 to
17, the basic group without training; (iii) 18 to 21, the intermediate group with train-
ing; (iv) and 22 to 26, the participants in the intermediate group without training.
90%
70%
50%
30%
10%
Participants 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Figure 4: Percentage of correct answers by each participant in the three production tests.
Source: The authors (2021).
What this figure visually shows is that there was, especially in the pre-test (ligh-
ter gray line), a high variability of responses from individuals who belonged to
the same group. This result confirms the hypothesis that although these inform-
ants had been placed in the same class of Spanish as a foreign language at uni-
versity, according to their level of knowledge, they performed differently. When
looking at the correct answers of participants 1 and 2, for example, we found that
the first one, in the pre-test, had a rate of 20% while the second had a rate of
68% in that same test. This discrepancy in the percentage of individual correct
answers in the same group was also found in the intermediate levels. In the pre-
test, informant 19 answered 95% of the items correctly, while informant 21, from
the same group, had a rate of 30%. Importantly, Figure 4 also shows the down-
ward curve of correct answers that is formed when we observe numbers 11 to 17,
which refers to the basic group without training. This group, as shown in the
Effects of perceptual training in the perception and production of heterotonics 361
results, has always had fewer correct answers than the other three groups. How-
ever, that does not mean that individual performance was always lower, but on
the contrary. Some informants in the basic group without training performed
more similarly to more experienced learners. For example, informant 17 had 90%
of correct answers, i.e., a score that approached the rate of correct answers of
both the basic group with training and the two intermediate groups. In summary,
what we found was that individuals 4, 9 and 17 always scored above the average
of their group and that individuals 5, 10, 16, 21 and 26 always scored below the
average of their respective groups in the production tests. Next, we will discuss
how the learners fared in the percentage variation in production.
For the sake of space, we will only report the results of the three informants
who most increased their rate of correct answers between the two tests in ques-
tion, namely learners 5 (467%), 1 (290%) and 10 (289%); and of the three learn-
ers who had the lowest rate of increase among the same tests: informants 23
(6%), 19 (5%) and 16 (−17%). This is why this proposal for analysis is particu-
larly interesting; participant 5, who was always below the average of the basic
group with training, in the individual analysis, was precisely the one who im-
proved the most by increasing the percentage of correct answers from the pre-
test to the delayed post-test (the number of correct answers increased by 467%,
i.e., the number of appropriate responses increased from 15% to 85%). This
same situation occurred with informants 10 and 21, who had, respectively,
289% and 183% of percentage variation between the same tests and who, in the
previous individual analysis, were participants who tended to lower the aver-
age of their group. This indicates that these learners were the ones that most
benefited from perceptual training, despite their difficulty in the tests that pre-
ceded the training itself.
On the other hand, the same cannot be said about informant 16, who be-
longs to the basic group without training. This participant always scored
right below the group average and was one of the participants with the low-
est rates of correct answers in all tests and in all groups. This may highlight
some factors; for example, the fact that this informant did not participate in
the training session (unlike individuals 5, 10 and 21); therefore, the produc-
tion tests may not have made sense to this person. Individuals 23 and 24, for
example, who did not participate in the training sessions, are also among
those who had less increase in the percentage of correct answers (6% and
8%, respectively). In their case, they had already respectively given 90% and
93% of adequate responses in the pre-test of production, i.e., they had nearly
answered the whole test correctly. Therefore, there was a small margin for
greater variation.
362 Pollianna Milan, Denise Cristina Kluge
Basic with training Basic without training Intermediate with training Intermediate without training
Figure 5: Intra-group percentage of correct answers and standard deviation (SD) in the four
perception tests. Source: The authors (2021).
For the first time in this study, when comparing the tests, there was a re-
duction in the percentage of correct answers between the pre-test and the post-
test for one of the groups (the basic group without training): it dropped from
65% to 60%, and then, in the delayed post-test, it increased by only one per-
centage point, as shown in Figure 5. These correct answers were not signifi-
cantly different; however, there were differences33 when comparing the correct
answers of these three tests to the generalization test (63%). Nonetheless, the
Post Hoc test did not show where they occurred.
The intermediate group with training did not present significant differences
in correct answers in the three perception tests, but when adding the generaliza-
tion in the comparisons, there were significant differences34 in the rate of correct
answers in the four tests, although, again, we could not determine where they
occurred. Therefore, it is likely that they occurred precisely between the generali-
zation test (95%), the post-test (91%) and the pre-test (88%), since the delayed
post-test had an average rate of correct answers that was equal to that of the gen-
eralization test. If we analyze it in this way, we can affirm that the training ses-
sions were positive since this group had better performance in the delayed post-
test and in the generalization test than in the pre-test and the post-test.
The intermediate group without training had a similar outcome to that of
the intermediate group with training. No differences were found in correct an-
swers among the three main tests, but in comparison to the generalization test,
there were inferential differences,35 although the Post Hoc test did not show
where they occurred. The p-value tended to approach significance when the
generalization test was compared to the other three tests. Such result can indi-
cate, just by looking at the descriptive data, that the rate of correct answers for
generalization (91%) was significantly higher than that of the pre-test, but
lower than that of the post-test (93%) and the delayed post-test (94%). Al-
though this group had not been trained, it had improved performance in the
perception tests, similarly to the group at the same level of academic exposure
that had been trained. Such outcome may suggest that only the exposure to the
tests (without the training sessions) has already helped them understand the
stressed syllables of Spanish in the heterotonics. Below are the perception re-
sults of each informant.
110%
90%
answers
70%
50%
30%
10%
Participants 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Figure 6: Percentage of correct answers by each participant in the three perception tests.
Source: The authors (2021).
In the individual performance of perception, in the basic group with training, in-
formants 2, 6 and 9 always had a rate of correct answers above that of the group
average, but only the rate of informant 9 was above average both in production
and in perception. Informants 2 and 6, who did not score above average in pro-
duction, had an easier time in perception, which shows that those who perceive
heterotonics will not always produce them properly. The opposite also hap-
pened: informant 1, whose rate of correct answers was below the average in per-
ception, followed the average of the group in production, i.e., perceiving a
phonological aspect inappropriately does not always result in inappropriate pro-
duction. In summary, for perception, the informants who scored above the aver-
ages of their groups were 2, 6, 9, 11 and 17, while those who scored below the
average were 1, 16 and 21.
Effects of perceptual training in the perception and production of heterotonics 365
In the analysis of percentage variation, the three informants who most in-
creased the rate of correct answers between the pre-test and the delayed post-test
for perception were 1 (74%), 22 (45%) and 10 (34%). And those whose percentage
variation increased that least were 13 (−7%), 3 (−9%) and 15 (−56%). Notably, par-
ticipants 1, 16 and 21 were the ones whose rate of correct answers was always
below the average of their respective groups in perception, when we analyzed per-
centage variation. However, two of them were among those that most increased
the percentage of correct answers from the pre-test to the delayed post-test. Partic-
ipant 1, by the way, was the one who benefited the most in perception, in-
creasing the percentage of correct answers between the tests in question by
74%. And as this informant was part of the basic group with training, we can
see in the results for perception that this learner benefited from the training
sessions. The same happened to learner 21, who ranked fourth among those
who benefited the most, as the number of correct answers between pre-test
and retention increased by 26%.
Participant 16, on the other hand, who was from the basic group without
training and had an average rate of correct answers below that of the group,
was not among those who most increased the percentage of correct answers, as
there was an increase by only 3% from the pre-test to the delayed post-test.
That is, without training and only with exposure to tests, this learner barely
evolved from one test to the next. The participants who least increased the per-
centage of correct answers from the pre-test to delayed post-test fit into one of
three situations: (i) those who already had a high average of correct answers
and, therefore, could benefit from the study only to a small extent, which is the
case of participants 2, 6, 9, 11, 17, 20 and 24; (ii) the ones who trained (partici-
pant 3) but who did not benefit from the training session, because they had a
smaller number of correct answers at the end than at the beginning of the
study; (iii) and participants such as 13 and 15, whose answers were also less
accurate in the last test than in the first, and who were part of the basic group
without training, i.e., the fact that they had not been trained may have led to a
worse performance from one test to the other.
This type of analysis showed us that a more individualized approach allows
us to better understand the impact of perceptual training according to the pro-
file of each participant. This understanding is in line with the statement by
Larsen-Freeman (2018), which summarizes why it is so important to look at in-
dividual development, as students not only start from different points when
they first engage in a task, but they also make their own developmental path.
366 Pollianna Milan, Denise Cristina Kluge
4 Conclusions
The inter-group analysis showed that there were positive effects of perceptual
training of heterotonics in Spanish as contrasted to Brazilian Portuguese for the
basic group, i.e., the one which had less academic exposure. The basic group
with training retained a significant average of correct answers in comparison to
the basic group without training. This was found, for example, in the comparison
of correct answers in the delayed production post-tests, in which the basic group
with training (average: 89%) had a significantly higher rate of correct answers
than the basic group without training (average: 58%). This inferential difference
was not found in the two groups with more academic exposure, because there
were no inferential differences at any point in this study, when comparing the
intermediate groups with and without training. In the two intermediate groups,
the effect of training was not positive because both groups were already familiar
with the heterotonics, as previously mentioned.
In the intra-group analysis, again the basic group with training had signifi-
cant differences in the rate of correct answers among the tests performed, both
in production and in perception. However, the members of the group were not
able to generalize, in inferential terms, what they had learned from being ex-
posed to new heterotonics in the generalization production test. In perception,
on the other hand, this same group was able to make this generalization for
heterotonics not seen in the other tests and also for new speakers (no longer
Mexicans, but Cubans and Hondurans). These results reinforced the positive ef-
fect of training for the basic group.
In the individual analysis, however, we found that having more or less aca-
demic exposure to Spanish was not necessarily a factor that determined the ef-
fect of training. This is because of participant 21, from the intermediate group
with training, who had a positive effect both in production and in perception.
From the production pre-test to the delayed post-test, this learner increased the
number of correct answers by 183%; in perception, this increase was 26% (the
fourth highest rate among the 26 participants in the perception test). This result
shows that when we first observed which participants tended to decrease the
group average and then we analyzed how these participants performed in the
percentage variation between tests, we realized that, in this type of comparison,
the individuals with more difficulty were exactly those who benefited the most
from training. This rule, in this study, was retained for learners with more diffi-
culty and who belonged not only to the basic group with training but also to
the intermediate groups with and without training. This means that even with
more academic exposure to Spanish, those who still had difficulty with the pro-
duction and the perception of heterotonics benefited from training, a fact that
Effects of perceptual training in the perception and production of heterotonics 367
had been masked in the group analysis. When there was lesser academic expe-
rience and absence of perceptual training, that is, learners from the basic group
who had difficulty and did not receive support to overcome it, the opposite oc-
curred: the informants who answered the tests more easily were the ones who
tended to increase the percentage of correct answers from one test to another.
According to Complex Systems Theory, this result shows that, basically, to
develop a language and/or learn something new, learners need to try different
ways to start and they ultimately feel satisfied with one form or another only
after enough interaction. Also, this attempt occurs in different ways for each
individual, even if that student belongs to the same group of learners and has
been exposed to the same number of hours of the language to be developed.
This reinforces that a learner’s performance cannot be generalized to the group,
nor can the group’s results be generalized to a particular individual (Lowie and
Vespoor 2015).
References
Amorin, Vitor. 2016. O Ensino de Matemática Financeira: Do Livro Didático ao Mundo Real
[Teaching financial mathematics: from the textbook to the real world]. Rio de Janeiro:
Sociedade Brasileira de Matemática.
Boggiss, George Joseph, Luiz Geraldo Mendonça, Luiz Alfredo Gaspar & Marcos Heringer.
2012. Matemática Financeira [Financial mathematics]. Rio de Janeiro: Editora FGV.
Bybee, Joan. 2010. Language, Usage and Cognition. Cambridge: Cambridge University Press.
Ellis, Rod. 1985. Understanding Second Language Acquisition. Oxford: Oxford University
Press.
Goldstone, Robert & Lisa Byrge. 2005. Perceptual learning. In Mohan Matthen (ed.), The
Oxford Handbook of Philosophy of Perception, 1–16. New York: Oxford University Press.
Henshaw, Florencia. 2011. Effects of feedback timing in SLA: A computer-assisted study on the
Spanish subjunctive. In Cristina Sans & Leow Ronald (eds.), Implicit and Explicit
Language Learning: Conditions, Processes, and Knowledge in SLA and Bilingualism,
85–99. Washington: Georgetown University Press.
Iezzi, Gelson, Samuel Hazzan & David Degenszajn. 2004. Fundamentos de Matemática
Elementar: Matemática Comercial, Financeira, Estatística [Fundamentals of elementary
mathematics: business, financial, and statistical mathematics]. São Paulo: Editora Atual.
Larsen-Freeman, Diane. 2018. Task repetition or task interation. In Martin Bygate (ed.),
Learning Language through Task Repetition, 311–330. Amsterdam: John Benjamins
Publishing Company.
Larsen-Freeman, Diane & Lynne Cameron. 2008. Complex Systems and Applied Linguistics.
Oxford: Oxford University Press.
Lima Júnior, Ronaldo Mangueira. 2016. A necessidade de dados individuais e longitudinais
para análise do desenvolvimento fonológico de L2 como sistema complexo [The need of
368 Pollianna Milan, Denise Cristina Kluge
Acknowledgments: The authors wish to thank Owen Ward for assistance with gathering litera-
ture for this review.
https://doi.org/10.1515/9783110736120-014
370 Anabela Rato, Diana Oliveira
1 Introduction
Research on second language perceptual training has contributed to the under-
standing of three major processes involved in speech learning: perceptual plas-
ticity, modality transfer, and robustness of learning.
Over the last 40 years (i.e., since Pisoni et al. 1982; McClaskey, Pisoni, and
Carrell 1983; Strange and Dittmann 1984), phonetic training studies have shown
that speech perception remains malleable over the life span with perceptual reat-
tunement of already formed phonemic categories and the establishment of new
L2 categories possible in any age studied so far. Pertinently, Bohn (2018) ac-
counts for an age gap in cross-language research on perceptual plasticity which
does not include the testing of learning mechanisms and processes in older
adults (over the age of 40).
Notwithstanding the testing of perceptual learning in younger adults, the
findings of training studies provide evidence of the plasticity of the perceptual
system that makes L2 speech learning possible in adulthood, as assumed by
the two most widely cited theoretical models of non-native speech learning
(Flege’s Speech Learning Model (SLM) 1995, and the revised Flege and Bohn’s
Model (SLM-r) 2021; and Best’s Perceptual Assimilation Model (PAM) 1995, and
Best and Tyler’s PAM-L2 2007).
Phonetic training research has also contributed to the discussion on the
two speech modalities interaction by examining the relation between percep-
tion and production performance, viz. by assessing the transfer of effects of per-
ceptual training in production and vice-versa. Specifically, the findings of a
meta-analysis of 18 perceptual training studies that tested for effects in produc-
tion (Sakai and Moorman 2018) indicate that the two speech modalities are con-
nected. The findings showed that perceptual training leads to medium-sized
gains in perception and small improvements in production. The results of a cor-
relational analysis suggested a non-significant small to medium relationship
between perception and production gains. More recent studies have examined
the modality transfer effect of both perception and production training to deter-
mine which training type transfers most effectively to the other modality. Stud-
ies such as those by Aliaga-Garcia (2017), Herd, Jongman, and Sereno (2013),
and Sakai (2016) have shown that both perception and production training
gains may transfer to opposite modalities. Aliaga-Garcia (2017) and Herd, Jong-
man, and Sereno’s (2013) findings suggest that both training types transfer to
the other modality, but how well perception or production transfers depends
on the relationship between the sounds being trained. Sakai (2016), however,
found that perception-only training led to large gains in perception but to no
significant improvements in production and production-only training led to
Assessing the robustness of L2 perceptual training 371
Delayed
Pretest Training Posttest
posttest(s)
The present systematic review includes perceptual training studies that have
tested for robustness of speech learning by administering testing tasks to assess
generalization and/or retention. Despite the four decades of research on L2 speech
learning, the findings about the efficacy of training in promoting generalization of
learning and long-term modification of learners’ perceptual performance are
somewhat scattered. Therefore, this review aims to provide a succinct overview of
perceptual training studies to answer the following research questions:
1. How often are both measures of generalization and retention of learning
adopted in perceptual training studies of L2 speech?
2. How effective are L2 perceptual training studies in promoting robust speech
learning?
In sum, the goal is to assess the carryover and long-term effects of perceptual
training on L2 speech learning in a population of adult L2 learners, by address-
ing the PICO components (i.e., Population, Intervention, Comparison(s) and
Outcome) of a systematic review (Higgins et al. 2019).
In section 2, we describe the method, including the literature search and the
coding, and tabulate information from each study included in the review. Sec-
tion 3 comprises the descriptive results concerning participant demographics,
scope of training (i.e., target segmental or suprasegmental structures), perceptual
training features, assessment of generalization and testing of retention of learn-
ing. We then report trends in the data, but do not conduct any statistical meta-
analysis.
2 Method
2.1 Literature search
For the purpose of this review, we included experimental research that met the
following six eligibility criteria: studies that (1) were published between 1980
Assessing the robustness of L2 perceptual training 375
Due to covid19 restrictions, the university’s library scan and deliver service was suspended
from March 2020 and the Morosan and Jamieson’s (1989) study, which is only available in
print, was not possible to obtain.
Table 1: Summary of studies assessing generalization and/or retention of learning, 1982–2020.
376
Godfroid, Lin and Ryu () English Mandarin tone yes yes
Iverson and Evans () Spanish, German English vowels yes yes
Lee and Lyster () Korean English vowels yes yes
Pruitt, Jenkins and Strange () English, Japanese, Hindi stops yes no
Hindi
Wang and Munro () Mandarin, Cantonese English vowels yes yes
2.2 Coding
The coding of the studies consisted of two phases. First, a preliminary set of vari-
ables were identified related to participants, target structure, perceptual training
and assessment of training, specifically generalization and retention of speech
learning. A coding scheme was thus developed and then piloted on a sample of
papers from the 27 studies. Both researchers coded all studies to ensure that rele-
vant information was not missed. The coding was discussed among the authors
of the study, unclear codes were revised and discrepancies were agreed upon.
The codes for each category of the coding scheme for this literature review syn-
thesis are presented in Table 2.
Variables Codes
Participants
Age
First language
Target language
Learning context L context FL context
Target language naïve beginner intermediate advanced
proficiency
Study
Sample size
N° of groups
Type of target segment suprasegment feature
structure
Target structure stops fricatives liquids vowels stress tone syllable
N° of target structures
Training
Training paradigm HVPT LVPT Both
Tasks ID DISC Both
Number of sessions
Length of session
Length of training
N° of tokens
Assessing the robustness of L2 perceptual training
per session
(continued)
379
Table 2 (continued)
380
Variables Codes
Stimuli
Quality natural synthesized both
Type real words pseudowords both
Presentation visual-only audio-only audiovisual audio and visual, audio,
audiovisual audiovisual
Anabela Rato, Diana Oliveira
Retention
Control group yes no
Tasks ID DISC Both
Comparison pretest and delayed posttest and delayed both
posttest posttest
Retention of learning yes no
Ret in all conditions yes no
Ret of generalization yes no n/a
Time after posttest
Generalization
Control group yes no
Tasks ID DISC Both
N° of gen tests
Gen to new tasks yes no n/a
Gen to new stimuli yes no n/a
Gen to new talkers yes no n/a
Gen to new contexts yes no n/a
Gen to other yes no n/a
conditions
Gen in all conditions yes no n/a
Assessing the robustness of L2 perceptual training
381
382 Anabela Rato, Diana Oliveira
three studies, (e.g., Wang 2013) and, finally, one paper investigated L2 Arabic
(Burnham 2013).
Regarding the learning context, ten studies (37%) focused on participants
who learnt the TL in a foreign language context (i.e., a classroom context in an
environment where the TL is not the societal language) (e.g., Okuno and Hardi-
son 2016; Wang et al. 1999), ten experiments recruited learners who acquired it
in a second language setting (i.e., a naturalistic target language environment)
(e.g. Nishi and Kewley-Port 2007; Wang and Munro 2004), one study included
participants who learned the TL in both contexts (Iverson and Evans 2009), and
six studies did not provide specific information about the language learning
environment.
Proficiency in the L2 ranged from initial to advanced levels and some stud-
ies focused on participants with little or no knowledge of the target language,
with the following distribution: naïve listeners – 6 studies; beginners – 5 stud-
ies; intermediate learners – 6 studies; advanced learners – 3 studies. Three pa-
pers did not mention their participants’ proficiency and four studies tested
learners with several proficiency levels (e.g., Lee and Lyster 2016; Okuno and
Hardison 2016). The approaches followed to assess proficiency are institutional,
in which grouping is based on the participants curricular or course levels (e.g.,
Burnham 2013; Cebrian and Carlet 2014; Fouz-González and Mompean, 2020),
or based on length of language experience in FL and/or L2 settings (e.g., Cheng
et al. 2019; Okuno and Hardison 2016). Some of them are also impressionistic,
in which grouping is based on self-assessment subjective descriptors (e.g., Lee
and Lyster 2016). Few studies used standardized language proficiency tests
(e.g., Huensch and Tremblay 2015; Iverson and Evans 2009).
Sample sizes varied greatly (mean=48, SD=58): most studies (44%) re-
cruited 20 to 40 participants (e.g., Shport 2016; Wang 2013); 37% of the experi-
ments had a sample size ranging between 40 and 303 L2 learners (e.g., Fouz-
Gonzaléz and Mompean 2020; Lee and Lyster 2016); and 19% tested samples
with less than 20 participants (e.g., Strange and Dittmann 1984; Wang et al.
1999). These descriptive statistics on sample size should take into consideration
the number of groups in the experimental design. For example, the study
which recruited over 300 participants had five experimental groups, with an av-
erage sample size of 51 participants per group, and one control group (n=50)
(Godfroid, Lin and Ryu 2017). On average, studies reported findings with 15 par-
ticipants per group (including the control group, when there was one), ranging
from 3.2 to 50.5 (median=12).
Assessing the robustness of L2 perceptual training 383
The vast majority of the studies (82%) tested L2 segments whereas only 15% tar-
geted suprasegments, and 3% examined syllable structure (viz. codas, Huensch
and Tremblay 2015). Among the 22 papers that focused on L2 phonemic catego-
ries, 36% examined vowels (e.g., Iverson and Evans 2009; Wang and Munro
2004), 27% tested liquids (e.g., Lively et al. 1994; Bradlow et al. 1997), and 18%
were dedicated to stops (e.g., Pruitt et al. 2006; Vlahou, Seitz, and Kopčo 2019).
Fricatives were the target category in only one study (Burnham 2013) and two
experiments tested several phonemes (e.g. Cebrian and Carlet 2014). As for
suprasegments, 75% of the four publications analyzed tonal structures (e.g.,
Wang et al. 1999) and a single study focused on pitch-accent patterns (Shport
2016). Irrespective of target structure type, 48% of all papers tested one to two
segments/suprasegments (e.g., Hardison 2003; McCrocklin 2012), 26% of the ex-
periments trained their participants on three to four structures (e.g., Lee and
Lyster 2016; Wang 2013) and seven articles (26%) implemented training on five
or more target phonological units (e.g., Iverson and Evans 2009 targeted 14 En-
glish vowels; Nishi and Kewley-Port 2007 compared training with a large set of
9 vowels and a subset of 3 English vowels). Stimuli containing the structures of
interest were naturally produced in most cases (85% of all 27 studies) and were
embedded in real words in 73% of the 22 experiments that provided information
on the stimulus type.
The majority of studies (70%) opted for the audio-only modality in their training
programs, with only 7% adopting the audiovisual format exclusively (e.g., Cheng
et al. 2019). The remaining experiments (22%) used both audio and audiovisual
stimuli presentation (e.g., Hardison 2003; Okuno and Hardison 2016). The pre-
dominant training paradigm (70%) was the high-variability phonetic training.
Among these 19 papers, two studies did not explicitly categorize their training
program, but they adopted variability of some sort (talkers’ voice and/or phonetic
context) and were, therefore, classified as HVPT experiments by the authors of
the present review (Okuno and Hardison 2016; Wang 2013). Identification tasks
were the preferred training task and were used in 85% of the studies. The remain-
ing four training experiments either used discrimination training tasks exclu-
sively (McCrocklin 2012; Strange and Dittmann 1984) or combined discrimination
and identification procedures (Cebrian and Carlet 2014; Fuhrmeister and Myers
2020). A similar trend was found for the test type before and after training: in
384 Anabela Rato, Diana Oliveira
From the pool of 27 studies, less than half included both measures of robust-
ness of learning (generalization and retention). Eleven studies included gener-
alization and retention tests after training (e.g. Iverson and Evans 2009; Nishi
and Kewley-Port 2007), 14 experiments included only the testing of generaliza-
tion of improvement achieved during training (e.g., Cebrian and Carlet 2014;
Wang 2013) and two assessed the long-term effects of training exclusively
(Fuhrmeister and Myers 2020; McCrocklin 2012).
Assessing the robustness of L2 perceptual training 385
Twenty-five out of the 27 studies included in this analysis (93%) tested generaliza-
tion of the learning obtained via perceptual training to untrained conditions (stim-
uli, phonetic context, talker, task). As such, all percentages presented below,
referring to generalization measures, will consider n=25, unless otherwise stated.
Ninety-two percent of the experiments used identification tasks to assess transfer
of learning. Generalization of learning to new stimuli was tested by 84% of the 25
studies (e.g., Hardison 2003; Cebrian and Carlet 2014) and 92% investigated trans-
fer of improvement to the perception of untrained talkers (i.e., new voices) (e.g.,
Cheng at al. 2019; Okuno and Hardison 2016) and six out of 25 papers (24%) dealt
with generalization to new phonetic contexts (e.g., Thomson 2012; Wang 2013). All
studies found evidence of generalization of learning, but only 68% reported that
effect for all conditions tested. For example, whereas Godfroid, Lin and Ryu
(2017) reported transfer of perceptual learning to untrained tasks, stimuli and
talkers, Shport (2016) found evidence of generalization to new stimuli but not to
novel voices and Lee and Lyster (2016) observed the opposite trend, i.e., transfer
to novel talkers but not to untrained stimuli. Additionally, 18 out of the 21 experi-
ments (86%) that assessed generalization to untrained tokens reported transfer
in that condition (e.g., Bradlow et al. 1997, 1999). Approximately, the same per-
centage of studies (87%) reported evidence of transfer of learning to novel talkers
(e.g., Cebrian and Carlet 2014), out of the 23 papers that tested generalization in
this condition. Three studies in which the training program used tasks different
from the tests also reported carryover effects of training. For example, Strange
and Dittmann (1984) reported that improvement in AX discrimination tasks gen-
eralized to categorical perception identification tasks of the same synthetic stim-
uli. Five of the six studies that investigated transfer of perceptual learning to new
phonetic contexts observed generalization. One study (Thomson 2012) reported
mixed findings, with transfer of vowel perception to only one of the three new
contexts examined. Another experimental condition investigated by one of the
studies was speech perception in rooms with different acoustics (Vlahou, Seitz,
and Kopčo 2019), which reported that one of the experimental groups (trained in
multiple-room reverberant environments) generalized improvement to an un-
trained room.
Thirteen out of the 27 papers considered in this review (48%) tested retention of
learning. Thus, the information presented below, referring to retention measures,
386 Anabela Rato, Diana Oliveira
will consider n=13. Seventy-seven per cent of these experiments used identifica-
tion tasks to assess performance some time after training was completed (e.g.,
Iverson and Evans 2009; Thomson 2012) and 85% tested retention using the
same type of task as in training. Thirty-one per cent of the experiments compared
performance in the delayed posttest with scores in the immediate posttest (e.g.,
Lee and Lyster 2016, McCrocklin 2012); 46% used pretest accuracy as the baseline
for comparison (e.g., Godfroid, Lin and Ryu 2017), two studies (15%) provided a
comparison between performance in the retention test and scores in both the pre-
test and the posttest (e.g., Iverson and Evans 2009), and one study did not in-
clude this information. The delayed posttest assessing retention of learning took
place between less than a day (e.g., Fuhrmeister, Schlemmer and Myers 2020)
and six months after training (Wang et al. 1999). Most studies (54%) tested re-
tention no longer than a month after the last training session (e.g., Godfroid,
Lin and Ryu 2017; Lee and Lyster 2016), in four experiments (31%) the delayed
posttest occurred three months after training was over (e.g. Nishi and Kewley-
Port 2007; Wang and Munro 2004), in one study four months afterwards
(Iverson and Evans 2009) and in another one six months after training completion
(Wang et al. 1999). Only two studies measured retention of learning in two subse-
quent times after the posttest. For example, Lively et al. (1994) tested retention 3
and 6 months after training was over. All 13 papers found evidence of retention of
learning. However, in only four experiments (31%) learning was retained in all
conditions tested (e.g., Fouz-González and Mompean, 2020). Eight of the 13
studies considered in this section provided information on the retention of
generalized learning and seven (87.5%) observed that generalization effects
were retained up to the moment of the delayed posttest (e.g., Bradlow et al.
1999; Lively et al. 1994).
4 Discussion
4.1 Participants
Although the body of research on perceptual training conducted over the last
decades has contributed to support the claim of life-long perceptual plasticity,
i.e., that L2 speech learning is possible in all ages, there is a gap in the learners’
age groups (Bohn 2018). Specifically, training studies have not investigated per-
ceptual learning in groups of mature adults. Due to the lack of standard data
reporting practices, it was not possible to calculate the average age of L2 partic-
ipants. However, a close examination of the measures provided (e.g., the age
Assessing the robustness of L2 perceptual training 387
range in 13 studies) allowed us to notice that only one study included L2 learn-
ers’ older than 40 years old. Future research should include wider age ranges
that include older learners to further test the claim of life-long perceptual learn-
ing mechanisms (Derwing et al. 2014; Bohn 2018).
Expectedly, English was the target language of most studies. This trend,
also observed by previous reviews (e.g., Lee, Jang and Plonsky 2015; Thomson
and Derwing 2015; Sakai and Moorman 2018), is explained not only by the sta-
tus of English as an international language but also by the selection of studies
chosen for analysis that were written in English. It also reinforces the need to
include languages other than English so that the study of other language-
pairings can further the examination of cross-linguistic influence and other lan-
guage-specific patterns in L2 speech learning. With the exception of one study
(McCrocklin 2012), training research controlled for the learners’ L1, which al-
lows the analysis of L1 and L2 interaction. Ten of the studies were conducted in
a foreign language context, in which learners are exposed to the target lan-
guage in a formal classroom setting, and another 10 in a second language envi-
ronment, in which naturalistic exposure to the target language may also occur
outside of the classroom. The different learning contexts may imply differences
in target language input (i.e., amount of exposure) and output (i.e. frequency of
use) which need to be accounted for, in particular during the training program,
including between the time elapsing from the immediate posttest and the de-
layed posttest(s). It is of particular relevance to interpret the findings of reten-
tion of improvement achieved during training and to understand if there was a
change in any of the external factors pertaining to language experience such as
amount of TL input and use, and context of learning or of affective variables
such as motivation. By providing a justification for not collecting data no longer
than six months after training was over, Pereira (2014: 186) explains the risk of
biased findings: “any testing 6 months later would carry the risk of confound-
ing the results if students had not continued having the same amount of input
because they had either dropped out or failed a module taught in English re-
sulting in having less English input for some months”.
Regarding language proficiency levels, the reviewed research included par-
ticipants with little or no knowledge of the TL to advanced learners and only
three studies did not report any proficiency indicator. However, as described,
training studies use a variety of measures that range from institutional to impres-
sionistic practices. This range of proficiency measures shows a general lack of
standardization in the reporting of participant language proficiency levels, as
previously noticed for L2 studies (Thomas 1994, 2006). The sample sizes ranged
from 8 to 303 participants, with most studies (44%) involving the participation of
20–40 learners. However, the average sample size of was 15 participants per
388 Anabela Rato, Diana Oliveira
group (ranging from 3 to 51). The difficulty in recruiting and retaining partici-
pants in a longitudinal study which involves not only testing in different times
(two or more, if delayed posttest(s) are included) and training with several ses-
sions is acknowledged by researchers in the field of phonetic training studies.
Participant attrition, in particular, is often reported (e.g., Fouz-González and
Mompean 2020; Lively et al. 1994) and its resulting smaller sample size is fre-
quently recognized as a limitation of training studies. However, as also recom-
mended by authors of previous reviews (e.g., Sakai and Moorman 2018), an effort
must be made to increase sample sizes to conduct robust statistical analyses and
be able to generalize the findings. For example, to motivate participants to com-
plete all the phases of the training study, incremental compensation in the form
of participation fees or course credit could be provided.
phonological units (e.g., Cebrian and Carlet 2014 attested generalization of learn-
ing of the target stops and fricative consonants but not for the labiodental /v/).
Sakai and Moorman (2018) reported that only 7 out of 30 perception studies
in their meta-analysis included the testing of retention of L2 speech learning,
and thus we were expecting to find the same trend. However, though the re-
ported low number of training studies that include assessment of long-term ef-
fects was confirmed in this review, the proportion is higher with 13 of 27 studies
including the testing of retention of improvement. The assessment of generali-
zation of learning is more frequent, being reported in 25 studies. Two possible
reasons may explain the lower number of perceptual studies that do not in-
clude delayed posttest(s). On one hand, the challenge to retain participants
over an extended period of time that can range from one week to six months (or
longer) after training. For example, several studies include participants who
are undergraduate or graduate students (e.g., Burnham 2013; Motohashi-Saigo
and Hardison 2009) who may be no longer available after a certain period of
time, particularly if the study timeline does not coincide with the academic
yearly timetable. On the other hand, there is a methodological concern with the
control of the amount of TL input that participants are exposed to outside of the
experiment in the time gap between the posttest and the delayed posttest(s). The
thirteen perceptual training studies that included delayed posttests found evi-
dence of retention of learning. However, only four experiments reported positive
long-term effects in all conditions tested (e.g., Fouz-González and Mompean
2020). The other studies reported partial or mixed findings. For example, in Nishi
and Kewley-Port’s (2007) study, only the experimental group trained with the
fullset of target segments retained learning after three months in the generaliza-
tion to new voices and real words and no effect was observed in the subset group
of trainees. Generalization effects were retained up to the moment of the delayed
posttest in seven studies. Lively et al. (1994) findings’ show the same generaliza-
tion tendency (transfer of learning to new words produced by familiar talker to a
greater extent than generalization to new talker) three and six months after train-
ing was over. Further research that includes the assessment of generalization
in the delayed posttest(s) could provide meaningful information regarding L2
speech learning development.
Although the scope of this review was not the analysis of transfer of percep-
tual improvement to production (see Sakai and Moorman’s 2018 meta-analytic
review), we observed that less than a third (22%) of the 27 studies assessed the
relation between the two speech modalities, and only one study (Bradlow et al.
1999) included the three measures of robustness of learning.
To understand the three major processes involved in second language
speech learning – perceptual plasticity, modality transfer, and robustness of
Assessing the robustness of L2 perceptual training 391
5 Conclusion
To examine the use of measures of robustness of L2 speech learning, 27 studies
were gathered for this literature review. Less than half of the studies (n=11) in-
cluded both generalization and retention testing. Fourteen experiments tested
for generalization of learning exclusively and two assessed retention of learn-
ing only. Transfer of learning to new experimental conditions is thus more fre-
quently tested than the long-term effects of training which highlights the need
to further investigate the effects of phonetic training programs, including the
delayed testing in more than one moment in time after training is over. This
would require, nonetheless, a thorough account of potential changes in the par-
ticipants’ learning experience and context in the time elapsing between posttest
and the delayed posttests.
The findings of the present narrative review show that all studies that
tested for carryover effects of training found evidence of generalization of im-
provement, and most of the experiments (17 out of 25) reported transfer of
learning for all conditions tested. The same trend was observed for the testing
of retention with all studies that tested for retention, reporting positive lasting
effects of perceptual training. However, less than half of the experiments (4 out
of 13) reported retention of improvement in all conditions.
In order to be able to conduct an exhaustive literature search and uphold
the quality and validity of perceptual training research, we decided to only re-
trieve peer-reviewed journal publications. However, this decision may have im-
pacted the results of our review, which seems to reflect a publication bias.
Thornton and Lee (2000) explain that this occurs when research which have
392 Anabela Rato, Diana Oliveira
Funding
This work was supported by the Victoria College Research Award (Fall 2020),
University of Toronto.
Assessing the robustness of L2 perceptual training 393
References
Aliaga-Garcia, Cristina. 2017. The effect of auditory and articulatory phonetic training on the
perception and production of L2 vowels by Catalan-Spanish learners of English.
Barcelona: Universitat de Barcelona dissertation.
Beddor, Patrice & Terry Gottfried. 1995. Methodological issues in cross-language speech
perception research with adults. In Winifred Strange (ed.), Speech Perception and
Linguistic Experience: Issues in Cross-Language Research, 207–232. Timonium, MD: York
Press.
Best, Catherine. 1995. A Direct Realist view of cross-language speech perception. In Winifred
Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-Language
Research, 171–204. Timonium, MD: York Press.
Best, Catherine & Michael Tyler. 2007. Nonnative and second-language speech perception:
Commonalities and complementarities. In Ocke-Schwen Bohn and Murray Munro (eds.),
Language Experience in Second Language Speech Learning – In honor of James Emil
Flege, 13–34. Amsterdam: John Benjamins Publishing Company.
Bohn, Ocke-Schwen. 2000. Linguistic relativity in speech perception: An overview of the
influence of language experience on the perception of speech sounds from infancy to
adulthood. In Susanne Niemeier & René Dirven (eds.), Evidence for Linguistic Relativity,
1–28. Amsterdam: John Benjamins Publishing Company.
Bohn, Ocke-Schwen. 2018. Cross-language and second language speech perception. In
Eva M. Fernández & Helen Smith Cairns (eds.), The Handbook of Psycholinguistics,
213–239. New Jersey, USA: Wiley.
Bradlow, Ann R., Reiko Akahane-Yamada, David B. Pisoni & Yoh’ichi Tohkura. 1999. Training
Japanese listeners to identify English /r/and /l/: Long-term retention of learning in
perception and production. Perception and Psychophysics 61(5). 977–985. https://doi.
org/10.3758/BF03206911
Bradlow, Ann R., David B. Pisoni, Reiko Akahane-Yamada & Yoh’ichi Tohkura. 1997. Training
Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning
on speech production. Journal of the Acoustical Society of America 101(4). 2299–2310.
Burnham, Kevin R. 2013. Phonetic Training in the Foreign Language Curriculum. Applied
Language Learning 23–24. 63–74.
Cebrian, Juli & Angélica Carlet. 2014. Second-language learners’ identification of target-
language phonemes: A short-term phonetic training study. Canadian Modern Language
Review 70(4). 474–499. https://doi.org/10.3138/cmlr.2318.
Cheng, Bing, Xiaojuan Zhang, Siying Fan & Yang Zhang. 2019. The role of temporal acoustic
exaggeration in High Variability Phonetic Training: A behavioral and ERP study. Frontiers
in Psychology 10. 1178. https://doi.org/10.3389/fpsyg.2019.01178.
Derwing, Tracey M., Murray J. Munro, Jennifer A. Foote, Erin Waugh & Jason Fleming. 2014.
Opening the window on comprehensible pronunciation after 19 years: A workplace
training study. Language Learning 64(3). 526–548.
Flege, James. 1995. Second language speech learning: Theory, findings and problems. In
Winifred Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-
Language Research, 233–277. Timonium, MD: York Press.
394 Anabela Rato, Diana Oliveira
Flege, James & Ocke-Schwen Bohn. 2021. The Revised Speech Learning Model (SLM-r). In
Ratree Wayland (ed.), Second Language Speech Learning: Theoretical and Empirical
Progress, 3–83. Cambridge: Cambridge University Press.
Fouz-González, Jonás & Jose A Mompean. 2020. Exploring the potential of phonetic symbols
and keywords as labels for perceptual training. Studies in Second Language Acquisition
43(2). 1–32. https://doi.org/10.1017/S0272263120000455
Fuhrmeister, Pamela & Emily B. Myers. 2020. Desirable and undesirable difficulties:
Influences of variability, training schedule, and aptitude on nonnative phonetic learning.
Attention, Perception, and Psychophysics 82(4). 2049–2065. https://doi.org/10.3758/
s13414-019-01925-y
Fuhrmeister, Pamela, Brianna Schlemmer & Emily B. Myers. 2020. Adults show initial
advantages over children in learning difficult nonnative speech sounds. Journal of
Speech, Language, and Hearing Research 63(8). 2667–2679. https://doi.org/10.1044/
2020_JSLHR-19-00358
Godfroid, Aline, Chin-Hsi Lin & Catherine Ryu. 2017. Hearing and Seeing Tone Through Color:
An Efficacy Study of Web-Based, Multimodal Chinese Tone Perception Training. Language
Learning 67(4). 819–857. https://doi.org/10.1111/lang.12246
Hardison, Debra M. 2003. Acquisition of second-language speech: Effects of visual cues,
context, and talker variability. Applied Psycholinguistics 24(4). 495–522. https://doi.org/
10.1017/S0142716403000250
Herd, Wendy, Allard Jongman & Joan A. Sereno. 2013. Perceptual and production training of
intervocalic /d, ɾ, r/ in American English learners of Spanish. The Journal of the Acoustical
Society of America 133(6). 4247–4255. https://doi.org/10.1121/1.4802902
Higgins, Julian P. T., James Thomas, Jacqueline Chandler, Miranda Cumpston, Tianjing Li,
Matthew J. Page & Vivian A. Welch (eds.). 2019. Cochrane Handbook for Systematic
Reviews of Interventions, 2nd edn. Chichester (UK): John Wiley & Sons.
Huensch, Amanda & Annie Tremblay. 2015. Effects of perceptual phonetic training on the
perception and production of second language syllable structure. Journal of Phonetics 52.
105–120. https://doi.org/10.1016/j.wocn.2015.06.007
Iverson, Paul & Bronwen G. Evans. 2009. Learning English vowels with different first-language
vowel systems II: Auditory training for native Spanish and German speakers. The Journal
of the Acoustical Society of America 126(2). 866–877. https://doi.org/10.1121/1.3148196
Jamieson, Donald G. & David E. Morosan. 1989. Training new, nonnative speech contrasts: A
comparison of the prototype and perceptual fading techniques. Canadian Journal of
Psychology – Revue Canadienne de Psychologie 43(1). 88–96. https://doi.org/10.1037/
h0084209
Lee, Andrew H. & Roy Lyster. 2016. Effects of different types of corrective feedback on
receptive skills in a second language: A speech perception training study. Language
Learning 66(4). 809–833. https://doi.org/10.1111/lang.12167
Lee, Junkyu, Juhyun Jang & Luke Plonsky. 2015. The effectiveness of second language
pronunciation instruction: A meta-analysis. Applied Linguistics 36(3). 345–366.
Lively, Scott E., David B. Pisoni, Reiko Akahane-Yamada, Yoh’ichi Tohkura & Tsuneo Yamada.
1994. Training Japanese listeners to identify English /r/ and /l/. III. Long‐term retention
of new phonetic categories. The Journal of the Acoustical Society of America 96(4).
2076–2087. https://doi.org/10.1121/1.410149
Assessing the robustness of L2 perceptual training 395
Logan, John S. & John Pruitt. 1995. Methodological issues in training listeners to perceive non-
native phonemes. In Winifred Strange (ed.), Speech Perception and Linguistic Experience:
Issues in Cross-Language Research, 351–378. Timonium, MD: York Press.
McClaskey, Cynthia, David B. Pisoni, & Thomas Carrell. 1983. Transfer of training of a new
linguistic contrast in voicing. Perception and Psychophysics 34(4). 323–330.
McCrocklin, Shannpon. 2012. Effect of Audio vs. Video on Aural Discrimination of Vowels.
Teaching English as a Second or Foreign Language – The Electronic Journal for English as
a Second Language (TESL-EJL) 16(2). 1–16.
Motohashi-Saigo, Miki & Debra M. Hardison. 2009. Acquisition of L2 Japanese geminates:
Training with waveform displays. Language Learning & Technology 13(2). 29–47.
Nishi, Kanae & Diane Kewley-Port. 2007. Training Japanese listeners to perceive American
English vowels: Influence of training sets. Journal of Speech, Language, and Hearing
Research 50(6). 1496–1509. https://doi.org/10.1044/1092-4388(2007/103)
Okuno, Tomoko & Debra M. Hardison. 2016. Perception-production link in L2 Japanese vowel
duration: Training with technology. Language Learning & Technology 20(2). 61–80.
Pereira, Yasna I. 2014. Perception and production of English vowels by Chilean learners of
English: Effect of auditory and visual modalities on phonetic training. London: University
College London dissertation.
Pisoni, David B., Richard N. Aslin, Alan J. Percy & Beth L. Hennessy. 1982. Some effects of
laboratory training on identification and discrimination of voicing contrasts in stop
consonants. Journal of Experimental Psychology: Human Perception and Performance
8(2). 297–314. doi: 10.1037/0096-1523.8.2.297
Pruitt, John S., James J. Jenkins & Winifred Strange. 2006. Training the perception of Hindi
dental and retroflex stops by native speakers of American English and Japanese. The
Journal of the Acoustical Society of America 119(3). 1684–1696. https://doi.org/10.1121/
1.2161427
Rosenblum, Lawrence D. 2005. The primacy of multimodal speech perception. In David
B. Pisoni & Robert E. Remez (eds.), The Handbook of speech perception, 51–78. Malden,
MA: Blackwell.
Rosenblum, Lawrence D. 2008. Speech perception as a multimodal phenomenon. Current
Directions in Psychological Science 17(6). 405–409. doi: 10.1111/j.1467-8721.2008.00615.x
Sakai, Mari. 2016. (Dis)connecting perception and production: Training adult speakers of
Spanish on the English/i/-/ɪ/ distinction. Washington DC: Georgetown University
dissertation. https://repository.library.georgetown.edu/handle/10822/1042879
Sakai, Mari & Colleen Moorman. 2018. Can perception training improve the production
of second language phonemes? A meta-analytic review of 25 years of perception training
research. Applied Psycholinguistics 39(1). 187–224.
Shport, Irina A. 2016. Training English listeners to identify pitch-accent patterns in Tokyo
Japanese. Studies in Second Language Acquisition 38(4). 739–769. https://doi.org/
10.1017/S027226311500039X
Strange, Winifred. 1995. Cross-language studies of speech perception: A historical review. In
Winifred Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-
Language Research, 3–45. Timonium, MD: York Press.
Strange, Winifred & Sibylla Dittmann. 1984. Effects of discrimination training on the
perception of /r-l/ by Japanese adults learning English. Perception and Psychophysics
36(2). 131–145. https://doi.org/10.3758/BF03202673
396 Anabela Rato, Diana Oliveira
https://doi.org/10.1515/9783110736120-015
400 Tracey M. Derwing
What should the focus of pronunciation instruction be? We have already estab-
lished that aside from Indigeneous contexts where target-like productions matter,
An overview of pronunciation teaching and training 403
‘sobbing’) and several in final position (‘rip’ vs. ‘rib’). These two consonants are
considered to be high FL, whereas consonants such as the interdentals, /θ/
and /ð/, are low FL even though when speakers substitute /s/ or /t/ for the for-
mer and /z/ or /d/ for the latter, the substitution is noticeable. However, in
most instances, a mispronunciation of either interdental has little or no conse-
quence for either intelligibility or comprehensibility. Once teachers have re-
viewed the high FL segmentals that learners seem to have difficulty with, they
should determine whether the problem is with perception, production, or both.
Administering a simple perception test will show the teacher whether the learn-
ers can discriminate between two segments. Some preliminary explanations
about segmental production can happen in class, but students can also be re-
ferred to technological aids to help them work on their perception at home.
Tools such as englishaccentcoach.com (Thomson 2022) can give learners an op-
portunity to easily focus on sounds that present them with difficulties. Research
has repeatedly shown, in more than 30 studies, that High Variability Phonetic
Training (HVPT) which is essentially what englishaccentcoach.com offers, has
a positive impact on perception, and in some cases, leads to improved produc-
tion as well (Thomson 2018).
It is often the case that shared problems with pronunciation are supraseg-
mental in nature, in which case the teacher can access resources to support the
learners in class. Numerous techniques can help with overall global improve-
ments, such as shadowing and mirroring. Meyers (nd) advocates inviting stu-
dents to choose a proficient and easy to understand speaker as a model and
have them break down what it is the model does with his/her voice, body move-
ments and gestures. She suggests asking students first to consider what the
speaker’s intended purpose is in a given communication and then further ana-
lyze the speech from there. After a period of 2–3 weeks students can make a
final video in which they mirror their model for their fellow students who can
provide feedback (for more information, see pronunciationforteachers.com
under the ‘Teaching’ tab). Another technique that focuses on suprasegmentals
is shadowing, which involves repeating a sample of speech at a very short
delay. This can also be done in class (often elements of sitcoms can be acted
out this way). Foote and McDonough (2017) have demonstrated, however, that
shadowing also lends itself to homework using recorded dialogues. In their
study, both comprehensibility and fluency were significantly enhanced as a re-
sult of shadowing.
A somewhat surprising technique, having students imitate in their L1 speak-
ers from the L2 they wish to learn, is effective in enhancing their pronunciation
of that L2 (Rojczyk 2015). Having students speak in their L1 takes away any pres-
sure to find suitable vocabulary or grammar, and allows them to focus on those
An overview of pronunciation teaching and training 405
aspects of the L2 that they notice when a speaker of the L2 uses the students’ L1.
Rojczyk asked ten Polish students of English to imitate an English accent while
speaking in Polish. The researcher was particularly interested in the voice onset
times of stops, and determined that imitating an English accent resulted in signif-
icantly more English-like voice onset times. A similar study was conducted in
Barcelona by Everitt (2015). She compared three groups of Spanish/Catalan learn-
ers of English. One group received standard pronunciation instruction in English,
a second group spoke in their L1 with an English accent and a third group served
as a control. The researcher hypothesized that the second group would be able to
produce far more output than the other groups because there were no limitations
on their lexical and grammatical knowledge. A post test revealed that both
groups who received an intervention had superior perception and production in
English to the control group, but that the L1 imitation group performed better
than the group who had traditional instruction. There is a caveat with this tech-
nique in that some learners may refuse to imitate another accent. In some cul-
tures, imitation is viewed as disrespectful whereas in others it is interpreted as a
bit of fun.
Many researchers and practitioners consider fluency, or the flow of language
in the absence of disruptive pauses, repetitions and repairs, to be a component
of pronunciation. One global measure of fluency is speech rate (typically sylla-
bles per second). Munro and Derwing (1998: 165) conducted a study in which
they asked Mandarin speakers to read a passage at a normal, comfortable pace,
and then to read the same passage at a rate “half as fast as normal”. In fact, the
speakers actually slowed their speech to a rate that was approximately 75% of
normal, but it was clearly slower than the initial passages. These passages were
then randomized and played to listeners who rated them for comprehensibility.
The speakers were rated as slightly less comprehensible in the slowed condition.
In a second experiment, the authors took the same normally-produced passages
and both slowed them and sped them up by 10 percent using computer software
without interfering with pitch. Slowing the Mandarin speakers’ speech rate had a
negative effect on listeners’ perception of their speech. Many researchers have
investigated fluency since then, with the consensus that fluent speech is easier
for listeners to follow. Several recent studies have involved approaches to en-
hancing fluency that lend themselves well to both general second language and
pronunciation-specific classrooms.
Galante and Thomson (2017) introduced drama activities into a language
class, while a comparable control group undertook normal communicative
classroom presentations. Pre and post instruction rating tests determined that
the drama group made significant gains in fluency whereas the control group
406 Tracey M. Derwing
point to the interplay among different aspects of language learning, but also to
the importance of including pronunciation while focusing on other aspects.
2.4 Feedback
Language students often profess to want more feedback than their teachers give
them, and some teachers are reluctant to provide negative feedback because
they worry that they will hurt their students’ feelings. It is true that if teachers
were to correct every aspect of a learner’s pronunciation that differs from a tar-
get, in some cases the amount of feedback would be overwhelming. This is
where the intelligibility/comprehensibility rubric comes in. Pronunciations that
do not interfere with understanding are not important and feedback is unneces-
sary. But features of speech that cause difficulty for listeners warrant explicit
feedback from the teacher. However, the teacher is not the only person in the
classroom who can provide useful feedback, and indeed, learners should be
helping each other with their productions. Martin and Sippel (2021) conducted
an innovative study in which four groups of first year learners of German partici-
pated (most of the learners were monolingual English speakers). One group was
a control; the other groups all received some pronunciation instruction, follow-
ing which one group experienced feedback from an instructor, another group
provided feedback to their classmates, and the last group received feedback from
their peers. The authors chose both segmental and suprasegmental targets as the
objects of instruction; the German phoneme /ts/ is often problematic because it
is written as [z], leading many learners to mispronounce it. Word stress in En-
glish-German cognates was the suprasegmental focus of instruction. A pretest
was administered in Week 1 of the study, and the peer feedback givers and re-
ceivers also had some explicit instruction on the nature of corrective feedback. In
Week 2 the learners in the experimental groups focused on the pronunciation of
the two targets, and in Weeks 3 and 4 they made recordings. Weeks 4 and 5 were
used for feedback on the recordings (the receivers of feedback were able to then
re-record) and a post-test was administered in Week 6. The pre and post tests
both consisted of individual words and sentences. Five native German speakers
rated the productions from all four groups for comprehensibility. All three inter-
vention groups improved significantly compared to the control group, but inter-
estingly, the group that outperformed the others was the group who provided
feedback to their peers, followed by the group who received feedback from a
teacher. The students who received feedback from their peers were in third place
but still well ahead of the control group. The authors point out that the provision
of feedback in classrooms does not have to be left solely to the instructor, and
408 Tracey M. Derwing
that by being put in the position of having to provide feedback, students’ phono-
logical awareness regarding their own productions is raised.
3 Going forward
Clearly, the study of L2 pronunciation has come a long way in the last twenty
years, most especially in terms of empirical research. In previous eras, many
astute insights were made by expert practitioners, such as David Abercrombie
(1949), who maintained that most learners need only comfortable intelligibility
as their goal. These insights were all but lost, however, when new approaches
to L2 instruction became popular, such as the Communicative Language Teach-
ing. Had there been a solid body of research, rather than personal observations,
perhaps pronunciation would not have sunk into such a state of obscurity for
so long. We can hope that the current revival of interest in pronunciation is
maintained for years to come. There are a lot of questions to be addressed!
The radical expansion in the ownership and use of smartphones and other
technology suggests that far more attention could be paid to digital gaming and
other forms of pronunciation apps. A quick Google search for “apps for learning
English pronunciation” turned up an astonishing eighty million results. But how
many of these apps were developed with pronunciation experts at the helm?
Very few indeed. I have watched one researcher, Ron Thomson, take an idea
from beginning to (well, there is no end). His doctoral dissertation was an early
version of englishaccentcoach.com, but he required considerable funding and
technical assistance to expand that program into a platform that could be used
by learners from all over the world. Fortunately, he was able to secure funding
from two government departments, but the app now needs expensive updating
and the kind of technological expertise that a fulltime professor of applied lin-
guistics does not have. This is a priority for him, so he has obtained more fund-
ing, but what I conclude from seeing how many hours over the years go into a
project like this, is that we need far more collaboration with scholars from other
fields. Researchers in the Netherlands have developed an Automatic Speech Rec-
ognition (ASR) program designed to help L2 learners of Dutch, but unlike most
ASR programs, the developers took into account the most frequent and problem-
atic errors identified by teachers of Dutch (O’Brien et al. 2018). This program is
embedded in electronic language courseware available to Dutch learners. It is in-
credibly advanced compared to programs for English pronunciation but it was
developed by a team of linguists, applied linguists, engineers and computing sci-
entists. To make true progress, we need more collaboration across the board.
An overview of pronunciation teaching and training 409
References
Abercrombie, David. 1949. Teaching pronunciation. ELT Journal 3(5). 113–122.
Bird, Sonya. 2020. Pronunciation among adult Indigenous language learners: The case of
SENĆOTEN /t’/. Journal of Second Language Pronunciation 6(2). 148–179.
Bongaerts, Theo, Susan Mennen & Frans van der Slik. 2000. Authenticity of pronunciation in
naturalistic second language acquisition: The case of very advanced late learners of
Dutch as a second language. Studia Linguistica 54(2). 298–308.
Bongaerts, Theo, Chantal van Summeren, Brigitte Planken & Erik Schills. 1997. Age and
ultimate attainment in the pronunciation of a foreign language. Studies in Second
Language Acquisition 19(4). 447–465.
Bueno Alastuey, Maria Camino. 2010. Synchronous voice computer-mediated communication:
Effects on pronunciation. Calico Journal 28(1). 1–20.
Dahm, Maria & Lynda Yates. 2013. English for the workplace: Doing patient-centred care in
medical communication. TESL Canada 30 [special issue 7]. 21–33.
De Bot, Kees, Wander Lowie & Marjolijn Verspoor. 2007. A dynamic systems theory approach
to second language acquisition. Bilingualism: Language and Cognition 10(1). 7–21.
An overview of pronunciation teaching and training 411
Derwing, Tracey M. 2003. What do ESL students say about their accents? Canadian Modern
Language Review 59(4). 547–567.
Derwing, Tracey M. (in press). Lessons learned from teaching teachers to teach pronunciation.
In Veronica Sardegna & Anna Jarosz (eds.), English pronunciation teaching: Theory,
practice and research findings. Bristol: Multilingual Matters.
Derwing, Tracey M., Helen Fraser, Okim Kang & Ronald I. Thomson. 2014a. L2 accent and
ethics: Issues that merit attention. In Ahmar Mahboob & Leslie Barratt (eds.), Englishes
in multilingual contexts, 63–80. Berlin: Springer.
Derwing, Tracey M. & Murray J. Munro. 2015. Pronunciation Fundamentals: Evidence-based
Perspectives for L2 Teaching and Research. Amsterdam: John Benjamins.
Derwing, Tracey M., Murray J. Munro, Jennifer A. Foote, Erin Waugh & Jason Fleming. 2014b.
Opening the window on comprehensible pronunciation after 19 years: A workplace
training study. Language Learning 64(3). 526–548.
Derwing, Tracey M., Murray J. Munro & Ronald I. Thomson. 2008. A longitudinal study of ESL
learners’ fluency and comprehensibility development. Applied Linguistics 29(3). 359–380.
Derwing, Tracey M., Marian J. Rossiter & Murray J. Munro. 2002. Teaching native speakers to
listen to foreign-accented speech. Journal of Multilingualism and Multicultural
Development, 23(4),245–259.
Derwing, Tracey M., Erin Waugh & Murray J. Munro. 2021. Pragmatically speaking: Preparing
adult ESL students for the workplace. Applied Pragmatics 3(2). 107–135.
Everitt, Charlotte. 2015. Accent imitation on the L1 as a task to improve L2 pronunciation.
Barcelona: Universitat de Barcelona thesis.
Flege, James E. 1995. Second language speech learning: Theory, findings and problems. In
Winifred Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross-
language Research, 233–277. Timonium (Maryland): York Press.
Flege, James E. & Ocke-Schwen Bohn. 2021. The Revised Speech Learning Model (SLM-r). In
Ratree Wayland (ed.), Second Language Speech Learning: Theoretical and Empirical
Progress, 3–83. Cambridge: Cambridge University Press.
Flege, James E., Murray J. Munro & Ian R. A. MacKay. 1995. Factors affecting strength of
perceived foreign accent in a second language. Journal of the Acoustical Society of
America 97(5). 3125–3134.
Foote, Jennifer A., Amy Holtby & Tracey M. Derwing. 2011. Survey of pronunciation teaching in
adult ESL programs in Canada, 2010. TESL Canada Journal 29(1). 1–22.
Foote, Jennifer A. & Kim McDonough. 2017. Using shadowing with mobile technology to
improve L2 pronunciation. Journal of Second Language Pronunciation 3(1). 34–56.
Galante, Angelica & Ron I. Thomson. 2017. The effectiveness of drama as an instructional
approach for the development of second language oral fluency, comprehensibility, and
accentedness. TESOL Quarterly 51(1). 115–142.
Gatbonton, Elizabeth, Pavel Trofimovich & Michael Magid. 2005. Learners’ ethnic group
affiliation and L2 pronunciation accuracy: A sociolinguistic investigation. TESOL Quarterly
39(3). 489–511.
Grimshaw, Jennica & Walcir Cardoso. 2018. Activate space rats! Fluency development in a
mobile game-assisted environment. Language Learning & Technology 22(3). 159–175.
Huensch, Amanda. 2019. Pronunciation in foreign language classrooms: Instructors’ training,
classroom practices, and beliefs. Language Teaching Research 23(6). 745–764.
Kang, Okim & Meghan Moran. 2014. Functional loads of pronunciation features in nonnative
speakers’ oral assessment. TESOL Quarterly 48(1). 176–187.
412 Tracey M. Derwing
Kissau, Scott. 2006. Gender differences in motivation to learn French. The Canadian Modern
Language Review 62(3). 401–422.
LeVelle, Kimberly & John Levis. 2014. Understanding the impact of social factors on L2
pronunciation: Insights from learners. In John M. Levis & Alene Moyer (eds.), Social
Dynamics in Second Language Assessment, 97–118. Berlin: de Gruyter.
Martin, Ines A. & Lieselotte Sippel. 2021. Is giving better than receiving? The effects of peer
and teacher feedback on L2 pronunciation skills. Journal of Second Language
Pronunciation 7(1). 62–88.
Marx, Nicole. 2002. Never quite a ‘native speaker’: accent and identity in the L2 – and the L1.
Canadian Modern Language Review 59(2). 264–281.
Meyers, Colleen. nd. Mirroring. https://Pronunciationforteachers.com (accessed 12 June 2022).
Munro, Murray J. & Tracey M. Derwing. 1998. The effects of speech rate on the comprehensibility
of native and foreign accented speech. Language Learning 48(2). 159–182.
Munro, Murray J. & Tracey M. Derwing. 2006. The functional load principle in ESL
pronunciation instruction: An exploratory study. System 34(4). 520–531.
O-Brien, Mary G., Tracey M. Derwing, Catia Cucchiarini, Deborah M. Hardison, Hans Mixdorff,
Ronald I. Thomson, Helmut Strik, John M. Levis, Murray J. Munro, Jennifer A. Foote &
Greta M. Levis. 2018. Directions for the future of technology in pronunciation research
and teaching. Journal of Second Language Pronunciation 4(2). 182–206.
Piller, Ingrid. 2002. Passing for a native speaker: Identity and success in second language
learning. Journal of Sociolinguistics 6(2). 179–206.
Rojczyk, Arkadiusz. 2015. Using FL accent imitation in L1 in foreign-language speech research.
In Ewa Waniek-Klimczak & Miroslaw Pawlak (eds.), Teaching and Researching the
Pronunciation of English, 223–233. Cham, Switzerland: Springer.
Ruivivar, June & Laura Collins. Nonnative accent and the perceived grammaticality of spoken
grammar forms. Journal of Second Language Pronunciation 5(2). 269–293.
Subtirelu, Nicholas. 2013. What (do) learners want (?): A re-examination of the issue of learner
preferences regarding the use of ‘native’ speaker norms in English language teaching.
Language Awareness 22(3). 270–291.
Thomson, Ronald I. 2018. High Variability [Pronunciation] Training (HVPT): A proven technique
about which every language teacher and learner ought to know. Journal of Second
Language Pronunciation 4(2). 207–230.
Thomson, Ronald I. 2022. English accent coach [online game]. Retrieved from www.englishac
centcoach.com (accessed 5 December 2021).
Thomson, Ronald I. & Tracey M. Derwing. 2015. The effectiveness of L2 pronunciation
instruction: A narrative review. Applied Linguistics 36(3). 326–344.
Timmis, Ivor. 2002. Native speaker norms and International English: A classroom view. ELT
Journal 56(3). 240–249.
Varonis, Evangeline & Susan Gass. 1982. The comprehensibility of nonnative speech. Studies
in Second Language Acquisition 4(2). 114–146.
Werker, Janet F. & Richard C. Tees. 2002. Cross-language speech perception: Evidence for
perceptual reorganization during the first year of life. Infant Behavior and Development
25(1). 121–133.
Yates, Lynda. 2022. Workplace communication. In Tracey M. Derwing, Murray J. Munro &
Ronald I. Thomson (eds.), The Routledge Handbook of Second Language Acquisition and
Speaking, 359–371. Abington, UK: Routledge.
Index
Accented speech 85, 87–103, 141, 209, Consonants 4–6, 13, 15–17, 28, 30–31, 35,
256, 401 41, 44–46, 48–59, 64–78, 90, 95, 140,
Accentedness 2, 4, 85–95, 99–103, 108, 369 171, 179–180, 209, 214, 218, 235,
Acoustic-orthography interface 41 249–256, 259, 267–270, 276–278, 316,
Argentinian speaker 4, 85–86, 89, 91, 319, 326, 328–329, 390, 403–404
93–95, 97, 99 Contextual factors 180, 200–203, 206,
Aspiration 19, 21, 265, 269, 276 208, 213
Assessment 4, 85, 90, 92, 95, 96, 99, 102, Corrective feedback 6, 218, 287–288,
103, 108, 204, 205, 211–214, 217–220, 290, 293, 295, 297–301, 303–304,
251, 279, 288, 294, 297, 300, 302, 305, 306–307, 407
353, 354, 369, 371, 373, 374, 378, 382,
384, 385, 389, 390, 392 Discrimination task 41, 50, 52–55, 58–59,
Automatic speech recognition (ASR) 6, 287, 62, 64–69, 72
290, 294, 408 Dutch. See also Protocol of Dutch as L2 7,
212, 214, 216–219, 220, 233, 293, 315,
Belgian speaker 4, 85–86, 89, 91, 93–95, 320–341, 401, 408
97, 315, 321–322, 324, 338, 341 Dutch Association 202
Body functions 200–203, 204–209, 212 Dynamic system 7–8, 14, 110, 147–150,
Brazilian learner 3–4, 7, 148, 150–151, 162, 161, 164
345–347 Dynamic System Theory 108–109, 210,
Brazilian listener 114–116, 137 220, 399
Brazilian Portuguese 13–14, 24–25, 36, 107,
109–110, 120, 141, 349–350 Effect of task 41–79
Brazilian speaker 320 English /h/ 229, 231, 233–235, 237, 239,
Brazilian teacher 5 241, 243–245, 247
English consonant 45, 65, 180, 254, 259,
Carryover effect 374, 385, 389, 391 269, 319
Case study 6, 13, 15, 17, 19, 21, 23, 25, 27, English learners 13, 113, 230, 233
29, 31, 33, 35 English vowel 4–5, 49, 65, 147–150, 153,
Chinese speaker 4, 85–86, 89, 91, 93–95, 162, 164, 253, 317–318, 383
97, 99 English-speaking consultation (ESC) 5, 168
Classroom-based study 88 Exemplar Model 4, 14, 17–18, 20–21,
Collaboration 211–212, 221, 341, 408–409 31–32
Common European Framework of Reference Explicit instruction 72, 147, 149, 162, 163,
(CEFR) 89, 261, 324 167–170, 173, 183, 191, 233, 257, 259,
Complex System 210, 345–348, 352, 367 289 292, 294, 305–306, 407
Comprehensibility. See also Perceived L2
comprehensibility 2–5, 34–35, 85–103, Fluency 35, 85–90, 95–103, 139, 175–176,
107–119, 128–141, 170–172, 205–206, 179, 189, 404–406
209, 218, 220, 288, 291–292, 399–407 Fossilization 8, 249, 255
Computer-assisted pronunciation training French learner 234, 315, 326–327, 341
(CAPT) 290 French pronunciation 287, 298,
Consensus building 174 302–303, 307
https://doi.org/10.1515/9783110736120-016
414 Index
Generalizability 7, 242, 315, 317, 321–323, Japanese phonemes 250, 254, 276
338–339 Japanese speaker 6, 85, 92, 95, 96, 98–100,
Generalization test 317–318, 345, 349, 103, 172, 251, 254–260, 263
350–352, 354, 357–359, 362–363,
371–373, 376 L2 acquisition. See also Second language
Graduate students 50, 147, 167–168, acquisition 1, 3, 5, 7–8, 44, 51, 71, 211,
170–173, 177, 189, 390 231, 287
Grapheme-to-phoneme correspondence L2 development. See also Second language
(GPC) 229, 231 development 3, 141, 149, 161, 164, 210
Greek 4, 41, 44–45, 50, 71, 319 L2 learning 1–2, 4, 18, 22, 43, 71, 150, 169,
197–199, 202–203, 207, 211, 255
Haitian speaker 4, 107, 109, 113, 115, 117–119 L2 phonology 17–18, 20, 31, 41, 68, 231, 320
Heterotonic words 7, 345–367 L2 pronunciation instruction 1, 2, 181, 211,
High variability 3, 7, 315–341, 345, 349, 360, 371, 399
371, 383, 388, 404 L2 pronunciation teaching 2–3, 5, 8, 35, 41,
49, 85, 87, 108, 109–110, 141, 290
ICF 5, 8, 198–221 L2 speech perception 2, 4, 41, 43, 48–49,
ICF model 5, 8, 197–221 51, 101, 103
Identification task. See also Phoneme identi- L2 speech perception and production 43
fication task 41, 50, 52–55, 58–59, 62, L2 teaching 3–7, 141
64, 66–70, 73, 75, 315, 317, 323, Language teacher training 167
329–330, 332–333 Learner autonomy 287, 290
Immediate feedback 289, 294–295, Learner profile 321, 322, 330, 333, 335,
298–301, 352 339, 341
Improvement rate 265–271, 277 Lexicogrammar 85–86, 95–96, 98, 101, 103,
Information and Communication Technology 175, 177, 179
(ICT) 6, 249–279 Linguistic factor 4, 44–45, 85–86, 88–90,
Information communication technology (ICT) 95, 100–103, 204
training 250–252, 258, 262–268, 272, Longitudinal study 4–5, 107–109, 112, 120,
275–277 123, 128, 132, 137, 140–141, 147, 149,
Intelligibility 2–5, 7, 34–36, 72, 85–87, 150–151, 164, 345–346, 352, 388
107–141, 167, 171, 179, 197, 199, Long-term effect 7, 317, 321, 323, 339, 372,
203–220, 250, 254, 276, 288, 289, 292, 374, 384, 390–392
369, 399–408
Intelligible. See also Intelligibility, Compre- Meta-analysis 2, 370, 372, 390, 392
hensibility, Accentedness 2, 5, 86–87,
110–111, 119, 140, 169, 170–172, 179, Native Language Magnet Model 1, 4, 41, 68
181, 182, 212–213, 217–218, 287, 291, Needs analysis 174–175, 177, 189, 191
399–400 New sound 1, 43, 69, 255
International graduate students 167, Non-native contrasts 41, 315–316
170–173
Oral communication 140, 167–175, 177, 179,
Japanese language 250, 251–253, 254, 181, 189–191, 295
267–274, 275, 278 Orthography 3, 13–18, 24, 28–30, 35–36, 41,
Japanese learner 249, 253, 275 229, 231–235, 241–245
Index 415
Participation 50, 172–173, 199–209, Retention 7, 315, 317, 320–323, 365, 369,
211–215, 325, 331, 388, 391 371–380, 384–387, 390–392
Pedagogical implications 4–5, 71, 85, 101, Revised Speech Learning Model 1, 4, 14, 20,
243, 277, 303 22, 43
Perceived L2 comprehensibility. See also Robustness 7, 17, 31, 315, 317, 319, 321–341
Comprehensibility 88
Perceptual Assimilation Model-L2 (PAM-L2) Second language acquisition. See also L2
1, 370 acquisition 250, 287, 307, 399
Perceptual learning 370–371, 385–387 Second language development. See also L2
Perceptual plasticity 370, 386, 390 development 1, 147
Perceptual training 7, 315–317, 321–339, Segment 19, 21, 71, 100, 152, 186, 229, 231,
345–367, 369–392 243
Phoneme identification task. See also Identi- Self-video 249, 251, 262, 264–265,
fication task 41, 50, 52–55, 58–59, 62, 268–269, 271–278
64–70 Sound system 1, 43, 48, 69, 70, 257, 277
Phonetic training 1, 3, 7, 315–341, 370–375, Spanish language 109, 232–233, 292, 321,
383, 388, 391, 404 349–350, 355, 357, 360, 364, 366
Phonetic variability 13 Spanish learners 7, 258, 345–366, 405
Phonological encoding 234 Spanish speakers 4, 85–86, 88–89, 91, 378
Plural formation 3, 13–14, 17, 19–36 Speech and Language Therapists 202
Poland speaker 4, 85–86, 89, 91, 93–95, Speech rate 85–86, 90, 95–101, 180, 189,
97, 99 206, 217, 405
Post-test 265, 291–292, 317–319, 327, Speech recognition 6, 287, 290–291, 294,
330–338, 345, 352, 357, 359, 363, 297, 299, 318, 408
371–374, 380, 386–388, 390–391
Pre-test 269, 292, 317, 324, 327, 330, Task effect See also Effect of Task 41, 49,
331–339, 356, 360–361, 371–373, 380, 64, 68–69
386, 407 Teacher-training 167, 216, 250, 252
Production training 1, 370 Technological Pedagogical Content
Pronunciation instruction 1–3, 5, 71, 150, Knowledge (TPACK) 5, 8, 167, 169,
161, 163–164, 167, 169–170, 178–181, 183–184, 186–187, 191
183, 191, 199, 209–211, 257, 290, 292, Text-to-speech (TTS) 290, 297
306, 340, 371, 399, 400, 402–403, Transfer 15–17, 28, 31, 68, 85, 176, 204,
405–407 207, 220, 229–230, 242, 295, 370–373,
Pronunciation research 3, 102, 209, 229, 375, 385, 389–391
399, 410
Pronunciation training 3, 6, 71, 169, 177, Word dictation task 41, 51–52
180, 183, 187, 197, 219, 255, 257, 258, Word frequency 41, 44–45, 49, 67–68
280, 287–307 Word learning 229, 231–234, 237–244
Protocol of Dutch as L2. See also Dutch and Word length 4, 41, 43–79
Dutch Association 212 Word-picture matching task 6, 229, 233