Speech Intelligibility Problems of Sudanese Learners of English

Speech intelligibility problems of
Sudanese learners of English
An experimental approach
Published by
LOT phone: +31 30 253 6006
Trans 10
3512 JK Utrecht e-mail: lot@uu.nl
The Netherlands http://www.lotschool.nl
Cover illustration: An overlay of the IPA vowel chart of British English

(Received Pronunciation, from: Roach, Hartman and Setter 2006) and of
Modern Standard Arabic (from: Thelwall 1990)
ISBN: 978-94-6093-057-7
NUR 616
Copyright © 2011: Ezzeldin Mahmoud Tajeldin Ali. All rights reserved.

Speech intelligibility problems of Sudanese
learners of English
An experimental approach
PROEFSCHRIFT
ter verkrijging van

de graad van Doctor aan de Universiteit Leiden,
op gezag van Rector Magnificus prof.mr. P.F. van der Heijden,
volgens besluit van het College voor Promoties
te verdedigen op 19 april 2011
klokke 16.15 uur
door
EZZELDIN MAHMOUD TAJELDIN ALI

geboren te Showak, Soedan
in 1971
Promotiecommissie
Promotor: Prof.dr. Vincent J. van Heuven
Overige leden: Dr. Rias Z. van den Doel (Universiteit Utrecht)

Prof.dr. Colin J. Ewen
Dr. Maarten G. Kossmann
Prof.dr. Marc van Oostendorp
Dr. Dick Smakman
Contents
Acknowledgments xi
Chapter One: Introduction

1.1 Introducing the topic of this study 1
1.2 Statement of topic area 4
1.3 The significance of the study 5
1.4 The objectives of the study 6
1.5 Questions raised by the research 7
1.6 Experimental design and testing methods 7
1.6.1 Means of data collection 7
1.6.2 Speaker and listener groups 7
1.6.3 Intelligibility tests 8
1.6.3.1 Perception tests 8
1.6.3.2 Production tests 8
1.6.3.3 Selection procedure of a model Sudanese EFL learner 8
1.6.3.4 Written questionnaires 9
1.7 Chapterization 9
Chapter Two: Linguistic background and related literature

2.1 Contrastive analysis 11
2.1.1 Introduction 11
2.1.2 Acoustic and perceptual characteristics of vowels 13
2.1.2.1 English and Arabic vowels 13
2.1.2.2 Length feature 16
2.1.2.3 English and Arabic vowel formants 16
2.1.2.4 Predictions of learning problems 18
2.1.3 The consonants of English and Arabic 19
2.1.3.1 Phonetic symbols of English and Arabic 23
2.1.4 English and Arabic syllable structure 25
2.1.4.1 Consonant clusters in English 26
2.1.4.2 Sequential Constraints in clusters 27
2.1.5 Markedness Differential Hypothesis 28
2.1.6 Conclusion 31
2.2 Background and contribution of related studies 31
2.2.1 Language and Speech 31
2.2.2 Accent 32
2.2.2.1 Received Pronunciation (RP) 33
2.2.2.2 Feasibility of RP 33
2.2.2.3 Foreign accents and errors 34
2.2.3 Speech perception and production 35
2.2.4 Speech intelligibility 37
2.2.5 Tests of speech intelligibility 38
vi TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
2.2.5.1 The Modified Rhyme Test 38

2.2.5.2 Feasibility of the MRT 39
2.2.5.3 Speech Perception in noise test: SPIN-test 40
2.2.6 Confusion matrices 41
2.2.7 Contribution of previous studies 42
2.2.7.1 Learning problems of English vowels 42
2.2.7.2 Learning problems of English consonants 43
2.2.7.3 Consonant clusters 45
2.2.7.3.1 Learning problems of English cluster consonants 45
2.2.7.3.2 Phonotactic constraints across languages 45
2.2.7.3.3 Sonority Sequencing Principle 47
2.2.8 The effect of explicit knowledge 47
2.2.9 Miscellaneous issues 49
2.2.10 Summary 51
Chapter Three: Intelligibility of RP English to Sudanese listeners

3.1 Introduction 53
3.2 Method 54
3.2.1 Intelligibility tests used 54
3.2.2 Participants 55
3.2.2.1 Sudanese listeners of English 55
3.2.2.2 Native speaker of RP English 55
3.2.3 Overall structure of the test battery 55
3.2.3.1 Tests materials 56
3.2.3.2 Test procedure 57
3.4 Overall results 57
3.4.1 Vowels 57
3.4.2 Discussion 59
3.4.3 Onset and coda consonants 60
3.4.4 Discussion 63
3.4.5 Onset and coda consonant clusters 65
3.4.6 Discussion 67
3.4.7 Sentence (SPIN) test 68
3.4.8 Discussion 69
3.4.9 General conclusions 71
Chapter Four: Intelligibility of Sudanese English to Dutch listeners

4.1 Introduction 73
4.2 Objective 74
4.3 Participants 74
4.3.1 Sudanese speakers (university EFL learners) 74
4.3.2 Native speakers of English 75
4.3.3 Dutch listeners of English 75
4.3.3.1 Learning problems of English speech sounds 75
4.3.3.2 Motivation to test Dutch listeners of English 77
4.4 Intelligibility tests used 78
CONTENTS vii
4.5 Test battery 78

4.5.1 Material and overall structure 78
4.5.2 Recordings 79
4.5.3 Perception test procedure 80
4.6.1 Vowels 80
4.6.1.1 Results 80
4.6.1.2 Discussion and conclusions 83
4.6.2 Consonants 85
4.6.2.1 Results 85
4.6.3 Consonant clusters 91
4.6.3.1 Results 91
4.6.3.2 Discussion 95
4.6.4 Results and discussion of Speech Perception in Noise test (SPIN) 96
4.6.5 Correlations 97
4.6.6 Conclusions 100
Chapter Five: Intelligibility of Sudanese English to British and American

listeners
5.1 Introduction 101
5.2 Objective 102
5.3 Method 102
5.3.1 Intelligibility tests used 102
5.3.2.1 Sudanese speakers of English 103
5.3.2.2 Selection of a representative Sudanese EFL speaker 103
5.3.2.3 Native speaker of English 103
5.3.2.4 Native listeners of English: British and American listeners 104
5.4 Overall structure of the test battery 104
5.4.1 Tests materials 105
5.4.2 Test procedure 105
5.5.1 Vowels 106
5.5.1.1 Results 106
5.5.1.2 Discussion and conclusion 109
5.5.2 Consonants 110
5.5.2.1 Results 110
5.5.3 Consonant clusters 117
5.5.3.1 Results 117
5.5.4 SPIN sentences 121
5.5.4.1 Results 121
5.6 Correlations 123
viii TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
5.7 Conclusions 126
Chapter Six: Acoustic analysis of Sudanese-English vowels

6.2 Method 128
6.2.1 Material 128
6.2.2 Speakers 128
6.3 Procedure 129
6.3.1 Formant measurements 129
6.3.2 Vowel normalization 130
6.3.3 Duration Measurement 130
6.5.1 Vowels 131
6.5.1.1 Vowel space 131
6.5.1.2 Discussion 133
6.5.1.3 Results and discussion of vowel duration 134
6.5.1.4 Automatic classification of L1 and L2 vowels 136
6.5.1.5 Conclusions 140
Chapter Seven: Acoustic analysis of English obstruents

7.2 Objective 142
7.3 Methods 142
7.3.1 Material 142
7.3.2.1 Sudanese EFL learners 142
7.3.2.2 Native speakers of RP English 142
7.4 Procedure 143
7.4.1 Test battery 143
7.4.2 Praat 143
7.5.1 English plosives 144
7.5.2 Acoustic features of English plosives 144
7.5.3 Spectral preparation 144
7.5.4 Voice onset time 145
7.5.5 Preceding vowel duration 148
7.5.6 Duration of consonants 150
7.5.7 Peak intensity 151
7.5.8 Centre of gravity 152
7.5.9 Conclusions 157
Chapter Eight: Acoustic analysis of English consonant clusters

8.2 Objective 160
8.3 Participants 160
8.3.1 Sudanese EFL learners 160
CONTENTS ix
8.3.2 Native speakers of RP English 160

8.4 Methods 160
8.4.1 Material 160
8.4.2 Test battery 161
8.4.3 Praat 161
8.5 Results of Cluster production 161
8.5.1 Onset clusters 163
8.5.2 Coda clusters 166
8.6 Discussion and conclusions 167
Chapter Nine Intelligibility assessment: written questionnaires

9.2 Objective 172
9.3 Subjects 172
9.4 The construction of the students and teacher questionnaires 175
9.4.1 Test content 173
9.4.2 Format and structure 173
9.4.3 Test procedure/apparatus 174
9.5 The scoring procedure 174
9.6.1 Results of the student questionnaire 174
9.6.1.1 General matters 175
9.6.1.2 Perception of English speech sounds 175
9.6.1.3 Production of English speech sounds 176
9.6.2 Results of teacher questionnaires 177
9.6.2.1 General matters 177
9.6.2.2 Perception of English speech sounds 178
9.6.2.3 Production of English speech sounds 179
9.7 Correlation between student and teacher judgments 180
9.8 Discussion and conclusions 181
Chapter Ten: Conclusion

10.2 Summary 184
10.3 Conclusion 185
10.3.1 Nature of speech intelligibility problems 185
10.3.2 Intelligibility of the Sudanese EFL learners to native speech 186
10.3.3 The most difficult sounds 188
10.3.4 Linguistic causes of intelligibility problems 189
10.3.4.1 L1 and L2 inventory differences 189
10.3.4.2 Lack of explicit knowledge aggravates intelligibility problems 189
10.3.4.3 Procedure of error analysis 189
10.3.5 Pedagogical implications of Error Analysis 192
10.3.6 Findings of the acoustic analysis of English speech sounds 193
10.3.6.1 Acoustic analysis of English vowels 193
10.3.6.2 Acoustic analysis of English consonants 194
x TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
10.3.6.3 Acoustic analysis of consonant clusters 194

10.3.6.4 Findings of the written questionnaires 195
10.3.7 Recommendations 195
10.3.7.1 Focus on speech sound production in isolation and in context 195
10.3.7.2 EFL teachers need a specific assistance 196
10.3.7.3 Experimental approach to problem solving 196
10.3.7.4 Use language lab to teach a foreign language 196
10.3.8 Suggestions for further studies 196
References 199
Samenvatting 211
Summary in English 215
Appendices (numbered separately by chapter)

3.1 Stimuli of identification test of English vowels 219
3.2.a Stimuli of identification test of English onset consonants 220
3.2.b Stimuli of Identification test of English coda consonants 220
3.3 Stimuli of Identification test of English clusters 221
3.4 Stimuli of Identification test of English SPIN sentences 222
3.5 Answer sheet of the identification test of English vowels 223
3.6 Answer sheet of the identification test of English consonants 225
3.7 Answer sheet of the identification test of English consonant clusters 227
3.8 Answer sheet of the identification test of English SPIN sentences 229
4.1 Answer sheet of the identification test of English vowels 230
4.2 Answer sheet of the identification test of English consonants 233
4.3 Answer sheet of the identification test of English consonant clusters 236
4.4 Answer sheet of the identification test of English SPIN sentences 239
6.1 Table of English vowel duration produced by Sudanese EFL learners 241
6.2 Table of English vowel duration of native speakers of RP English 242
7.1 VOT values of English stops produced by Sudanese speakers 243
7.2 COG values of English obstruents by Sudanese speakers 243
7.3 Preceding vowel durations of Sudanese EFL learners and native
speakers of RP English 244
7.4 Mean durations of English consonants of Sudanese EFL learners and
native speakers of RP English 245
7.5 Relative intensity rates of English consonants 246
9.2.a Student paper-and-pencil questionnaire 247
9.2.b Teacher paper-and-pencil questionnaire 253
Curriculum vitae 257

Acknowledgments
I owe gratefulness to the Leiden University Fund (LUF) and the Leiden University
Centre for Linguistics (LUCL). Their commitment solving many of the practical
problems was invaluable support for this study.
I wish to thank many of my professional colleagues: Willemijn Heeren, Jurriaan

Witteman, Franciscka Scholtz, Rongjia Cui, and Yiya Chen for their encouragement and
endurance to answer many questions. Special thanks also extend to Richard Todd,
Kristen De Joseph, Elinor Croxall and Mohammed Alsulami for their help and
participation in recordings, experiments and proof reading.
In the first years of my stay, I enjoyed sharing coffee and chat with Rob Goedemans,
Maarten Hijzelendoorn, Jurgen van Oostenrijk, Ellen van Zanten, Chaoju Tang, Sandra
Barasa, Vincent van Heuven, Jos Pacilly and others. Special thanks are also due to the
staff members of the Sudan Embassy in the Netherlands, and many Dutch, foreign and
Sudanese friends in Leiden. They were friends indeed when help was needed.
Thanks go to the library staff at Leiden University for being so helpful offering service
in every respect.
I would like to express my gratitude to the staff members at the Ministry of Higher
Education in Sudan and at Gadarif University for the scholarship I received during my
stay in the Netherlands, and to the contact persons at the Ministry of Higher Education
for their patience listening to my telephone calls.
Thanks go to my students and professional colleagues at Gadarif University in Sudan

and to SULTI staff members, teachers of English at Gadarif and Showak secondary
schools and the Department of Linguistics at Khartoum University for their parti-
cipation in experiments.
Special thanks go to my family for encouragement.

Chapter One
Introduction
1.1 Introducing the topic of this study
Instructors of English as a foreign language (EFL) aim to help their students to achieve
successful communication, using the language. The students may use English to
perform various communicative tasks such as assignments, debates, passing
examinations and so on, as part of their daily work during a semester. They also need
English to engage in complicated communicative activities in real life, such as
communication for job interviews, and academic and professional pursuits. Therefore,
it is fair to conclude that the task of achieving effective communication by EFL learners
is complex, which requires mastering many language skills such as listening, reading,
writing and speaking, pronunciation and comprehension abilities. However, in this
context, pronunciation, comprehension and listening abilities will receive more
attention than the other skills which have very much to do with the study at issue. The
learners need to produce accurate speech sounds as well to show high abilities in
comprehension, when they are involved in interactions. However, for various reasons,
learners of English as a foreign language have problems making themselves intelligible.
They either fail to understand the message conveyed by speech or to pronounce
English intelligibly. Many language studies are now attempting to investigate this type
of speech learning problem.
As everywhere in the world, the study of speech intelligibility problems of English has
recently emerged as a rapidly growing issue of inquiry, extending across different
disciplines of language teaching, in Sudan. Researchers and language teachers have
approached many English language issues such as syntax, lexis, comprehension, reading
and other skills (see e.g. Towards a functional approach to the English research on the writing
skills in Sudan – Abdalla 2005, 2001 and Vocabulary learning strategies: A case study of
Sudanese learners of English – Ahmed 1988). Their accounts indicate that much effort has
been expended in these areas. However, relatively little empirical investigation has been
done on English speech perception and production problems, in Sudan, except for a
few studies and text-books that provided reviews in a more impressionistic manner
(English pronunciation for Arabic speakers – Mitchell and El Hassan 1993 and Errors in
English among Arabic speakers: Analysis and Remedy – Kharma and Hajjaj 1989).
Impressionistic views such as these only inform the scholarly reader about the topic
under investigation in descriptive terms and provide virtually no practically oriented
reference methodology. Therefore, they do not contribute effectively to the solution of
the problem. One similar example is that examining issues such as pronunciation or
intelligibility problems using only written tests, which ask candidates overt questions
2 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
about the types of problems they face in recognizing and pronouncing, for example,
English phonemes or words, will mostly provide inadequate findings.
Investigation problems like these are often a by-product of inappropriate research

methods used and unsuccessful formulation of goals, which may affect the type of data
required. The teaching methods used can often contribute to the argument of such a
problem, particularly when they fail to provide precise descriptions of the nature of the
learning problem. In point of fact, empirical research training is necessary which helps
candidates acquire the new research skills. For example, the involvement of technology
in language research helps researchers to expand their studies to include new contexts
of learning and methods of data collection (e.g. in the field of phonetics, speech
processing software such as Praat, see Boersma and Weenink 1996, is an invaluable
tool). It will enable the researchers to account for issues like the acoustic correlates of
the speech sounds of the second language (L2). Thus, the success of language research
mainly depends on the appropriate procedure pertaining to data collection; the nature
of the topic determines the type of procedure to be used.
This study attempts to implement an experimental approach to the investigation of the

speech intelligibility problems of Sudanese EFL learners. It focuses on a two-sided
topic, which involves receptive and productive intelligibility, whereby participants are
placed in communication acts. The interlocutors must fulfil two interdependent
requirements: first to be clearly understood by producing accurate speech sounds and
second, by showing a high capacity of speech comprehension whenever they are
involved in daily life interactions (Carrell and Tiffany 1960). Moreover, investigative
attempts of this study account for the extent to which linguistic factors can impede the
achievement of intelligible speech by Sudanese university EFL learners. The most
prominent previous study carried out recently is Communication problems facing Arab
learners of English (Rababah 2005).
More specifically, as it has been argued that phonemes form the basic sound knowledge
of speech, the study focuses on segmental analysis of the English speech sounds
spoken by Sudanese EFL learners. Recently, comprehensive literature surveys have
been carried on segmental analysis, e.g. assessing constraints on second-language
segmental production and perception (e.g. Flege 2003). In L2 pedagogy, intelligibility
and speech perception are issues that motivate many investigations targeting segmental
difficulties which are experienced by ESL/EFL learners. 3 Segmental analysis
approaches the measurement of intelligibility in phonological and phonetic terms that
closely relate to differences between the sound systems of the learners’ L1 and L2. This
evokes the argument that phonemic differences may well represent the most difficult
learning problems experienced by EFL learners: many related studies reveal the
influence of phonemic variation across languages (do Val Barros 2003).
Different considerations necessitate the use of segmental analysis in this context. Firstly,
3
A distinction is often made between English as a second language (ESL) and English as a
foreign language (EFL). In the former case English is the dominant language in the learning
environment, e.g. when an immigrant has to learn the language in England. In the latter case the
new language is learnt in the learners’ country of origin, typically as part of the school curriculum.
CHAPTER ONE: INTRODUCTION 3
there are many differences between the phonemic systems of English and Arabic. The
Arabic vowel system distinguishes only three vowels, viz. /C, W, K/. These vowels are
mostly unwritten (or marked by diacritics on consonant symbols) and represent short
vowels. They are not part of the Arabic alphabetic or ordinary spelling; the vowels are
inferred from context. The vowels perform a morpho-phonemic function in Arabic
word formation (Hayat 2005, Alan 1997). They also function to mark inflectional
categories such as tense, gender and number, which reveals the nature of the Arabic
non-concatenative morphological system underlying deep phoneme regularities (Ken-
stowics 1994). The situation is different in English, which has a large number of vowels
of a more complicated nature, comprising pure vowels (or monophthongs) as well
diphthongs (such as /G+/, /C+/, etc.), all of which may occur in accented as well as
unaccented syllables.
Secondly, there are acoustical differences between the English and Arabic vowel
systems. Vowel length is an important temporal feature distinguishing between vowels
in the two languages. In Arabic, length signals a short/long distinction (length in
relation to vowels is like gemination in relation to consonants). In English, some vowels
are long, e.g. /KÖ/ in seed and /#Ö/ in car, whilst others are short e.g. /+/ in fit and /G/ bed
(Mitchel 2004) but the long and short vowels are also distinguished by difference in
phonetic vowel quality (determined by degree of mouth opening, constriction place
along the front-back dimension and degree of lip rounding). This reinforces the
argument that phonologically, the durational differences in Arabic vowels are
independent of vowel quality, whilst in English, durational differences are not and do
not necessarily have a systematically orthogonal relation to quality differences (De Jong
2004). Similar phonemic and acoustic differences exist between the English and Arabic
consonant inventories. Arabic has complicated phonological features such as emphasis
and gemination, which may not correspond to those of English.
Thirdly, phonological awareness of a second or a foreign language is necessary for the

achievement of successful communication. That is, in learning a second language,
learners must have thorough phonological knowledge of the language. For instance, it
is often helpful to learn the speech sounds of a language as isolated sound units and to
know how such sounds may change in different contexts. It is also important to know
what aspects of acoustic features are relevant to the listeners for perceiving the sounds
of speech. These phonological principles must be fully understood by anyone interested
in learning a second/foreign language. Many linguists confirm that awareness of these
principles will help avoid deviation from the native norm of L2 phonemes which may
affect both the perception and production of English speech. Moreover, well-learned
speech sounds constitute the foundation of intelligible speech.
The theme sketched above has recently motivated researchers of ESL/EFL (e.g.
Strange, Bohn, Trent and Nishi 2004, Wang and Van Heuven 2006) to conduct
experimental analyses of the English vowel system as spoken and perceived by native
and non-native speakers. In the current study, experimental analysis has been
conducted targeting receptive and productive speech intelligibility. The analysis covers
perception tasks which treat receptive intelligibility. It also covers important properties
such as the graphical representation of the vowel space and temporal structure of
English vowels, consonants identified and produced by Sudanese speakers dealing with
productive intelligibility. The final objective of the investigation of intelligibility

problems targets RP speech sounds, which are responded to and/or produced by
Sudanese university EFL learners.
1.2 Statement of topic area
The topic of this research is to investigate speech intelligibility problems that Sudanese
EFL learners face at university. Investigation attempts to account for the extent to
which linguistic factors can impede receptive and productive intelligibility of the
English speech sounds. Linguistic factors herein refer to (i) L1 interference and (ii)
awareness of English speech sounds. Concerning the first factor, L1 interference, the
purpose is to examine to what extent the learners’ mother-tongue obstructs their
learning of the English speech sounds. This is because Sudanese EFL learners arguably
experience difficulty in identifying and producing the English speech sounds (i.e.
vowels, consonants and clusters) due to transfer of their L2. These problems have been
addressed before, but from an impressionistic point of view. One of these studies
suggests that Sudanese learners of English have difficulties perceiving and producing
the English vowels (Mohammed 1991). Another study (Bobda 2000) more specifically
claims that the production of the English vowels /«, «Ö, ¡/ forms a problem to Sudanese
EFL learners due to their Arabic linguistic background.
In a wider context, related studies report that Arab-speaking learners of English

generally have problems with the English vowels (Munro 1993, Brett 2004). Moreover,
do Val Barros (2003) found that Arab learners also have difficulty in pronouncing
English consonants /RXU6&\F<0/. Additional problems manifest themselves
when Arab-speaking learners are exposed to English onset clusters as in special, flow,
please, or coda clusters as in next, film (Carlisle 2001). Specifically, investigation of the
phonemic differences between languages is necessary, as these differences have
negative effects on the learning of L2 speech. Many studies of non-native speech talk
about the risk of reduced intelligibility, which arises due to phonemic differences,
particularly when actual practice of the second/foreign language is infrequent. The
ultimate result of such differences is that L2 learners fail to realise that two sounds in
the L2 are the manifestation of different categories of speech sounds. Mostly this
happens when two sounds occur distinctively on the phonetic surface of the target
language (L2) as but are close to a single category in the learner’s source language (L1),
(Flege 1976, 1995).
Lack of explicit knowledge of the English speech sounds represents the second factor
that is argued to cause intelligibility problems for Sudanese learners of English. It is
assumed that the learners’ explicit knowledge of English speech is insufficient, which
delays their recognition and production of these sounds. Explicit knowledge, in this
study, covers the articulatory and auditory awareness of English required to recognize
and produce English speech sounds. Articulatory knowledge includes learning to
produce the new sounds and this implies unfamiliarity of the learners with such speech
sounds. Therefore, the learners need to develop articulatory habits, which can be
acquired with more exercises or exposure. Awareness of this aspect of knowledge
enables Sudanese EFL learners to understand, choose and use L2 sounds efficiently in
interactions. The learners also need to know about the correct distribution of the
English speech sounds, in isolated words, or in connected speech, when they hear these
sounds in recordings or spoken, in their correct order in syllables or words, etc. The
learners may need to know the perceptual representations of speech sound patterns,
which are built from the auditory mapping information. This is because the perception
of specific L2 speech sounds can influence the identification of these sounds.
Perceptual confusion patterns may be indicative of the structure of the perceptual space
and the strategies used by L2 listeners. Furthermore, learners, in this study, need to
have some background in the acoustic and temporal cues of English, in so far as they
are important for phonemic distinctions. As the related literature shows, most of the
perception and production errors of English speech sounds are the result of the lack of
these aspects of knowledge (e.g., Mohammed 1991).
To find evidence with a realistic degree of certainty about the speech intelligibility
problems, an experimental approach will be adopted. For receptive intelligibility
measurements, I will implement auditory discrimination methods such as the Modified
Rhyme Test (MRT), which treats isolated stimuli (vowels, consonants and clusters),
read in a fixed carrier phrase (Say …..again), and the SPIN (Speech Perception in Noise,
see § 2.2.5.3) test, with the target words embedded in meaningful sentences. The MRT
tests the existence of categorical distinctions on the part of the listener; i.e. it is a
segmental intelligibility measurement (Flege 1976). When EFL materials are presented
to native listeners, the tests show whether the L2 production of the learners contains
contrasts of interest, e.g. rake vs. lake. Conversely, when native English materials are
presented to L2 listeners, the tests show whether or not the EFL learners know how to
make the relevant perceptual distinctions in the target language.
The study will also attempt to measure the acoustic correlates of the English speech
sounds produced by the Sudanese learners of English. Acoustic correlates include (i)
duration and phonetic quality (position in the F1-by-F2 formant space for vowels, (ii)
duration, voice onset time (VOT), centre of gravity and intensity for consonants and (iii)
duration of constituent sounds in consonant clusters. The aim of the measurements is
to find out how the differences spectral and temporal properties between Sudanese
EFL learners’ L1 (Arabic) and English can affect intelligibility. Finally, paper-and-pencil
questionnaires will be distributed to test Sudanese students’ and their instructors’
opinions on what difficulties they experience in learning English as a foreign language,
which intuitions may help to understanding the problems uncovered by the functional
tests.
1.3 The significance of the study
The study uses an experimental approach to examine segmental intelligibility problems

in English speech. One benefit of this approach is that it enables the researcher to
obtain an understanding of the learners’ abilities and weaknesses in English speech
perception and production. Such insight is more difficult to develop in the context of
an interview, or from a written text, neither of which allow information that treats the
issue in depth. For example, data on pronunciation problems collected by the use of an
interview, can only tell about the investigated area in a descriptive sense. That is, it
hardly gives real insight into such a subject, unlike the experimental conduct that may
provide more real and accurate feedback using technology.
In fact, few studies, in the Sudanese context, have approached the issue of speech
intelligibility problems in experimentally, as this study will do (see § 1.1). Most
investigations focus on English learning problems such as reading, writing, syntax,
listening skills, and so on. However, few studies treat the problems of English pro-
nunciation or perception problems among the Sudanese EFL learners in descriptive
terms.
Secondly, the involvement of native and non-native listeners/speakers as participants is

beneficial to the study. Native listeners/speakers of English represent control groups
whose judgments and observations can be used reliably to figure out the perceptive and
productive intelligibility problems of English speech sounds of my learners. In this
study, Dutch students will also be involved because they are highly successful listeners/
speakers of English (see Singleton and Lengyel 1992). The involvement of non-native
listeners in the experiments will expand the space to include assessments of non-native
speakers of English, which may add an effective contribution.
Thirdly, multiple methods are used in the investigation to substantiate data sources
increasing the reliability of this research. The use of several data sources and different
methods presented a sort of triangulation: i.e. a variety of methods in social sciences for
data collection. The idea behind triangulation is to contribute to agreement of different
data sources, which serves a more reliable interpretation of the data.
l. 4 The objectives of the study
This study aims to devote a greater care to speech intelligibility problems that are
experienced by Sudanese learners of English. Very little effort has been given to such
types of language problems. Even the specialized workshops (these are workshops
dealing with EFL problems in the Sudanese context) on speech intelligibility problems
have provided little information. Their findings provide insufficient accounts for the
problems concerned. Moreover, in terms of methods, these studies use a database
obtained by means of interviews, which give descriptions and impressions about
research problems, rather than results extracted from experiments. In general, the
current research attempts further investigation on the impediments of speech
intelligibility among Sudanese EFL learners. The study will act as a pioneer project in
the sense that it avails itself of experimental evidence for the issue under concern in
order to serve as a blueprint guideline for future attempts aiming to solve such types of
problems. Thus, the research specifies two goals:
(i) To identify the linguistic causes of intelligibility problems manifest among

Sudanese EFL learners as perceived by native listeners of English.
(ii) Test the intelligibility of vowels, single and cluster consonants of English perceived
and produced by Sudanese students of English based on experimental means.
1.5 Questions raised by the research
1. To what extent are Sudanese university EFL learners intelligible to native listeners
of English?
2. Are English vowels the most difficult to pronounce as opposed to consonants or
consonant clusters?
3. Which English speech sounds produced by Sudanese EFL learners do native
listeners find most difficult to recognize within each of the categories vowels,
consonants and consonant clusters?
4. What is the precise nature of the speech intelligibility problems observed among
the Sudanese learners of English?
5. What are the linguistic causes of such problems? More specifically,
Do the inventory differences between the learners’ L1 and the target language
present a major cause of these problems?
Does insufficient explicit knowledge of the English sound system on the part of
EFL learners aggravate their intelligibility problems?
1.6 Experimental design and testing methods
This part provides a short description, which serves as a bird’s eye view of the research
methods and experimental design adopted in this study. However, more specific
information on the experimental design will be provided later in the separate chapters.
1.6.1 Means of data collection
Different ways of data collection are adopted in this study, which include perception
tests, production tests and written questionnaires. All the tests target English speech
sounds that include vowels, consonants, clusters, and words embedded in high-pro-
bability sentences (SPIN).
1.6.2 Speaker and listener groups
This study targets three groups of participants that descend from different linguistic
backgrounds. The Sudanese university EFL learners represent the test group, which
participated in all the experiments as listeners and/or speakers of English. Similarly,
native speakers of RP English are involved in the experiments as listeners/speakers of
RP English (model groups). American and Dutch groups of subjects participated in the
experiments as listeners only. More importantly, the selection of participants varied in
terms of nationalities, linguistic distance and the number of times they took the tests.
None of the individual subjects involved in these experiments, participated in any of
the perception, production tests or the questionnaires more than once. Furthermore,
recruitments for the perception tests include native listeners (students and professors)
of British and American English who answer the tests questions from inside The
Netherlands, while others answered them online from a distance, e.g. from Britain or
America. This variation in the recruitment criteria will contribute to the reliability of the
results.
1.6.3 Intelligibility tests
1.6.3.1 Perception tests
For the measurement of intelligibility of the subjects, the Modified Rhyme Test (MRT)
was used in all the perception tests, which is considered to be the most accurate and
reliable measurement of such an speech intelligibility (see Logan, Greene and Pisoni
1989). The MRT measures segmental intelligibility through a word identification task
employing a set of four-alternative forced choice test items.
1.6.3.2 Production tests
The production experiments serve to establish the acoustic correlates of English speech
sounds spoken by Sudanese EFL learners. The tests seek insight into the phonetic and
acoustic differences between the learners’ L1 and L2 in areas such as vowel duration,
voice onset time (VOT), centre of gravity, preceding vowel effect. Importantly, before
doing the production tests, the learners had a short training. Firstly, they were asked to
read three lists of key words of English including vowels, single consonants and clusters.
The aim of the key words was to guide the learners to the correct pronunciation of the
target phonemes. Secondly, the learners were instructed to pay special attention to
different types of vowels (lax vowels, tense vowels, diphthongs), to the contrast
between voiced and voiceless consonants, and to initial and coda clusters (see separate
chapters).
1.6.3.3 Selection procedure of a model Sudanese EFL learner
For the selection of a representative speaker from among a total number of 11

Sudanese EFL university learners and one native speaker of RP English, a sound quality
test was used. The purpose of the test was to identify within the peer group the learner
with the most representative (i.e. average) performance. This was done by asking 20
listeners to listen to recorded material (vowels, consonants and clusters of English read
by the Sudanese EFL speakers in context (Say ….again) and to assess the sound quality
(i.e. recording quality) and pronunciation of the speakers. These judges were asked to
click on one of four marks scaled from 0, 40, 50 and 100 (see answer sheets in
Appendices 3.1-4.4). Some of the speakers fragmented the sentences in three parts:
Say … xxx … again. Therefore, the pauses between Say and the target and between the
target and again were cut out to make the test faster and the speech more intelligible.
We also added background noise to the recordings of the native speaker of RP English
so that it would sound similar to those of the Sudanese EFL learners. To locate the
most representative speaker, the mean of individual subjects was computed and then I
selected the speaker whose evaluation was closest to the group mean.
1.6.3.4 Written questionnaires
The third important means of data collection were written questionnaires that invited
Sudanese EFL learners and their teachers to voice their subjective opinion as to what
difficulties they experienced in correctly producing and perceiving English sounds and
sound combinations. The availability of this type of data may afford a better
understanding of the topic under investigation. For one thing, It also forms one of
relaxed technique of data collection, which offers the subjects an opportunity to think
and write down their answers.
1.7 Chapterization
The remainder of this study consists of nine chapters arranged as follows:
Chapter 2 includes two sections. Section one provides a contrastive analysis of the
English and Arabic phoneme inventories, describing differences and similarities
between the two systems. On the other hand, section two provides a linguistic
background and reviews the contributions of relevant literature.
Chapter 3 investigates the receptive speech intelligibility problems of Sudanese EFL

learners. It explores the extent to which these learners are able to identify the speech
sounds of native English, and tries to identify possible linguistic causes of these
receptive problems.
Chapter 4 investigates the productive speech intelligibility problems of the EFL

Sudanese learners. It attempts to find out what linguistic causes compromise the
intelligibility of the learners to Dutch listeners of English.
Chapter 5 treats the speech intelligibility problems of Sudanese EFL learners when their
speech production is presented to native listeners of English (British and Americans).
The chapter aims to establish the extent to which Sudanese EFL speech is less
intelligible to native English listeners than native English speech.
Chapter 6 reports an acoustic analysis of the English vowels spoken by Sudanese

learners and native speakers of English, discussing the acoustical differences that exist
between the two data sets. The differences found may provide insight into the speech
properties in the Sudanese speakers’ accent that are most detrimental to their
intelligibility for native English listeners.
Chapter 7 provides an acoustic analysis of the English consonants spoken by Sudanese

learners of English contrasting the results to control data produced by native English
speakers.
Chapter 8 performs an acoustic analysis of the English consonant clusters produced by

Sudanese learners of English discussing the findings in relation to control data
produced by native English speakers.
Chapter 9 discusses impressions and assessments of Sudanese students and teachers of

English collected by means of written questionnaires, in an attempt to establish the
extent to which the students and teachers are aware of the existence of receptive and
productive intelligibility problems with Sudanese EFL and their causes.
Chapter 10 presents a summary of the research and its findings. It discusses the
implications of the findings for current views on the role of native-language
interference in second-language acquisition, and makes recommendations for future
research.
Chapter Two
Linguistic background and literature

2.1 Contrastive analysis
2.1.1 Introduction
The difficulty of learning the phonological categories of a target language has received
much discussion in second-language studies. Brière (1966) attributes the learning
problems of phonological categories to the competing phonemic categories of L1 and
L2 systems, the allophonic features of the phonemes and the distribution of these
categories within their respective systems. Therefore, the presence or absence of these
features plays an important role in the learning of L2 speech sounds. That is, the higher
the degree of similarity that exists between the phonological systems of the source (L1)
and the target (L2) language, the easier it is for the second or foreign language speaker
to learn the phonological categories. In this sense, the hypothesis of a phonological
system of a language does not only refer to the sounds of such a language, but a
combination of distinctive and non-distinctive features that may cause interference.2 L1
interference affects the learning of L2 speech sounds in two ways. The learners tend to
pick up only the distinctive features and to ignore the redundant. They also tend to
interpret the target sounds in terms of the features of their L1 sound system. However,
in another account of interference it is argued that it is easier for second-language
2
Interference is a language phenomenon that refers to the transfer of L1 rules to the learning of
L2. In the learning theories of second language (Flege 1995), a sort of language filter occurs in
the learning process of a second/foreign language where the norms of L1 may facilitate learning
or inhibit it. In the case of similarities, L1 norms facilitate the learning of L2 through positive
transfer. However, a negative effect often takes place and this is normally associated with the
differences in L1. This negative transfer is also called interference (Miller 1981). Native speakers
can identify foreign accents that appear in the speech produced by L2 speakers. Therefore,
pronunciation errors of second-language learners do not just present random attempts to
produce unfamiliar sounds but rather reflect the sound inventory, rules of combining sounds, and
the stress and intonation patterns of their native languages (Ohata 2007).
speakers to learn an entirely new phoneme that is absent from their mother tongue than
to learn a sound that partially resembles an L1 sound. All in all, learning problems of L2
phonemes occur when a second-language speaker starts from the assumption that L2
speech sounds are the same as those of his/her L1. In this sense, the learners start by
using their L1 perceptual strategies in recognizing or producing the new language
sounds. Contrastive analysis is a branch of linguistics that seeks to identify the types of
phonological errors that EFL/ESL speakers make when perceiving and pronouncing a
second or foreign language. Moreover, contrastive analysis makes predictions with
regard to a hierarchy of difficulties, which is based on the new phonemes, new
allophones and new sequences, i.e., those aspects that stand out as the distinctive
properties of the target language (Brière 1966, Hoffer 1970). The phonetics of a
language should also be considered since it causes many of the difficulties facing the
ESL/EFL learners. Contrastive analysis in this section aims to make predictions about
the types of errors that might be a true reflection of learning problems. It attempts to
show the degree of dissimilarity between the sound inventory of English as the target
language, and of Arabic, which is the first language of Sudanese EFL learners.
Importantly, the discussion of Arabic considers both Sudanese colloquial and Modern
Standard Arabic (MSA), which starts with MSA and then moves on to Sudanese
Colloquial Arabic (SCA) discussing differing areas. In the present research it would
seem impossible to make a unique choice between the two varieties, i.e. MSA and SCA,
as the EFL learner’s native language background, and it is precisely for this reason that
I will assume that both varieties have to be considered together when accounting for
learning problems experienced by Sudanese students of English. Several considerations
have led me to this decision.
Firstly, MSA forms the common base from which the phonemes of Arabic dialects
stem. Secondly, as a part of the educated class, the learners’ everyday communication is
not totally free from MSA. The learners’ shift to MSA may arguably influence their
colloquial Arabic serving to reduce the differences existing between Sudanese colloquial
and Modern Standard Arabic. Thirdly, the context of Arabic linguistics is characterised
by what is known as diglossia, i.e., a language phenomenon that refers to two varieties
of a language used adjacently. The two varieties at issue are MSA and the spoken
vernaculars. Vernaculars which are used in everyday communication across the Arab
world, are characterized as more mutable and flexible forms of language than MSA.
This language reality reinforces the argument that the existence of two varieties side by
side serves to narrow the distance between these varieties. The reality of Sudanese
Arabic supports these arguments, which witnesses only a narrow change of its sounds.
Its vowel inventory developed /G/ and /Q/. As for consonants, /&, \/ are merged into
/\/ and /6, U/ into /U/ whilst /S/ is pronounced /I/. Furthermore, Sudanese Arabic
permits no diphthongs or consonant clusters at all (details in §§ 2.1.2.1, 2.1.3, 2.1.4). So,
it is possible to argue that all other vowels and consonants do not differ from MSA.
According to Ryding (2005), the Arabic language context does not show a sharp
division between the written and spoken forms of Arabic varieties across the Arab
world as it might be the case in some other languages. There is a continuum of
language ordering, which runs from high to low. Thus, MSA takes the highest position,
followed by formal (a spoken standard form, see Long 1996) and colloquial varieties.
But this reality is conditioned by several factors such as the speakers’ academic
background and the use of Modern Standard Arabic as a means of communication on
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 13
television and other public media everywhere in the Arab world. Importantly, the use of
Standard Arabic and colloquial varieties side by side can eliminate dialects.
To the best of my knowledge, there is no comprehensive experimental work involving

Arabic phonetic and phonological properties as part of a contrastive analysis of L1 and
L2. Such a contrastive analysis would be very time consuming and it would have to be
done (at least) twice, i.e. for both MSA and for the learners’ local vernacular. In the
context of the present dissertation I have given priority to establishing the difficulties in
speech production and perception of Sudanese learners of English. The issue of
accounting for the difficulties experienced by these learners in terms of native language
interference was therefore dealt with not so much on the basis of L1 data collected by
myself, but on existing sources in the literature. When referring to the sound structure
of MSA, I will be able to rely on the results of experimental studies to some extent. In
the case of Sudanese Colloquial Arabic, however, my information will be limited to
impressionistic descriptions such as Dickins (2007) or even unpublished conference
papers such as Abdalla (2001).
2.1.2 Acoustic and perceptual characteristics of vowels
Vowels are characterized by a free passage of the air stream. It is possible to describe
and feel movement and posture of the tongue and the relatively passive surface of the
vocal tract of vowels; however no closure or strictures occur when vowels are produced.
Importantly, there is a need to make use of the auditory and articulatory means of
perception and description of vowels. In this context, the ear is all-important for this
task since speech can be seen as a matter of input and output.
2.1.2.1 English and Arabic vowels
Arabic is a language which has a small inventory of vowel sounds. Its vowel system is a
classical triangular system that maintains the Proto-Semitic vocalism represented as
open, close front, close back: /C, W, K/ (see Figure 2.1), each of which may be short or
long (geminated) (Kaye 1997). These three vowels are often described as diacritics,
which refer to special unwritten marking interpreted as short /C, W, K/. 3 Munro (1993)
reports similar descriptions according to which Standard Arabic has three basic short
vowels /K, W, C/ of which /K / is realized as /+/, /W/ as /7/ and /C/ as /3/, but he adds
that there are five long vowels realized as /KÖ, GÖ, CÖ, WÖ, QÖ/. Arabic has only two di-
3 In the Arabic script, the harakat (diacritic marks) are special unwritten marks (they are not part
of Arabic alphabetic or ordinary spelling, but understood from context) which represent short
vowel sounds /C, K, W/. The literal meaning of harakat is ‘movements’, e.g., in the context of
moving airwaves that we produce while pronouncing vowels. Diacritic marks stand for English
lax vowels /C, K, W/ (Chomsky and Halle 1968, Hayat 2005, Alan 1997). This characteristic affects
the ability of Arab learners of English to extract and process the English vowels, which form part
of English words. That is, Arabic orthography of the daily newspapers do not use diacritics.
Native speakers of Arabic focus on only consonants, the structure of which encodes the roots
with general semantic value. This process cannot be applied to vowels in English (Fender 2008).
phthongs /CW, G+/ (Hayat 2005). However, Mitchell (2004) states that the diphthongal
feature is absent from the Arabic speech sound system. There is a variation in Arabic
vowels across Arabic dialects. According to Dickens (2007) the Sudanese vowel
inventory contains five short vowels /K, W, C, G, Q/ and five long vowels /KÖ, WÖ, CÖ, GÖ, QÖ/,
which uncontroversial form an extension of the short vowels (see also Munro 1993,
Raimy 1997). However, in Sudanese Arabic, /G/ is also realized as a reduced form of
/G+/, whilst /Q/ is a reduced form of /C7/ and often realized as /W/. Moreover, in
Sudanese urban Arabic there is alternation between /K/ and /L/ on the one hand, and
between /W/ and /Y/ on the other, depending on the position of /L/ or /Y/ in the
syllable. Since no vowels are possible in initial position in Arabic, the alternation is
analysed as an underlying phoneme /M/ which is realized as /K/ in nucleus position but
remains a consonant /L/ in marginal position. Similarly, /Y/ is realized as /W/ in the
nucleus and as a consonant /Y/ in peripheral position. However, Sudanese Arabic /L/
can often be represented well by /KÖ/ rather than /K/, whilst /Y/ can be represented by
/WÖ/ rather than /W/ in nucleus position. This account is clear in some Arabic words
such as /KÖF/ ‘an annual occasion (festival in Arabic culture)’ versus /KF/ ‘water well’
and /IWÖN/ (‘say’ in Sudanese Arabic) versus /SWN/ (‘say’ in MSA) and so on.
Figure 2.1. Arabic vowels as described in classical triangular Proto-Semitic (after Kaye 1997).
The figure stands for the original Arabic vowel system which forms the base of Modern Standard
Arabic and other Arabic dialects.
Importantly, as sound properties, Arabic vowels play an essential role, e.g., in syllable
and word formation: i.e. they do not bear meaning like consonants, but they represent
connectors in word structure. This means that in word structure, vowels form
constituent morphemes sprinkled through the word rather than taking place as
continuous segments. This characteristic is clear in word families such as (darasa ‘he
studied’) and (hamala ‘he carried’) where the a-vowels are inflectional affixes. It is worth
noting that, in these families of semantically related words, the only constant formal
property is that each stem has three consonants in a fixed order (drs and hml,
respectively). Vowel-consonant interspersion, in a way, reveals deep regularities of the
nature of Arabic vowels and how/where they work. It also reveals that the distribution
of consonant versus vowels in Arabic is determined by the CV template that
characterizes the morphological categories a given word belongs to; i.e. it marks the
inflectional categories such as tense in verbs and number on nominal cases, etc.
(Kenstowicz 1994, Frisch 1996, Nwesri, Tahaghoghi and Scholer 2006). In English, the
consonant versus vowels distribution is lexically contrastive (cf. VCC art, CVC rat and
CVCC taunt).
On the other hand, the English vowel system is complex (see Figure 2.2 below). It
consists of nineteen (or even twenty) vowel phonemes. These include eleven (twelve if
// is accepted as a separate phoneme) pure vowels (or: monophthongs) and eight
diphthongs in stressed position which can be categorized in different terms. Vowel
production involves the position of the lips, the tongue, the parts of the tongue used
and the degree of raising. With respect to tongue position in the mouth, there are three
distinctions in RP English; i.e., front vowels /KÖ, +, G, 3/, central vowels /«Ö, ¡/ and back
vowels /WÖ, 7, nÖ, b, #Ö/. The back vowels are rounded while the front and mid vowels are
unrounded. In terms of the degree of tongue height, /KÖ, +, WÖ, 7/ are high vowels, /G, «Ö,
nÖ/ are mid vowels and /3, #Ö, ¡, b/ are low vowels. The RP vowels are divided into
tense/long and lax/short vowels (force of articulation). This contrast is primarily one
of vowel quality; the difference in duration is only a secondary cue of the tense/lax
distinction. The tense vowels occur in both closed and open syllables whereas the lax
vowels may only occur in closed syllables. Tense vowels are accompanied by ‘Ö’ as a
length mark, such as in /WÖ, «Ö/. Importantly, the distinction between English short/
long vowels depends upon three oppositions, which make the task more complex (see
Dretzke 1998). Another important feature of English vowels is that the tense/lax vowel
tokens, i.e. /WÖ, 7/, can often be distinguished by quality alone as in foot/boot, quality and
quantity as in good/food, while the quality of /7/ has to be kept quite distinct from that
of a reduced form of /WÖ/. This feature can cause learning problems for ESL/EFL
learners whose native languages have a small vowel inventory. Additionally, English has
sequences of vowels included under a term called diphthongs. These are /G+, 7, C+, C7, n+,
+, G, 7/. Diphthongs are vowel sounds that have a glide within the syllable. The first
element in English diphthongs is called the starting point and the second is the one in
which the glide is made. The diphthongs mentioned above are illustrated by words such
as laid, load lied, loud, Lloyd, leered, laird and lured, respectively. The centring diphthongs
/+/, /G/ and /7/ are a prominent characteristic of British English (Mitchell 2004). A
number of generalizations apply to RP diphthongs. Their length is equivalent to that of
long vowels and they are susceptible to regional variation (Cruttenden 2008).
Figure 2.2. English pure (or: monophthongal) vowels (after Roach, Hartman and Setter 2006).
Vowels are described in relation to the tongue and lips positions. High vowels are in the top of
the chart, mid vowels in the middle and the low vowels appear in the lowest part of the chart.
The horizontal dimension captures the front (left) to back (right) distinction.
2. 1.2.2 Length feature
Vowel length presents an important temporal cue, which classifies vowels into short
and long tokens. In Arabic, all three vowels /C, W, K/ are subject to a short/long
distinction. Similarly, English possesses short/long contrast; however, in English,
vowel duration is influenced by the following consonant and other environmental
features (Mitchell 2004). In Arabic, short and long vowels are clearly different from
each other. Long vowels tend to be twice as long as the short ones. In this sense, there
is a possibility that short/long vowels of Arabic across dialects can correspond to
English equivalent short/long vowels (Munro 1993, Mitleb 1981). However, in Arabic
listeners/speakers may attend to more than just acoustic vowel duration to distinguish
between short/long vowels (Tsukada 2009). Arguably, Sudanese Arabic applies a
similar duration strategy distinguishing between short/long vowels.
2. 1.2.3 English and Arabic vowel formants
In terms of spectral properties, Arabic and English vowels show a variation of

differences. The back vowels of the two languages differ in the direction and extent of
F1 and F2 movement. The Arabic back vowels have higher formant F2 frequencies
compared to English. 4 Similarly, the English /3/ produced by Arab learners, tends to
4
For an explanation of vowel formants (resonances) see § 6.3. Here it suffices to know that the
lowest resonance frequency, F1, is reflects degree of mouth opening; the second lowest resonance
frequency, F2, is related to vowel backness and lip rounding.
be closer to Arabic /C/, however the front vowels /+~KÖ/ show no serious spectral
problems. In general, Arabic effects on L2 vowel production are pervasive in all vowels.
(Munro 1993). Specifically, Sudanese Arabic shows differences in vowel spectral
properties (i.e., L1 and L2 formant values). That is, the Sudanese Arabic long vowels /KÖ,
CÖ, WÖ/ show relatively lower F1 and F2 values compared with their English counterparts
(Elobeid and Maaly 1996).
Figures 2.3-4 provide a comprehensive survey on the F1, F2 and F 3 of Arabic and
English vowels. In general, the formants of Sudanese vowels tend to have lower values
compared with their English counterparts (cf. Figures 2.3 and 2.4). However, all
Sudanese Arabic vowels relatively show formant directions similar to those of English.
This information can help predict the durations of the Sudanese Arabic vowels may
have some kind of correspondence to English duration rates. It also implies that the
Sudanese EFL learners may not have problems producing English durations of short
and long vowels.
Frequency (Hz)
Figure 2.3. F1, F2 and F3 values (plotted vertically, in Hz) of Sudanese Arabic vowels (after
Alghamdi 1998).
Frequency (Hz)
Figure 2.4. F1, F2 and F3 values (plotted vertically, in Hz) of English vowels (after Deterding
1997).
2.1.2.4 Predictions of learning problems of English vowels
Linguists believe that learning problems of L2 phonemes experienced by a second-

language learner can be predicted to some extent from differences of phonemes,
allophones, absence of a sound, the distribution of these sounds within syllable and the
functional load of these sound units in the two languages.
This section provides linguistic information about the similarities and differences that
exist between English and Arabic language sound systems. The section will attempt to
survey the types of learning errors which may occur due to phonetic and phonological
differences between English and the learners’ L1 (Arabic) using the data of the related
studies.
Table 2.1 below provides some patterns of phonemes which exist in the English vowel
inventory but which may or may not exist in the Arabic inventory. This information is
useful in making predictions of the learning problems which Sudanese learners of
English are assumed to face.
Table 2.1 Some predictions of learning problems of English vowels. It provides accounts for the
sort of errors assumed to be made by Sudanese EFL learners.
Vowel Learning problem or error

/3/ Different from that of Arabic; often realized as /CÖ/.
/#/ Absent in Arabic inventory; may be confused with other English
vowels. It may be confused with Sudanese /CÖ/ and /Q/.
/¡/ Distinctive for English and totally absent from Arabic. Learners may
find difficulty to learn this sound or confuse it with English vowels
such /n, nÖ/ or Arabic /C/.
/«Ö/ Absent from Arabic. It is expected to be replaced by English sounds
like /G/ or /nÖ/.
/G+/ Standard Arabic has /G+/. The vowel inventory of Sudanese Arabic
has /G/ but it does not have /G+/. So, this diphthong might be re-
duced to /G/ or confused with /«Ö/.
/WÖ/ Different from Arabic /W/. It may be substituted for tense vowels
like /nÖ/ or /WÖ/.
/#Ö/ Almost absent from Arabic as there is no qualitative opposition to it.
It may be difficult to recognize or pronounce.
/nÖ/ Different from Arabic. It may be difficult to recognize or pronounce
and is often substituted for English /n/. Learners may also sub-
stitute it for Arabic /nÖ/ which should be articulated with more open
mouth. In RP the vowel is slightly higher while the lips are closely
rounded.
/nÖ, 7/ Absent from Arabic. It may be difficult to recognize or pronounce.
Learners may find it difficult to distinguish between these vowels
particularly in words such as taught, saw and ought which share /nÖ/.
/G/ Absent from Arabic vowel inventory. It may be difficult to recogn-
ize or pronounce.
/7, WÖ, +, KÖ/ Similar to Arabic high front /K/ and back /W/. However, learners
may substitute these vowels due to cross-language differences.
These English vowels often require quality, quantity or both.
/G, 3, ¡/ the combination of these short vowels can cause perception or
pronunciation problems which lie in the establishment of the qualita-
tive opposition between /G~3/ in bed ~ bad, /3~¡/ in pat ~ putt.
2.1.3 The consonants of English and Arabic
The first language of the subjects is Arabic, a language with at least 28 consonantal
sounds. These are the obstruents /D, F, V, M, H, \, U, O, P, &, 6, F<, 5/, approximants /Y, Ä,
L/, trill /T/ and the back consonants glottal /!, J/, uvular /¢, ZS and pharyngeal /Í, /,
plus the emphatic stops and fricatives /V, F, &, U/ (Huthaily 2003, Kaye 1997, Laufer
1988). English, the target language, has 24 consonants /D, R, F, V, I, M, F<, V5, X, H, &, 6, \, U,
<, 5, O, P, 0, N, Y, L, J/ and an approximant /T/. In principle, some kinds of similarities
exist between English and Arabic consonants, in a wide range that includes obstruents,
nasals and approximants (see Suhana 2001). However, some consonants have specific
characteristics that mark them as unique due to categorical phonemic differences.
Arabic has a considerable presence of plosives at different places of articulation. It has
both voiced /D, F/ and voiceless stops /V, M/. However, unlike English, the absence of
/R/ and /I/ is unique to the Arabic language. In some Arabic dialects such as Iraqi and
Lebanese there is a voiceless /R/ probably due to the influence of Persian (Kaye 1997).
Along the same line, the phonemic system of Sudanese Arabic (SA) (see Figure 2.1
below) has /I/ instead of the uvular /S/. In fact, /I/ is often used by Bedouins in the
place of /S/, which suggests that the latter is the original phoneme (Karouri 1996).
Arabic has a large number of fricative sounds, including four pairs that show a voicing
contrast and three voiceless fricatives with no voiced counterpart. In terms of
articulation, the fricative pair /&, 6/ are dental in English whilst in Arabic they are
rather inter-dental sounds. These fricatives are absent in many of the languages of the
world which designate them as a major source of mispronunciation for ESL/EFL
learners. Such a case applies to the consonant inventory of Sudanese colloquial Arabic:
i.e. the inter-dental fricatives do not exist in the Sudanese IPA phonetic chart. They
merged with the apico-dental fricatives /U/ and /\/.
The Arabic voiced palatal approximant /L/ and voiced labial-velar approximant /Y/ are
found in many languages of the world (often called semi-vowels – see Arabic vowels
above). More importantly, /Y/ has two interpretations in Sudanese Arabic. In the
phonetic literature, it is formally described as a labial-velar. This means that the
description of /Y/ as bilabial glide refers to the phonetic realization that labiality is the
primary articulation and that velarity is a concomitant secondary feature which forms a
natural corollary of this labiality, however, other linguists claim the opposite. According
to Dickens (2007), the phonemic system of the Sudanese Arabic makes this
phenomenon more reasonable. That is, in Sudanese Arabic has both /Y/ articulated as
a post-dorso-velar, in terms of standard articulation and as a palatal-velar in terms of
the functionalist analysis.
Sudanese Arabic also contains /Z/, a sound produced in the same place in the mouth as
English /I/ but with a fricative sound. It is usually transliterated as <kh> and
corresponds to the final sound in Scottish loch ‘lake’ or German lach ‘laugh’. One more
distinctive Arabic consonant is /S/. It is called a uvular because the tongue touches the
uvula. Another consonant is pronounced even farther back, the tongue touching the
back wall of the throat (pharynx) just enough to produce a hissing sound like //. This
consonant forms one of the most distinctive Arabic sounds. The glottal Arabic sound
/!/ is classified as stop consonant. However, speakers of Arabic as a second or foreign
language often considers it a vowel sound. In this sense, there is an important
consideration that no syllable or word in Arabic starts with a vowel. Therefore, if an
Arabic word is heard to begin with a vowel, it actually begins with a glottal stop /!/
hamza). Non-native speakers tend to classify many Arabic words such as umma /WOOC/
‘nation’ and usbuu /WUDWÖ/ ‘a week’, as initiated with a vowel. It is probably because they
are not familiar with such a type of phoneme that forms a real stop in Arabic (see
Ryding 2005).
The voiceless affricate /V5/ is absent from the Arabic consonant inventory; the only
voiced post-alveolar affricate that exists in Arabic is /F</. Some Arabic dialects do have
such a phoneme, e.g. in Iraqi and Lebanese dialects, but Sudanese Arabic does not have
it. Development of /V5/ in some Arabic dialects is likely due to influence of Persian,
which language possesses this sound. On the other hand, the English consonant
inventory includes both the post-alveolar voiced and voiceless affricates /F<, V5/.
English shows allophonic (non-contrastive) differences, which in Arabic constitute

separate phonemes. In English the consonants /R, R*, V, V*, M, M*/ are interpreted as only
three phonemes as there is an association between phonemes and their phonetic
segments which requires /R/ to be realized as [R] after a voiceless alveolar fricative as in
spell and as [R*] elsewhere; thus, English has aspirated and plain allophones of three
phonemes /R, V, M/(Carr 1999). However, the situation is different in Arabic, where the
combination of /D V, F, M, S, !/ and their emphatics /V, F/, which represent the plosive
sounds, are treated as clear-cut phonemes. Moreover, only two voiceless stops /V*, M*/
are aspirated (Odisho 2005). More specifically, Sudanese Arabic has /D, V, F, M, V, F/
missing, /S/ is pronounced as/I/ and often /!/ is changed to /CÖ/. Therefore, drawing
clear-cut boundaries between similar consonants and those that possess phonological
specificity is often difficult. Allophonic features such as these can lead to perceptual or
productive problems with English speech sounds, particularly across varieties of
dialects. Other things to be considered are differences in the place of articulation,
context and acoustic features of the phonemes of L1 and L2 that play a major role in
showing the identity of a consonant in a language. These factors combined are expected
to add to the intricacy of speech perception and production problems that Sudanese
EFL learners face when learning English consonants.
It is important in this context to discuss the effect of Arabic emphatics on the

surrounding vowels. Arabic emphatics /V F, &, U/ represent a set of complex
phonemes that are produced with a primary coronal articulation involving the
withdrawal of the tongue body into the pharynx. In terms of place of articulation,
emphatic sounds are similar to their non-emphatic counterparts /V, F, &, U/. However,
with emphatics the tongue covers the area extending from their main place of
articulation to a portion of the palate opposite the tongue, which is raised towards the
palate. This articulatory feature distinguishes emphatics from their non-emphatic
counterparts /V, F, &, U/. Probably this is the reason why Arab grammarians refer to
these phonemes as mut¯baqah (‘covered’).
In terms of distribution, emphatics are only a feature of a CV syllable (but not VC),
which represents its minimum span, e.g., as in /V+D/ ‘medicine’, /F+F/ ‘against’, etc., It
is also possible to find more than one emphatic in words containing two or more
syllables.
More specifically, Sudanese colloquial Arabic adopts a similar emphatic system.

However, it has other emphatic forms /T, N, I/. Moreover, in a number of words the
emphatic consonant is replaced by its non-emphatic counterpart, particularly in the case
of /U/. In fact, according to Watson (2002), some dialects of Sudanese Arabic have
totally lost emphatic /U/. My personal observation is that the loss of emphatic sounds
in Sudanese Arabic dialects is neither regular nor complete. For instance, whilst Central
Arabic (in the area around Khartoum) tends to be just less pharyngealized, the Buttana
dialect (a large region in Eastern Sudan) lost emphatic /U/ almost completely. However,
in words such as /TCÖU/ ‘a head’, /T7UWÖO/ ‘fees’ native speakers of Sudanese colloquial
Arabic change /U/ to /U/. This new use of emphatic sounds is common among
Baggara Arabs in Western Sudan (Shuwa Arabs).
Phonetic influence of emphatics: One important issue to be discussed in this context, is the
phonetic effect of emphatics. Traditional analysis of Arabic provides little acoustical
information on emphatics. However, it accounted for the contrast between emphatic
and plain forms in terms of the orthographical system of Arabic. According to this
analysis, the only evidence of contrast between plain and emphatic consonants is that
only forms with emphatic graphs in the spelling of Arabic can be considered emphatics.
For example, plain and emphatic forms /V, V/, as in /V+ÖP/ ‘figs’ and /V+ÖP/ ‘clay’, and /F,
F/ as in /TCÖM+F/ ‘still, e.g. still water’, and /TCÖM+F/ ‘runner, jogger’ respectively, con-
stitute evidence of contrast. On the other hand, consonants like /P, M,O/, etc. are not
considered emphatics since they do not have emphatic counterparts in Arabic ortho-
graphy (Lehn 1963). In synchronic descriptions, however, one generally finds no claims
to the effect than the emphatic consonants per se are acoustically different from their
plain counterparts. The emphatic~plain contrast is apparent only from the effect of the
contrast on the adjacent vowels. Vowels following emphatics show a raised vowel
formant F1 and a lowered F2 in comparison to their counterparts in a plain environ-
ment (Obrecht 1968, Ahmed 1984, Newman and Verhoeven 2002, Watson 2002). In a
recent study, Jongman, Herd and Al-Masri (2007) reported that the effect of emphatics
on following vowels is clear in all Arabic dialects, and that the F2 lowering in short
vowels appears to be stronger than in long vowels. Moreover, compared to high vowels
such as /K, W/ the F2 lowering effect is stronger for low vowels, such that it results in a
different vowel quality for /C/: in an emphatic environment, the following vowel /C/ is
heard as /Q/. 5
Arguably, Sudanese Arabic emphatics will affect the learning of English vowels. The
learners may transfer the emphatic feature learning English /V/ in words such as talk,
taught, (lap)top, tall, tough, etc., pronouncing it as emphatic [V]. The reason for this is that
in the learners’ L1, most CVC syllables with back vowels /Q, CÖ, QÖ, W/ begin with /V/.
On the strength of this assumption, I expect that word categories such as the above will
have different F2 and F1 and thereby different vowel qualities than similar words not
beginning with /V/. I will highlight some more phonological contrasts in the next
section.
5
Although the literature mentions effects of the plain~emphatic contrast on following vowels
only, it seems to me that the effect is more or less symmetrical and should also affect the F1 and
F2 of preceding vowels. It is precisely for this reason that the contrast can also be perceived in
pre-pausal position, i.e., when no vowel follows the consonant.
2.1.3.1 Phonetic symbols of English and Arabic
Figures 2.5-6 provide a description of the articulatory systems of English and Arabic.
They provide information on the distribution, place and manner of articulation of
speech sounds in each language.
Figure 2.5 presents the consonants of Sudanese Arabic (SA). It provides background
about the number and characteristics of sounds and how they differ from Standard
Arabic, whilst Figure 2.6 illustrates the English consonants in terms of number and
distribution. These charts allow the readers to compare the phonetic and phonological
distribution of the speech sounds in English and Sudanese Arabic
Place of articulation
labio-dental
Manner of
apicodental
Pharyngeal
post-dorso
post-dorso
pre-palatal
post-velar
articulation
alveolar
Glottal
dorso-
apico-
labial
D V F V F L E velar
M I !
Stop
Fricatives H U \ U \ U I Z Í J
Nasal O P P
Liquids T T
N N
Glide Y [ Y
Figure 2.5. Phonetic representation of the Sudanese Arab phonemic system (after Dickens 2007).
Place of Articulation
Manner of
Bilabial Labio- dental Alveolar Alveo- Palatal Velar Glottal
articulation
dental palatal
Stop R D V F M I !
Fricative H X 6 & U \ 5 < J
Affricate V5 F<
Nasal O P 0
Literal
approximant N
Retroflex
approximant T
Glide Y L
Figure 2.6 English consonant sounds (after Roach, Hartman and Setter 2006).
2.1.3.2 Prediction of learning problems
This section provides linguistic information about the similarities and differences that
exist between the English and Arabic sound systems. Phonemes that exist in the
English consonant inventory may or may not exist in the Arabic inventory. This
information enables the researcher to make predictions of learning problems.
Table 2.2. Some predictions about the learning problems of English consonants. It provides
accounts for the sort of errors assumed to be made by Sudanese EFL learners.
Consonant Learning problem or error

/R, V, M, Arabic has /D, F, V, M/ similar to English, but it does not have /R/ or /I/.
D, F, I/ Sudanese Arabic has a voiced /I/ similar to that of English. Learners
are expected to have problems with the English voiceless /R/. More-
over, there is an expected difference in VOT between the English and
Arabic plosives.
/X/ Absent in Arabic, but voiceless /H/ exists. Learners are expected to have
a problem learning /X/.
/&, \/ Although these consonants uncontroversially exist in the Arabic invent-
ory as interdental voiceless fricative, in Sudanese Arabic /&/ has merged
with /\/. Learners may experience difficulty to learn English /&/.
/6, U/ These sounds exist in the Arabic inventory as interdental and alveolar
voiceless fricatives. In Sudanese Arabic /6/ merged with /U/. Learners
may have problems to distinguish between /6/ and /U/.
/P, O, 0/ Arabic has similar nasals as those of English, however, the English /0/
is absent from Arabic inventory.
/V5/ The English fricative /V5/ is not part of the Arabic consonant inventory.
However, Arabic speakers are not expected to have learning problems
with this sound. The voiceless /V5/ exists in some Arabic dialects; it is
borrowed from Persian.
/F</ This English voiced affricate has an equivalent in MSA. However,
Sudanese Arabic inventory has /</ (jim).
/ Y/ Similar to Arabic /Y/, however there is a slight difference in the manner
of articulation. Moreover, in Sudanese Arabic this phoneme classified as
bilabial and as post-dorso-velar consonant. Therefore, learners may
substitute it for English sounds such as /T/ and /N/.
2.1.4 English and Arabic syllable structure
English and Arabic have different rules of syllable and word construction. English
syllable structure is flexible. It permits a wide range of syllable patterns such as CV,
CVC, VC, CVCC, CCV, CCVC, CCVCC, and so on. The syllable structure of Modern
Standard Arabic is considerably more restricted. It allows: (i) a light or open syllable
which includes CV and CVV, in words such as /OCÖ/ ‘not’, /HKÖ/ ‘in’ and (ii) a closed
syllable CVC as in /O+P/ ‘from’ and (iii) super-heavy syllables which include the
following types: CVC1C1 and CVVC1C1 (with geminate coda clusters), CVC1C2 and
CVVC1C2 (with non-geminate coda clusters) as well as CVCCV and CVVCV (Mitchell
2004). Importantly, the CV and CVC syllable patterns frequently occur in Arabic
prepositions, whilst the super-heavy types prevail in nouns, verbs and derivations from
these lexical categories. The Arabic CVCC syllable type is only allowed after a pause
(which is orthographically indicated by diacritic mark called ‘sukun’). Sudanese Arabic
and MSA have similar syllable patterns. However, the final consonant cluster in the
CVCC type is frequently split in Sudanese Arabic by vowel epenthesis.
The VC syllable type is a common feature of English word structure where many words
start with a vowel such as in echo, inch, ebb. As explained above, no syllable/word in
Arabic begins with a vowel. Also, English permits consonant clusters in the onset of
syllables while Arabic does not (see details next section). Clearly, then, the constraints
on the syllable structure differ substantially between English and Arabic. The study of
these differences may provide information that is needed to understand the problems
of Sudanese EFL learners of English consonant clusters.
2.1.4.1 Consonant clusters in English
Consonant clusters are a feature of many of the languages of the world. In the 486
language sample in the World Atlas of Linguistic Structures no fewer than 425 (87%) have
clusters (Comrie, Dryer, Haspelmath and Gil 2005: feature/map 12). McLeod, Doorn
and Reed (2001) and Ramsaran (1999) state that in their sample of 104 languages that
have clusters, 39 percent have word-initial clusters, only 13 percent have final clusters,
while the remaining 48 percent have both. In English, only one third of the
monosyllabic words begin with consonant clusters, whereas the predominance of
clusters is found in word-final position. This dominance is explained by the phonemes
/U, \, V, F/ that can be appended in suffixes. When such morph-phonemes are discarded,
the incidence of consonant clusters declines to only 19%. Consonant clusters are
sequences of two or three consonants that come together in a word, without being
separated by a vowel. In English, the groups /URN/ and /VU/ are consonant clusters in
the word splits. Some linguists argue that the term can properly be applied only to those
consonant clusters that occur within one syllable. Others contend that consonant
clusters are more usefully defined when they may occur across syllable boundaries. The
longest consonant clusters in the word extra, given the conservative definition, would
be /MUV/ and /UVT/, while the latter, more liberal view allows /MUVT/. In English, the
longest possible initial cluster is CCC, as in split, whilst the longest possible final cluster
is CCCC, as in twelfths, but in practice the probability of finding final clusters longer
than three is extremely small (Ramsaran 1999).
As explained above, Modern Arabic dialects have simple syllable shapes such as CV,
CVC and CVV. The occurrence of syllables with clusters such as CVCC is largely
restricted to Modern Standard Arabic. On the dialectal level, consonant clusters are rare
in both initial and coda positions. Therefore, many Arabic dialects apply syllable repair
strategies, which are largely controlled by the sonority properties of the individual
consonants. For instance, Sudanese Arabic (SA) adopts a strategy by which a CVCC
cluster is broken up by the insertion of /K/ or /C/. Thus, /JKON/ ‘a load’ becomes
/JKOKN/ and /MCND/ ‘a dog’ becomes /MCNKD/. This syllabification process is very common
in both geminate and non-geminate clusters (Broselow 1992, Raimy 1997).
Another repair strategy requires some syllables to begin with a vowel. When the
passive-marking prefix /P/ is added to the (active) verb katl, ‘he killed’, resyllabification
and vowel epenthesis is required, as in as in-katal ‘he was killed’. This means that SA
obeys syllable constraints that require repair strategies when an underlying form cannot
be syllabified to obtain sufficient ‘syllabic harmony’ (Kenstowicz 1994, Raimy 1997).
Finally, it is possible to argue that such vowel epenthesis strategies can affect the
pronunciation of English consonant clusters by Sudanese EFL learners.
2.1.4.2 Sequential constraints in clusters
The structure of consonant clusters is often highly complicated. It seems safe to say
that there are no two languages in the world with the same inventory of clusters.
English clusters are tightly related to the syllabic system of the words where the syllable
is always composed of a vowel sound plus 0, 1, 2, or 3 onset and/or coda consonants
that form the consonant clusters. Moreover, the English clusters are not formed in an
arbitrary way, although there is not a clear rule for their formation. Be this as may,
some researchers have provided rules depending on their experience and empirical
work. For example, some clusters are sequenced as (i) /U/ + /R, V, M, H, O, P, Y, N, L/ (in
this case /U/ is pre-initial) or (ii) pre-initial plus initial plus post-initial, e.g. /URT, UMT/.
In the phonological sequential constraints of English, a word can start with only certain
segments. For example, if a word begins with three consonants, then the sequential
constraint must be /U + {RL RN,RT, VL, VT, ML, MT, MY}/; any other word-initial combination
of three consonants is unacceptable even if /U/ precedes a perfectly legal two-member
cluster, e.g. /*UVY, *UMN, *UHL/ (Hyman 1975). 6
The theory of phonetic representations provides an account of these phenomena. An

utterance is a set of discrete segments which are complexes of phonetic parameters that
follows a set of phonetic combination principles (within the segment) and sequencing
principles (between successive segments). These principles function as phonetic
constraints, which govern the structure and sequential manner of segments. Among the
sequential constraints, there might be certain conditions that limit the maximal length
of consonant clusters. Such constraints vary from language to language and each
phonetic system within such a language has a specific set of representations that serve
to account for its segmental processes (Chomsky and Halle 1968). Further, Shibatani
(1973) states that phonetically universal and idiosyncratic constraints play an important
role in the arrangement of segments and features. For example, Japanese has a
constraint which permits only syllabic nasals to occur in word-final position, whilst
German and Dutch have a constraint which permits only unvoiced but not voiced
obstruents to occur word-finally. Arguably, Sudanese Arabic allows no word to end in
consonant clusters. This circumstance might lead one to predict that vowel epenthesis
may be found in word-final clusters in EFL spoken by SA learners.
6/L/ in an English onset cluster may only occur before /WÖ/ as in spurious, stew, skewer, furious.
/*UMN/ occurs in sclerosis but is ruled out since this is a (Greek) loan word.
2.1.4.3 Predictions of learning problems
This section provides linguistic information about similarities and differences that exist
between consonant sounds in English and Arabic. It provides patterns of the clusters
which exist in English but which may or may not exist in Arabic. This information
helps to predict learning problems.
All English initial and coda clusters may represent learning problems for Sudanese EFL
learners as these types of sequenced consonants are absent from the Arabic inventory
(see literature – next section). Some of the more obvious learning problems involving
consonant clusters are exemplified in Table 2.3.
Table 2.3 Predictions of learning problems of English consonant clusters. It provides accounts
for the sort of errors assumed to be made by Sudanese EFL learners.
Consonant cluster Learning problems or errors

/U +{RN, RT, VN, VT, MN}/ Absent from the Arabic inventory. Learners may find it
difficult to produce these clusters. They may insert an
epenthetic vowel between the cluster members.
/{R, V, M}+ {N, T}/ Absent from the Arabic inventory. Learners are expected to
alter the positions of /N/and /T/ in these English clusters
/PV, UV, NV, NM, NF, VU, F\/ Similar patterns exist in Arabic but not in the Sudanese
inventory. Learners are expected to mispronounce (split up)
or misidentify coda clusters like these.
2.1.5 Markedness Differential Hypothesis
Another principle of error prediction is the Markedness Differential hypothesis.

Linguists have proposed this hypothesis for language aspects other than phonology;
however, they now also apply it to phonology. The principle of the Markedness
Differential Hypothesis (MDH) is based on marked and unmarked forms of a language.
One way to define a marked language form is that an unmarked form is one which is
more common, and more frequently used in the languages of the world than a marked
form. According to Selinker (2008), second-language studies apply the MDH to predict
that a speaker of a language with a more distinctive contrast than that which occurs in
the target language, will need less effort and time to learn the new contrast than a
speaker whose L1 has less marked forms than L2. Phonologists have proposed that
contrastive voicing is unmarked in word onsets but marked at the ending of words.
English has a voicing contrast in both word-initial and in word-final obstruents, and is
therefore a marked language in this respect. German, on the other hand, uses the
unmarked system, by which the voicing contrast does not occur in word-final position.
As a consequence of this, English speakers with voicing contrast in both initial and final
syllable position, e.g. tab vs. tap, are believed to have no problems producing German
words which do not have a voicing contrast in final position. Conversely, German
learners of English will have a problem pronouncing the voiced~voiceless contrast in

word-final position in English.
This section applies the markedness principle to other phonological aspects (e.g. in L1
and L2) to locate the contrast properties. In the syllable structure of a language, the
purpose is to seek how a speaker who comes from an L1 with e.g. a syllable structure
that only permits CV, will have difficulty to adapt to an L2 syllable structure with more
complex syllable types such as CCV, CVC, CCCVC, and so on. Tables 2.4, 2.5 and 2.6
show examples of phonological features that are marked in English, Arabic, or in both.
Table 2.4. Linguistic facts on which the proposed marked hypothesis bases its predictions. This
table shows the degree of marked and unmarked vowels in English and Arabic. Dialectical
variation is considered when some features exist in formal Arabic but not across dialects.
1. Vowels
Description Languages Frequency
A language maintains a English Most frequent
complex vowel system
A language maintains a English and Arabic Frequent in both
short/long vowel contrast languages
A language maintains more English (Arabic has only two diph- Frequent in
diphthongal categories thongs /G+, C7/. In Sudanese these English. Rare in
diphthongs are rendered to /G, Q/). Arabic
Table 2.5. Linguistic facts on which the proposed marked hypothesis bases its predictions. This
table shows the degree of marked and unmarked consonants in English and Arabic. Dialectical
variation is considered when some features exist in formal Arabic but not across dialects.
2. Consonants
A language maintains English Frequent in
aspiration voiceless stops /R, V, M/ are aspirated when English
they are syllable initial, in words such as
pot, cat, car but unaspirated after /U/ in
words like spew , stew , skip.
Arabic
/V, M/ are aspirated when they appear in
the beginning of a stressed syllable and are
released in word final position.
A language with more English and Arabic Frequent in
fricative sounds both languages
A language maintains English and Arabic Frequent in
voicing in initial, medial both languages
and coda positions
A language maintains English
an allophonic feature Aspirated [R*, V*, M*] allophones of /R, V,M/
Table 2.6. Linguistic facts on which the proposed markedness hypothesis bases its predictions.
This table shows the degree of marked and unmarked cluster consonants in English and Arabic.
Dialectical variation is considered when some features exist in formal Arabic but not across
dialects.
3. Consonant clusters
A language maintains English and Arabic Frequent in both languages. CVCC
CV and CVC, CVCC frequent in Standard Arabic but
syllable structure not in Sudanese dialects.
A language maintains a English Frequent in English only
complex syllable
structure CVCC,
CCVC, CCCVC, or VC
2.1.6 Conclusion
The description and categorization of vowels, consonants and cluster consonants of

English and Arabic provided above is necessary for a good understanding of the nature
of the phonological categories of the two languages. The contrast of the two languages
above highlights some major similarities between the two sound systems as well as
major differences.
2.2 Background and contribution of related studies
This part provides a short outline, which is suitable as a bird’s eye view of this section.
It reviews literature about the impediments to speech intelligibility, a problem that is
argued to be experienced by Sudanese university students specializing in English. It
describes the contribution of previous studies to this topic accounting for the effect of
both the learner’s L1 transfer and the lack of phonological awareness of English in the
occurrence of intelligibility problems. The section also talks about the methods and
tests they used discussing their adequacy. There is much more concern with segmental
analysis of the vowels, single and cluster consonants of English, which form the basic
sound knowledge of speech. Therefore, all related literature that deals with speech
intelligibility, speech perception, pronunciation problems of vowels, single and cluster
consonants of English, will be the subject of the survey. Moreover, previous research
on speech problems of Arabic-speaking students of English forms a primary source of
information to account for the perception and production problems in English among
Sudanese learners. However, other ESL/EFL literature is also useful as a second source,
which views the topic in a broader sense, accounting for the speech problems which are
faced by non-native speakers of English from different linguistic backgrounds.
2.2.1 Language and speech
Speech refers that expressive utterance used by human beings to communicate their
ideas. Speech is transmitted by a set of sounds that are produced by varying the
strictures along the path of the air flowing from the lungs. Such sounds formed by the
human voice have an effective role and permits it to bear a message by variations in
timbre (Lafon 1966). There is a difference between speech and language. As voice,
speech has characteristics that may imply certain messages. For example, it is possible
to identify a person by his/her voice as either sharp, low or loud, etc., whilst other
elements, such as the sequences of phonemes that are used to differentiate between the
words in the lexicon, are qualities of language. Thus, a language is a system of con-
ventional signals used for communication in a society. Such a pattern of conventions
consists of distinctive sound units such as phonemes, a vocabulary system and the
association of meaning with words. When a person performs a speech task in inter-
actions, all these linguistic elements are involved in the achievement of verbal com-
munication.
Speech is composed of small units known as phonemes or segments. However, there

are differences between these sound units. A phoneme is the smallest contrastive
linguistic unit of sound that distinguishes meaning. For example, it is possible to

account for a phoneme by means of commutation; i.e., the use of minimal pairs such as
pill, bill till, fill, etc., to identify that words are different in respect of one sound (Carr
1999, Cruttenden 2008, Massaro 1975). Segments, on the other hand, represent the
major elements of speech. Moreover, a segment is a linear unit typically anchored in a
short stretch of speech by a set of phonetic features. If phonemes serve the contrastive
aspect of a language sound on syllable and word level, segments do the same job but in
the wider context of a stretch of speech.
Therefore, segments are subject to more phonetic and acoustic features, e.g., the
duration of the vowel segments in words such as mitt/meat can differ from that of bid/
bead for a number of elements. Thus, as major sound units of spoken words, segments
have some articulatory features in common with each other but undergo variations due
to environmental differences, e.g., the preceding and the following vowels (Lass 1996
and Laver 2002). Therefore, the relationship between segments and phonemes can be
interpreted as a matter of realization where segments are most commonly represented
in different phonemic environments. Several adjacent sounds in connected speech may
carry information on the same phoneme, and there is an overlapping in so far as one
and the same sound segment carries information on several adjacent segments (Fant
1973, Gilbers 1992).
2.2.2 Accent
In linguistics, accent refers to the way of pronunciation that distinguishes a speaker as

belonging to a particular language environment. An accent forms a distinctive feature of
non-native speakers of a language, who acquire such language at a later age (usually
above 13 or 14 years of age). This is because when learning a second language, speakers
carry the perceptual and productive traits of their L1. An accent level varies from
person to person and this depends on three factors: age, exposure and the L1
articulation system. Age is a significant determinant of the degree of a foreign accent.
That is, a second language can be spoken without an accent but only if this language
has been acquired before a critical age limit, which varies between 6 to 12 years old
(Long 1990). Next, the native language background of the speaker, the quality of the
teacher of the new language and the amount of exposure to and interaction with native
speakers combined, develop the learner’s mastery of speech and so reduce the amount
of accent (e.g. Arslan and Hansen 1996, Van den Doel 2006, Wang 2007). Lastly, the
involvement of the L1 articulation system also affects the perception and production of
speech. For instance, the vocal tract transfer of Arabic speakers of English triggers
errors such as substitution of /#Ö/ for /3/ in words like add, bat and dad, the conflation
of /&/ as in there for /6/ as in three and the use of /U/ for /\/.
Furthermore, there are two types of accent. First, a foreign accent refers to speech
produced by non-native speakers of a language in which these speakers involve their L1
perceptual and productive phonemic strategies in the learning of L2. For example, if a
person has difficulty pronouncing some of the sounds of a second language he is
learning, he may substitute similar sounds that occur in his L1. The speech sound such
a learner produces sounds wrong or ‘foreign’ to native speakers of the target language.
The other kind of accent is simply the way a group of people speak their native
language. This is determined by who they are, where they live and to what social groups
they belong. People who live in close contact grow to share an accent, which will differ
from the way other groups in other places speak. For example, someone who lives in
the United States will have a native accent that is different from that of a British
English speaker.
2.2.2.1 Received Pronunciation (RP)
The origin of the term RP (‘received’ pronunciation) has been subject to controversy,
but A.J. Ilis’ On early English Pronunciation and John Walkers’ Critical Pronouncing Dictionary
and Exploiter of English are among the sources that contributed to its appearance.
However, received means ‘generally accepted by the best society’. Received
Pronunciation is used by the educated class, in formal affairs, and it used to be the
language variety deemed suited for radio and television. In this way, RP is not a regional
accent but is recognizable as being the standard or neutral accent (Cruttenden 2008,
Roach 2004). It is a form of English now used in Britain which dominates the areas
around London and the two historic towns of Oxford and Cambridge. In the past RP is
an English pronunciation that is best represented in the BBC, courts, films, theatre,
television programmes, etc. Bernard Shaw’s plays Arms and the man and Pygmalion
represent a real reflection of the RP accent, forming in this way a linguistic reference
for language scholars who seek evidence for the inextricable link between accents and
social class. However, the RP accent is now known and accepted on radio and
television. It is also described in books and phonetics and is taught to L2 learners of
English.
2.2.2.2 Feasibility of RP
Received Pronunciation (RP) has recently become the target of a great deal of criticism
as elitist and limited to certain speech communities, but the reality of language use
falsifies such criticisms. As some studies currently show, RP is placed higher on a scale
of perceived attractiveness compared with other varieties as an accent that is widely
preferred and commonly used by the speech community in formal situations. Previous
studies refer to RP sounds as mutual and more intelligible. In a related activity where
participants listened to accent samples in order to judge which one is more preferred as
a suitable model, responses to both the RP and South East accent were 100% positive,
while other accents such as Devon, Belfast, Shield and Pontypridd were rejected. The
latter strain the listeners, sound elliptic, cause comprehension problems to listeners and
form a gross deviation from the standard sounds encountered in normal listening. RP,
on the other hand, is an accent that forms an understandable and usable model for
non-native speakers in everyday communication (Ramsaran 1999, Trudgill and Hannam
2005). The use of RP enables speakers to overcome comprehension difficulties that
they may otherwise encounter when involved in speech in which regional accents form
a language reality; this fact indicates that, linguistically, RP is a genuinely regionless
model that is known and easily understood all over England and elsewhere (Collins and
Mees 1981). In the communities where English is used as a second language, such as
India, Nigeria and so forth, the RP accent is no longer the major model. Such
communities have developed their own local English accents, which fit non-native
environments. Although this idea sounds practical, it is often neither safe nor fair,
simply because the use of RP can at least be desirable in establishing certain minimum
standards for the achievement of mutual intelligibility. Its phonemic system is capable
of conveying a message efficiently from a native English listener’s standpoint, given
that the listener has time to ‘tune in’ to the speaker’s pronunciation in a given context.
It retains the accentual characteristics of English while it is possible to reduce the
segmental inventory of English and retain a good level of intelligibility.
For instance, a speaker can reduce the vowel system to a central pair /Ö, /. This
change makes it difficult to understand the message. Distribution of post-vocalic /T/ in
words such as farm, heard, bird, etc., is observed in some English dialects; however, RP
does not permit this phenomenon. RP includes accentual features that represent crucial
elements to natural forms of English and exhibit a considerable homogeneity in their
consonant systems. Thus, in turn it prevents any further simplifications keeping these
features shared with the natural system (Cruttenden 2008). The vowel system (see
Figure 2.1 below) has no regional variation; it has a variation of other types, though. In
particular, there is a variation between conservative and advanced RP. This largely
reflects the linguistic change that has occurred in RP with advanced pronunciations
typical of younger speakers. For instance, the RP vowel system no longer shows a
distinction between /n/ as in sore and /nÖ/ as in saw, etc. There is also a wide-spread
loss of /7/ and its merger with /nÖ/. Thus, some words such as sure are pronounced as
/5nÖ/ like shore. In the majority of accents now the phoneme /WÖ/ is commonly used in
words like suit, resume and enthusiasm, etc. In RP (and in Popular London) both /WÖ/ and
/LWÖ/ are heard in words like hue, due, Tuesday, etc. However, the tendency to omit /L/ is
stronger among younger speakers (Cruttenden 2008). The phoneme /WÖ/ is retained in
words like Susan and super (Trudgill and Hannah 2002). No changes to monophthongs
are classed as almost complete, but the loss of schwa in the diphthong /G/ results in
the monophthong /G/ in words like share, pear, though some older speakers use the
diphthongal pronunciation (Cruttenden 2008). In RP, words like pip and peep have
different length. If you speak the two words, you will probably find that tense peep is
longer than lax pip. The long (tense) vowels are indicated by ‘Ö’, so the long counterpart
of RP /+/ is /KÖ/ and so on, see Figure 2.1 above.
2.2.2.3 Foreign accents and errors
It is well known that speakers substitute speech sounds from their L1 for those of their
L2 in the attempt to communicate, which results in producing accented speech.
Normally this occurs due to the absence of one or more sound features from the
speaker’s L2. It also occurs when L2 knowledge is lacking, which makes speakers resort
to a repair strategy to adapt to new features. Arabic speakers of English often make
production errors that indicate an Arabic accent. For example, they apply vowel
epenthesis before (‘prothesis’) or inside (‘anaptyxis’) English consonant clusters as a
repair strategy. Thus, words like special, speak, are pronounced as /+URG5N/and /+URKÖM/ by
Arabic learners of English (Patil 2006). This is quite similar to what has been reported
for Spanish learners of English (Lado 1957, Hyman 1975) as well as Brazilian learners
of English (Bond 2001). Examples of anaptyxis are found in fly and drain, which are
pronounced as /H+NC+/ and /F+TG+P/ by Arab learners of English, e.g., Sudanese and
Egyptians. Moreover, Korean speakers of English insert a vowel more often in stop+C
clusters rather than in strident+C or sonorant+C clusters. Interestingly, it has been
found that stop+C clusters reveal an asymmetry between voiced and voiceless stops.
For instance, Korean speakers insert a vowel more frequently in voiced stop+C clusters
than in voiceless ones. The same pattern was observed in Mandarin Chinese and
Cantonese speakers’ production of English consonant clusters where various clusters
they produced were illegal in English (Kwon 2005).
Speech error phenomena as such motivate the necessity of fundamental distinction in

second language studies. The errors are made by L2 learners in this context are
probably due to language use and knowledge of the habits by which these words should
be pronounced. Therefore, a distinction between the effect of linguistic competence
which represents the underlying system of a language (implicit knowledge), and the
linguistic performance which represents the strategies that the speakers follow in
producing and perceiving L2 speech (explicit knowledge), is necessary.
Errors like these help make predictions about how speakers/listeners of one language
will reproduce speech sounds of another language. They also reflect the psychological
reality of phonological descriptions. In some cases, native listeners may find difficulty in
understanding the English spoken in a different manner from their own (foreign
accent). Accent modification training would help obliterate the problem, or at least to
develop a new accent that would improve communication ability. Not all sound
substitutions and omissions are speech errors. Instead, they may be related to a feature
of a dialect or accent. For example, speakers of African American Vernacular English
(AAVE) may use /F/ for /&/, e.g. /F+U/ for /&+U/ this. This is not a speech sound dis-
order, but rather one of the phonological features of AAVE.
2.2.3 Speech perception and production
Linguistically, speech perception and production mutually support each other: i.e., the
occurrence of the first relates to occurrence of the second (Gilbert 1995). Perception
represents the power supply of speech production. Kuhl (1994) defines speech
perception as a process that involves the employment of cognitive, motor and sensory
skills to hear and understand speech. Kuhl explains that a child perceives speech by
forming mental conceptual maps of the speech it hears in its environment. Such
conceptual maps are stored in the brain, which constitute, later, the specifics of speech
perception and serve as blueprint guidelines that a child uses to produce speech.
Therefore, the process of speech perception is not immediate, but an output of long-
term operations that accumulate over time. It is a series of organized events that
involve the establishment and storage of information over time. During subsequent
stages, information is developed, transformed, reduced, elaborated, stored, recovered
and used to make different types of decisions. Speech production, on the other hand, is
a process that requires the brain to transmit a message to the speech organs, and these
in turn produce the patterns of speech sounds on demand (Cruttenden 2008, Crystal
1999). Moreover, phonetically, in producing speech sounds, an air pressure difference,

in appropriate locations is an important requirement in the mechanism of speech sound
production. This task requires three functions for the use of air pressure: (i) to create a
turbulence passing through a narrow channel, e.g. in pronouncing sounds like /U, 6/, (ii)
to build up air pressure behind the total blockage of the vocal tract to make a sort of
burst, which is needed in the production of plosives such as /R, F/, and (iii) to sustain
sufficient air flow through the glottis to permit the vocal folds to vibrate, thereby
producing the glottal pulse train that excites the resonance cavities in the mouth, throat
and nose. The production of speech sounds involves several elements such as the place,
or manner of articulation and the difference in the air pressure and its direction. The
latter mechanism explains why some sounds are classified as ingressive or egressive. It
also draws attention to the cross-linguistic differences of the articulation of the speech
sounds of languages (Gussenhoven and Jacobs 1998).
More importantly, there is a kind of interdependency between speech perception and

production. The learning of L2 sounds is often influenced by the articulatory properties
of the L1 (Canepari 2005, Cruttenden 2008, Derrick 2005, Flege 1995, Groenen,
Maassen and Crul 1996, Johnson and Elissa 1989). This suggests the possibility that
speech perception and production are interdependent abilities that correspond to each
other. It also means that the articulatory errors of the speech sounds have perceptual
bases (negative transfer), a process that largely depends on how accurate stored
cognitive knowledge is. That is, any phonetic realization involving the perception and
production of L2 phonemes correlates to the internal phonological system and its
phonetic realization in the L1 (Flege 1995). In this way, a perceived difference may lead
to a produced difference. This assumption provokes the question as to whether the
relation between speech perception and production can contribute to speech
intelligibility or not. In the previous studies, results from the effect of segmental errors
of L2 speakers of English show a strong relationship between segmental errors and
degradation of intelligibility. Arslan and Hansen (1996) found that the involvement of
L1 articulation system causes Arabic speakers of English to substitute L1 /C/ for the
English /3/ phoneme in words like add, bat and dad. Arabic-speaking students of
English have pronunciation problems due to the interference of the perceptual and
productive strategies of their mother tongue (Amayreh and Dyson 1998, Cruttenden
2008, Flege 1981, Munro 1993). Problems such as these often increase when the L1 and
L2 phonemic inventories involved show no equivalents of speech sounds. A similar
example of production errors of English /T, N/ is experienced by Japanese learners. It
suggests that the chances of success are greater for learners whose native language has
equivalents for th4 English sounds /T, N/ than for those learners whose mother tongue
does not include such a contrastive pair. There is an equivalence classification
mechanism that permits the brain to generate new sound categories different than those
of the L2. In addition, some previous studies that investigated the relationship between
perception and production, have arrived at different conclusions. Bradlow, Pisoni,
Akahane-Yamada and Tohkura (1997) investigated the effects of training in /T/~/N/
perceptual identification on /T/~/N/ production by adult Japanese speakers. They
found significant improvements in pronunciation after perceptual training without any
explicit production training. This result shows a close link between perception and
production.
Researchers refer to the relationship between the perception and production of speech
sounds as an important issue because they bring them to an understanding of the
mental processes involved in the learning of L2 speech sounds. It also provides them
with insight into the types and the nature of speech perception and production
problems that ESL/EFL speakers face.
2.2.4 Speech intelligibility
Speech intelligibility is described as the collection of properties that permit a native

listener of a language (e.g. English) to correctly identify linguistic units such as
phonemes, syllables, morphemes and words, in the order they were produced by the
speaker of the utterance. The more intelligible a speaker or a spoken utterance is, the
higher the percentage of units (words) that are correctly recognized, the smaller the
number of transpositions in the reported serial order among the units and the faster the
native listener performs the recognition task. In this sense, there is a difference between
intelligibility and comprehensibility. Comprehensibility determines the comprehension
of the spoken message. Listeners comprehend a spoken utterance if they get the
meaning (or gist) of the spoken utterance; it is the result of the process that is also
termed ‘speech understanding’ (Van Bezooijen and Van Heuven 1997; Van Heuven
2008).
Certain failures of communication, ranging from frustrations among speech

participants in a casual chat to serious misunderstandings in business meetings and
aircraft accidents, are seen as consequences of poor speech intelligibility. Moreover,
linguists think that the concept of intelligibility often exceeds its elementary definition
as the recognition of word and utterances or the extent to which a speaker’s utterance is
actually understood and emphasized. The notion of intelligibility to them requires the
consideration of global elements such as context, inter-cultural backgrounds of the
interlocutors of English as Lingua Franca/International Language. Patil (2006) and
Nair-Venugopal (2003) explain that this new status is not without support from the
reality of English in the outer circle where English is spoken as a second language and
the expanding circles where English is spoken as a foreign language. There is a need to
understand each other in contexts where participants communicate for a business
purpose, or general interaction, etc.
Fraser (2005) claims that most of the impediments to speech intelligibility are
attributable to segmental factors and that more than 50% of speech intelligibility is
accounted for on the basis of sound (rather than morphological or syntactic deviations).
Similarly, Jenkins (2000) stated that while the syntactic level plays a salient role in
comprehensibility in EFL interactions, pronunciation forms the most prominent single
element of intelligible speech. 7 Furthermore, the measurement of speech intelligibility
7
Earlier, Van Heuven (1986) reasoned that faulty syntax and morphology can only compromise a
speaker’s intelligibility if words can be recognized. After all, if the words are pronounced so
poorly that they cannot be recognized, it will not be possible to establish any order (i.e. syntax)
among them.
based on linguistic elements necessitates the use of native speakers of the target
language as a standard reference/model.
2.2.5 Tests of speech intelligibility
2.2.5.1 The Modified Rhyme Test
Speech intelligibility tests have an long history. With regard to the data obtained using
such tests, only few tests have proved to be effective. The Modified Rhyme Test (MRT)
is one of the worldwide-standardized measurements of segmental intelligibility of
speech. The MRT forms an extension of two earlier attempts. These are the PB and RT
tests. The PB test refers to the Phonetically Balanced word lists that were compiled at
Harvard University during the Second World War. The lists were composed of
monosyllabic quartets, which were chosen in a way that gives an approximation of the
relative frequency of phoneme occurrence in the language. Each PB list consisted of 50
monosyllabic words, which is enough to adequately approximate the relative frequency
of phoneme occurrence in English. One of the features of the PB list is that the relative
difficulty of the stimuli is constrained so that the stimuli that are always missed or
always correct are removed, leaving only those items that provided useful information.
The PB test was developed to compare phonetic discrimination and for overall
recognition accuracy. The test material targets both vowels and consonants. 8 The PB
word list provided a considerable contribution to speech intelligibility research, but
further requirements were needed to make the test more adequate and economic. These
requirements gave birth to the Rhyme Test. The RT presents the stimulus in stem form,
e.g., [-ot, -ag], etc., and the listener is required to complete or provide the missing letter,
while s/he is listening to the items spoken. However, the RT has some drawbacks,
since it focuses only on initial consonants, while non-initial ones like /0, </ are excluded.
8 Phonetically Balanced word lists (PB lists) have been used widely since the Second World War
in statistical intelligibility testing. The words in each list are presented in a new, random order
each time the list is used, where each item was spoken in the same carrier phrase. PB intelligibility
test requires more training than other statistical tests, and is particularly sensitive to variation in
signal-to-noise ratio. In other words, a relatively small change in S/N causes a large change in the
intelligibility score. Moreover, phonemically, PB presents a balanced tool for the measurement of
speech intelligibility. It has stimuli lists which are composed of monosyllabic CVC words that
have been selected in such a way that the lists reflect the statistical distribution of the phonemes
in that dialect. Because of the limited size of typical PB word lists, repetition of the list is very
likely to lead to the listener learning of the list. This problem can be overcome by only presenting
the list once, or by training the subjects first so that the effects of learning have leveled out
before the actual tests. Once the list is learned, the PB word list is equivalent to a limited
response set, i.e. effectively a multiple-choice test. (Hudgins et al. 1947).
Such weaknesses of the RT have given rise to the Modified Rhyme Test (MRT). The
MRT forms the most accurate and reliable measure of intelligibility (Logan, Greene and
Pisoni 1989). Speech intelligibility measures involve word identification tasks in a closed
set of six items. The test has a list of 300 words, consisting of a representative sample
of stimulus words arranged into rows. Each row encompasses six words with the same
rhyme. The rhyme functions give an economic value, which serves to reduce the
speaker’s vocal effort. Furthermore, the methods and materials of the MRT require that
both the speaker and the listener be trained. That is, the administrator instructs the
speaker to read the words using a carrier phrase, while s/he takes notes regarding
feedback about the loudness, clarity and the rate of the speakers’ performance
throughout the reading of the list. These notes will be used in the stage of analysis.
Listeners, on the other hand, can have the chance to hear the words and then they start
responding. The time limits of the test are measured from the time when the button is
pushed until the end of the words presentation. The score is the number of items
correctly responded to. Test items normally target single and multi-phonemes or words;
these refer to vowels, single and cluster consonants of English. The formal assessments
interpret the responses as either intelligible or unintelligible; put in figures, a score of
(close to) 100% is interpreted as completely intelligible performance (Lafon 1966).
2.2.5.2 Feasibility of the MRT
Many approaches that have been designed for the measurement of intelligibility, do not
give an adequate account and have many drawbacks. An example of this is the use of
comprehension questions (Anderson and Koehler 1988) and picture selection in
response to a stimulus (Smith and Bisazza 1982), etc. Yet, they offer something
valuable and their drawbacks motivate questions. Consider, for instance, the
comprehension question test that draws conclusions about the listener’s efficient
comprehension from the scores provided; the assessment will be a reasonable one since
listeners respond correctly. However, a comprehension test of this type cannot account
for how well listeners’ responses correlate with speakers’ intentions. Speech
intelligibility is a complex phenomenon, which is influenced by several variables. Firstly,
not all audible speech is necessarily a condition for good speech intelligibility, just as
adding more light to a blurred text does not make it more legible. Similarly, the addition
of more sound intensity to speech that is surrounded by reverberation, echoes or
distortion does not make it more intelligible. In standardized speech intelligibility,
testing the talker-to-listener transmission path is measured with three assumptions. It is
assumed that the talker should speak without accent or speech impediments; the speech
has to be in a normal form with normal emphasis of words and the listener has to
possess normal hearing abilities. Moreover, in both cases the actual performance will
vary, especially if the assumptions made about the talker and listener cannot be met in
practice. Secondly, the transmission of the voice signal also affects speech intelligibility
from talker to listener. Such factors can spoil the integrity of a voice signal while it is on
its way to the listener. A poor signal-to-noise ratio, for example, masks the voice signal.
Reverberation, i.e., echoes in rooms, is a special kind of noise that causes smearing or
blurring of the sounds and makes the speech less audible and difficult to understand.
Fant (1973), states that an intelligibility test has to account for the phonemic distances
which exist between speech sounds. An intelligibility test also regards the specific
conditions under which a test has taken place. Thus, it enables the experimenter to have
precise feedback. In this issue, the rhyme test has an advantage of minimizing the rate
of contextual confusion. Many advantages make the MRT practical. A confusion matrix
of phonemes can be calculated from the scores of the tests. That is, the actual rate of
intelligibility is simply the number of words correctly responded to. Naive listeners can
participate more than once without being exposed to any training, which is a very
distinctive feature compared with other types of tests. Reliable results can be obtained
even with a small number of subjects, which usually ranges from 10 to 20. More
importantly, the results of recent research have proven the MRT to be an excellent
measure of segmental intelligibility of natural speech.
Several studies have expanded upon the paradigm of the MRT word list. Importantly,
some researchers often eliminate the number of choice response sets of the test from
six to four items, which has two advantages: it will help the listeners avoid becoming
confused by a large number of choice items; hence, they will make a smaller number of
perception errors (Wang 2007).
2.2.5.3 Speech Perception in Noise test: SPIN-test
Assessment of the performance of listeners with normal hearing abilities is a

complicated task. This is because everyday communication covers a vast scope of
spoken material, and may take place in different contextual environments. These sorts
of interrelated processes make it impossible to sample all types of speech events in a
single test. One possible way to determine the types of sensory and cognitive processes
involves the reception of speech materials is to design tests that evaluate the extent to
which candidates show weakness in the utilization of these processes. Two basic types
of operation are involved in the understanding of sentences. The first is the reception
and initial processing of acoustic information through the auditory system (‘bottom-up
information’, and the second is the use of linguistic knowledge stored in the brain,
which includes phonological, syntactic, morphological and semantic knowledge (‘top-
down information’). Therefore, all tests of speech perception targeting sentences adopt
such a type of test. The SPIN test is one of the perception tests that follow this design.
The SPIN (Speech Perception in Noise) test is a speech perception test that is based on
simple and predictable English sentences: e.g., the test uses two types of sentences; high
and low probability sentences. The words at the end of high probability sentences are
predictable from the body of the sentence, e.g., spread some butter on your bread. On the
other hand, the words at the end of low probability sentences cannot be predicted from
sentences: Mary could discuss the tack. The function of the SPIN test is the assessment of
listeners’ ability to understand everyday speech by combining bottom-up and top-down
information. Normally, words are more intelligible in sentence context than in isolation,
as many studies have revealed. The sentence context decreases the probability of errors
by the listeners (Kalikow, Stevens and Elliot 1977, Miller 1981). This is because
sentences impose constraints on the set of alternative words, which will increase
intelligibility. Measurement is based on a recognition task of twenty-five words
embedded in meaningful and highly predictable sentences, as in she wore her broken arm in
a sling (target word underlined). Listeners only write down the final word that they think
they heard in each sentence. This part of the SPIN test has proved to be efficient at
assessing speech recognition abilities (Rhebergen and Versfeld 2005). Although
listeners’ performance is primarily quantified in terms of numbers of whole words
correctly recognized, partially correct answers are also important since they give
information about the perception of phonemes in onset, nucleus and coda position.
2.2.6 Confusion matrices
The term ‘confusion matrix’ refers to a visualization tool typically used in supervised
learning. Each column of the matrix displays the instances in a predicted class, while
each row represents the instances in an actual class. The value of confusion matrices is
that it is easy to judge if the system is confusing two classes; i.e. commonly mislabeling
one as another. Later, the raw data will be analyzed in terms of phonetic classes of
perceptual or phonological features to give values about what is confused and what is
not. For instance, one can examine consonant confusion across manners of articulation
or analyze the data in terms of voicing. Benki (2003) and Nielsen (2004) analyzed
confusion data for voicing, place of articulation and manner of articulation in syllables
and phonemes. 9 Bosman (1989) states that the interpretation of consonant confusion is
usually based on the features shared by the confused phonemes. Phonemes that have a
feature in common are more susceptible to confusion than phonemes that differ with
respect to this feature (all else being equal). Vowel perception is largely determined by
the first two vowel formants, F1 and F2; the vowel space is determined by the position
of the tongue-hump [front~back] and the degree of constriction [close/high~
open/low]. Typically, listeners tend to confuse vowels most frequently that are adjacent
in the F1-by-F2 vowel space.
Several factors lead to confusability of segments in L1 with those of L2. First, and
foremost, incorrect perception in the L2 is caused by the degree of similarity between
the L2 sound and the nearest sound category in the listener’s native language. But there
are other factors to be considered as well. Environmental factors such as lighting, angle
of viewing distance between the speaker and the listener clearly affects the quality of
the optical information provided and the lip-reading abilities of the listener to use this
information. A third factor is the interaction of different linguistic levels such as
semantic, syntactic, lexical and phonological constraints as potential sources of
disambiguation of a spoken utterance. Such factors facilitate the performance of the
listeners/speakers under consideration, acting as a combination for maximum benefit
(Lachs 1999). Many researchers present stimulus words embedded in test items as part
of semantically and syntactically meaningful sentences and compare the listener’s
performance on the same (or similar) words presented in isolation or in meaningless
contexts, as a way of determining the listener’s ability to use contextual information in
the speech recognition process (see e.g. Nielsen 2004, Wang 2007).
9
The classical reference on confusion studies is Miller and Nicely (1955) in their ground-breaking
work on the role of distinctive features in the perception of consonants.
2.2.7 Contribution of previous studies
2.2.7.1 Learning problems of English vowels
Sudanese EFL learners are expected to make different types of English vowel
production errors, e.g., in words such as bait, and, ask, let, fate, make, lace, poor, peat, put, pot,
putt, bit, fear, bet, stay, etc. Mohammed (1991) described pronunciation errors made by
Sudanese EFL learners as the result of inter-linguistic transfer and ineffective teaching.
Al-Alrishi (1992) and Bobda (2000) found that the English NURSE vowel /«Ö/ is
rendered in Sudan as /¡/, or /n/ if /«Ö/ is represented orthographically as <or> in
words like work, worth, word, etc. Here the absence of /n/ in the Sudanese-Arabic vowel
inventory and the misleading spelling of English conspire to produce the incorrect
vowel substitution pattern observed. In related L2 production of English vowels,
similar errors were reported in several studies of Arabic-speaking groups. Brett (2004)
found that Arabic speakers of English face serious difficulties in distinguishing between
English vowels such as /n/, /nÖ/, /7/ as in cot, caught, and coat, all of which are often
pronounced as /nÖ/ or undergo substitutions. Altaha (1995) also reported that Arabic
learners of English produce the English front vowel /G/ as /+/ so that words such as set
and sit are both pronounced as /U+V/.
More importantly, English vowel production problems are detected even among ESL
learners who come from language backgrounds linguistically related to English.
German learners of English have difficulties differentiating between /3/ and /G/ in bat
vs. bet, on the one hand and between /¡/ and /n/ as in duck and dock on the other
(Steinlen 2002). Some errors due to orthographical influence involving the production
of the English /G/ (in words like red, bed, dead) were detected among Italian speakers of
English, where /G/ was pronounced as /G+/ (Piske et al. 2002).
The literature has revealed that English vowel production is also influenced by
differences in temporal cues. In English, incorrect vowel duration compromises
intelligibility (Jenkins 2000, Walker 2001). In the production of the English vowels,
Arab learners of English showed an exaggeration of duration differences between short
(lax) and long (tense) vowels. Specifically, Arabic ESL speakers produced the English
/K~+,G+~G, W~7/ tense and lax vowel pairs with duration ratios of 2.6:1, 2.6:1 and 2.5:1,
respectively. In contrast to this, native English control speakers produced lower
duration ratios of only 2.2:1 in all three vowel pairs. Moreover, the Arab groups
produced the native-like ordering of vowel duration for front vowels, but the order
among the back vowels differed due to transfer of L1 (Mitleb 1981, Munro 1993). That
is, the learners used their L1 productive strategies to produce English vowels. It is
possible to conclude that L2 learners of English need to be aware that the English
short vowels are not as short as those of their L1 (and that the long vowels as long as
those of Arabic). Linguistic theories describe ESL/EFL learners’ incorrect pro-
nunciation as the result of neurological development that occurs in the brain due to a
process of normal maturation at puberty. After this period the speech production and
perception systems become are specialized for the processing of only L1 sounds. The
specific native-language prototypes interfere with the L1 learner’s perception of some
L2 contrasts by acting as a perceptual magnet, which pulls L2 vowels towards the L1
prototypes. Thus, L2 vowel sounds which are located near an L1 vowel prototypes are
discriminated less readily than vowels that are not located near L1 prototypes. It has
been assumed that the phonetic ‘prototype’ for each sound category exists in memory
and plays a unique role in speech perception and production (Iverson and Kuhl 1995).
However, Flege (1976) found that the incorrect conceptual representations of English
sounds adopted by such learners are strongly responsible for speech production
problems. That is, in Flege’s Speech Learning Model (SLM), it has been hypothesized
that without accurate perceptual targets to guide sensorimotor learning of sounds,
production of the L2 sounds will be inaccurate. This is because learners of the L2 may
fail to perceive L2 sounds which are affected by the L1 (Flege 1995). The lack of
knowledge of the English vowels was also reported to contribute to English
pronunciation problems. Research results of some Sudanese secondary school learners
of English recently showed that phonological awareness is urgently needed for
intelligible speech. The results revealed that the subject group exposed to
pronunciation knowledge achieved better results than those who received no training
(Al Dawla 2005, Mohammed 1991). Similar problems with the production of the
English speech sounds are widely spread among Arabic speaking learners of English.
Similar problems manifest themselves in the perception of the English vowels when
Sudanese EFL learners are exposed to English. The learners have problems
discriminating between /G/ and /G+/ in words like let, shade, make, rate, etc. Moreover,
the English tense and lax vowels /+, KÖ, 7, WÖ/ are frequently substituted in words such as
beat/bit, sit/seat. Listeners also fail to deal with vowels such as pot, put, pert, cut, etc.
Very little has been written about the English vowel perception problems that Sudanese
university EFL learners face. However, in related studies, Huthaily (2003) reports that
Arabic native speakers misperceive /+, G, n, 3, nÖ/ due to the unfamiliarity of Arab
speakers with such a large number of vowels as those of English. Brett (2004) reports
that perception problems of English vowels experienced by Arabic EFL learners
probably occur due to the fact that their L1 (Arabic) lacks central vowels.
2.2.7.2 Learning problems of English consonants
Although Sudanese Arabic has many consonants that resemble those of English,
Sudanese EFL learners have difficulty understanding and pronouncing some English
consonants. In this sense, Sudanese EFL learners arguably fail to discriminate between
English fricatives such as /6, U, &, \/ in words like thin/sin and then, there/zero, zeal, etc.
The voiced labiodental /X/ is often substituted for /H/ or /D/ as in words like very/berry
and volleyball/bolleyball. Previous studies of Arabic speakers learning English manifest
similar pronunciation problems. The English consonants /R, X, U, 6, &, \, F<, 0/ are
reported to be difficult to produce for Arabic speakers (Al-Arishi 1992, Altaha 1995, do
Val Barros 2003, Jesry 2005, Ruhaif 2007). Moreover, do Val Barros (2003) states that
such types of pronunciation difficulties occur after puberty and are caused by the
interference of productive strategies of the mother tongue. Do Val Barros (2003)
explains that the English /0/ represents the highest percentage of pronunciation errors
made by such subjects. This is most probably because sound pairs such as /P~0/,
/D~R/, /X~H/ are allophones of one phoneme in the Arabic language, whilst they
present separate phonemes in English. In attempting to account for these types of
problems, Rababah (2003) states that pronunciation errors of Arab EFL students are
attributable to deficiencies in linguistic competence and to the differences that exist
between the English and Arabic pronunciation systems, resulting in communication
breakdown. Similarly, Patil (2000) explains that divergences like consonant devoicing
(mug pronounced as muck) cause communication to break down because they damage
essential phonological features, which play a significant role in intelligibility. The
replacement of English voiceless /R/ by voiced /D/ is common, which is attributable to
interference of the speaker’s L1. American listeners have difficulty recognizing English
stops /R, D/ which are produced by Saudi speakers as /D/, due to VOT differences
between the L1 and L2 inventories. In other previous studies, the vowel context effect
of VOT is assumed to derive from aerodynamic properties of the human speech
production mechanism. This effect is expected to manifest itself also in Arabic where
the VOT in /NV/, /MN/ was longer by about 10 ms, e.g. in Lebanese Arabic. The study
reported a small mean difference (52 ms) for /NV/, /MN/ in Arabic against (51 ms) in
English which is likely to be due to underestimation of the real VOT difference
between Arabic and English. This is because of the difference in vowel context and
because the subjects used to estimate the Arabic phonetic norm were speakers of
English L2 (and may therefore have produced Arabic stops that resembled English
stops in terms of VOT). Opportunely, neither the confounding factor of vowel context
and nor the subjects’ L2 experience weaken the assumption that voiceless stops in
Arabic and English differ in terms of VOT, a process which requires native speakers of
Arabic to produce voiceless stops with longer VOT values in English than in Arabic.
However, the confounding of vowel context does undermine the validity of the finding
that Arabic subjects shortened VOT when switching from Arabic to English. Most or
all of the observed ‘shortening’ of VOT, which averaged about 14 ms, was likely due to
the difference in vowel context in the Arabic and English speech material (Flege 1976).
This type of interference may occur on the level of phonetic implementation of a
certain phonemic feature, i.e. similar phonemes in different languages may have
different implementations, which cannot be easily grasped by EFL/ESL learners (Flege
and Port 1981, Rasmussen 2007). This phenomenon may support the assumption that
similarities of sound structure between two languages facilitate the learning of an L2.
However, other studies have proven the opposite in a learning situation where both L1
and L2 contain similar phones. That is, the learning of these sounds turns out to be
more difficult than learning new contrasting phonemes that are completely absent in
the L1. In other words, it is more difficult to acquire a sound in the target language
which is relatively similar to the native language than one which is substantially
different. Although no coherent explanation for this phenomenon has been
forthcoming, there is substantial literature documenting that similarities between the
native language and the target language can cause problems in L2 acquisition (Flege and
Port 1981, Eckman, Elreyes and Iverson 2003).
On the other hand, Sudanese EFL learners also have problems in understanding
English speech sounds. To my knowledge, very few reports have been provided about
these learners; however, arguably, there are interchangeable substitutions of the English
consonants /U/ for /6/, e.g. in words such as sick/thick and sink/think and /&/ for /\/
in words like then/zen. The recognition of English consonants such as /V5, F<, H, X/ also
prove to be difficult. Literature shows that Arabic EFL learners experience similar
perception problems with English consonants (Rasmussen 2007). The English
approximants /T, N, Y/ also present perception problems for the learners. The sound
/Y/ is often heard as /T, N/ as in rent/lent/went. For instance, a word like went is realized
as rent, which is probably due to similarity in the manner of articulation between these
approximants. This type of substitution error reveals a kind of linguistic development
where there is a phonological rule merging /T/ with /Y/. It reinforces the potential that
two different phonological representations are often possible for the same sound
(Hyman 1975). In terms of phonetics, the Arabic /T/ is an alveolar trill whilst the
English /T/ is a retroflex frictionless continuant, a voiced alveolar or post-alveolar
approximant, which is incorrectly produced by Arabic learners of English who treat it
as a counterpart to their mother tongue. Probably because the Arabic phoneme /T/ is
pronounced with more physiological effort than that of English, such a manner of
articulation results in an incorrect perceptual representation of the English /T/, which is
pronounced with less force (Khattab 2002).
2.2.7.3 Consonant clusters
2.2.7.3.1 Learning problems of English cluster consonants
English consonant clusters are expected to cause problems for Sudanese EFL learners.
There are arguments that the learners have problems understanding initial and coda
clusters in words like flow, clock, special, twelve, glass, string, proper, ground. Insertion of an
epenthetic vowel before or between the cluster members generally occurs. In the
literature, insertion of a vowel sound between the cluster members by Arab EFL
learners is reported in words such as cream, /MKTKÖO/, text /VGMKUV/, etc. (Patil 2006, Carlisle
2001). Similarly, the English affricate /F</ is often split by /K/, e.g., a word like bridge is
pronounced as /DTKFK</ (Rababah 2003). 10 An insertion of the English /+/ between the
members of the onset English obstruent clusters /U+ (/V, R, M, N, Y, P, O/) as such is
intended to facilitate producing cluster consonants of English. This is because clusters
such as /RN, RT, IT, UR, 6Y/, etc., or a three initial-segment cluster like /URT, UMT, UVT, URN/ are
totally absent from the Sudanese colloquial Arabic inventory (Kaye 1997). Arguably,
similar learning problems arise in the perception of English clusters, where Sudanese
EFL listeners misperceive English cluster consonants. In the previous studies, Arab
listeners of English use their L1 phonotactic constraints to identify English clusters
even when these phonotactics do not facilitate the perception of the target language.
An English cluster item like /PV/ is heard as /0M/, /RN/ as /DN/, RT/ as /RN/, /FT/ as /IT/,
and /6T/ as /VT/.
2.2.7.3.2 Phonotactic constraints across languages
Linguists believe that the sound sequences of languages are controlled by phonotactic
constraints that are encoded in the processing system of such languages. This principle
10
The pronunciation of village as /XKNKI/ has also been reported.
gives each language its own sound sequences, describing which sounds should take up
the initial position in a syllable and which ones occupy final positions. The types of the
representations that are used in the processing system of language to encode constraints
are the subject of an important area of debate in second-language studies. In English
constraints, the sound /0/ is not permitted to appear in all positions. So, one possibility
for /0/ is to appear in a syllable final position but there it cannot be preceded by long
vowels or diphthongs (Goldrick 2004). Other phonotactic constraints on English
syllable structure are that /V5, F<, &, \/ do not cluster in onsets and /N, T, Y/ only occur
alone or as non-initial elements in onset clusters. Moreover, /TJ, L, Y/ do not occur in
final position in RP and Australian English, although /T/ can occur in final position in
rhotic dialects such as American English.
Similar sound sequences also apply to English cluster consonants, which determine the
sound sequences that can appear in a syllable and the positions in the syllable where
particular sounds can occur (onset or coda). Thus, sequencing constraints govern which
sound classes should appear adjacent to each other and they aid the identification of
word boundaries. Differences of L2 phonotactic constraints often motivate perceptual
and phonological problems among L2 speakers, as previous studies show. Seo (2003)
reported that segment positional restrictions motivate phonological alternations on
similar consonant clusters, which result in poor speech perception. An account of
speech perception of some cross-linguistic patterning provides correct predictions that
homorganic C+liquid sequences are more likely to undergo phonological change than
heterorganic C+liquid sequences in a given language. Findings of cross-language
investigations of 31 languages from different language families show that nasal+liquid,
obstruent+liquid clusters (or sonorant+sonorant and obstruent+sonorant sequences)
of homorganic sequences like /PV,NV/ and are more vulnerable to phonological change
than those of heterorganic sequences /RT, DT RN, MT/ (onsets) and /NR, TM/ (codas).
Compared with heterorganic consonants, homorganic consonants have an additional
shared acoustic property, e.g., vowel formant transitions for the same place of
articulation, assuming that they are adjacent to a vowel. Thus, the two sounds in a
homorganic C+liquid sequence can be considered as being phonetically more similar to
each other than those in a heterorganic C+liquid sequence are. Moreover, phonological
change can also occur due to the absence of contexts with appropriate phonetic cues:
e.g., velar-to-alveolar shift is interpreted as a repair strategy. According to Kawasaki
(1982) and Ohala (1992, 1993), if two sounds in a sequence are acoustically and
auditorily similar, the degree of distinctiveness of the two sounds would be diminished
and thus they would be subject to modification. Vowel epenthesis is one of repair
strategies that occur due to phonotactic differences between L1 and L2. A good
example of this phenomenon is made manifest in the performance of some English
consonant clusters of Iraqi and Egyptian speaker groups. Both dialects have syllable-
structure conditions that disallow consonant clusters in word-initial position. Yet
speakers of each dialect modify English words with initial consonant clusters in a
different manner. Egyptian speakers will pronounce /HN7/ flow as [H+N7] whereas Iraqi
speakers will pronounce it as [+HN7]. Both pronunciations can be attributed to rules of
epenthesis in the native language that bring underlying syllable structures into
conformity with surface structure restrictions. In a word such as flow, the first
consonant is extra-syllabic (unassociated with a nucleus) and a vowel must be inserted
to which the consonant is resyllabified according to convention before it reaches the

surface structure. The Egyptian rule of anaptyxis inserts a vowel to the right of the
extra-syllabic consonant to which it resyllabifies, forming a CV syllable. In contrast, the
Iraqi rule of prothesis inserts a vowel to the left of the extra-syllabic consonant to
which it resyllabifies, forming a VC syllable. If the preference for the CV syllable had
been powerful, Iraqi speakers might have been expected to pronounce words such as
flow as [HKNQ] at least some of the time because such a strategy would have created a CV
syllable independent of L1 transfer. However, such pronunciation was not evident for
Iraqi speakers. The native English speakers also resyllabified Arabic to conform to
English syllable structure conditions (Carlisle 2001).
2.2.7.3.3 Sonority Sequencing Principle
One principle that has recently been established to treat consonant cluster sequencing
more adequately is the sonority principle theory. Phoneticians postulate various
phonetic features to characterize sonority. One feature is that the position of a segment
in a syllable is determined by its sonority. The most sonorous segments form the
peak/nucleus of the syllable, whereas the others are arranged around the syllable
nucleus according to their degree of sonority. In other words, there is a downward cline
towards the syllable margins, which starts from the peak of sonority. Thus, vowels are
the most sonorous sounds, followed in decreasing order by liquids, nasals, fricatives
and stops as in the following words: trip, drip, ripe, come and please (Clements 1990,
Gierut 1999, Gierut and Champion 2001, Ladefoged 1993). Sonority plays a prominent
role in accounting for phonotactic patterns across languages. However, this does not
mean it can account for every phonotactic matter or pattern since many constraints
have very little to do with syllable structure and lie, in this way, totally outside the
domain of sonority theory. For instance, there is a common constraint which requires
that obstruent clusters agree in voicing, and it operates not only within syllables but also
across syllable boundaries in many languages (e.g., French, Russian, Catalan) showing
its entire independence of syllabification. One constraint, which often overrides the
syllable contact principle, is the prohibition of a complex syllable onset. If a cluster is
composed of a sonorant plus obstruent or ends in one of a small set of obstruent
clusters, it is well-formed and requires no epenthetic vowel. In other cases, an
epenthetic vowel appears between its two cluster members (Clements 1990).
2.2.7 The effect of explicit knowledge
Explicit knowledge refers to language rules and vocabulary items that second/foreign
language learners acquire through instruction (teaching). The learners will be able to
reflect this knowledge directly in their actual use of the target language (Krashen 1985,
Ellis 1994). Thus, the concept of explicit knowledge implies two considerations
involving second/foreign language learning. Firstly, the learners’ explicit knowledge
develops due to the learning experiences in which they acquire explanations of the ways
the target language functions. Secondly, compared to implicit knowledge, explicit
knowledge is an essential element for language acquisition, particularly for adult learners,
when the task of acquisition demands paying attention (Schmidt 1992).11 Most probably,
these considerations represent part of the reasons why linguists focused on explicit
knowledge designing the pedagogical materials for second/foreign-language teaching.
In designing these materials, some linguists focused on ‘form’, which can include
grammar points, vocabulary items, a language function or pronunciation (Ellis 1994).
According to Venkatagiri and Levis (2007) explicit explanations of structural properties
of the target language pay off in all of these aspects but are highest in the area of
pronunciation. Explicit knowledge of phonology should therefore play an important
role in improving the pronunciation accuracy of learners. In other words, conscious
knowledge of L2 speech sounds can help learners to achieve correct perception and
production of a second language.
Specifically, I argue that insufficient explicit knowledge of English speech sounds

affects the performance and intelligibility of Sudanese EFL learners. It is assumed that
the learners need to master English language knowledge learnt/taught in a course. This
knowledge refers to the actual input that the learners understand and can manipulate to
choose, articulate or interpret English speech sounds in individual words and
connected speech. For instance, they need to learn how to distinguish between
phonemes and allophones in English. Most EFL learners do not understand the
relationship between allophones and phonemes of English and what distinguishes them
from those in their native language. This is because each language has its own set of
phonemes that are recognizable either on the articulatory level, by the presence of a set
of relevant organic features, or on the acoustic level, by the presence of a set of
distinctive sound features. Thus, phonemic features give a language its identity and
form a basis upon which this language is built, but at the same time, trigger learning
problem for L2 learners.
Related studies describe attempts at teaching explicit knowledge of (aspects of) the
sound structure of the target language to foreign language learners. In the Sudanese
context, it was shown that teaching explicit knowledge improved the quality of the
learners’ English pronunciation (Al Dawla 2005, Fahal 2004). Earlier, Munro (1993)
found that the production of English vowels by Arabic speakers improved with
increased training, i.e. through increased knowledge of the target sound system.
Sufficient evidence of the importance of the explicit knowledge to ESL/EFL learners is

manifest in different learning situations. There are some difficulties in learning the
phonemes, e.g., the difference between /R/ and /D/ is a subtle one, but it makes a
11 According to Krashen (1985) and Bjarkman and Hammond (1989), implicit knowledge refers
to the tacit or subconscious knowledge which is developed and stored in the form of
generalizations during the learning of the target language. Linguists claim that a newborn baby
starts language acquisition from a genetically determined zero stage, proceeding forward to a
complete state of language knowledge using its subconscious (i.e. implicit) knowledge. Implicit
knowledge forms the available knowledge that learners need in order to acquire a second
language. If learners are at stage ‘i’ of language development, for example, they can acquire i+1 if
they comprehend an input item including i+1.
significant difference in meaning (Dahlquist 2002, David, Shirley and Dickson 1999).
Along the same line, mastering explicit phonemic knowledge, the learners will be able
to judge or differentiate between acceptable phonemic sequences and unacceptable
ones; e.g., in English an /UV/ cluster is acceptable, while /UH/ is not. In a wider context,
the variation of phonemes from word to word and from speaker to speaker makes the
learning of phonemes more complicated. However, if L2 learners have background
knowledge of this variation, they will achieve intelligible speech. According to Carr
(1999) the acquisition of native-like English pronunciation is a difficult task that
requires much effort, especially for learners past the age of puberty, but is a very
important element to avoid frustration among the speech participants. This means the
learners need to explicitly know more about the phonemes, i.e., they need to focus on
sound units (Gussenhoven and Broeders 1976).
2.2.9 Miscellaneous issues
Orthographical issues. The phonemic system of a language is related to its writing system.
Therefore, a sort of reference to spelling in the language should take place that gives
guidelines on pronunciation and perception of speech.
Quite apart from the difference in symbols, the difference between the English and
Arabic writing systems results in speech intelligibility problems. English has a complex
orthography, whilst the Arabic orthography is phonemic, such that one letter represents
one sound. These differences cause Sudanese learners of English to have pronunciation
problems. Historically, in most languages, members of the speech community learned
orthography from their elders. If it is supposed that there is a time when the
relationship between letters and sounds is clear and direct to those first created forms,
it will not remain the same as time passes. This is simply because the following
generations will not understand this relation and consequently a problem rises in
pronouncing words. The physical preservation of written forms resulted in the rise of
conservation practices in orthography by virtue of which the graphic form remains
unchanged, while the spoken form undergoes modification. The English word knight,
for example, originates from German, which presents a cognate of knecht. English
conservative orthography writes it as knight and it pronounces it as night /PC+V/. There
are many English words with similar spellings that have come to be pronounced
differently: e.g., plough, through, rough or roll, doll, home, come, etc. Thus, English ortho-
graphy is inadequate in comparison to orthographic systems of other languages. In
addition to the complex nature of English spelling inherited from the past, there are
idiosyncrasies in spelling that make it tricky to use. Idiosyncrasies refer to the large
number of consonant and vowel sounds varying from one dialect to another but which
give poor links to letters. Such relations make prediction of pronunciation difficult to
ESL/EFL learners. For instance, some learners have difficulty figuring out what
phoneme the digraph ‘th’ represents in words such as thin and then. This is because ‘th’
has two perceptual representations in English: voiceless and the voiced dentals /6, &/
(Heffner 1975). English is a language which has borrowed words from various
languages, such as Latin, Greek, Arabic and Russian. This feature makes English
pronunciation problematic, particularly for non-native speakers, since the relation
between letters and sounds in many of these borrowed words is not clear. Consider, for
example, words such as tchotchkes, chemical, alcohol, gnocchi, in which some letters are
written but not pronounced. Learners of English who come from a linguistic
background of simple orthography systems need to make much effort to learn how to
pronounce such words. Moreover, some borrowed words retain their spelling and
pronunciation of origin. In the 14th century, there was such a tendency, which
motivated an enthusiasm for things but in a neoclassical style. Such tendency allows the
spellings of words to undergo adaptation where words like nacioun changed its spelling
to nation while ‘gg’, which denotes ‘jh’, has been substituted for ‘dg’ in word final
position. Furthermore, spelling differences for the same sounds exist simply because
such sounds are pronounced differently, e.g., ‘ee, ea’. Later, the spelling of words
containing this newly unified sound had stabilized, so the double spellings were
preserved. Long vowel sounds witnessed a shift from a continental pronunciation that
is more like Spanish or French to the current one, after which the vowels took two
forms, short and long, as in ship/sheep, etc.
Teaching background. English pronunciation receives little space in the syllabus taught at
the primary, secondary and tertiary level in Sudan. Arguably, very few pronunciation
lessons are ever interspersed between the syllabus items, which represent inappropriate
and often insufficient phonological and phonetic components for EFL learners. There
are very few lessons that treat the basics of English speech articulation in high schools,
whereas only two or three courses are taken by university students of English that
present issues such as descriptive phonetics and listening comprehension skills.
The problems of teaching EFL in the Sudanese context are attributable to many factors.
According to Mitchell and El Hassan (1993) there are no practical teacher books to be
used during the teaching of English. This means teachers perform language teaching
depending on their own experience, which is not always scientific.
Moreover, results of a related study (Fareh 2010) revealed that the teaching of EFL, in
Sudan and other Arab countries, forms a challenge which arises due to a number of
reasons. Text books used do not consider many of the essential educational
requirements such as the learners’ level of English, attitudes, interests, etc. Their
contents are not authentic, and these are presented at a high level of language
demanding much from the learners. Moreover, the content of these courses does not
meet the needs of the learners and are often too large to finish within a term or
semester. Furthermore, the teaching strategies are typically teacher-centered in which
the learners have little opportunities to practise language skills in the target language.
This situation is exacerbated by the use of inappropriate methods of language
instruction, e.g., teaching English pronunciation and listening skills are not always
carried out by the use of language labs.
Arguably, assessment of EFL in Sudan largely focuses on the learners’ writing and
reading abilities while listening and speaking, including pronunciation, receive little
attention from assessors. Consequently, the learners do not show much development in
the learning of these skills.
2.2.10 Summary
This section provides a summary of chapter two. The chapter reviewed the testing
methods used in the measurement of speech intelligibility in second or foreign language
studies. It also reviewed the contributions of previous literature on speech intelligibility
problems that are faced by Sudanese EFL learners.
1. Several methods and tests deal with speech problems; however, previous studies
show that the Modified Rhyme Test and SPIN represent a highly adequate
approach to speech intelligibility measurement.
2. There is wide a range of phonetic and phonological differences between English,
which represents the target language, and Arabic, which represents learners’ L1.
These differences are worthy of study and are assumed to form a potential source
of learning difficulties for Sudanese EFL learners.
3. Vowels represent the most difficult area of English sounds for Sudanese EFL
learners to understand and produce. Previous studies refer to L1 effects, wrong
implementation or lack of knowledge of English phonetics and phonology.
4. English consonants are less problematic for the learners; however, learners have
difficulty identifying and pronouncing some English consonants such as /U, 6/, /\,
&/, /5,</, /V5, F</, /0/ and /R, X/.
5. The learners face more problems in their attempts to pronounce onset and coda
consonant clusters. Coda clusters are more difficult to understand than initial
clusters. Consonant clusters such as those that occur in English do not exist in the
Arabic language. Therefore, adult L2 learners are equipped with their L1
phonotactic constraints and have to deal with the mismatch that exists between L1
and L2.
6. Related studies have provided few accounts of the phonetic and acoustic correlates
of the learning problems experienced by Sudanese EFL learners. Therefore, much
more profound investigation is necessary to provide a clearer picture.
7. Arabic learners of English often perceive the phonological principles of English;
however, they fail to implement them and this is attributable to the paucity of L2
knowledge.
8. Studies of English as a second or foreign language use native speakers of English
as control groups/model speakers for comparative purposes. Error analysis based
on the differences which exist between learners’ performance and that of native
speakers. Differences are highly predictive of difficulties experienced by the
learners, whilst similarities imply fewer problems manifested in the learning of L2
speech sounds. Several types of errors have been detected in related studies, which
include substitutions, conflations, confusion, developmental interlingual errors and
insertion/deletion.
9. The study of the perception and pronunciation problems in English dealing with
Sudanese EFL learners receives little attention. The school and university
syllabuses give insufficient space to the teaching of these aspects of knowledge,
whilst the way these skills are taught is inadequate and traditional.
10. The related literature shows that most English pronunciation and perception
errors are due to the following: (i) the intricate nature of the English vowels, (ii)
unfamiliarity of ESL/EFL speakers with large numbers of vowel sounds, (iii)
incorrect perceptual representations of English vowels and (iv) by-product of

ineffective teaching and lack of exposure to L2.
Chapter Three
Intelligibility of RP English
to Sudanese listeners
3.1 Introduction
This chapter aims to present experimental evidence for the causes of speech
intelligibility problems which Sudanese university EFL learners face. The investigation
attempts to account for the linguistic factors that are assumed responsible for these
problems. In a recognition task of L2 speech, for example, the learners’ L1 represents
one of the linguistic factors affecting the learning process. That is, ESL/EFL learners
are sensitive to the speech sounds of their mother tongue, most of which are easily
intelligible to them. This means they do not have problems identifying sounds in their
own language. However, problems arise when the learners are involved in perception
tasks using second or foreign language speech. These problems form one of the urgent
ESL/EFL issues which require measuring the learners’ receptive intelligibility. The
measurement of receptive intelligibility addresses the listener’s ability to recognize the
acoustic waveform produced by the speakers as string of meaningful units (words) (see
Kent, Dembowski and Lass 1996). Among the different types of instrumental analysis
which treat speech recognition, segmental intelligibility measurement can be considered
an advantageous method. Therefore, this study is done on the basis of segmental
analysis of vowels, single consonants and consonant clusters of English. It targets the
types of identification errors made by the Sudanese listeners in the native speech,
accounting for issues like how vowels, consonants and clusters of English manifest
themselves as learning problems. Specifically, it is assumed that many reasons are
responsible for the intelligibility problems among Sudanese EFL learners. In an EFL
/ESL context, previous studies revealed that differences in phonetic and phonological
implementation in a learner’s mother-tongue often result in misperception of the
speech sounds of L2. According to Fokes, Bond and Steinberg (1985) Arab listeners of
English are inconsistent in identifying aspirated and unaspirated voiceless stops. They
have more difficulty with the labial than the alveolar categories. The identification
problem is attributed to the effect of the place of articulation of the stops and the
identity of the vowels. Moreover, voicing decisions at the labial place of articulation are
more difficult than at the alveolar place for all subjects. Acoustically, intelligibility
problems faced by EFL/ESL learners often occur due to the influence of consonants
on vowels as an example of the ways in which speech sounds interact in different
phonetic environments. Therefore, listeners need to know that in some environments,
the English vowel /KÖ/ as in beat, bead should not be realized precisely the same as /KÖ/ in
peat, keep which often reduces the intelligibility of a foreign learner of English (Allen
and Miller 1999). Problems such as these also require drawing attention to the learners’
explicit knowledge of English speech sounds. Many of the error analysis of L2 speech
sounds point out that learners’ misperception of L2 pronunciation are the result of
partial learning, orthographical differences, and so on, which support the hypothesis
that when L2 norms are lacking, learners usually fall back on habits of their mother-
tongue. This chapter attempts to examine experimentally the negative effect of two
linguistic elements on speech intelligibility of Sudanese EFL learners: (i) transfer of the
learners L1 (Arabic) and (ii) lack of explicit knowledge of English speech sounds.
Finally, this theme is discussed into four sections where each section integrates with the
others in a way as to provide coherence between the components of such sections.
3.2 Method
3.2.1 Intelligibility tests used
Intelligible speech is defined as speech that is understood by native speakers (Munro et

al. 2006). This means that speech intelligibility is principally a hearer-based construct
that depends on interaction in an appropriate context involving the comprehension of
the message between the listener and the speaker. It is also possible to refer to speech
intelligibility as any successful communication that involves both native and non-native
speakers of English, because the final goal of such speech is understandability. Since
listeners of this study are expected to have an incorrect conception of English speech
sounds, focus will be on examining vowels, consonants and consonant clusters, in part,
because they form the basic sound knowledge of the English language, the mastery of
which is required for perfect learning of speech. And second, because the assessment of
whether speech is intelligible or not is attributed to segmental factors. It has been
claimed that more than 50% of speech intelligibility is accounted for on the basis of
speech sounds (Pascoe 2005).
The Modified Rhyme Test (MRT) was used in the experiments. The MRT is considered
to be a highly accurate and reliable measure of intelligibility (Logan, Greene and Pisoni
1989). Speech intelligibility measures involve word identification tasks in a closed set of
four alternatives, where the listeners are asked to select the response they think the
speaker intended. The score is the number of correctly responded-to items. Test items
normally target phonemes, multi-phonemes, or words. Phonemes refer to vowels and
single consonants, whilst multi-phonemes refer to consonant clusters. The formal
assessments of phonemes and multi-phonemes interpret the responses as either
intelligible or unintelligible; a score of (close to) 100% is interpreted as completely
intelligible performance (Lafon 1966). Word intelligibility, on the other hand, was
determined on the basis of final words embedded in short redundant SPIN sentences.
SPIN is an abbreviation of ‘Speech Perception in Noise’ Test (Kalikow, Stevens and
Elliott 1977, Wang 2007, Wang and Van Heuven 2007). The test asks listeners to
recognise 25 keywords embedded in meaningful and highly predictable sentences, as in
She wore her broken arm in a sling (keyword underlined). Listeners only write down the
final word that they think they heard in each sentence. This part of the SPIN test
proved to be efficient at assessing speech recognition abilities (Rhebergen and Versfeld
2005). Although the listeners’ performance is primarily quantified in terms of number
of whole words correctly recognized, partially correct answers are also important since
CHAPTER THREE: INTELLIGIBILITY OF RP ENGLISH TO SUDANESE LISTENERS 55
they give information about the perception of specific phonemes in onset, nucleus and
coda position.
3.2.2 Participants
3.2.2.1 Sudanese listeners of English
The subjects of the study were ten Sudanese university English students in the
Department of English at El Gadarif University in the Sudan. The subjects involved in
these experiments specialized in English language teaching (Teaching English as a
Foreign Language, TEFL). They had studied for six semesters when they participated in
the listening test. During the period of study, which extends for four years, students
attended three courses in the field of pronunciation; these are (i) an introduction to
phonetics, (ii) phonology and (iii) practical phonetics, delivered in three successive
semesters. They also attended two classes on English listening skills, which usually take
place in semesters one and three. English is treated as a foreign language (not a second
language), the learning of which starts in the fifth year of primary school and continues
at secondary schools for three years. English lessons obtained during these stages vary
between 5 and 6 hours per week; English is treated as a school course that provides
basic principles of the language in a traditional way of language teaching.
3.2.2.2 Native speaker of RP English
The test materials were produced by one male native speaker of RP English. The
speaker was asked to read the test material with RP accent. He received advice to
perform constant reading.
3.2.3 Overall structure of the test battery
The experimental stimuli included four tests. These were (i) a vowel test, which was
composed of minimal quartets including short and long vowels as well as diphthongs,
(ii) single consonants in either onset or coda position and (iii) consonant clusters in
onset or coda position. These target sounds were embedded in meaningful C*VC*
words (where C* stands for one to three consonants). (iv) The fourth test comprised 25
sentences taken from the high-predictability set included in the SPIN (Speech
Perception in Noise) test (Kalikow et al. 1977). These are short everyday sentences in
which the sentence-final target word is made highly predictable from the earlier words
in the sentence, as in She wore her broken arm in a sling (target word underlined). Word
stimuli in the first three tests were embedded in a fixed carrier sentence Say…again,
which insured a fixed intonation with a rise-fall accent on the target word. The vowel
and the single consonant tests contained items on each individual vowel or consonant
phoneme in the RP inventory. 12
12
Inadvertently, the vowel test did not include an item targeting the vowel /7/ as in boat.
The consonant test targeted all the consonants in onset position and in coda position.
For the cluster test, the number of test items had to be limited as the total inventory of
onset and coda clusters is very large; including all the clusters would have been too
demanding on the listeners. Nine onset and eight coda clusters were selected that
present potential problems to Sudanese-Arabic learners of English (Allen 1997, Patil
2006). All items in the tests were chosen such that they occurred in dense lexical
neighbourhoods, i.e. there should be many words in English that differ from the test
item only in the target sounds. For instance, the vowel /+/ was tested in the word pit,
since the /p_t/ consonant frame can be filled by many other vowels, as in peat, pet, pat,
pot, part, port, put, putt and pout. These so-called lexical neighbours, differing from the
target word in only the identity of the test sound, make up the pool of possible
distracters (alternatives) in the construction of the MRT test. When selecting the three
distracters needed for each test item, lexical neighbours that differ from the target in
only one distinctive feature, were preferably selected. For the target pit, we selected
alternatives with vowels that differed from /+/ in just one vowel feature, i.e. pet
(differing in height), put (differing in backness) and pot. The latter alternative differs
from the target in both height and backness; this solution was preferred over the one-
feature difference in peat (or Pete) as it was decided to exclude proper names and low-
frequency alternatives as much as possible, which may show a larger decrement in
recognition than high-frequency words. The full set of test items is included in the
Appendix.
3.2.3.1 Tests materials
The stimulus sentences were typed on sheets of paper (one sheet for each test) and
then read by a male native speaker of RP English. Recordings took place in a sound-
treated room. The speaker’s voice was digitally recorded (44.1 KHz, 16 bits) through a
high-quality swan-neck Sennheiser HSP4 microphone. The speaker was instructed to
inhale before uttering the next sentence so that each utterance would have
approximately the same loudness, intonation and temporal organisation. The target
words were excerpted from their spoken context using the high-resolution digital
waveform editor in the Praat speech processing software (Boersma and Weenink 1996).
Target words were cut at zero-crossings to avoid clicks at onset and offset. Target
words and SPIN sentences were then recorded onto Audio CD in seven tracks. The
first track contained two practice trials for the vowel test and was followed by track 2,
which contained the 19 test vowel items. Tracks 3 and 4 contained the practice and test
trials for the single consonant tests and tracks 5 and 6 contained the cluster items.
Track 7 comprised the 25 SPIN sentences with no practice items. In the single
consonant and cluster tests trials targeting onsets preceded the items targeting codas.
Other than that, the order of the trials within each part of the test battery was random.
Trials were separated by a 5-second silent interval. After every tenth trial, a short beep
was recorded, to help the listeners keep track on their answer sheets.
3.2.3.2 Test procedure
The stimuli were presented over loudspeakers in a small classroom that seated ten
listeners. Subjects were given standardized written instructions and received a set of
answer sheets that listed four alternatives for each test item. They were instructed for
each trial to decide which of the four possibilities listed on their answer sheet they had
just heard on the CD. They had to tick exactly one box for each trial and were told to
gamble in case of doubt. Alternatives were listed in conventional English orthography.
In the final test (SPIN), subjects were instructed to write down only the last word of
each sentence that was presented to them. There were short breaks between tests and
between presenting the practice items and test trials. Subjects could ask for clarification
during these breaks in case the written instructions were not clear to them.
I will now present the results of the test battery in four sections, one for each test. Each
section will first outline the structural differences between the sounds in the source
language (Sudanese Arabic, SA) and in the target language (RP English). Such com-
parisons may help understand why certain English sounds are difficult for Sudanese
learners and others are not.
3.4 Overall results
In this part, I present the results and the discussion of four sections separately which
include vowels, consonants, clusters and SPIN sentences of English.
3.4.1 Vowels
Figure 3.1 presents the percentage of vowels correctly identified by the Sudanese-
Arabic university students broken down by target vowel. As is shown by Figure 3.1, the
listeners overall correctly identify no more than 47.8 percent of the English vowel
tokens spoken by the native speaker. However, responses to individual vowels differ
widely, with percentages anywhere between 0 and 100. In detail, there is a complete
failure in the recognition of the short vowel /¡/ and the long vowel /#Ö/. These are
followed by high rate of misperception of the lax English vowels /+/ and /7, G, n/.
Similarly, tense vowels /«Ö, WÖ/ and diphthongs like /G, W, G+, C+, +, #7/ also proved to
be problematic. However, listeners show no errors in perceiving the two vowels /n+, nÖ/,
while few errors are made in the perception of the short vowel /3/. The low
percentage reveals that listeners find the perception of the English vowels difficult due
to different reasons, which will be discussed later.
Figure 3.1 Mean percentage of native RP English vowels correctly identified by Sudanese
listeners broken down by target vowel. Error bars represent ±1 Standard Error.
Table 3.1 shows the results in more detail. This is a confusion matrix with the stimulus
sounds (‘target’) presented to the listeners listed in the rows and the responded vowels
(‘Perceived RP vowels’) listed in the columns. Correct responses are listed in the cells
along the main diagonal of the matrix (indicated in bold print), while incorrect
responses (so-called confusions) are located in off-diagonal cells. Confusions that occur
in 30 percent of the cases or more have been highlighted in the matrix (grey-shaded
cells). These cells identify types of errors that point to specific difficulties on the part of
the listeners.
Table 3.1 Confusion matrix of 20 English vowels and diphthongs (in the rows) perceived by ten
Sudanese-Arabic listeners (in the columns). Correct responses are on the main diagonal, indicated
in bold face. Confusions ( 30%) are in grey-shaded cells. The vowel /7/ should have been
presented but was not.
Perceived RP vowels
Target
¡ «Ö #Ö 3 #7 C+ G G G+ + KÖ + n nÖ 7 n 7 WÖ 7
¡ 0 1 9
«Ö 4 1 2 3
#Ö 0 1 9
3 9 1
#7 5 1 4
C+ 3 5 2
G 3 5 1 1
G 2 6 2
G+ 1 1 8
+ 5 2 3
KÖ 1 4 5
+ 7 3
n 3 7
nÖ 10
7
n+ 10
7 2 8
WÖ 4 6
7 3 7
3.4.2 Discussion
The perception of the English vowels forms a serious problem for Sudanese Arabic
listeners of this study. The listeners frequently confused the low central short vowel /¡/
with the peripheral low and back short vowel /n/, whilst the half-open vowel /«Ö/ was
identified as /nÖ/ because their L1 (Arabic) inventory lacks central vowels (Brett 2004).
It is most likely that linguistic differences between the listeners’ L1 and L2 have a
negative transfer (mapping model, cf. Kuhl 2000) on the listeners’ perception process.
That is, listeners are not familiar with the types of vowels needed in English because
they are not distinguished in the Arabic phoneme system. Therefore, they tend to
mentally equate L2 vowel sounds to their L1, so that the non-native listener fails to hear
a contrast between two sounds that native listeners and listeners of the target language
make.
In a similar case Tomokiyo, Black and Lenzo (2003) reported that difficulty to achieve
inter-coder agreement between Arabic and English vowels. Especially the presence of
an /G/ or /Q/ vowel proved difficult for the Arabic listeners to identify with a great deal
of consistency. Tomokiyo et al. refer this to the influence of Modern Standard Arabic
(MSA), where formal marking (i.e. in the writing system) indicates the existence of only
/C, K, W/. More importantly, duration often has a negative influence on the recognition
of English vowels. This appears in several cases where the Sudanese listeners conflated
/7/ with /WÖ/ and /+/ with /KÖ/ and confused /#Ö/ with /nÖ/. Such a type of error
motivates the hypothesis that duration is an important acoustic cue used in cross-
linguistic speech perception (Hillenbrand and Clark 2000). According to Hillenbrand
and Clark, due to duration shortening the vowel, /3/ tends to be heard as /'/ and /#/
as /n/, whilst the lengthened /'/ tends to shift to /¡/ and /¡/ to /#/ or /n/, a change
process which leads to confusion. However, Hillenbrand and Clark observed only
minor changes in the perception of /7, WÖ/ and /+, KÖ/; for these vowels the effect of
incorrect duration was negligible. This account implies that duration cue does not have
serious effects on all short/long vowel contrasts. A more specific case was reported by
Munro (1993) that the English vowels interpreted by Arabic groups (including
Sudanese) manifested the same ordering of vowel duration differences for front vowels,
but a different ordering for back vowels. This means that if the English vowels
perceived or produced by Arabic speakers tend to longer/shorter, it is probably not
because their L1 is a quantity language in which length is an intrinsic element that
requires vowels to be realized as either short or long (geminated). Rather, it is because
similar cues are used in English. This data raises the prediction that English tense-lax
vowels are close to Arabic long/short vowels in terms of quality and duration.
Moreover, it is possible to account for such perception errors as inadequate knowledge
of English vowels, which prompts listeners to conflate, guess, or fall back on their L1
norms (Flege and Font 1980, Fokes, Bond and Steinberg 1985, Walker 2001). It is also
probable that because Sudanese listeners descend from a language background with a
small number of vowels, they find the perception of the English vowels difficult.
According to Cruttenden (2008) this is most predictable in those areas where vowels
are close together in the vowel space, so that confusions are possible within these areas:
/+, KÖ/, /7, WÖ/, /G, 3/, and /¡, nÖ, b, #Ö/.
In conclusion of this section, it should be noted that there is also confusion in the
group of diphthongs. The diphthong /#7/ is misidentified as /7/, /+/ as /G/ and
/C+/ as /G+/. Misidentification of such English vowels can be attributed to the fact that
each two confused diphthongs share at least one sub-phone; a feature which serves to
complicate the perception task for listeners.
3.4.3 Onset and coda consonants
Figure 3.2 shows the results of the perception test of the ten Sudanese listeners for the
English single consonants, presented to them in the onset of syllables (upper panel) or
in the coda (bottom panel).
Figure 3.2 Mean percentage of correctly recognized consonants by ten Sudanese listeners, broken
down by 24 target consonants. Lower and upper panels present the results for coda and onset
consonants, respectively. Error bars are ± 2 Standard Errors.
Figure 3.2 shows that the overall identification of the onset consonants is better than
that of coda consonants, with means of 95% against 75% correct. In onsets, listeners
show near-perfect perception of stops /D, V, F, M/ and the fricatives /H, X, U, 5/ as well as
/O, P, J, L/. However, a few errors were made in the identification of voiceless labial
/R/ and voiced velar /I/. Listeners also substituted /I/ for /M/, which are produced at
the same place of articulation (velar) and /d</ for /F/. Other errors occurred in the
recognition of the voiceless fricatives /6/ and the voiced /\/. Here listeners confused
the voiced /\/ with voiceless /U/, /6/ with /U/ and /F/ with /&/ whilst /R/ was
perceived as /V5/. An interesting finding is that listeners were observed to frequently
perceive the retroflex /T/ as /Y/.
Table 3.2 presents the Sudanese listeners’ perception of English onset consonants in
more detail. The diagonal line running across the table displays the correct scores of
perception while the scores scattered around it represent the problem areas.
Table 3.2 Confusion matrix of 22 English onset consonants (in the rows) as perceived by ten
Sudanese-Arabic listeners (in the columns). Further, see Table 3.1.
Perceived RP consonants
Target
D V5 F & H I J F< M N O P R T U 5 V 6 X Y L \
D 10
V5 9 1
F 10
& 2 8
H 10
I 9 1
J 10
F< 10
M 10
N 10
O 10
P 10
R 1 9
T 6 4
U 10
5 10
V 10
6 1 1 8
X 10
Y 1 9
L 10
\ 1 9
Compared with onset consonants, results in Figure 3.2 show that more errors are made
by the listeners in the perception of coda consonants; the overall mean percentage of
correctly identified consonants is poorer for codas than for onsets. A confusion of 90%
was made in the recognition of the voiceless stop /R/ as /F/, /M/ and /P/. Listeners
also made errors in the perception of /I/; i.e., they confused /I/ with/M/ and /I/ with
/P/. Conversely, they confused /M/ with /I/ and /M/ for /V/, whilst /V/ was
misidentified as /F/ or /M/. Nasal codas proved to be a problematic area of perception
where the confusion rate ranged between 50% and 60%. For example, listeners
frequently confused /0/ and /O/ with /P/. On the other hand, labio-dental /H/ was
confused with /X/ and /X/ with /\/. Listeners show very few errors in identifying /D/,
/U/ and /V5/, while they made no errors in the perception of /N/, /5/ and /F</. The
confusion matrix of coda consonant perceptions is presented in Table 3.3. In the table
the plosives /R, V, F, M, I/ appear more problematic, whilst /N, 5, F</ were perfectly
perceived.
Table 3.3 Confusion matrix of 19 English coda consonants (in the rows) perceived by ten
Sudanese-Arabic listeners (in the columns).
Target
D V5 F F< H I M N O P 0 R U 5 V 6 X \ &
D 8 1 1
V5 9 1
F 6 1 3
F< 10
H 6 4
I 5 3 2
M 1 6 2
N 10
O 6 4
P 7
0 1 3 5
R 3 5 1 1
U 9 1
5 10
V 2 1 5
6 4 6
X 7 3
\ 6 4
& 1 9
3.3.4 Discussion
One of the findings is that the Sudanese listeners confused English /T/ and /Y/. This
finding supports the claim that the learners’ production of L1 sounds often influences
the way they perceive an L2 counterpart. That is, the /T~Y/ glide confusion is very
likely because the English /T/, which is not a trill but a frictionless continuant, is
mistaken for the nearest vowel-like sound in Arabic, which would be /Y/. There are
strong indications that /Y/ is perceptually close to English /T/. There is a sound
change in progress in which young speakers of English now pronounce onset /T/ as
/Y/ (see Watt, Docherty and Foulkes 2003). In the majority of English accents /T/ is
articulated as a voiced alveolar or post-alveolar approximant. The retroflex variant of
/T/ is distinguished by a particularly low F3 that is close to F2, while energy above F3 is
normally weak due to the existence of two anterior constrictions in the vocal tract, one
made by the tip or blade of the tongue and the other by the narrowed lip. The Arabic
/T/, on the other hand, is normally a tap or an alveolar trill that requires vibration of the
tongue against the ridge. Allophonic variation is mainly concerned with the distinction
between single and geminate /T/ in intervocalic position, whereby single /T/ is
produced as a tap and geminates as trills (as they are in Spanish). Because of these
phonemic and acoustic features, the substitution of /Y/ for /T/ can occasionally occur
(Khattab 2002). This conclusion suggests that such a type of problem occurs due to the
learners’ lack of knowledge and to insufficient practice of the English /T/ as a post-
alveolar approximant.
On the other hand, the replacement of /I/ by /M/, /\/ by /U/, /H/ by /X/, and /6/ by
/U/ shows a systematic pattern of errors. The first two errors are a shift of voiced to
voiceless sounds. These cases are produced at the same place of articulation; the sounds
/I/ and /M/ are velar, while /\/ and /U/ are alveolar. It is most probable that the
perception errors /I, M/ and /\, U/ are the result of the effect of similarity of the place
of articulation. However, it is possible to suggest that errors such as these can occur
due to a violation of the norm of the voiced/voiceless feature; i.e. these sounds are
probably substituted because the voicing feature is not distinguished, or resists learning.
However, Flege and Font (1981) attribute this type of error in English stops to the
place of articulation rather than to voicing. Additionally, the confusion of /6/ for /U/ is
probably caused by interference of the perceptual strategies of the listener’s L1 where
the English (inter)dental /6/ was mistaken for the nearest Arabic sound, which is the
(alveolar) dental /U/. The substitution of /&/ for /\/ and /6/ for /U/ is often attributed
to the L1 effect. That is, in the consonant inventory of Sudanese and other Arabic
dialects, the interdental /6, &/ merged with the apico-dental (often labelled as alveolar
or sibilant) /U, \/ (Corriente 1978, Dickins 2007, Karouri 1996, Watson 2002). Thus, an
Arabic word like /J3Ö&C/ ‘this’ is pronounced as [J3Ö\C], whilst /63ÖDKV/ ‘firm’ is
pronounced as [U3ÖDKV], a problem which is reflected in the perception of L2 speech
sounds. The affricate /V5/ was also misperceived as /R/ because the articulation of the
two stops /V/ and /R/ involves a complete closure followed by a release. This makes
listeners think of affricates as stops with a slow fricative release. It is very common
among L2 interlocutors that when there is background noise or unfamiliarity with the
speaker’s accent, intelligibility is compromised (Ball and Rahilly 1999, Subramaniam and
Ramachandrainh 2006).
In comparison to the onset, the perception of the coda consonants proved to be

difficult for the Sudanese listeners. The listeners made more errors in the perception of
the voiceless stop /R/, which was substituted for /M/, /F/ and /P/. They also
substituted /V/ for /F/ and /M/. This can be attributed to several factors. First, the
sameness of the manner of articulation of such sounds; i.e., the sudden burst required
in producing /R, M/ and /V, M/ makes such phonemes sound similar. When all acoustic
correlates of L2 are not easy to pick up, listeners are forced to guess the identity of a
stop; consequently they will choose the nearest place of articulation, or sound features
that are relevant to the intelligibility of their native language which compromises
recognition accuracy (Gimson 1989). Second, the differences that exist between Arabic
and English in both the phonetic detail specifying the voicing contrast and the stop
inventory, add to problems. In Arabic, the voiceless stops are aspirated, while there is
pre-voicing for syllable-initial stops. English stops, on the other hand, exhibit a voicing
contrast at all points of articulation; bilabial, alveolar and velar. These differences
function as sufficient cues for the distinction between the stops. Regardless of such
differences, in perceiving the English stops particularly in cases like /R, F/ and /V, F/
the Sudanese listeners use the acoustic correlates of Arabic stops instead, which triggers
the confusion. This type of error of English stops is described as a wrong
approximation of the length of the vowel duration that should precede or follow such
stops. To avoid these problems, Arabic speakers learning English need to do a
modification in their L1 correlates of voiced and voiceless stops towards the English
norm (Fokes et al. 1985, Khattab 2000). They need to use a longer VOT value for
initial voiceless plosives and to lengthen the vowel preceding the syllable-final voiced
stops/obstruent. Other perception errors are that the Sudanese listeners confused the
voiceless coda consonants with their voiced counterparts as in /U~\/ and /H~X/ as a
result of the similarity in the place of articulation, whilst the confusion of /P, 0, O/ is
due to nasality. Many types of errors of perception are the result of similarity of the
place and manner of articulation, on both onset and coda level. The absence of some
phonemes like /X, 0, R/ from the Arabic inventory adds to the perception problems of
listeners.
3.4.5 Onset and coda consonant clusters
Figure 3.3 shows means (and standard error) for a group of ten Sudanese listeners in
the perception of English consonant clusters. As the figure shows, in contrast to vowels,
consonant clusters yield fewer errors of perception. Furthermore, the performance of
the listeners for onset clusters is better than for coda clusters; the overall correct scores
being 75 and 71%, respectively.
Listeners misrecognized /FT/ as /IT/ which is more frequent than /FT/ as /MN/ and
these are followed by the misidentification of /UN/ as /UP/. They are also observed to
interchangeably make errors in perceiving /URN/ as /URT/, /MN/ as /IT/ and /URT/ as /RT/
or /UMY/. However, there are no errors in the perception of the initial clusters /IN/,
/RN/ and /UY/. On the other hand, final clusters are more prone to misperception. That
is, the rates of perception errors shown in Figure 3.3 indicate that the most perception
errors are manifest on the coda level; and these are the substitution of /DF/ for /NF/,
/UV/ for /UM/, /P\/ for /O\/ and /P\/ for /F\/. Listeners also made errors in identifying
/NO/, /VU/, /PV/ and /OR/, but fewer errors were observed in recognizing the item /I\/,
whilst /0M/ was correctly recognized. More details are shown in Tables 3.4 and 3.5
below. They provide a clearer picture of the correct and confused consonant clusters.
The correct scores of perception appear on the diagonal line running across the table in
bold face, while the cells scattering around represent the confusion areas.
Figure 3.3 Mean percentage of English onset and coda clusters correctly identified by ten
Sudanese listeners. Error bars are ± 2 Standard Errors.
Table 3.4 Confusion matrix of eight English onset consonant clusters (targets, in the rows)
perceived by ten Sudanese-Arabic listeners (responses, in the columns).
Target Perceived RP consonant clusters

Onset FT IN MN RN UN URN URT UV UY IT MT MY RT UM UMY UO UP
FT 5 4 1
IN 10
MN 5 2 3
RN 10
UN 6 1 3
URN 8 2
URT 3 5 1 1
UV 9 1
UY 10
Table 3.5 Confusion matrix of eight English coda consonant clusters (targets, in the rows)
perceived by ten Sudanese-Arabic listeners (responses, in the columns).
Target Perceived RP consonant clusters

Coda DF I\ NO 0M PV P\ UV VU F< F\ ND NF NM NU NV OR O\ UM \F
DF 4 6
I\ 8 1 1
NO 8 2
0M 10
PV 8 1 1
P\ 6 1 3
UV 6 4
VU 7 2 1
3.4.6 Discussion
The plosive+liquid replacement of /FT/ by /IT/ and the fricative+plosive /UV/ by /UM/
can be accounted for as an alveolar-to-velar shift within the same manner of
articulation. The misperception of /MN/ as /IT/ (velar+liquid) is attributable to the
factor of velarity in the first cluster members and to the manners of articulation in the
second. Generally speaking, these types of perception errors support the linguistic
hypothesis that the perception of L2 sounds is often influenced by the perceptual and
articulatory properties of L1 (Canepari 2005, Cruttenden 2008), where listeners often
resort to the nearest corresponding sound. Moreover, such a type of perception error
where a voiced obstruent precedes the voiced liquid /T/ often takes place due to
phonological alternations in similar consonant clusters – mostly in homorganic
C+liquid sequences. These phonological alternations usually occur when the speech
signal is not detected well due to the lack of experience with voicing leads in
phonetically voiced stops, or due to the absence of appropriate phonetic cues (Seo
2003). Similar interpretations apply to the misperception of the voiceless sibilant+
voiceless stop+liquid clusters /URN/ as /URT/ interchangeably and the misperception of
/URT/ as /RT/ and /UMY/, where substitution errors of the third cluster member /N, T, Y/
took place, respectively. However, this type of error points also to the influence of the
similarity of the manner of articulation shared by the approximants. On the other hand,
the confusion of the coda nasal+fricative clusters P\/ as /O\/ is due to nasality, but the
confusion of nasal+plosive clusters /P\/ for /F\/ is probably due to the influence of
the place of articulation shared by such members. Additionally, listeners follow a repair
strategy in perceiving /DF/ as /NF/. They adopt the nearest speech sound that aids them
to understand a word/message; i.e. listeners transfer their L1 phonotactic constraints
when listening to English. 13 This strategy reflects the prominent role played by the
Sonority Sequence Principle in accounting for phonotactic patterns across languages
(Carr 1999, Clements and Keyser 1988, Gierut 1999, Gierut and Champion 2001). Thus,
13 To achieve perceptible pronunciation or to facilitate perception and production of

the nasal+liquid, obstruent+liquid clusters of homorganic sequences and similar

voiceless sibilant+voiceless plosives are more vulnerable to phonological change than
those in heterorganic sequences.
3.4.7 Sentence (SPIN) test
The SPIN-test (Speech Perception in Noise test) targets word recognition at the
sentence level. It aims to examine the learners’ performance in speech perception by
including the effect of semantic context. In the SPIN test, listeners are exposed to a set
of 25 specific meaningful sentences. Their task is to write down the last word
embedded in each sentence. In this way, the final goal of such types of tests is to
provide a measure of the ability of a listener to understand speech in an everyday
listening situation.
Figure 3.4 provides the means of Sudanese listeners’ performance on the SPIN test.
60
Percent correct responses
40
20
0
ons_cor nuc_cor cod_cor word_cor word_comp
Figure 3.4 Percentage of (parts of) English words correctly recognized by Sudanese-Arabic
listeners (further see text).
Correct perception of complete keywords (‘word_cor’ in Figure 3.4) proved to be very

difficult for listeners; scores are around 30% correct. However, listeners often managed
to recognize some sounds in the words correctly. For instance, correct identification of
sounds in the onset position of syllables (‘ons_cor’) is at 70%, whilst vowels (‘nuc_cor’)
and coda consonants (‘cod_cor’) are around 45% correct. The mean of the component
identification (‘word_comp’) is about 50%. The observation that onsets were perceived
more accurately than the vowels and codas ties in with the more detailed results of the
MRT tests. Together, these results indicate that onsets consonants, whether single or
clusters, were identified more successfully than vowels and codas.
3.4.8 Discussion
The Sudanese listeners had a poor perception in simple and predictable English
sentences that reached 30% correct. However, they had a better performance on single
and cluster consonants and were poor especially on the vowel level. These observations
provide empirical evidence that words and vowels are the most problematic aspects for
the listeners. It is possible to predict that vowel perception would be more of a
challenge for Sudanese-Arabic listeners of English than single and cluster consonants.
This prediction is based on the observation that the learners’ L1 (Arabic) has only five
or six vowels, which makes it difficult for such learners to correctly classify the vowels
in the much richer system of any variety of English (Cruttenden 2008). Moreover,
observations bear out the prediction that the large number of consonant sounds
existing in the listener’s L1 facilitated the perception task (positive transfer); i.e. listeners
are at least more familiar with consonants than vowels.
Table 3.6 presents a correlation matrix for the test results obtained separately for
vowels, single consonants, cluster consonants and SPIN sentences of ten Sudanese
listeners. Within the category of consonants and consonant clusters separate test
components are distinguished for target sounds in onset position, coda position and
averaged over both positions. The correlation coefficient computed is Pearson’s
product moment correlation r, which expresses the strength of the linear relationship
between two sets of scores. The value of r ranges between –1 and +1. If r is positive,
then higher scores on one variable (e.g. test score X) tend to go together with higher
scores on the second variable (test score Y). If r is negative, the relationship between
the two sets of scores is reversed, i.e., higher scores on test X go together with lower
scores on test Y (and vice versa). The absolute size of r expresses the strength of the
relationship. If r = 0 there is no relationship between the two variables at all, when r =
|1| the relationship is perfect, so that the score on Y can be predicted with certainty
from X (and vice versa). The best way to interpret intermediate r-coefficients, is to
square the value of r. So, if test scores X and Y are correlated at r = .7, then test score
X can be predicted from test score Y with an accuracy of 49 percent (r2 = .49) on a
scale between 0 and 100, i.e., between zero correlation and perfect correlation (see e.g.
Woods, Fletcher & Hughes 1986) for more discussion on how to interpret correlation
coefficients).
Table 3.6 Correlation coefficients of scores on vowels, single consonants (in onsets, codas and
both), cluster consonants (in onsets, codas and both) and SPIN sentences. R-values indicate the
linear relation between the listeners’ test scores for any pair of test components.
Single Cons Clusters SPIN sentences

Items Vow
ons coda both ons coda both ons vow coda word
coda .591
both .782 .965
vowels .682 .169 .353
onset clusters .327 .208 .267 .312
coda clusters .391 .057 .172 .164 .020
both clusters .505 .152 .282 .297 .470 .873
ons_SPIN .135 .057 .000 .507 .308 .073 .215
vow_SPIN .209 .288 .154 .435 .343 .227 .033 .710
cod_SPIN .288 .533 .505 .093 .330 .234 367 .639 .597
word_SPIN .194 .584 .514 .214 .381 .070 .124 .567 .700 .899
comp_SPIN .000 .327 .253 .386 .370 .064 .237 .908 .845 .866 .822
Bolded |r| > .6: Correlation is significant at the 0.05 level (2-tailed).
Bolded |r| > .7: Correlation is significant at the 0.01 level (2-tailed).
Normally, one would expect listeners who are good at identifying one type of sound,
for instance vowels, to be also good at recognizing other sounds, such as consonants
and clusters. By extension of the same argument, listeners who are good at identifying
sounds (whether vowels or consonants) should also be good at recognizing words. I
would expect all correlation coefficients in Table 3.6 to be positive. Some positive
correlations, however, are fairly trivial and should not be considered. It is predictable,
for instance, that correct perception of either onset or coda consonants should
correlate strongly with the averaged score of onset and coda consonants, simply
because 50 percent of the average is determined by each of the component scores. I
will not discuss such part-whole relationships in the remainder of this section.
In spite of the above reasoning, a large number of negative correlations is observed in

the table. In all cases but one, these are insignificant and can therefore be considered
indicative of no correlation. The one remaining negative correlation, r = –.682 (p < .05),
is between vowels and onset consonants in MRT words, indicating that poorer
identification of vowels goes together with better results for onset consonants. This
result is unexpected and I find myself unable to provide even an ad-hoc explanation for
it. All other significant correlations are positive and are therefore potentially instructive.
Both positive and significant correlations are found only between variables measured in
words used in the SPIN-test. Observed, first of all, that the recognition of onset
consonants in SPIN words correlates with recognizing the vowels in such words (r
= .710, p < .01). This is what I would expect but, of course, it goes against the
inexplicable negative correlation reported earlier between onset consonants and vowels
in the MRT words. I also find that the recognition of onset and coda consonants are
correlated in the SPIN-words. Interestingly, the chances of recognizing the entire
SPIN-word are better if the listener identifies the coda correctly (r = .899, p < .01) than
when he correctly identifies the vowel (r = .700, p < .01) or the onset consonant (r
= .567, ins.). This observation goes against the general claim that sounds contribute less
to word recognition as they occur later in the word (e.g. Marslen-Wilson & Welsh 1978,
Nooteboom 1981).
I should point out, finally, that it is strange to find no correlation between any of the
individual test components in the MRT word tests and the listeners’ performance on
the contexted SPIN word recognition test.
To sum up, the perception of the listeners in the SPIN materials is very poor at the
sentence level, but it provides feedback about which of the three types of English
phonemes is most problematic for Sudanese listeners. In this connection, the results of
the Sudanese listeners’ correct word identification in the SPIN-test are comparable to
those obtained for Mandarin Chinese listeners exposed to a similar SPIN test (Wang
2007). Similarity of performance between the two groups can be attributed to the fact
that both Chinese and Sudanese listeners speak English as a second/foreign language.
The listeners also come from linguistic backgrounds that are entirely unrelated to
English; Chinese is a Sino-Tibetan language, whilst Arabic is Semitic. In contrast,
Dutch listeners in (Wang 2007) had high word correct percentage, due to more
exposure to English than the non-Germanic groups. Furthermore, phonetically, the
Dutch L1 sounds are closer to the English targets than either those of Arabic or
Mandarin. Predictably, American listeners had the best performance on the SPIN test
simply because they are native speakers of English (Wang 2007).
3.4.9 General conclusions
Vowels proved to be a difficult area of perception for Sudanese listeners of English.

This is most likely because they are unfamiliar with a large number of different types of
vowel sounds present in the English language. Listeners found the perception of the
English diphthongs, central and back vowels the most problematic because such types
of vowels are absent in their L2.
Durational aspects do not show serious effects on the identification of English vowels
because there is some kind of correspondence between the listeners’ L1 (Arabic)
long/short vowel durations and those of the English tense-lax vowels. However, the
confusion within the tense-lax vowel pairs /7, WÖ/ and less frequently /+, KÖ/ indicates
interference of the subjects’ L1 and probably the lack of knowledge of English vowel
sounds.
With regard to the interdependency existing between the perception and production of
speech sounds, differences in the place and manners of articulation between English
and Arabic phonetic systems require that the Sudanese listeners enhance their L1
phoneme inventory to that of L2 to achieve a better performance of English speech.
The perception of the English single and cluster consonants is more difficult in the
coda than in the onset position. The listeners transfer their L1 phonotactic constraints
when listening to English consonant clusters. This mostly occurs with coda consonants
where the listeners fail to distinguish or implement certain phonetic features.
Conclusions drawn above provide cognitive insights that help understanding the nature
and the causes of the speech perception problems that are experienced by Sudanese
listeners of English. Thus, they represent useful guidelines that can contribute to the
learning and teaching of such types of problems in ESL/EFL contexts. One important
guideline is that successful pedagogical implications of speech perception should target
the mastery of the basic principles of English phonology, phonetics and acoustic cues.
Many second/foreign language learners lacking such knowledge have difficulties
treating English speech issues, e.g., recognizing English vowels in different contexts, or
discriminating between quartets such as pit, pat, pot, put, etc. So, there is a need
sometimes for pupil involvement in group work for task-based learning, whereby some
pupils may have roles which require them to listen or speak quite a lot. Moreover, the
listeners’ L1 inventory has a negative effect on the process of the speech intelligibility.
This requires that it should be taken more seriously and more practically during the
learning/teaching tasks of English speech perception and production. The teachers, for
example, need to create an ‘English atmosphere’ in the classroom where more exposure
to native English speech is necessary to reduce the L1 effect.
Chapter Four
Intelligibility of Sudanese English for

Dutch listeners
4.1 Introduction
Learning a second language speech can often be described as a process that depends on
phonological representations where a native source language L1 influences the target
language L2. A negative influence of an L1 with few vowel contrasts may interfere with
attempts on the part of ESL/EFL learners to distinguish between English minimal
pairs like bet/bait, cat/cart, din/den, sin/thin, half/halve, bed/bet, wit/wet, worse/worth, pea/bee,
peer/pair etc. In this task, learners exert an effort in producing the intended speech
sound correctly, although most of them fail. One reason why the learners face
problems such as these is the discrepancy of the perceptual representations of
phonemes exists between L1 and L2. Previous studies revealed that Japanese EFL
learners have perception and production problems with the English /T~N/ contrast in
words like lot vs. rot (Lee 1969). In a more recent related study, Arabic learners of
English were shown to have difficulty distinguishing /&, \, 6, U/ because English
fricatives are softer than their Arabic counterparts (Koeczynski and Mellani 1993).
Linguists are very much concerned with measuring these types of errors, which are
manifest in the performance of the second or foreign language learners. A test that
addresses speech production issues such as these (in words like lake vs. rake) and
accuracy of L2 sounds measures segmental intelligibility. When L2 speech sounds are
recognized correctly by native speakers this constitutes evidence that the L2 production
distinguishes the required categories. However, failure is also useful evidence, which
provides insight that helps to predict the nature and the causes of intelligibility
problems (Flege 1976). This study measures the segmental intelligibility of the speech
sounds produced by Sudanese university EFL learners (native speakers of English are
included in this study but as a control group only). It attempts to account for the extent
to which linguistic elements can impede the intelligibility of these speaker groups when
Dutch listeners of English assess them. The involvement of Dutch listeners of English
as a judgment group was intended to provide additional feedback on the quality of
Sudanese-Arabic accented English in an international context. The mere dichotomy of
native/non-native speaker has proven to be of limited value (Atechi 2006, Smith 1992).
By including non-native Dutch listeners of English, native RP speakers as control
groups and Sudanese EFL learners as the test group, the study is expected to provide
more evidence of intelligibility problems under investigation.
Arguably, Sudanese EFL learners typically make a wide variety of production errors in
their vowels, consonants and clusters of English. Substitutions of English vowels are
observed in words such as pot, put, pat coat, palm, warm, flute, etc. It is assumed that these
types of errors occur because the speakers are not familiar with a large number of
vowels such as those of English. Similar errors are also observed in the performance of
these subjects producing English consonant clusters. For example, a vowel sound is
usually inserted before (prothesis) or between (anaptyxis) the members of English
clusters in words such as flow sprint, special, and so on. Doing this, speakers attempt to
achieve perceptible pronunciation even though consonant clusters are absent from the
Arabic sound system. Differences of phonological representations between English and
the Sudanese learners’ L1 (Arabic) make the issue concerned more difficult.
This study reports the intelligibility of English speech sounds produced by Sudanese
EFL learners as opposed to those of native English speakers, assessed auditorily by
Dutch listeners of English.
4.2 Objective
Objective of the study is to find experimental evidence for the causes of speech
intelligibility problems experienced by Sudanese university speakers of English based
on the assessments of Dutch listeners of English. The data obtained can also help
understand and draw cognitive insights into the nature and causes of pronunciation
problems the learners face.
4.3 Participants
The participants involved in these experiments came from different linguistic

backgrounds. They include Sudanese university learners of English, British speakers of
English and Dutch students preparing for bachelor or master degrees in various fields.
In the following sections, I will provide more background information on the various
speaker or listener groups.
4.3.1 Sudanese speakers (university EFL learners)
These are ten Sudanese University students of English at Gadarif University in Sudan.
The subjects involved in these experiments specialize in English language teaching
(Teaching English as a Foreign Language, TEFL) and have already spent six semesters
of study. During the period of study, which extends for four years, the students attend
three courses in the field of pronunciation: (i) an introduction to phonetics, (ii)
phonology and (iii) practical phonetics delivered in three following semesters, besides
two classes in English listening skills that usually take place at semester one and three.
The Arabic language is the mother-tongue for all the students, whilst English is treated
as a foreign language (not a second language) the learning of which starts at the basic
level in the fifth year and continues at secondary schools for three years. The English
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 75
lessons obtained at such stages vary between 5 and 6 hours per week. At primary and
secondary school the basic principles of English are taught in a traditional way.
One of the ten university students was involved in the perception tests as a speaker of
Sudanese accented English. This speaker was asked to read out a list of English
stimulus items which include vowels, single and cluster consonants, besides SPIN
sentences. The Sudanese model speaker was selected by means of a quality sound test
from among a number of 11 Sudanese speakers of English. The sound quality test was
administered online and candidates of different nationalities were invited to listen to the
test and provide scores to each speaker by clicking on one of the grade options
provided. Assessment of the speakers’ sound quality depended on the computation of
the total mean of the results of each speaker. Finally, the speaker with the individual
mean closed to the grand mean was chosen as the representative subject.
4.3.2 Native speakers of English
The participant herein is the native speaker of English (RP accent) who was involved
earlier in the perception tests as a model speaker of English, as described in chapter
three. As explained in chapter three, this speaker was asked to read out stimulus items
which included vowels, single and cluster consonants of English, as well as SPIN
sentences.
4.3.3 Dutch listeners of English
Participants here included ten Dutch students who were preparing for bachelor and
master degrees in various fields of study at Leiden University. These participants took
part in the perception tests as listeners only (see § 4.3.3.2).
4.3.3.1 Learning problems of English speech sounds
Despite the fact that English and Dutch languages are strongly related languages, Dutch
listeners of English face a variation of learning problems of English vowels and
consonants.
Both English and Dutch have a large number of vowels. Moreover, both English and
Dutch vowels fall into three categories (i) checked vowels, (ii) free steady-state vowels
and (iii) diphthongs. However, there are also differences between the two vowel
systems and the associated phonotactic possibilities. For example, the Dutch vowel
inventory includes a set of combinations of free vowel+glide sequences that does not
exist in English.
As a case in point, Dutch listeners confuse English /3/ and /G/ due to the
circumstance that Dutch has only /'/ in this part of the vowel space, which is
positioned between the two English vowels /3/ and /G/. 14 The major cause of these
perception errors is the influence of L1 vowel inventory (Cutler et al. 2005, Flege 1992).
The inability of Dutch listeners to distinguish between the English vowels /3/ and /G/
in minimal pairs such cattle ~ kettle is described as an impact of pseudo-homophones
that collapses minimal pairs such as the previously mentioned ones. Robust priming
effects were found for Dutch ESL listeners who respond faster to cattle after first
hearing kettle (and vice versa) but not for native English listeners (Cutler et al. 2005).
There are other vowels that may cause learning problems for Dutch learners of English.
According to Flege (1992) English /K/ and /W/ are classified as similar to Dutch /K/
and /W/. He adds that Dutch /K/ is lower than its English counterpart, yet this need not
cause serious learning problems. However, English /W/ appears to cause a learning
problem because some learners substitute it for Dutch /7/ (see also Collins and Mees
1999, Wang and Van Heuven 2007). Additionally, the English central vowel/¡/ is
classified as a new vowel to Dutch learners of English since there is no phonetic
representation for it in the Dutch vowel inventory. So, it is expected to cause learning
problems for Dutch ESL learners. However, for some reason, /¡/is classified as a
similar phoneme that represents no learning problem. Firstly, it is because the vowel
exists in Dutch inventory but it goes unexploited by Dutch. Secondly, it is because the
Dutch vowel /#/ has acoustic values similar to those of American and British /¡/.
These are reasons why Dutch learners of English exert little effort identifying English
/¡/.
Dutch and English consonants are similar in most respects; however, there are also
differences. Some English fricatives form perception and articulation problems for
Dutch learners. There is a problem with the articulation of /&/ and /6/, in that
members of pairs such as /6~U/ and /&~F/ are not clearly distinguished. Most Dutch
speakers of English have learnt some English in primary and secondary school so they
already know the /6~U/ contrast. However, previous studies show that these speakers
have difficulty in distinguishing between the English fricatives /6~U/ (Collins and Mees
1999). This is probably because the dental fricative /6/ is absent from the Dutch
consonant inventory. Yet another learning problem with English consonants is the
substitution of /X/ for English /Y/. This error is most likely due to an orthographical
effect: the sound written as w in Dutch would be a good approximation of English /X/
but it is not used in this way (Collins and Mees 1981).
On the other hand, the perception of English fricatives seems to be less problematic.
Heeren and Schouten (2008) reported that the identification and discrimination of
British-English /6~U/ by Dutch listeners improved after training, which is consistent
with results from earlier training studies. That is, results show that trained listeners
performed better in the post-test than in the pre-test and in several respects they also
did better than the untrained control group. The improvement in their performance
14Arguably, Dutch students with Southern (Limburgian) accents would have difficulty learning
English /G/ due the fact that their L1 has [3] rather than ['] as the realisation of the lax low front
vowel (Smakman, personal communication).
excluded acquired similarity, but acquired distinctiveness was not found exclusively at
the phoneme boundary. Furthermore, control listeners, who received no training, also
improved by simply performing the tests twice in pre-test and post-test due to
experience of the control group in the design of a phoneme training study. Moreover,
Iverson et al. (2008) state that Dutch speakers use the phonetic categories of their L1
perceiving and producing English /X/and /Y/. This means the learners incorporate
English /Y/ with their L1 Dutch /8/ category and English /X/ with their L1 Dutch
/X/ category. This learning strategy suggests the learning problems of these phonemes
can be attributed to perceptual interference. Furthermore, Iverson et al. (2008)
conclude that Dutch speakers are consistently accurate in identifying and producing
English /X/ and /Y/ because of their experience with English and because of their
eagerness to learn new languages.
4.3.3.2 Motivation to test Dutch listeners of English
Intelligibility was evaluated auditorily by Dutch listeners of English due to several

considerations:
Dutch listeners and Sudanese learners of English were matched at important points
such as age and education level. Both groups of subjects were university students
preparing for bachelor degree, and were in a similar age bracket (around 19-25 years
old). These characteristics have important influence on second language learning.
Other conditions related to language proficiency such as phonetic distinctions,
training of L2 speech and everyday exposure to English could affect intelligibility
(see Kluge et al. 2007, Scott 1999). Dutch listeners enjoyed a good command of
English both in read speech and spontaneous speech, a feature which enables non-native
listeners to make relatively effective judgments and fewer understanding errors.
Dutch listeners can be assumed to be unfamiliar with Sudanese-accented Arabic
English. Thus, they can be labelled as naive listeners (Best and Tyler 2007), a
characteristic which is considered an effective determinant of speech intelligibility.
The involvement of Dutch listeners as non-native speakers of English in the
intelligibility assessment along with native listeners of English (the same test will be
done by native British and American listeners of English in a later chapter) was
intended to determine whether English with an unknown accent is a greater
handicap for non-native than for native listeners, even if the non-native listeners’ L1
is rather similar to the target language.
The participants were also selected on the basis of their language background, such
that all of them speak Dutch as their mother-tongue. There are close similarities
between the linguistic systems of Dutch and English, so that Dutch listeners should
be able to understand English better than most other non-native listeners. Inclusion
of Dutch listeners will allow testing whether there is a difference in intelligibility of
Sudanese-accented English between two groups of non-native listeners: (i) Dutch
listeners, who do not share the L1 with the speakers, and (ii) Sudanese listeners,
who share the speakers’ L1.
4.4 Intelligibility tests used

that depends on interaction in an appropriate context involving the apprehension of the
message between the listener and the speaker. It is also possible to refer to speech
speakers of English because the final goal of such speech is understandability. Since
listeners of this study are expected to have an incorrect conception of English speech
sounds, this study will focus on examining vowels, consonants and consonant clusters.
This is because they form the basic sound knowledge of English language and because
the assessment of whether speech is intelligible or not is mainly attributed to segmental
factors; more than 50% of speech intelligibility is accounted for on the basis of speech
sounds (Pascoe 2005). Moreover, pronunciation includes all a learner needs to do to be
intelligible (Fraser 2005).
to yield a highly accurate and reliable measure of intelligibility (Logan, Greene and
Pisoni 1989). Speech intelligibility measures involve word identification tasks in a closed
set of four alternatives from which the listeners are asked to select the one they think
the speaker intended. The score is the number of correctly responded-to items. Test
items normally target phonemes, multi-phonemes or words. Phonemes refer to vowels
and single consonants, whilst multi-phonemes refer to consonant clusters. The formal
assessment of phonemes and multi-phonemes score the responses as either intelligible
or unintelligible; put in figures, a score of (close to) 100% is interpreted as completely
intelligible performance (Lafon 1966). Word intelligibility, on the other hand, was
determined by the recognition of final words embedded in short redundant SPIN
sentences. SPIN is an acronym for the ‘Speech Perception in Noise’ test (Kalikow,
Stevens and Elliott 1977, Wang and van Heuven 2003, Wang 2007). The test asks
listeners to recognise 25 short meaningful and highly predictable everyday sentences
and write down only the final word embedded in each sentence, as in She wore her broken
arm in a sling (target word underlined). This part of the SPIN test proved to be efficient
at assessing speech recognition abilities (Rhebergen and Versfeld 2005). Although the
listeners’ performance is primarily quantified in terms of number of whole words
correctly recognized, partially correct answers are also important since they give
information about the perception of phonemes in onset, nucleus and coda position.
4.5 Test battery
4.5.1 Material and overall structure
The experimental stimuli include four tests. These are (i) a vowel test, which is
words (where C* stands for one to three consonants). Word stimuli in the first three
tests were embedded in a fixed carrier sentence Say…again, which insured a fixed
intonation with a rise-fall accent on the target word. The vowel and the single
consonant tests contained items on each individual vowel or consonant phoneme in the
RP inventory. Moreover, the consonant test targeted all the consonants in onset
position and in coda position. For the cluster test, the number of test items had to be
limited as the total inventory of onset and coda clusters is very large; including all the
clusters would have been too demanding on the subjects. Nine onset and eight coda
clusters were selected that represent problems to Sudanese-Arabic learners of English
(Kaye 1997, Patil 2006). All items in the tests were chosen such that they occurred in
dense lexical neighbourhoods, i.e. there should be many words in English that differ
from the test item only in the target sounds. For instance, the vowel /+/ was tested in
the word pit, since the /p_t/ consonant frame can be filled in by many other vowels, as
in peat, pet, pat, pot, part, port, put, putt and pout. These so-called lexical neighbours,
differing from the target word in only the identity of the test sound, make up the pool
of possible distracters (alternatives) in the construction of the MRT test. When
selecting the three distracters needed for each test items, I preferably selected lexical
neighbours that differ from the target in only one distinctive feature. For the target pit,
alternatives with vowels that differed from /+/ in just one vowel feature were selected,
i.e. pet (differing in height), put (differing in backness) and pot. The latter alternative
differs from the target in both height and backness; this is preferred to the one-feature
difference in peat (or Pete) as it was decided to exclude proper names and low-frequency
alternatives as much as possible. The full set of test items is included in Appendices 4.1-
4.4.
4.5.2 Recordings
The stimulus sentences were typed on paper sheets (one sheet for each test), and then
read by a male Sudanese EFL leaner and native speaker of RP English. Recordings took
place in a sound-treated room. The speaker’s voice was digitally recorded (44.1 KHz, 16
bits) through a high-quality swan-neck Sennheiser HSP4 microphone. The speakers
were instructed to inhale before uttering the next sentence. The target words were
excerpted from their spoken context using the high-resolution digital waveform editor
contained in the Praat speech processing software (Boersma and Weenink 1996). Target
words were cut at zero-crossings to avoid clicks at onset and offset. Target words and
SPIN sentences were then recorded onto Audio CD in seven tracks. The first track
contained two practice trials for the vowel test and was followed by track 2, which
contained the 19 test vowel items. Tracks 3 and 4 contained the practice and test trials
for the single consonant tests and tracks 5 and 6 contained the cluster items. Track 7
comprised the 25 SPIN sentences with no practice items. In the single consonant and
cluster tests, trials targeting onsets preceded the items targeting codas. Other than that,
the order of the trials within each part of the test battery was random. Trials were
separated by a 5-second silent interval. After every tenth trial, a short beep was
recorded, to help the listeners keep track on their answer sheets.
4.5.3 Perception test procedure
listeners. The listeners were given standardized written instructions and received a set
of answer sheets that listed four alternatives for each test item. They were instructed for
In the final test (SPIN), listeners were instructed to write down only the last word of
I will now present the results of the test battery in four sections, one for each test. Each
section will first outline the structural differences between the sounds in the source
language, Sudanese Arabic (SA), and in the target language, RP English. Such com-
parisons may help understand why certain English sounds are difficult for Sudanese
learners and others are not.
4.6 Overall results
4.6.1 Vowels
4.6.1.1 Results
Results in Figure 4.1 include the means of the perception test of English vowels
responded to by ten Dutch listeners. Dutch listeners had low scores in perceiving /3,
#7, G/ produced by Sudanese speakers of English, whilst they totally misidentified the
short vowels /G, n/ and the diphthong /G+/. However, they made few errors in
recognizing the vowels /+Ö, «Ö, 7, WÖ, 7/ and even fewer errors were made in the
perception of /#Ö/ and /C+/. These types of perception errors, which cover all short and
long vowels as well as diphthongs, indicate that English vowels spoken by Sudanese
university students of English are less intelligible to Dutch listeners. Several factors may
cause these perception problems, which will be discussed later. It is noteworthy that
Dutch listeners identify /+, nÖ, n+, +/ with no errors.
On the other hand, means in Figure 4.1 show that Dutch listeners have a higher
identification rate of the English vowels spoken by native speakers of English than that
of Sudanese EFL learners; overall perception rate is 88% against 50% when the
listeners were exposed to Sudanese speakers. More specifically, Figure 4.1 shows that
English vowels /¡, «Ö, n, +Ö, #Ö, G+, C+, +, nÖ, n+/ were perfectly perceived and few errors
were made in the recognition of /7, WÖ, 7, G, #7/, whilst the front short vowel /G/ was
hardly recognized. Interestingly, the correct scores of the listeners at issue are strikingly
different; their perception is low with Sudanese speakers and high with the native
speakers of English. The error patterns of the listeners with the two speaker groups
present interesting parallel cases.
Figure 4.1 Mean percent correct identification of English vowels by ten Dutch listeners. The
vowels were spoken by a Sudanese (‘non-native’) and a native RP speaker of English.
Tables 4.1 and 4.2 are confusion matrices. They provide a numerical account of the
correct scores and the confusions made by Dutch listeners when they heard English
vowels spoken by Sudanese EFL learners and the native English speakers, respectively.
The tables show the correct scores along the diagonal in the tables with the problematic
vowels in the off-diagonal cells. Table 4.1 includes the perception data of vowels
spoken by Sudanese speakers, whilst Table 4.2 includes the data of vowels spoken by
native speakers of English. Missing responses occurred with three stimulus vowels /G+,
¡,G/ (20%, 10%, 10%, respectively). These have been omitted from Table 4.1 to make
it easy to read. In Table 4.1 the vowels /+Ö, 7, WÖ, G, «Ö, ', 3, 7/ form the most
problematic areas, whilst in Table 4.2 /7, WÖ, G, G, 3/ were highly confused vowels. The
tables also show that listeners misidentify the vowel /#7/ as /7/ with both speakers (in
70% of the Sudanese speaker and in just 10% of the native RP token).
Table 4.1. Confusion matrix of 18 English vowels and diphthongs spoken by a Sudanese EFL
speaker (in the rows) and perceived by ten Dutch listeners (in the columns). Correct responses
are on the main diagonal, indicated in bold face. (Confusions t 3 are indicated in grey-shaded
cells). The vowel /7/ should have been presented but was not. Three responses are missing (see
text).
Perceived RP vowels
Target
¡ «Ö #Ö 3 #7 C+ G G G+ + KÖ + n nÖ n+ 7 WÖ 7 7
¡ 9
«Ö 5 4 1
#Ö 9 1
3 4 3 1 2
#7 3 7
C+ 7 2 1
G 0 7 3
G 6 3
G+ 5 0 3
+ 10
KÖ 4 6
+ 10
n 2 0 8
nÖ 10
n+ 10
7 4 6
WÖ 5 5
7 4 6
Table 4.2 Confusion matrix of 18 English vowels and diphthongs spoken by a native speaker of
RP English (in the rows) and perceived by ten Dutch listeners (in the columns). Correct
responses are on the main diagonal, indicated in bold face. (confusions t 3 are indicated in grey-
shaded cells). The vowel /7/ should have been presented but was not.
Perceived RP vowels
Target
¡ «Ö #Ö 3 #7 C+ G G G+ + +Ö + n nÖ n+ 7 WÖ 7 7
¡ 10
«Ö 10
#Ö 10
3 10
#7 9 1
C+ 10
G 3 7
G 2 8
G+ 10
+ 10
+Ö 10
+ 10
n 10
nÖ 10
n+ 10
7 7 3
WÖ 3 7
7 3 7
4.6.1.2 Discussion and conclusions
More perception errors of English vowels were made by Dutch listeners when they
heard Sudanese EFL learners. There were interchangeable substitutions of the English
vowels /7~WÖ, +~+Ö, 3~G/, which may be attributed to the influence of the listeners’ L1
vowel inventory. Collins and Mees (1981) confirmed that the English tense and lax
pairs /WÖ~7/ and /3~G/ are the most difficult vowel sounds for Dutch listeners/
speakers to produce/emulate. Confusions of these English vowel pairs frequently occur
because there are no similar vowel sounds in their L1. Interestingly, Wang (2007)
reported similar results where Dutch listeners repeatedly confuse /+~+Ö, 7~WÖ, 3~G/
when they listen to Chinese speakers of English due to the lack of a clear category
boundary between /3/ and /G/ and because of the differences that exist between the
speakers’ L1 and L2. Interestingly, the Dutch listeners repeated similar perception
errors: /7~WÖ, 3~«, G~«Ö,#7~7/ when responding to the native RP speaker, although,
obviously, the number of perceptual confusions in the latter case was much smaller. It
would appear, therefore, that the difference in intelligibility of the native RP and
Sudanese-Arabic speakers for Dutch listeners is the joint product of an incorrect
representation of the English vowel system both on the part of the Sudanese speakers
and of the Dutch listeners.
The tense vs. lax perception errors such as /+Ö~+, WÖ~7/ are probably caused by the
duration difference between English and Arabic. However, acoustically this claim
seems to be less probable. This is because the long/short vowels of the Sudanese
speakers’ L1 (Arabic) show correspondence to the English tense-lax vowels. Therefore,
it is possible to classify such types of errors as by-products of the incorrect English
vowels produced by Sudanese speakers, which probably resulted from the wrong
realization or implementation of the English vowels. The wrong realization of English
vowels can be attributed to interference of the Sudanese speakers’ L1 (Munro 1993,
Munro, Derwing and Morton 2006). In a related study, Bobda (2000) found that
Sudanese speakers render English vowels /«Ö/ to /« or G/ and /G+/ to /G/ due to their
L1 linguistic background. Actually, the incorrect production of the central and back
English vowels represents frequent types of errors among Arabic speaking groups,
which probably occurs due to the total absence of these types of vowels from Arabic
vowel inventory (see Brett 2004). These findings indicate that Sudanese speakers of
English have difficulty learning central and back English vowels.
Furthermore, in terms of acoustics, learning problems of English vowel pairs or triplets

/+Ö, +/, /WÖ~7/, /3, ¡, G/ and /nÖ, b, #Ö/ have been found to be the result of the closeness
of such vowels in the vowel area. This means the closer the vowels within the vowel
space are to one another, the more vulnerable they are to confusion. These confusions
have been observed to take place only among EFL learners who descend from language
backgrounds with small number of vowels (Cruttenden 2008). This diagnosis seems to
fit the Sudanese case whose L1 contains a small number of vowels. It is also likely that
the misperceptions of English vowels spoken by Sudanese speakers in this study is due
to unfamiliarity of Dutch listeners with the Sudanese accented Arabic English since the
lack of a close familiarity with the speakers’ habits affects intelligibility process (Ball and
Rahilly 1999).
However, comparatively, the findings reveal an advantage for the native speakers of
English who are clearly more intelligible to Dutch listeners than the Sudanese speakers.
The close relationship between English and Dutch vowel inventories, which
correspond to some extent in terms of number vowels phonetic features may partly
explain the difference (Wang and Van Heuven 2004). In addition to a linguistic ad-
vantage, however, Dutch listeners have had more exposure to English speech in every
day life, which represents a kind of systematic practice of English. This circumstance
enables the listeners to overcome many of the learning difficulties that might be
experienced by non-native speakers of English lacking exposure to English.
4.6.2 Consonants
4.6.2.1 Results
Figure 4.2 presents the correct identification of English consonants in a perception test
done by ten Dutch listeners.
Figure 4.2 Correctly identified English consonants in a perception test done by ten Dutch
listeners. The results are shown separately for consonants produced by the Sudanese-accented
and the native RP speaker.
Generally, listeners’ performance on the consonants is better than on the vowels; the
mean vowel intelligibility of the Sudanese EFL tokens and that of the native speaker of
English is 50 and 88%, respectively. For the consonants, correct identification scores
are 78% and 81% for consonants spoken by Sudanese and 100% and 99% for
consonants spoken by native speakers of English, in onset and coda positions,
respectively. Dutch listeners, therefore, made more perception errors when they
responded to English consonants spoken by Sudanese speakers. In onset position,
frequent substitution errors were obtained in consonant pairs /F~V/, /&~\/, /6~U/,
/I~M/ and /P~N/. Fewer errors were made in the perception of /F, D, H, X, Y, V5, P/,
where /F/ was misperceived as /R/, /D/ as /X/ or /H/ interchangeably, /H/ as /X/ or
/Y/, /R/ was misperceived as /V5/ and /P/ as /N/. However, the listeners performed
better on coda consonants. The most frequent error patterns for codas are the
substitution of the obstruent pairs /&~\/, /M~I/, /6~U/, /V~F/, /U~\/ and /0~P/.
Although the error rates are low, they are systematic and revealing: listeners often
repeated the same types of perception errors in both onset and coda positions,
particularly with Sudanese speakers. On other the hand, Figure 4.2 shows that Dutch
listeners had nearly perfect perception of the English consonants spoken by native
speakers, particularly for onset consonants. Only 10 percent of the perception errors
were made where /&/ was replaced by /F/. However, the listeners made more errors in
coda consonants. The nasal /P/ was misidentified as /0/ and /O/, /&/ was replaced by
/V/, /M/ by /I/ and less frequently /\/ was replaced by /U/. These results indicate that
Dutch listeners found the native speakers of English more intelligible than Sudanese
speakers.
Tables 4.3-4-5-6 present a numerical account of the confusion structure in the per-
ception of the English consonants. Tables 4.3 and 4.4 show the correct identification of
the English consonants of ten Dutch listeners read by Sudanese speakers in both onset
and coda positions. Tables 4.5 and 4.6 display the percentage of the same listeners in
the same perception test but with the items spoken by the native speakers of English.
The correct identification appears along the diagonal running across the table, while the
incorrect scores are in the off-diagonal cells. An interesting finding is that listeners
made more perception errors with coda consonants, irrespective of the native versus
non-native background of the speaker. The tables also show that, irrespective of the
speaker, Dutch listeners found the English onset consonants spoken by the Sudanese
group more difficult than coda consonants.
Table 4.3 Confusion matrix of English onset consonants spoken by a Sudanese EFL speaker (in
the rows) and perceived by ten Dutch listeners (in the columns). Correct responses are on the
main diagonal, indicated in bold face. Confusions appear in off-diagonal cells (confusions t 3 are
indicated in shaded cells).
Target

D F & F< H I J M N O P R T U 5 V 6 V5 X Y \
D 9 1
F 0 1 9
& 3 7
F< 10
H 1 9
I 5 4 1
J 10
M 10
N 10
O 10
P 4 6
R 8 2
T 10
U 10
5 10
V 10
6 7 3
V5 10
X 2 1 5 2
Y 10
\ 10
Table 4.4 Confusion matrix of English coda consonants spoken by Sudanese EFL learners (in the
rows) perceived by ten Dutch listeners (in the columns). Correct responses are on the main
diagonal, indicated in bold face. Confusions appear in off-diagonal cells (confusions t 3 are
indicated in shaded cells).
Target
D F & F< H I M N O P 0 R U 5 V 6 V5 X \
D 9 1
F 10
& 5 5
F< 10
H 10
I 8 2
M 2 3 4 1
N 1 9
O 1 9
P 10
0 1 9
R 2 8
U 10
5 10
V 3 7
6 6 3 1
V5 10
X 10
\ 5 5
Table 4.5 Confusion matrix of English onset consonants spoken by a native speaker of RP
English (in the rows) and perceived by ten Dutch listeners (in the columns). Correct responses
are on the main diagonal, indicated in bold face. Confusions appear in off-diagonal cells.
Target
D F & F< H I J M N O P R T U 5 6 X Y \
D 10
F 10
& 1 9
F< 10
H 10
I 10
J 10
M 10
N 10
O 10
P 10
R 10
T 10
U 10
5 10
6 10
X 10
Y 10
\ 10
Table 4.6 Confusion matrix of English coda consonants spoken by a native speaker of RP
are on the main diagonal, indicated in bold face. Confusions appear in off-diagonal cells
(confusions t 3 are indicated in shaded cells).
Target
D F & F< H I M N O P 0 R U 5 V 6 V5 X \
D 10
F 10
& 8 2
F< 9 1
H 10
I 10
M 1 9
N 1 9
O 9 1
P 3 7
0 10
R 1 9
U 10
5 10
V 10
6 8 2
V5 10
X 10
\ 1 9
The replacement errors /V~F, M~I, H~X, P~N, \~U/ indicate similarity in the place of
articulation between such tokens. The errors might be caused by the unfamiliarity of
Dutch listeners with Sudanese English. It is also possible to refer these types of English
consonant perception errors to different voicing contrasts utilized in their production-
Arabic consonant inventory; i.e. absence of energy required for English voiceless
consonants. The latter reasoning applies particularly to the misperceptions of the
English /V/ as /F/, /&/ as /\/ and /6/ as /U/, where the speakers’ L1 (Arabic) transfer
acts as a barrier that blocks the acquisition of the L2 consonants and passes only Arabic
speech sounds, i.e. the L1 filter effect. Previous studies revealed that many Arabic
speakers of English have difficulty producing /6, &, U, \/ due to L1 interference (Altaha
1995, Rababah 2003, do Val Barros 2003). A good example of L1 interference, has
been observed in a recent study which revealed that the boundaries between fricative
and dental fricative pairs /V~F, &~\, 6~U/, in Sudanese colloquial Arabic, have almost
become blurred (Dickins 2007). This is most probably the reason why Dutch listeners
substitute these fricatives. Interestingly, this conclusion accounts for the repetition of
the same error patterns made by Dutch listeners for the onset and coda consonants that
were produced by the Sudanese speakers. More interestingly, these error patterns did
not occur at all when the English consonants were produced by the native speaker of
English.
The misperception of /H/ as /Y/ can be explained in orthographic terms assuming that
Dutch /Y/ is treated as /X/ because of the absence of an energy contrast – not
consistent or totally absent – in Dutch /H/ or /X/. Phonologically, the interchangeable
substitutions of /H/ for /X/ are seen as the result of an L1 filter in the Dutch listeners’
perceptual and productive sound inventory. This is because the contrast between these
bilabials is not a matter of a fortis/lenis (voiced~voiceless) property (Collins and Mees
1981). The misidentification of English /P/ as /O/ which were pronounced by native
speakers of English in word-final position, can be attributed to the wrong realization of
these phonemes by Dutch listeners. The reason for this is that Dutch and English
nasals have different phonetic characteristics in word-final position, i.e., Dutch nasals
occurring in word-final retain their voicing feature but English final nasals do not or
only partially (see Tucker and Warner 2010). Interestingly, these perception errors do
not occur when the same English target sounds were read by Sudanese speakers –
which may be due to an interlanguage benefit between Dutch listeners and Sudanese
speakers.
Linguistically, English and Dutch show contrasts in phonological representations of the

same phonemes, which motivate the assumption that Dutch listeners perceive English
nasals with their L1 categories. The misidentification of RP /&/ as /F/ or /V/ is due to
interlanguage effect (using a language system which is neither the L1 nor the L2). This
is because as the Dutch consonant system suggests that /&/ has the status of a plosive
in the listeners’ interlanguage (Gussenhoven and Broeders 1976). In conclusion, the
performance of the Dutch listeners on consonants is better than on vowels with both
Sudanese and native speakers of English. This suggests that English consonants are
more intelligible to Dutch listeners than vowels.
4.6.3 Consonant clusters
4.6.3.1 Results
Figures 4.3 and 4.4 present the correctly identified English consonant clusters spoken
by Sudanese and native speakers of English in a perception test done by ten Dutch
listeners.
Figure 4.3 Correctly identified English onset consonant clusters by ten Dutch listeners spoken by
a Sudanese and a native RP speaker of English.
As the results in Figures 4.3 and 4.4 show, Dutch listeners achieved better performance
on English cluster consonants than on vowels and single consonants. Their perform-
ance was even better when they were exposed to the native speakers of English than to
the Sudanese speakers. The overall means of the vowels are 50% and 88%, onset and
coda consonants 78% and 81% against 99% and 99% and onset and coda cluster
consonants are 86% and 81% against 100% and 91% for Sudanese and native speakers
of English, respectively. Onset clusters spoken by the native speaker were perfectly
identified by the listeners except an incidental error rate of 10% made in the perception
of /URT/. However, more substitution errors /FT/ for /DT/ and /IT/, /IN/ for /MN/, /RN/
for /HN/, /UN/ for /UP/, and /URN/ for /UMY/ were made by the listeners when they heard
the same onset clusters spoken by Sudanese speakers. These findings indicate that the
onset consonant clusters read by the native speakers of English are more intelligible to
Dutch listeners than those of the Sudanese speakers are. They also show that the
misperception within the cluster pairs /MN~IN/, /UN~UP/ and /RN~HN/ is revealing and
more systematic than /FT~DT, IT/ on the one hand, and /URN~UMY/ on the other.
Figure 4.4 Correctly identified English coda consonant clusters by ten Dutch listeners spoken by
a Sudanese and a native speaker of RP English.
Additionally, Tables 4.7-8-9-10 present confusion matrices the cluster consonants in

both onset and coda positions. Tables 4.7 and 4.8 show the identification of English
cluster consonants of ten Dutch listeners spoken by Sudanese speakers in onset and
coda positions, respectively, while Tables 4.9 and 4.10 display the responses of the
same listeners for the control items spoken by the native speaker of English. The
correct identification appears in the cells along the main diagonal in each table; errors
(confusions) are in the off-diagonal cells.
In Tables 4.7 and 4.9 there are relatively few errors in the perception of the onset
clusters, whether spoken by both Sudanese or the native speaker of English. Just one
single misperception occurred with the native speaker: /URT/ was perceived as /URN/.
However, a few more errors were made by Dutch listeners when the English clusters
were produced by the Sudanese speakers, as Table 4.7 shows. These clusters often
contain /I/ as the first element of either the stimulus or the response.
On the other hand, Tables 4.8 and 4.10 show that Dutch listeners made more
perception errors in the perception of the coda clusters produced by Sudanese EFL
learners and native speakers of English. The data also show similar patterns of
perception errors for coda clusters, irrespective of the speaker type. The listeners
substituted /P\/ for /O\/ and less frequently /DF/ for /NV/ or /NF/. The listeners were
also observed to mistake /0M/ for /PF/, /UV/ for /UM/, /PV/ for /OR/ and /P\/ for /VU/
or /F</. However, the error rates of Dutch listeners in the cluster items spoken by
Sudanese EFL learners are high, particularly for coda clusters. These findings reveal
that there is a positive relation between the listeners’ performance in single and cluster
consonants rather than between the clusters and vowels (see further § 4.6.5).
Table 4.7 Confusion matrix of English onset consonant clusters spoken by a Sudanese speaker of
(confusions t 3 are indicated in shaded cells)..
Perceived RP consonant clusters

Target
FT IN MN RN UN URN URT UY DN DT HN IT UMY UP
FT 8 1 1
IN 7 3
MN 3 7
RN 9 1
UN 9 1
URN 9 1
URT 10
UY 10
Table 4.8 Confusion matrix of English coda consonant clusters spoken by a Sudanese speaker of
(confusions t 3 are indicated in shaded cells).

Target
DF I\ NO 0M PV P\ UV VU NF F6 NV OR O\ PF UM
DF 1 6 2 1
I\ 10
NO 10
0M 7 3
PV 9 1
P\ 8 1 1
UV 7 3
VU 8 2
Table 4.9 Confusion matrix of English onset consonant clusters spoken by a native speaker of
responses are on the main diagonal, indicated in bold face.

Target
FT IN MN RN UN URN URT UY
FT 10
IN 10
MN 10
RN 10
UN 10
URN 10
URT 1 9
UY 10
Table 4.10 Confusion matrix of English coda consonant clusters spoken by a native speaker of
responses are on the main diagonal, indicated in bold face. Confusions appear in off-diagonal
cells (confusions t 3 are indicated in shaded cells).

Target
DF I\ NO 0M PV P\ UV VU HU NV O\ NF
DF 6 1 3
I\ 10
NO 10
0M 10
PV 10
P\ 6 4
UV 10
VU 9 1
4.6.3.2 Discussion
Errors made by Dutch listeners in the perception of the velar /MN~IN/ and alveolar /UN
~UP/ initial cluster members spoken by Sudanese subjects probably occur due to
insufficience or absence of energy required for voiceless sounds. Moreover, the error
pattern /FT~DT/ and /URT~URN/ can be attributed to the similarity in the manner of
articulation, or to voicing, whilst the /RN~HN/ misperception can be seen as being due to
labiality shared by the first cluster members. Additionally, the listeners’ errors in the
initial member of coda clusters /P\~O\/ are most likely caused by nasality. Dutch
listeners were observed to repeat similar types of errors with both Sudanese and native
speakers, which indicates that these error patterns have to do with the fact that the
listeners are Dutch. Despite the fact that English and Dutch different phonological
systems, they are closely related to one another. Consequently the Dutch listeners
showed better performance on English clusters produced by native speakers than on
their counterparts produced by Sudanese speakers. Wang (2007) reported similar
conclusions and argued that that Dutch listeners achieved better performance on the
English clusters produced by Americans due to the rather close linguistic similarity that
exists between the Dutch listeners’ L1 and the L2.
4.6.4 Results and discussion of Speech Perception in Noise test (SPIN)
Figure 4.5 presents the scores of ten Dutch listeners obtained on the SPIN test, the
items of which were read by both Sudanese (left-hand part of figure) and native
speakers of English (right-hand part).
Figure 4.5 Mean percentage of correctly recognised words (CW) by ten Dutch listeners obtained
in a SPIN test. Also, percentages of correctly identified word components (onset, vocalic nucleus,
coda) are indicated. Items were read by one Sudanese EFL learner (left-hand part of figure) and
one native speaker of English (right-hand part).
As Figure 4.5 shows, Dutch listeners had a poor perception of keywords in simple and
predictable English sentences that reached 27% when the sentences were spoken by the
Sudanese speaker. However, the listeners had a better performance of 70% on the same
materials read by the native speaker. Similarly, they had lower scores on onset, nucleus
and coda positions in the SPIN items read by the Sudanese speaker (68%, 51% and
42%, respectively) against higher scores (96%, 91% and 76%) when the same SPIN
items were read by the native speaker. These results indicate that the SPIN sentences of
native speakers are more intelligible to Dutch listeners than the sentences of Sudanese
speakers. The results also reveal that Dutch listeners of English managed to recognize
many sounds in the words correctly even if the failed to recognize a keyword in its
entirety. The onsets were perceived more accurately than the vowels and the codas,
which observation ties in with the results of the MRT tests.
Moreover, the findings indicate that onsets consonants, whether single or in clusters,
were identified more successfully than vowels and codas. This implies that the listeners’
performance is always better when they hear native speakers.
4.6.5 Correlations
Tables 4.11-12 present the correlations between the four parts of this study. These
parts include vowels, single and cluster consonants and words as whole units of English.
The tables show how the perception of English vowels and consonants is correlated in
the MRT items and with the their counterparts (segments, clusters and whole words) in
the SPIN sentences.
Table 4.11 Correlation matrix of dependent variables (identification scores) for materials
produced by a Sudanese-Arabic EFL speaker.
consonants
word_nuc
cons._ons
word_ons
clust_cod
cons_cod
clust_ons
word_all
clusters
vowels
consonants .154
cons_onset .336 .944
cons_coda –.329 .688
clusters -.489 –.329 –.498 .188
clust_onset –.630 –.385 –.551 .153 .951
clust_coda –.266 –.224 –.375 .205 .933 .777
word_all .359 –.060 –.006 –.179 –.297 –.270 –.292
word_onset .818 .309 .485 –.232 –.623 –.699 –.456 .569
word_nucleus .461 .217 .269 –.014 –.144 –.322 .080 .659 .604
word_coda .138 –.274 –.274 –.165 .239 .315 .122 .285 .263 –.137
Bolded r > .8: p d 0.01 (2-tailed).
Bolded r < .7: p d 0.05 (2-tailed).
Table 4.12 Correlation matrix of dependent variables (identification scores) for materials
produced by a native speaker of RP English.
consonants
word_nuc
word_ons
clust_cod
cons_cod
cons_ons
clust_ons
word_all
clusters
vowels
consonants –.186
cons_onset .171 –.021
cons_coda –.211 .982 –.207
clusters .278 –.078 –.444 .000
clust_onset a a a a a
clust_coda .278 –.078 –.444 .000 1.000 a
word_all –.454 –.186 –.032 –.176 –.284 a –.284
word_onset –.046 .003 .035 .000 –.035 a –.035 .379
word_nucleus –.420 –.175 –.212 –.131 –.230 a –.230 .945 .292
word_coda –.446 –.215 .014 –.216 –.087 a –.087 .939 .410 .811
aNo r can be computed because at least one of the variables is constant (perfect scores only).
Bolded r > .8: p d 0.01 (2-tailed).
The computation of the correlation coefficient of vowels, single and cluster consonants
and SPIN sentences provides statistical support. A positive relationship exists between
correct identification of nuclei and words, r = .659 (ins.) and .945 (p < .01) produced by
Sudanese and native speakers, respectively. This relationship reveals that listeners
usually get a correct word score whenever a nucleus vowel is correctly recognized,
which in turn indicates that the perception of vowels is a decisive factor of word
predictability. However, there is a negative relationship between the coda consonants
and word codas spoken by both Sudanese and native speakers: r = –.165 and –.216,
respectively. This relation indicates that when Dutch listeners make perception errors
on consonant codas they also tend to make perception errors on word codas with both
speakers. It suggests single and cluster coda consonants are more difficult to perceive
than their onset counterparts. A weak positive correlation exists between vowels and
onset consonants: r = .336 and .171 for Sudanese and native speakers of English,
respectively. It gives rise to the prediction that the correct identification of English
vowels assumes correct identification of English onset consonants. However, these
weak positive or negative relations imply a rather unstable performance on the part of
the Dutch listeners when they are exposed to Sudanese speakers. This can be
interpreted as a by-product of incorrect English pronunciation of Sudanese speakers.
That is, incorrect pronunciation of some CVC stimuli changes their meaning, which
influences their predictability. Data of a similar test (Wang 2007) supports the claim
that correct identification of Dutch listeners responding to Chinese EFL speakers is
poorer (32%) than when responding to native speakers of English (67%). Dutch
listeners had a high word correct percentage due to more exposure to English than the
either the Sudanese or Chinese listener groups. Moreover, linguistically their L1 norm is
much closer to English than that of the Sudanese and Chinese listeners. The Chinese
and Sudanese-Arabic L1 linguistic systems are entirely unrelated to English, the former
being a Sino-Tibetan and the latter a Semitic language. Previous studies, which
measured the perceptual similarity between languages on the basis of their overall
sound structure, found that the mean distance of Dutch from English is 3.7 and that
the proximity of Dutch to English is based on known genetic and structural similarities.
According to the same study Arabic is 12.5 distance units away from English, which
labels it as the farthest language from English and Dutch compared to other languages
(Bradlow, Clopper and Smiljanic 2007). In conclusion, the findings reveal that the
perception of vowels and coda consonants are more difficult for Dutch listeners than
single and cluster consonants.
4.6.6 Conclusions
Dutch listeners made more perception errors on English central and back vowels read
by Sudanese speakers than with those of the native speakers, probably due to incorrect
English source vowels. These type of vowels are absent from the Sudanese speakers’ L1
(Arabic) vowel inventory.
Similar perception errors were experienced in perceiving English onset and coda
consonants produced by Sudanese speakers. The English fricatives /&, \, 6,U/ proved to
be problematic for Dutch listeners. These types of perception errors can be interpreted
as a by-product of partial learning or insufficient practice.
Generally, fewer errors were made on the level of consonant clusters. Moreover, the
listeners made even fewer perception errors of English single and clustered consonants
when the material was read by a native speaker of English.
Dutch listeners found native speakers of English more intelligible than Sudanese
speakers because English and Dutch are closely related languages with rather similar
sound systems. Secondly, Dutch listeners have regular exposure to target language,
which facilitates learning of English. Thirdly, they are not familiar with Sudanese-
accented English.
Sudanese-Arabic speakers experience more difficulties in producing English vowels

than single and clustered consonants because they are not familiar with rich vowels
systems.
Chapter Five
Intelligibility of Sudanese English to

British and American listeners
5.1 Introduction
Researchers need to test in greater detail the ways in which non-native speech of
English varies from that of the native speakers and to determine the extent to which
such variation can impede intelligibility between the speech interlocutors. A task such
as this requires looking at the phonetic and phonological difference between L1 and L2
to find out which segmental variations are possible and how they can impede or
enhance speech intelligibility. This is often necessary since phonemic variation between
languages has negative effects on the learning of L2 speech; i.e. many studies of non-
native speech indicated the potential for reduced comprehension, particularly when
actual practice of the second/foreign language is infrequent. According to Jenkins
(2000), (incorrect) habit formation is one of the major factors responsible for
intelligibility problems where the muscular habits that are always operated to produce
the L1 speech sounds, are automatically activated in L2 production. This process
requires non-native speakers to pay more attention to produce accurate speech.
However, as soon as these speakers release control to focus on the content of the
message, they produce erroneous pronunciation. This situation continues until
sufficient practice leads to the mastery of L2 sounds that are phonetically different
from those of the L1. However, incorrect speech habits are not the underlying cause of
the pronunciation problems in foreign-accented speech. The incorrect production of
L2 speech sounds occurs due to categorical differences between L1 and L2, where non-
native speakers use incorrect perceptual representations (normally L1 sounds) for the
production of L2 sounds (Flege 1976). Many L2 speakers of English fail to distinguish
between phonemic and allophonic sounds of English, or they often conflate or confuse
certain speech sounds as result of differences between L1 and L2. For example, Arabic
speakers of English conflate /D/ and /R/, because the latter has no phonological
representation in Arabic (Cruttenden 2008, Flege 1976). Similar problems occur among
Russian speakers, who confuse clear /N/ as in leaf, black and lose and dark / /as in pool,
full and milk, which form contrastive phonemes in Russian, but are allophones in
English.
5.2 Objective
This study attempts to investigate segmental intelligibility problems that Sudanese-

Arabic EFL learners face. It reports an experimental analysis of the English speech
sounds including vowels, consonants and clusters to test how intelligible Sudanese EFL
learners are to British and American listeners. The experimental work uses the Modified
Rhyme Test (MRT) as well as the SPIN test as before (see chapters three and four) but
in the present chapter the test items were presented to native listeners of English.
The results of this chapter will provide useful feedback on how well the Sudanese-
Arabic EFL learners (at the university level) are understood by listeners in the target
population, i.e. by native listeners of English. In doing so, this study attempts to
account for issues such as what English speech sounds are problematic and what
linguistic elements cause of such problems. Therefore, the study provides cognitive
insights into the nature and the causes of error patterns detected in the investigated area.
5.3 Method
5.3.1 Intelligibility tests used

that depends on interaction in an appropriate context involving the comprehension of
the message between the listener and the speaker. It is also possible to refer to speech
speakers of English. Since the non-native listeners in this study are expected to have an
incorrect conception of English speech sounds, the focus will be on examining vowels,
consonants and consonant clusters. Priority is given to segmental properties because,
firstly, because the vowels and consonants form the basic sounds of the English
language, the mastery of which is required for perfect learning of speech. Secondly, the
assessment of whether speech is intelligible or not is attributed to segmental factors,
since more than 50% of speech intelligibility is accounted for on the basis of speech
sounds (Pascoe 2005, Fraser 2005).
to be a highly accurate and reliable measure of intelligibility (Logan, Greene and Pisoni
1989) at the phoneme level. Speech intelligibility measures involve word identification
tasks in closed sets of four alternatives, where the listeners are asked to select the
response they think the speaker intended. The score is the number of correctly
responded-to items. Test items normally target phonemes, multi-phonemes, or words.
Phonemes refer to vowels and single consonants, whilst multi-phonemes refer to
consonant clusters. Phoneme and multi-phoneme responses are scored as either
intelligible or unintelligible. A score of (close to) 100% is interpreted as completely
intelligible performance (Lafon 1966).
Word intelligibility, on the other hand, was established by having listeners recognise 25
keywords, each one embedded in final position in a short everyday sentence taken from
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 103
the SPIN test (SPIN is an abbreviation of ‘Speech Perception in Noise’ (Kalikow,

Stevens and Elliott 1977, Wang and van Heuven 2003, Wang 2007). An example of a
SPIN-item would be She wore her broken arm in a sling (keyword underlined). Listeners
write down the final word that they think they heard in each sentence. This part of the
SPIN test proved to be efficient at assessing speech recognition abilities (Rhebergen
and Versfeld 2005). Although the listeners’ performance is primarily quantified in terms
of number of whole words correctly recognized, partially correct answers are also
important since they give information about the perception of specific phonemes in
onset, nucleus and coda position.
5.3.2 Participants
5.3.2.1 Sudanese speakers of English
The study participants were ten Sudanese university students in the Department of
English at Gadarif University in the Sudan. The learners involved in these experiments
specialized in English language teaching (TEFL). They had studied for six semesters
when they participated in the test. During the period of study, which extends over four
years, students attended three courses in the field of pronunciation; these are (i) an
introduction to phonetics, (ii) phonology and (iii) practical phonetics, delivered in three
subsequent semesters. They also attended two classes on English listening skills, which
usually took place in semesters one and three. English is treated as a foreign language
(not a second language), the learning of which starts in the fifth year of primary school
and continues at secondary schools for three years. English lessons obtained during
these stages vary between 5 and 6 hours per week; English is treated as a school subject
that provides basic principles of the language in a traditional way of language teaching.
5.3.2.2 Selection of a representative Sudanese EFL speaker
A Sudanese model speaker was selected by means of a quality sound test from among a
number of 11 Sudanese speakers of English. The quality sound test was administered
through the internet. Candidates of different nationalities were invited to listen to the
recordings of the 11 speakers and then assess the sound quality of the speakers by
clicking on one of the grade options provided. Assessment of the speakers’ sound
quality depended on the computation of the total mean of the results of each speaker in
the test wherein the speaker with the mean judgment score closest to the grand mean
was chosen as the representative learner.
5.3.2.3 Native speaker of English
In the control part of the study a single male native speaker of English (RP accent) was
used as a model speaker of English. He was asked to read out stimulus items which
include vowels, single and cluster consonants of English, as well as the SPIN sentences.
5.3.2.4 Native listeners of English: British and American listeners
The group of native English listeners comprised ten British and ten American speakers
of English preparing for BA or MA degrees in various academic disciplines at Leiden
University. Listeners were recruited by means of online or poster invitation. The
subjects were asked to fill in short questionnaire before they started answering the
perception test. In the questionnaires, they provided information about their
nationalities as either British or American speakers of English and their linguistic
backgrounds. Moreover, the listeners did not speak Arabic, which represents the first
language of the Sudanese speakers involved in the experiments. Moreover, all
respondents declared that they were unfamiliar with English spoken with a Sudanese-
Arabic accent. All respondents used their first language on a daily basis, within their
expatriate communities. Some subjects, friends or family of some of the Leiden-based
students, did the experiments online in their home country.
5.4 Overall structure of the test battery
The experimental stimuli included four tests. These were (i) a vowel test, which was
words (where C* stands for one to three consonants). The fourth test comprised 25
sentences taken from the high-predictability set included in the SPIN (Speech
Perception in Noise) test (Kalikow, Stevens and Elliott 1977, also see above). Word
stimuli in the first three tests were embedded in a fixed carrier sentence [Say…again],
which insured a fixed intonation with a rise-fall accent on the target word. The vowel
and the single consonant tests contained items on each individual vowel or consonant
phoneme in the RP inventory. 15 Moreover, the consonant test targeted all the
consonants in onset position and in coda position. For the cluster test, the number of
test items had to be limited as the total inventory of onset and coda clusters is very
large; including all the clusters would have been too demanding on the listeners. Nine
onsets and eight coda clusters were selected that represent problems to Sudanese-
Arabic learners of English (Allen 1997, Patil 2006). All items in the tests were chosen
such that they occurred in dense lexical neighbourhoods, i.e. there should be many
words in English that differ from the test item only in the target sounds. For instance,
the vowel /+/ was tested in the word pit, since the /p_t/ consonant frame can be filled
in by many other vowels, as in peat, pet, pat, pot, part, port, put, putt and pout. These so-
called lexical neighbours, differing from the target word in only the identity of the test
sound, make up the pool of possible distracters (alternatives) in the construction of the
MRT test. When selecting the three distracters needed for each of the test items, lexical
neighbours that differ from the target in only one distinctive feature, were preferably
15 Inadvertently, the vowel test did not include an item targeting the vowel /7/ as in boat.
selected. For the target pit, we selected alternatives with vowels that differed from /+/ in
just one vowel feature, i.e. pet (differing in height), put (differing in backness) and pot.
The latter alternative differs from the target in both height and backness; we preferred
this to the one-feature difference in peat (or Pete) as we decided to exclude proper names
and low-frequency alternatives as much as possible. The full set of test items is included
in Appendix 4.2.
5.4.1 Tests materials
The stimulus sentences were typed on sheets of paper (one sheet for each test), and
then read by 11 male Sudanese EFL learners (see above) and one native speaker of RP
English. One representative Sudanese speaker was selected from the larger group of 11
by means of a quality sound test (see § 5.3.2.1). The native speaker of English was a
British male candidate who was selected as a model speaker of RP English (see §
3.2.2.2). Recordings took place in a sound-treated room. The speaker’s voice was
digitally recorded (44.1 KHz, 16 bits) through a high-quality swan-neck Sennheiser
HSP4 microphone. The speakers were instructed to inhale before uttering the next
sentence so that clear recording is achieved. The target words were excerpted from
their spoken context using the high-resolution digital waveform editor included in the
Praat speech processing software (Boersma and Weenink 1996). Target words were cut
at zero-crossings to avoid clicks at onset and offset. Target words and SPIN sentences
were then recorded onto Audio CD in seven tracks. The first track contained two
practice trials for the vowel test and was followed by track 2, which contained the 19
vowel test items. Tracks 3 and 4 contained the practice and test trials for the single
consonant tests and tracks 5 and 6 contained the cluster items. Track 7 comprised the
25 SPIN sentences with no practice items. In the single consonant and cluster tests,
trials targeting onsets preceded the items targeting codas. Other than that, the order of
the trials within each part of the test battery was random. Trials were separated by a 5-
second silent interval. After every tenth trial a short beep was recorded, to help the
listeners keep track on their answer sheets.
5.4.2 Test procedure
listeners. Subjects were given standardized written instructions and received a set of
answer sheets that listed four alternatives for each test item. They were instructed for
In the final test (SPIN), subjects were instructed to write down only the last word of
5.5 Overall results
5.5.1 Vowels
This section will present the results of the test battery in four sections, one for each test.
Each section will first outline the structural differences between the sounds in the
source language, Sudanese Arabic (SA) and in the target language, RP English. Such
comparisons may help understand why certain English sounds are difficult for
Sudanese learners and others are not.
5.5.1.1 Results
Figure 5.1 presents the total mean correct identification scores obtained by the two
groups of native listeners of English, i.e. ten British and ten American listeners, on the
vowel part of the MRT tests.
Correct vowel identification (%)
Figure 5.1 Correct responses (%) to English vowel tokens of ten British and ten American
listeners. The error bars include ±2 Standard Errors of the mean. The vowels were produced by
one Sudanese and one native speaker of British English.
As Figure 5.1 shows, vowel identification scores for the native listeners (British and
American) are higher when they were exposed to English vowel tokens produced by
the native speaker but lower when the same vowel tokens were read by the designated
Sudanese speaker. Overall mean correct for the British listeners is 67% and 93% against
65% and 91% for American listeners in the vowel tokens of English, respectively. A
repeated measures analysis of variance (RM-ANOVA) with native language of the
speaker (native, foreign) as a within-subject factor and nationality of the listener (British,
American) as a between-subjects factor shows that only the effect of speaker type is
significant, F(1, 18) = 152.3 (p < .001). The effect of listener and the listener × speaker
interaction are insignificant, F(1, 18) < 1 for both main effect and interaction.
The confusion matrices in Tables 5.1-2 present details about the listeners’ performance
on the vowel identification task. It is obvious from the tables that the listeners found
the English vowels produced by the Sudanese speakers more difficult than those read
by the native speakers. Table 5.1 shows that the British listeners totally misperceived
the English front mid close /G/ as /+/ or – less often – as /KÖ/. The English open /3/
also proved to be difficult for the listeners. It was frequently misheard as /¡/ and less
frequently as /7/. Another type of frequent perception error was the confusion of the
English tense /KÖ/ for its lax counterpart /+/. Moreover, the English tense /KÖ/ was
replaced by /3/ or /G/ but less often. Perception errors involving the central and back
English vowels included the replacement of the English /n/ by /7/ and less often by
/¡/ or /3/, whilst the back low /#Ö/ was substituted for /«Ö/. Other miscellaneous
errors were the misperception of /n/ as /¡/ or /3/ and /nÖ/ as /#Ö/. Similar perception
error patterns were found for the American listeners exposed to the same English
vowel tokens spoken by the Sudanese speaker (see Table 5.2). Interestingly, most of
these errors have to do with the central and back vowels, which implies a systematic
relation with the production of the English source vowels. This relationship will be
discussed later. On the other hand, no serious problems were found when the English
vowels were read by the native speaker. However, the English lax-tense pairs /7~WÖ/,
/+~KÖ/ were often substituted by both British and American listeners.
Table 5.1 Confusion matrix of English vowels and diphthongs produced by a Sudanese EFL
learner (in the rows) and perceived by ten British listeners (in the columns). Correct responses are
on the main diagonal, indicated in bold face. Confusions ( 30%) are in grey-shaded cells. The
vowel /7/ should have been presented but was not.
Perceived RP vowels
Target
«Ö ¡ #Ö 3 #7 C+ G G G+ + KÖ + n nÖ n+ 7 WÖ 7 W
«Ö 6 1 2 1
¡ 9 1
#Ö 3 7
3 5 3 2
#7 9 1
C+ 10
G 0 9 1
G 2 7 1
G+ 1 3 6
+ 10
KÖ 1 1 5 3
+ 10
n 2 1 0 7
nÖ 1 9
n+ 1 1 8
7 1 9
WÖ 1 1 8
7 2 1 7
Table 5.2 Confusion matrix of English vowels and diphthongs produced by a Sudanese EFL
learners (in the rows) and responded to by ten American listeners (in the columns). Correct
responses are on the main diagonal, indicated in bold face. Confusions ( 30%) are in grey-
shaded cells. The vowel /W/ should have been presented but was not.
Perceived RP vowels
Target
«Ö ¡ #Ö 3 #7 C+ G G G+ + KÖ + n nÖ n+ 7 WÖ 7
«Ö 5 1 4
¡ 6 4
#Ö 1 8 1
3 7 1 1 1
#7 10
C+ 9 1
G 1 9
G 10
G+ 4 2 4
+ 10
KÖ 1 5 4
+ 1 1 8
n 3 1 6
nÖ 10
n+ 10
7 9 1
WÖ 1 5 4
7 1 9
5.5.1.2 Discussion and conclusion
Most likely many of the errors which were made by the British and American listeners
identifying English vowels produced by Sudanese speakers, have linguistic causes. The
replacement of the English /G/ by /+/ can be attributed to two elements. Firstly, it is
probably triggered by an L1 effect which permits only vowel sounds available in the
Arabic vowel repertoire, viz. /K, C, W/, while it blocks /G/, since the latter is not part of
the Arabic vowel system (see Kopczski and Mellani 1993). This assumption is less
probable, however, since previous studies have shown that Arabic speakers developed
/G/ (Munro 1993, Dickins 2007). 16 Secondly, a replacement error of this type can most
16 Sudanese Arabic also developed monophthongs. These include /G/, which historically
descends from the diphthong /CL/ as in /CLP/ ‘an eye’, which coalesced (merged) in dialects such
as Cairene and Central Sudanese. In Arabic varieties spoken in large parts of the Levant these
probably be referred to spelling/graphical differences between English and Arabic,

where the Sudanese-Arabic speakers pronounce English /G/ in the way it is spelt as a
transfer of the Arabic spelling system, which maintains a direct letter-sound relation.
This means that each vowel or consonant of Arabic has one sound, which corresponds
to its spelling, but there are no mute letters. Therefore, the English vowel /G/ in words
such as enter, envelope, wet, let, etc., is often mispronounced as /+/ by the Sudanese
speakers, which forms the major cause of confusion in this context. It is also possible
to describe this phenomenon as an interlingual error, which results from faulty or
partial learning of the L2 rule. 17
Similarly, the misperception of /3/ as /¡/ or /7/ is due to an incorrect English vowel.
That is, Arabic speakers almost always have problems with the pronunciation of the
front open /3/. They tend to pronounce the English /3/ in the same way they
produce their L1 vowel back open lengthened /C/; i.e., in Sudanese and Cairene Arabic
/C/ is pronounced as in [D3ÖD] ‘door’ (Kaye 1997). It is likely this is the reason why
native Arabic speakers are advised to keep the English short vowel /3/ fully front
(Cruttenden 2008). Along similar lines Bobda (2000) concluded that Sudanese speakers
of English fluctuate between /¡, «Ö, 7/ due to interference from their Arabic L1
background. The confusion of lax-tense /+~KÖ/ by the British and American listeners
can also be attributed to an incorrect vowel production that probably resulted from the
wrong implementation of English vowel categories. It is less probable that these
substitution errors are the result of incorrect vowel length in the learners’ L2. This is
because a vowel distinction in both English and Arabic vowel systems is based on
short/long contrasts. However, Munro (1993) reported that the English vowels spoken
by Sudanese Arabic EFL learners are influenced by their L1 (Arabic) vowel system,
which has a short/long vowel contrast that is solely based on quantity. Thus, a
substantial number of subjects pronounced English tense-lax vowels in terms of Arabic
long/short vowel categories. However, this assumption seems to be weak because
short/long contrast is also used in English tense-lax vowel distinction. These types of
errors often happen when the speakers have had relatively little exposure to English
speech.
5.5.2 Consonants
5.5.2.1 Results
Figure 5.2 presents the mean percentage of correctly identified consonants by two
groups of native listeners of English, i.e. ten British and ten American listeners. Again,
vowels are realized as /Gu/ or /nt/. In Sanani and a number of Peninsula dialects, the diphthongs
are maintained in all phonological contexts. Moreover, among some Cairene speakers the mono-
phthongs are shortened in closed syllables to give short /G/ or /n/, hence they are not consider-
ed to be separate vowels (Watson 2002).
17 Actually, the English pronunciation preferences often do not pay attention to the relation
between sounds and letters, as equally as it considers social conventions, then, a sort of balance
would occur. This feature makes English pronunciation a problematical area particularly for non-
native speakers because the relation between letters and sounds is not clear (Wells 1999).
the MRT items were spoken by a designated representative Sudanese learner of English
and by a native speaker of RP English.
Correct consonant identification (%)
Figure 5.2. Mean correct identification of English onset and coda consonants by 10 British and
10 American listeners of English. The error bars include ±2 Standard Errors of the mean. The
consonants were produced by one Sudanese and one native speaker of British English.
As Figure 5.2 shows, the perception level of the British and American listeners in
English consonants is very high. The overall percentage of correctly identified
consonants by these listeners is 85.0 and 84.8 % when the consonants were produced
by the Sudanese speakers and 99.0% and 99.2% when they were spoken by native
speakers of English. The RM-ANOVA shows that the effect of speaker type is highly
significant, F(1, 18) = 94.5 (p < .001). Moreover, the British listeners showed better
understanding of the English consonants read by the Sudanese speakers, but the
difference is insignificant, F(1, 18) < 1. Furthermore, the level of performance on the
consonants read by the native speaker is almost the same, between the two listener
groups so that the speaker × listener interaction remains insignificant, F(1, 18) < 1. It is
probably because both listener types are native speakers of English. However, a few
English onset and coda consonants were misperceived (see Tables 5.3-4-5-6).
Table 5.3 Confusion matrix of English onset consonants produced by a Sudanese EFL speaker
(targets, in the rows) and responded to by ten British listeners (in the columns). Correct
responses are on the main diagonal, indicated in bold face. Confusions ( 30%) are in shaded
cells.
Target Perceived RP consonants
D V5 F V & H I J L M N O P R T U 5 6 X Y \
D 10
V5 10
F 2 5 3
V 4 6
& 0 10
H 10
I 8 2
J 1 8 1
L 10
M 10
N 10
O 10
P 9 1
R 1 1 8
T 1 9
U 10
5 10
6 5 5
X 2 7 1
Y 10
\ 10
Table 5.4 Confusion matrix of English onset consonants produced by a Sudanese EFL speaker
(targets, in the rows) and responded to by ten American listeners (in the columns). Correct
cells.
Target
D V5 F V & H I J F< M N O P R T U 5 6 X Y \
D 10
V5 10
F 0 10
V 5 5
& 0 10
H 10
I 10
J 1 9
F< 10
M 10
N 10
O 10
P 10
R 10
T 10
U 10
5 10
6 7 3
X 9 1
Y 10
\ 10
Table 5.5 Confusion matrix of English coda consonants produced by a Sudanese EFL speaker
(targets, in the rows) and responded to by ten British listeners (in the columns). Correct
cells.
Target
D V5 F F< & H I M N O P 0 R U 5 V 6 X \
D 9 1
V5 10
F 10
F< 10
& 7 3
H 10
I 10
M 10
N 8 1 1
O 10
P 10
0 3 7
R 1 9
U 10
5 1 9
V 10
6 1 5 4
X 1 9
\ 6 4
Table 5.6 Confusion matrix of English coda consonants produced by a Sudanese EFL speaker
(targets, in the rows) and responded to by ten American listeners (in the columns). Correct
cells.
Target
D V5 F F< V5 H I M N O P 0 R U 5 V 6 X \
D 10
V5 10
F 10
F< 10
& 5 5
H 8 2
I 10
M 2 4 4
N 10
O 10
P 10
0 1 9
R 3 7
U 10
5 1 9
V 10
6 4 1 2 3
X 1 9
\ 4 6
As for the onset consonants produced by the Sudanese EFL speaker, both British and
American listeners totally misidentified /&/ as /\/, while frequent misperceptions of
/6/ as /U/ and /F/ as /V/ were also observed. It is worth mentioning that the American
listeners always misperceived /F/ as /V/. These are probably the most serious errors
experienced by the listeners involving the English consonants read by Sudanese
speakers. Similar error patterns of the dental fricative consonants of English were made
in the coda consonants read by the Sudanese speakers. These included the replacement
of /&/ by /\/, /6/ by /U/, /\/ was replaced by /U or 6/ whilst /6/ was replaced /U or
&/ and there was /0~P/ confusion, for both listener groups. Miscellaneous other
confusions such as /M~I/ and /H~X/ were found for the American listeners only.
The British and American listeners suffered from several other confusions, which
included /X~R, 5~V5, R~M/ in coda position. The error frequency obtained for the
fricative consonants is higher for onsets but lower for the coda position.
In contrast to the above, the listeners showed nearly perfect perception of all English
onset and coda consonants when these were articulated by the native speaker. As for
the onset consonants, the British listeners misperceived /&/ as /6/ and /6/ as /U/,
whilst the American listeners showed perfect perception. As for coda consonants, the
most prominent type of error was an interchangeable (symmetrical) confusion of
/O~P/ by the British listeners, which showed up as an asymmetrical substitution of
/0/ for /P/ in the responses of the American listeners.
The conflation of /&/ with /\/ and /6/ with /U/ which were read by Sudanese speakers
can be attributed to incorrectly produced English consonants. This conflation resulted
from interference of (L1) Sudanese colloquial Arabic (in formal Arabic these sounds are
pronounced correctly) (Mohammed 1991). In the Sudanese consonant inventory the
interdental /6, &/ merged with the apico-dental (often labeled as alveolar or sibilant) /U,
\/ (Dickins 2007, Watson 2002, Corriente 1978). Thus, Arabic words like /J3&C/ ‘this’,
are mispronounced as J3\C/, whilst /63DKV/ ‘firm’ is mispronounced as /U3DKV/, which
influenced the production of the English dental and alveolar fricatives. Actually, in a
number of Arabic dialects, the line separating dental continuants from sibilant (hissing)
sounds is becoming blurred. That is, the consonant chart of Central Arabic (CA) and
Modern Standard Arabic (MSA) contain three subsets grouped as stops /V F, V/,
sibilants /U \, \/ and interdentals /6, &, &/. This means that the distinction between
sibilants and interdentals has been lost at the colloquial dialectical level, but not in
formal Arabic. However, the loss of such boundaries is compensated for by four
distinctive features for two subsets which include voiced-plain, voiceless-plain,
voiceless-emphatic and voiced-emphatic consonants (Schmidt 1987, Watson 2002,
Dickins 2007). This change, therefore, has side-effects involving the perception of L2
dental fricatives. According to Kaczwski and Mellani (1993), to avoid these types of
confusions, Arabic speakers (of different colloquial dialects) of English need to
rearrange the distinctive features lying between inter-dentals and alveolar from those of
Arabic. Furthermore, the distinction between English /6, &/ does not always lie in their
articulation since most EFL learners can perform them correctly in isolation. However,
the problem aggravates when such dentals are combined with /U/ and /\/, particularly
in languages which contain no dental fricatives. All of /U, \/ and /6, &/ are produced
nearer to the upper incisors, so that learners need to practice drills containing
combinations involving such sounds (Cruttenden 2008).
In terms of international English intelligibility, the incorrect pronunciation of the

English dental fricatives /6, &/ and /U, \/ represents a problem for second language
learners irrespective of the learner’s L1. This is probably because /6, &/ are relatively
infrequent phonemes in the sound patterns of the world’s languages. However, the
assumption does not show consistency since the same substitutions were also observed
among EFL/ESL speakers descending from language backgrounds with similar dental
fricatives, e.g. Arabic, etc. Intelligibility problems as such probably arise in interactions
involving non-native and native speech participants of English due to factors like (i) the
number of the minimal pairs the distinction of which is dependent on contrasts of such
phonemes and (ii) the potential frequency of such pairs in interactions. This claim
motivates the prediction that in an error hierarchy, contrast between phonemes such as
/6~U, &~\/ may imply a high functional load due to their rare occurrence in many
languages, which in turn leads to intelligibility problems. Thus, the intricate learning
nature of these phonemes, as both rare and highly marked sounds across languages,
practically plays a major role in labelling them as a prominent issue of speech
intelligibility problems (see Jenkins 2000, Seidlhofer 2005, Van den Doel 2006).
Other substitution errors of English /M~I/ coda consonants which were read by the
Sudanese learners are likely due to the lack of a clear voicing feature separating voiced
from voiceless stops, which occurs across very narrow (VOT) boundaries. Moreover, it
is probably because in the Arabic inventory the VOT values of final plosives are
normally low or absent which make the voicing distinction between such pairs blurred.
Consequently, the native listeners made incorrect judgments of the English velar
consonants. The misrecognition of the English /N/ as /P/ is attributable to similarity of
the place of articulation. However, it is most probably due to the effect of pre-pausal
features that affect a wide range of modern Arabic dialects, including Central Sudanese
dialects.
Other perception errors like /R~H, X~D/ can be attributed to labiality shared between
bilabial stops and labio-dental fricatives or to voicing. Background noise or
unfamiliarity with the speaker’s accent often delays intelligibility between speech
interlocutors (Ball and Rahilly 1999). On the other hand, the native listeners do not
show serious perception errors of English consonants read by the British speakers,
which are most likely due to similarity of their linguistic backgrounds. In other words,
both British and American listeners benefited from the similar linguistic background
shared with the native speakers.
5.5.3 Consonant clusters
5.5.3.1 Results
Figure 5.3 presents the mean percentage of correctly identified consonant clusters in
the responses by ten British and ten American listeners of English. The clusters were
read by one Sudanese and one British speaker of English.
As Figure 5.3 shows, both British and American listeners achieved a less than optimal
identification of the English clusters read by Sudanese speaker: correct identification is
84 and 88% for British and American listeners, respectively. Their performance is near-
ceiling with the same consonant clusters read by the native speakers: the overall mean
scores are 98% and 96%, respectively. The overall effect of speaker type (native, non-
native) is highly significant as shown by an RM-ANOVA, F(1, 18) = 24.8 (p < .001).
The results also seem to indicate that the Sudanese speakers were more intelligible to
the American than to the British listeners but the RM-ANOVA shows that neither the
main effect of listener nationality, F(1, 18) < 1, nor the speaker × listener type
interaction, F(1, 18) = 1.8 (p = .198) reach significance.
Correct cluster identification (%)
Figure 5.3. Percentage of correctly identified English onset and coda clusters by 10 British and 10
American listeners of English. The error bars include ±2 Standard Errors of the mean. The
consonant clusters were produced by one Sudanese and one native speaker of British English.
Tables 5.7-8-9-10 present the confusion matrices of the British and American listeners’
perception results of English onset and coda clusters, produced by the Sudanese
speakers. As Tables 5.7-8 show, few errors were made in the perception of the onset
English clusters such as the replacement of /MN/ by /IN/ by both groups of listeners.
Moreover, the British listeners replaced /RN by /HN/ whilst American listeners replaced
/RN/ by /FT/ and /UY/ by /UR/. Generally, Tables 5.7-8 do not show any serious
difference in the error rates between the two listener groups. This is probably so
because generally the onset clusters are easier to identify.
However, the British and American listeners made more perception errors in the coda
clusters as Tables 5.9-10 show. The British listeners misidentified /UV/ as /UM/, /PV/ as
/0M or OR/ and /0M/ as /PF/. Other miscellaneous errors which showed no regular
pattern are the misperception of /P\/ as /F\, or /VU, UV/ as /MF/. On the other hand,
fewer errors were made by the American listeners in the perception of English coda
clusters produced by the Sudanese speaker. These included the replacement of /UV/ by
/UM/ and /0M/ by /PF/. This finding reveals that the error frequency in the perception
of the English consonant clusters by the British and American listeners is more
remarkable in the coda clusters.
The error rate is smaller when the English consonant clusters were read by the native
speaker in both onset and coda clusters. As the results show, the perception by both
listener groups is nearly perfect. The British listeners misperceived /UV/ as /UM/ whilst
American listeners replaced /UN/ by /UR/. The nasal cluster member /P\/ was also
mistaken for /O\/ by both listener groups.
Table 5.7 Confusion matrix of English onset clusters produced by one Sudanese speaker (targets,
in the rows) and responded to by ten British listeners (in the columns). Correct responses are on
the main diagonal, indicated in bold face.
Perceived RP cluster consonants

Target
FT IN MN RN UN URN URT UY DN HN
FT 10
IN 9 1
MN 1 2 7
RN 9 1
UN 10
URN 10
URT 10
UY 10
Table 5.8 Confusion matrix of English onset clusters produced by one Sudanese speaker (in the
rows) and responded to by ten American listeners (in the columns). Correct responses are on the
main diagonal, indicated in bold face.

Target
FT IN MN RN UN URN URT UY DN UR IT
FT 9
IN 8 2
MN 2 7 1
RN 1 9
UN 10
URN 10
URT 10
UY 9 1 1
Table 5.9 Confusion matrix of English coda clusters produced by one Sudanese speaker (targets,
in the rows) and responded to by ten British listeners (in the columns). Correct responses are on
the main diagonal, indicated in bold face. Confusions ( 30%) are in shaded cells.
Target
DF I\ NO 0M PV P\ UV VU DN F\ MF NF NM NV l6 PF UM
DF 6 2 1 1
I\ 9 1
NO 9 1
0M 7 3
PV 9 1
P\ 9 1
UV 5 1 4
VU 10
Table 5.10 Confusion matrix of English coda clusters produced by one Sudanese speaker (targets,
in the rows) and responded to by ten American listeners (in the columns). Correct responses are
on the main diagonal, indicated in bold face. Confusions ( 30%) are in shaded cells.

Target
DF I\ NO 0M PV P\ UV VU UM NU NV N6 OR F\ PF NH NM NF
DF 4 5 1
I\ 10
NO 6 1 2 1
0M 6 1 3
PV 2 5 1 2
P\ 7 2 1
UV 3 7
VU 10
The replacement of the onset cluster /MN/ by /IN/ is the most frequent perception error
pattern, which is most likely made due to the lack of clear distinctive voiced and
voiceless features that occurs across very narrow VOT boundaries. In the Sudanese L1
(Arabic) inventory, the distinction of consonant stops such as these uses VOT and
aspiration features but these are activated in different ways than in English. While both
English and Arabic fall into the two-category group of languages in terms of the
number of stop categories they contain, the two languages differ in their VOT patterns.
Arabic follows a binary system of presence or absence of glottal pulsing during the
closure period of the stop, while in English there need not be any vocal cord vibration
during the production of either of members of the pair /M, I/ (Kattab 2000). Con-
sequently, the EFL speaker incorrectly produced the English velar consonants, which
misled the target listeners to choose the right acoustic feature, VOT/aspiration, to
distinguish the initial consonant in /MN~IN/. In the coda position, the misidentification
of the second consonant in clusters as in /UV~UM/ and /PV~0M/ probably occurred
because of the similarity of the manner of articulation between second cluster members
(plosives). Nevertheless, perception errors such as the misidentification of /OR, 0M/ as
/PF/ reflects the effect of plosive release: i.e. weakly exploded stop consonants are
often vulnerable to confusion.
Other miscellaneous errors such as /RN~HN/, /RN~FT/, /P\~F\/ and /UV~MF/ in both
onset and coda positions, which do not show a clear pattern, can possibly be
understood as the result of differences in phonotactic restrictions between English and
Arabic. Many findings in the field of non-native speech perception have shown that the
perception of speech segments is determined by two factors; language-specific and
language-universal constraints. That is, phonotactic restrictions in each language
determine the sound sequences in a syllable where particular sounds can appear in the
onset/coda position.
Interestingly, the findings reveal that Sudanese speakers are more intelligible to
American listeners than to their British counterparts (for ANOVA see the results
above), which may imply that the American listeners are more familiar with the foreign
accent, or with foreign accents in general, than the British listeners are. On the other
hand, the British listeners benefited from the fact that they spoke the same variety
(British English) as the native speaker since the British listeners obtained higher scored
on the native speaker’s materials than the American listeners did.
5.5.4 SPIN sentences
5.5.4.1 Results
Figure 5.4 presents the mean correct scores on the SPIN test obtained by ten British
and ten American listeners. The sentences were read by one Sudanese and one British
speaker of English. Error bars (± 2 standard error, SE) are also shown. The figure also
shows the correct identification scores on components of the SPIN keywords. Separate
scores were computed for the onsets, vocalic nuclei and codas of the SPIN keywords.
Also, a composite score was computed by taking the mean of these three component
scores. Note that the composite score is always higher than the word-recognition score:
for a keyword to be counted as correctly recognized, all components had to be
identified correctly by the listener. I will present and statistically analyse only the word-
recognition scores. The component scores will be analysed in a later section when I will
make an attempt to predict word recognition from the component scores.
Figure 5.4 Mean correct recognition of keywords and components thereof by ten British and ten
American listeners of SPIN sentences produced by one Sudanese (top panel) and one British
speaker (bottom panel) of English. Error bars are ± 2 SE.
As Figure 5.4 shows, the performance of the British and American listeners is nearly
perfect on the SPIN sentences produced by the native RP speaker, with overall mean
values of 93 and 95%, respectively (right-most bar in each cluster). However, lower
word-recognition rates were obtained when the same sentences were read by the
Sudanese speaker of English: the overall means drop to 65 and 69% for the British and
American listeners, respectively. Moreover, in comparison to the British listeners, the
American listeners show a higher intelligibility level of the SPIN sentences irrespective
of the speaker’s accent. The main effect of speaker type (Sudanese EFL versus native
British) was highly significant by a RM-ANOVA, F(1, 18) = 239.9 (p < .001). The
effect of listener type (American versus British), however, is a trend at best, F(1, 18) =
3.3 (p = .085). The speaker × listener interaction is totally insignificant, F(1, 18) < 1.
Figure 5.4 also provides details on the listeners’ performance in the perception of the
SPIN keyword components produced by the Sudanese and the British speaker. The
correct identification by British and American listeners of onset consonants in the
keywords is 85 against 93% when the consonants were read by the Sudanese and
British speaker, respectively, F(1,18) = 90.8 (p < .001). However, the listeners
responded perfectly to the same consonants spoken by the British speakers; the mean
correct score is 100% for both listener groups, F(1, 18) = 7.5 (p = .013) for both the
main effect of listener group and for the speaker × listener interaction.
The results for the vowel nuclei show a small difference of perception between the
British and American listeners; the mean correct identification scores here are 76
against 84% when the items were read by the designated Sudanese EFL speaker, and 97
and 100% when the items were read by the native speaker, F (1, 18) = 136.2 (p < .001)
for the speaker effect and F (1, 18) = 10.4 (p = .005) for the main effect of listener
nationality. However, the interaction between speaker and listener groups is a trend at
best, F (1, 18) = 3.0 (p = .099).
On the other hand, performance on the coda consonants proved to be the poorest of
all and the British listeners had higher scores than the Americans when the sentences
were read by the British speaker; the mean scores are 97 against 96%. However, both
listener types showed a lower score when the same coda consonants were read by the
Sudanese speaker; the mean scores are 69 against 75%, respectively. Again, the effect of
speaker type was highly significant, F (1, 18) = 191.2 (p < .001), whereas the effect of
listener group was not, F (1, 18) = 1.3 (p = .271). The interaction between speaker type
and listener group just fails to reach significance, F (1, 18) = 4.3 (p = .053).
Both British and American listeners obtained excellent recognition scores on simple
and predictable English sentences produced by the native RP speaker. However, the
American listeners performed slightly better than their British counterparts, regardless
whether the materials were spoken by the Sudanese or the native RP speaker of English;
the mean recognition scores found for these two groups of listeners are 69 and 95%
and 65 and 93%, respectively. The listeners’ performance is always better when they
hear native speakers. Interestingly, the American listeners tend to have better scores
irrespective of speaker type. Possibly, the SPIN sentences, which were developed in the
USA, refer to American rather than to British everyday situations. The coda consonants
proved to be a difficult area in which the listeners showed a low performance, in
comparison to the onset consonants and nucleus vowels. The correlation figures below
may provide more insight.
5.6 Correlations
Tables 5.11-12 present correlation matrices for vowels, single and cluster consonants
and the component scores on the SPIN keywords: i.e. vowels, single and cluster
consonants, the mean of the latter three components, and the recognition scores on the
entire keyword in the SPIN sentences. The correlation coefficients were computed for
the mean percent correct scores of British (upper part of tables) and American (lower
part of tables) native listeners, separately for the non-native (Table 5.11) and native
speaker (Table 5.12). The tables present linear product-moment correlation coefficients
(r) between the listeners’ perception scores for all tests and test components in the
battery.
Table 5.11 Correlation matrix for scores on vowels, single consonants, cluster consonants and
(components of the) SPIN test (onset correct, nucleus correct, coda correct, mean of onset +
nucleus + coda, whole word correct) read by one Sudanese speaker of English.
SPIN sentences MRT

Listeners
onset nuc. coda mean vowels cons cluster
British nuclei .339**
codas .375** .552**
onsets .655** .776** .895**
vowels –.163** .000** .099** .006**
consonants –.151** –.111** –.386** **–.312** –.104**
clusters –.199** .000** .326** .128** .347** –107**

words correct .380** .448** .802** .745** –.075** –491** .421**
American nuclei .692**
codas .445** .573**
onsets .815** .905** .810**
vowels –.411** –.553** –.362** –.528**
consonants –.320** –.490** .222** –.232** .192**
clusters –.128** –.226** .044** –.124** –.198** .456**
words correct .549** .597** .670** .719** –.097** .175** –.343**
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).
The computation of the correlation of the SPIN results provided different figures with
respect to listener and speaker nationality backgrounds (for an explanation of the
concept of the correlation coefficient, see chapter three). With regard to SPIN test
components read by the designated Sudanese EFL learner, a correlation between the
onset consonants and nucleus vowels yielded a positive r = .692 (p < .05) for the
American listeners, whilst it shows a positive but insignificant r = .339 for the British
listeners. These figures imply that the vowel nucleus is predictive of the onset correct
perception, particularly for the American listeners. Moreover, the coda consonant
component correlates with the onset and nucleus vowels positively at r = .375 and .552
for the British listeners and at r = .445 and .573 for the American listeners, respectively.
These relations indicate that both British and American listeners identify the onset
consonants well whenever they succeed in identifying nucleus vowels and coda con-
sonants (and vice versa).
On the other hand, we find no useful correlation between vowels, consonants and
clusters and their SPIN component counterparts, which null-effect we did not expect.
There are weak correlations between SPIN coda consonants and consonants at r = .222.
This indicates that vowels and consonants have a negative association with the SPIN
components, except the coda consonants, which have a positive relationship to
consonants. Similarly, English vowels heard by American listeners showed a relatively
high positive correlation with coda consonants (although the correlation is not
significant) at r = .625. It is possible to attribute the absence of correlation between the
SPIN components and their MRT counterparts (vowels, consonants and clusters), to
the learners’ paucity of exposure to English, which leads to less consistent performance.
Table 5.12 Correlation matrix for scores on vowels, single consonants, cluster consonants and
SPIN test (onset correct, nucleus correct, coda correct, mean of onset + nucleus + coda, whole
word correct) read by one British speaker of English.
SPIN sentences MRT

Listeners
onset nuc. coda mean vowels cons cluster
British nuclei .a
codas .a .068**
onsets .a .663** .792**
vowels .a –.583** –.181** –.493**
consonants .a –.167** .272** .102** –.111**
clusters .a –.356** –.509** –.600** .059** .089**
words correct .a –.089** –.582** –.491** .386** –356** –.048**
American nuclei .a .a
codas .a .a
onsets .a .a 1.000**
vowels .a .a .625** .625**
consonants .a .a –.218** –.218** –.307**
clusters .a .a –.196** –.196** –.276** .385**
words correct .a .a .667** .667** .742** –327** –.294**
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).
a. Correlation cannot be computed because at least one of the variables is constant (at 100%
correct).
Individual vowels, consonants and clusters show some weak correlations. Clusters
showed a positive (but statistically insignificant) correlation with consonants at r = .385.
Moreover, clusters correlate positively with vowels (r = .347) and with coda consonants
(r = .326). This shows that cluster consonants are to some extent good predictive
elements of correct perception of vowels and coda consonants read by Sudanese
speakers and responded to by British listeners.
5.7 Conclusions
Errors made by British and American listeners in the perception of the English front,
central and back vowels produced by Sudanese speakers were largely due to fact that
the learners’ native language, Sudanese Arabic, which distinguishes merely three vowel
qualities. These English vowels are not part of the speakers’ L1 vowel inventory so they
represent learning difficulty. Moreover, the paucity of knowledge of the English sound-
letter correspondences on the part of the learners often leads to the misperception of
these vowels. Such perception errors often take place due to partial learning or
insufficient practice.
Frequent confusions were made by the British and American listeners of English in the
perception of the English dental fricatives /U, 6/ and /\, &/ read by the Sudanese
speaker due to interference from the Sudanese-Arabic source consonant system. The
incorrectness in this context is caused by the filter effect of the speakers’ L1 Sudanese
Arabic (SA) consonant inventory in which contrasts that exist between English
consonants are not made.
British and American listeners showed no serious perception problem with English
speech sounds which were produced by the native control speaker.
The results also reflect the effect of the linguistic backgrounds of speech participants
on intelligibility. That is, native listeners are better equipped to interpret the speech of a
native talker. On the other hand, non-native talkers may produce the L2 speech sound
with a articulation base that is typical of their L1 rather than of the target language
which leads to misinterpretation of such a sound. This means that ESL/EFL listeners
from the same native language background as the talkers will be more likely to access
the correct phonemic category than EFL/ESL listeners and speakers who do not have
the same native language.
Vowels and coda consonants (rather than consonant clusters and single initial
consonants of English proved to be the most problematical area in the perception of
Sudanese-Arabic accented English for native (British and American) listeners.
Chapter Six
Acoustic analysis of English vowels

6.1 Introduction
Producing the English vowels is one of the most challenging tasks for Sudanese
university EFL learners. Such learners arguably have difficulties, e.g., distinguishing
between English vowels like /G/ and /«Ö/ in words like gale ~ girl and /#Ö, 3, ¡,n/ in
words like cart, cat, cut, cot. Cross-linguistic studies have shown that segmental errors like
these frequently occur in ESL/EFL due to differences between L1 and L2 (Flege 1995,
Gilbert 1984). Many learners whose L1 lacks contrastive sounds of L2 tend to replace
L2 sounds by the nearest sound available in their L1. The English vowel /W/, for
example, may be realized with significantly higher F2 values in English than in French
due to absence of an /[/ category in English. This is probably why substitution of
English /W/ in French tous /VW/ is perceived as /[/ by native French listeners (Flege
1976). Findings such as these suggest that language-specific differences are responsible
for learning difficulties of L2 speech sounds. The lack of L2 knowledge may also
contribute to production problems of English vowels by ESL/EFL learners. This has
to do with the explicit knowledge acquired by the L2 learners through pronunciation
lessons taught. Most ESL/EFL classes focus on teaching language aspects such as
syntax, vocabulary and morphology to help learners to grasp the structure of English
sentences. However, learning to produce correct pronunciation is not given much
attention in these syllabuses. Although a few lessons treat phoneme articulation in a
broader sense, the accompanying exercises do not address any specific pronunciation
difficulties. In these lessons, teachers ask the learners to pronounce repeatedly a set of
minimal pairs, etc. The learners react to such pronunciation tasks reluctantly and this is
probably why the lessons are less effective. In the Sudanese context, for example, EFL
learners receive lessons for the development of the listening skills, in which tape
recordings are played. Most other communication skills take place inside the class room.
Therefore, the learners do not get sufficient opportunities to practise skills needed in
real life.
To account for the processes involved in cross-language speech production like these
and to predict difficulties experienced by adult second or foreign language (L2) learners,
the spectral and temporal patterns of L2 speech sounds produced by these learners
should be examined. Instrumental studies focused on aspects like formant frequency
(in Hz) in the production of L2 vowel by L2 learners. Focus was limited to areas of
difference where a vowel in L1 has no counterpart in L2. However, other studies went
further to examine even the production of L2 vowels that have a phonological
counterpart in L1, seeking to achieve several goals. Firstly, by examining the
production patterns they wish to obtain conceptual and productive insights into the
mechanisms that the second (or foreign) language learners adopt in order to deal with
the target English phonological system. Secondly, they aim to establish insights into the
extent and nature of the similarities and differences between the phonetic inventories
of the learners’ native language (L1) and those of the target L2 (Flege 1976). In the
present study, the phonetic and acoustic distance that exists between English (L2) and
Sudanese EFL learners’ (L1) Arabic form a major factor of L2 production which
motivated the present investigation. That is, differences in spectral properties (i.e., F1
and F2 formant values) between English and Arabic represent one example, where the
Sudanese Arabic long vowels /KÖ/, /CÖ/ and /WÖ/ showed relatively lower F1 and F2
values compared with English (Elobeid and Maaly 1996). This property may influence
the learners’ articulation of L2 through interference of L1 perceptual vowel
representations. Similar problems might arise in the production of L2 due to
differences of the vowel space and temporal cues between the L1 and L2. However, if
some vowels in L1 show correspondence to others in L2, this should also be
considered.
English vowel problems sketched above, have recently motivated researchers of

ESL/EFL (e.g. Strange, Bohn, Trent and Nishi 2004, Wang and Van Heuven 2006) to
conduct experimental analyses of the (English) vowel system. In the current study, a
similar acoustic analysis will be reported. Specifically, I have studied the location of
Sudanese-Arabic accented English vowels in the acoustic vowel space (defined by the
first and second formant frequencies) as well as their durational properties.
6.2 Methods
6.2.1 Material
Recordings were made on a laptop computer using Adobe Audition software. The
subjects were seated in a quiet room with their lips a few centimetres away from a
head-mounted close-talking microphone. They were asked to read a list of mono-
syllabic English words which included all the target English vowels. These words were
embedded in a carrier sentence (Say …again). The carrier sentence was intended to help
the subjects to speak at a constant rate. The list of items (including keywords) can be
found in Appendix 3.1. The subjects were encouraged to give their best possible
production of such words. If the experimenter suspected that an error in the
production was simply a reading error, rather than a genuine indication of the subject
inability to pronounce a certain word, the subject was asked to repeat the word. The
recorded material was then submitted to acoustic analysis using Praat software
(Boersma and Weenink 1996).
6.2.2 Speakers
Ten Sudanese native Arabic speakers preparing for a bachelor degree in English
language teaching were recruited primarily from the student population at Gadarif
University. In selecting the participants, semi-final learners who had reached a
considerable level of English were preferred. This is because they were expected to
achieve better performance. Practically, these students use English only inside the
CHAPTER SIX: ACOUSTIC ANALYSIS OF ENGLISH VOWELS 129
classroom and in other academic activities such as debates, discussions, etc. For the
control group of native speakers, the data published by Deterding (1997) was used,
which provides measurements of English vowels recorded by five male and five female
BBC broadcasters. The data is found in a directory that contains ten files in Excel
format. Each file contains the measurements of the first three formants of the eleven
monophthongal vowels of RP. Importantly, the words containing the target vowels
were not spoken in sentences but in isolation.
6.3 Procedure
6.3.1 Formants measurements
When studying the details of vowel production, the customary procedure is to measure
the lowest two resonance frequencies of the vocal tract, denoted as the first and second
formants (F1, F2), respectively. F1 and F2 can be related to vowel quality in a fairly
straightforward fashion (e.g. Delattre, Liberman and Cooper 1955). F1 corresponds
closely to the degree of mouth opening (close versus open vowels) whilst F2 is a
correlate of vowel backness. The task of formant measurement was done in a number
of steps. Firstly, I roughly estimated where the formants were by looking at the
spectrogram of the stimuli, particularly the target vowels. Formant tracks were
automatically computed for the lowest three formants (F1, F2, and F3) in the frequency
range between 0 and 3200 Hz and superposed onto the spectrogram. Whenever there
was a visual mismatch between the formant tracks and the spectrogram, the model
order (number of formants required) and/or the frequency range of the Linear
Predictive Coding (LPC) analysis was changed, until a satisfactory match was obtained.
Then segmentation points were set in a text grid at the onset and offset of the target
vowel while the number of formants to be extracted (two or three) and frequency cut
off (in Hz) were noted on a separate tier. Using a script, the duration and the formant
frequencies were extracted from the recordings off-line. Formant values were extracted
at the temporal midpoint of the target vowel. 18 The data were then further analysed
with SPSS statistical software.
Then, in order to make acoustic distances between vowels in the formant space
optimally correspond to auditory distances, formant values were rescaled from hertz to
Barks (using the conversion formula advocated by Traunmüller 1990). 19
18
I gratefully acknowledge the help of Ing. Jos J.A. Pacilly, senior technician at the LUCL
Phonetics Laboratory, in writing the necessary Praat scripts.
19
The Bark scale is a psycho-acoustical transformation proposed by Zwicker (1961). Bark has to
do with measurements of loudness. The scale ranges from 1 to 24 corresponding to the first 24
critical bands of hearing. There are subsequent band edges (in Hz) at 20, 100, 200, 300, 400, 510,
630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700,
9500, 12000, 15500 Hz. According to Smith and Abel (1999) Bark units represent samplings of a
continuous variation in the frequency response of the ear to a sinusoid or narrow band noise.
6.3.2 Vowel normalization
A z-normalization procedure was applied to the Bark-transformed F1 and F2 values of

the Sudanese and native speakers of English. Vowel normalization is a statistical
operation developed to compensate for speaker-specific differences in vocal-tract size
which in turn result in different formant resonances (Brett 2004). Vowel normalization
is crucial in order to compare the vowel realizations across different speakers in
linguistically meaningful ways. Normally, comparison includes formants, durations and
vowel classification. In the current study, normalization is used to preserve
phonological distinctions among English vowels produced by British and Sudanese
speakers. The z-transformation involves subtracting the individual speaker’s mean F1
(and mean F2) from the raw formant values of F1 (or F2) and subsequently dividing the
difference by the speaker’s standard deviation (of F1 and F2, respectively) (Wang and
Van Heuven 2006, Adank, Smits and Van Hout 2004). After normalization, z-
transformed values of F1 below 0 correspond to high (close) vowels, whilst values
above 0 correspond to low (open) vowels. Similarly, positive z-values for F2 stand for
front vowels, whilst negative z-values of F2 refer to back vowels. In the results below
(Figures 6.3-4), F1 is plotted along the vertical axis (high F1 at the bottom, low at the
top) and F2 along the horizontal axis (high F2 to the left, low F2 to the right). This
configuration of the axes yields a representation that closely resembles a traditional
articulatory vowel chart.
6.3.3 Duration measurement
The measurement of duration is a complicated task. This is because the delimitation of

sound units in an acoustic sense requires dealing with segmentation of utterances in
which different productive and auditory quality impressions of sounds can make the
task of such impressions complex. Even when it can be done, the duration values
provided might not correspond to linguistic judgments of length, e.g. in short and long
English vowels like those in beat, bit, etc. Absolute vowel duration values are often
undesirable, since the duration of vowels will vary considerably depending on the
context, for instance how fast or slowly the sentence containing the vowel is
pronounced, whether the vowel is followed by a voiced or voiceless consonant, and so
on. Therefore, the z-normalization procedure was also applied to the duration of the
Sudanese speakers’ English vowels. This precaution was taken for two reasons. First,
the EFL speakers’ slow speaking rate may affect the absolute values of the English
vowel durations. Secondly, the English vowels duration are expected to be influenced
by the Sudanese speakers L1 (Arabic) inventory where vowel durations in which tense
and lax counterparts are contrasted through a quantity rather than a quality difference
as in English (Algamdi 1998, Munro 1993, Kopczwski and Mellani 1993). Durations
were z-normalized by subtracting from each individual vowel token the speaker’s mean
vowel duration and dividing the result by the speaker’s standard deviation. As a result,
the speaker’s mean vowel duration changed to 0 and the new standard deviation
changed to 1. Any z-duration shorter than the speaker’s mean duration will have a
negative value, any duration longer than the mean will be positive.
6.4 Overall results
6.4.1 Vowels
6.4.1.1 Vowel space
Figures 6.3-4 below present acoustic vowel charts of eleven English vowels produced
by Sudanese and British speakers, respectively. As a correlate of vowel height F1 (in
Barks) is plotted vertically against F2 (in Barks), which is plotted horizontally (from
right to left) as a correlate of vowel backness. Each point in the graph represents the
centroid (mean F1-F2 coordinates) in the acoustic vowel space of one vowel type,
measured at the temporal midpoint of the ten tokens produced by the Sudanese
speakers (or by a variable number in the L1 control data). In the graphs long (tense)
and short (lax) English vowels are indicated separately. The short vowels are the corner
points of the polygon with the grey shading.
First formant (F1, Bark)
Second formant (F2, Bark)
Figure 6.3 Mean positions in the vowel space of English vowel tokens produced by Sudanese
speakers. Long tense vowels are linked by the unshaded polygon, whilst the short lax vowels are
shown in the shaded polygon. F1 values are plotted vertically and F2 horizontally.
First formant (F1, Bark)
Second formant (F2, Bark)
Figure 6.4 Mean positions in the vowel space of English vowel tokens produced by British
speakers. Long tense vowels are linked by the unshaded polygon, whilst the short lax vowels are
shown in the shaded polygon. F1 values are plotted vertically and F2 horizontally.
It is apparent from the results that the English vowel space of the Sudanese speakers
differs from that of the natives. In the vowel area, the short and long English vowels of
these speakers appear to be closely similar (though not identical) whilst their British
equivalents are dissimilar, which reveals an important discovery. This implies that the
Sudanese speakers follow the same track in producing the short and long English
vowels which make their acoustic output of such vowels manifest a kind of
correspondence. In the vowel space, the high front vowel /KÖ/ is situated closer to the
low front /+/. Similarly, the rounded back /nÖ/ and /n/ appear closer to each other, but
in the case of the native speakers these pairs are totally separate, i.e. /nÖ/ is located high
back, whilst /n/ tends to be low back in the vowel area. This suggests that the vowels
produced by Sudanese learners do not conform to native English patterns. Similarly,
the English long vowel /WÖ/ of the Sudanese speakers is produced further back than
that of the British speakers. More interesting differences are that several Sudanese
English vowels do not show a clear learning pattern, i.e., do not look like those of the
target language. As Figure 6.1 shows, /G/ is less open and is in fact quite close to /+/.
The short open /3/ is quite near /¡/ and /#Ö/, unlike that of the native speakers which.
These types of pronunciation problems occur due to different factors.
6.4.1.2 Discussion
The statistical analysis of acoustic output reveals that the dispersion of the English
vowels spoken by the Sudanese speakers and their British counterparts uses different
contrastive categories. Generally, this suggests that Sudanese EFL learners have
problems in implementing native English norms. In detail, one of the most interesting
findings is that the members of the English tense-lax vowels pairs /WÖ~7/ and /nÖ~n/
are very close to one another in the vowel space. This pattern of error reveals a clear
effect of the speakers’ L1 vowel system; i.e. the English tense/lax vowels were
pronounced according to the subjects’ L1 productive strategy (Mitleb 1981). On the
other hand, the English tense vowel /KÖ/ shows no serious production problems,
probably because it is similar to the Arabic /KÖ/ (see Munro 1993). The misclassification
of /G/ as /+/ (Figure 6.1) indicates no distinct learning of this vowel. It is probably due
to the fact that the English /G/ has no equivalent in Arabic, so that Arab students tend
to replace it by /+/ or /¡/ (Kopczwski and Mellani 1993). However, this claim sounds
less plausible, since previous studies have shown that Sudanese Arabic has /G/ (Munro
1993, Dickins 2007). 20 Therefore, most probably this type of error refers to spelling/
graphical differences that exist between English and Arabic, where the Sudanese-
Arabic speakers pronounce English /G/ in the same way it is spelt. Therefore, the
English vowel /G/ in words such as enter, envelope, wet, let, etc., is frequently
mispronounced as /+/ by the Sudanese speakers. The major cause of this confusion is
probably partial learning of the English front vowels. Moreover, this type of error is
also attributable to transfer of the Arabic spelling system, which maintains a direct
letter-to-sound correspondence. This means that each vowel or consonant of Arabic
has one sound, which corresponds to its spelling, but there are no silent (unpro-
nounced) letters.
The fluctuation of the English front low short vowel /3/, which is physically shown in
a mid position between // and /¡/, points to the lack of this vowel type in the
learners’ L1 vowel inventory (Brett 2004). This type of problem may exist due to
differences of vowel realization between English and Arabic. In a related study of
Arabic vowels, the Sudanese informants tended to produce Arabic vowels, e.g., /CÖ/
(typically sounds like /3/) with rising tones (Algamdi 1998). Arguably, this is one of
the reasons why Arabic speakers are frequently advised to keep the English /3/ fully
front to avoid confusion with /¡/ (Cruttenden 2008).
The lack of vowel contrasts in Arabic makes the learning of English vowels difficult.
Arabic and English show similar simple syllable nuclei in that both show phonetically
short and long vowel patterns. But because Arabic has fewer contrasts, the range of
20
Sudanese Arabic also developed monophthongs. These include /G/, which historically descend
from the diphthong /CL/ as in /CLP/ ‘an eye’, which coalesced (merged) in dialects such as
Cairene and Central Sudanese. In Arabic varieties spoken in large parts of the Levant these
vowels are realized as /Gu/ or /nt/. In Sanani and a number of Peninsula dialects, the diphthongs
are maintained in all phonological contexts. Moreover, among some Cairene speakers the mono-
phthongs are shortened in closed syllables to give short /G/ or /n/, hence they are not con-
sidered to be separate vowels (Watson 2002).
allophonic variation of each vowel phoneme is greater than that of English; e.g., Arabic
/C/ has allophones within the area bounded by /', 3, #, ¡/. Thus, English contrasts
such as bet-bat, cat-cot, cot-cut, cot-caught trigger difficulty (Lehn and Slager 1983). All in all,
error patterns such as these are often accounted for on the basis of differences of
formant values that exist between L1 and L2, as previous studies have shown. These
differences result in incorrect articulation of L2 vowels (Liberman et al. 1957, Scholes
and Robert 1968).
6.4.1.3 Results and discussion of vowel duration
Figure 6.5 presents the mean durations of English vowel tokens of Sudanese university
students and native speakers of English. Duration values are arranged in descending
order from left to right. Durations are measured in milliseconds. In the figure, the
native speakers’ vowel durations appeared longer than their Sudanese counterparts
because they were spoken in isolation.
Figure 6.5 Mean duration (s) of English vowels produced by Sudanese (square markers) and
native (circles) speakers of English, broken down by vowel type.
Z-normalization was used to get more insightful vowel duration values (see
normalization above). The computation of correlation coefficients revealed a strong
positive relationship between the Sudanese speakers’ mean vowel durations and those
of the native speakers (r = .943, p < .01). Moreover, the mean duration values of the
pure English vowels produced by Sudanese speakers are as follows: /+/ 59 ms, /KÖ/ 145
ms, /G/ 69 ms, /n/ 108 ms, /nÖ/ 199 ms, /7/ 90 ms, /WÖ/ 159 ms, /3/ 150 ms, /¡/ 81
ms, /«Ö/ 109 ms and /#Ö/ 211 ms (see Appendix 6.1 for individual vowel durations and
mean norm vowel durations). This statistical fact implies that the English vowel
durations of Sudanese speakers correspond relatively well to English vowel duration
norms (see Catford 2001, Jacewicz, Fox and Salmons 2006). In other words, the
tense/long English vowel durations of Sudanese learners correspond to the longest
native RP durations whilst the lax/short ones correspond to shortest durations.
The observed correspondence fits the assumption that the Arabic tense-lax vowel
categories resemble those of English in terms of quality and duration. However, the
resemblance is not perfect since each of the two languages possesses distinctive
acoustic features (see Elobeid and Maaly 1996). In other previous studies, Sudanese
speakers showed English vowel duration ordering similar to that of the native speakers,
in particular in tens/lax vowel pairs; however, in terms of vowel quality (location in the
F1-by-F2 space) they are insufficiently distinct from one another. This is likely because
the Sudanese learners incorrectly interpret English tense/lax vowels in terms of Arabic
temporal properties (Mitleb 1984, Munro 1993). Actually, in terms of acoustic cues, the
Arabic long/short vowel distinction can best be described as a tense-lax contrast based
on quantity (Alghamdi 1998, Flege and Port 1981, Hassan 2003, Koeczynski and
Mellani 1993, Walkers 2001). 21 On the other hand, in English, the distinction between
the tense-lax vowel pairs is primarily a qualitative difference perceived by the native
speakers (Carrs 1999, Catford 2001, Cunningham-Anderson 2003). Thus, cross-
linguistic differences such as these potentially lead to difficulty for ESL/EFL learners.
The results also imply that the Sudanese speakers are aware of the English long/short
vowel contrast but they have difficulty implementing the exact acoustic norms of the
English vowels. Moreover, the poor performance in this area could be attributed to the
speakers’ relatively little exposure to English vowel sounds.
21
Vowel quantity is defined as that phonological distinction of a vowel relative to one or more
other vowels of similar timbre in the language. Contrasts in vowel quantity are often acoustically
realized by the duration of vowels where a long vowel quantity has a duration that extends twice
that of a short vowel. The greater duration associated with a long vowel quantity also allows the
possibility for a more extreme articulation than a corresponding short vowel quantity. Con-
sequently, the vowel spectrum, in particular the first and second formant frequencies, and
therefore perceived timbre, may also be affected by vowel quantity (Takayuki et al. 1999).
6.4.1.4 Automatic classification of L1 and L2 vowels
Since there are only perception data at this moment on the English vowel tokens of a
single (representative) Sudanese-Arabic EFL speaker, I would like to make an educated
guess of how native English listeners would identify all the Sudanese L2 English vowel
tokens collected in this study (or how Sudanese L2 listeners would identify the L1
English vowels produced by many different RP speakers). In order to do so, Linear
Discriminant Analysis will be used (LDA). LDA (Klecka 1980, Strange, Bohn, Trent
and Nishi 2004) is an automatic classification technique that can be trained to optimally
classify the vowel tokens in this study in terms of the English vowel categories. In the
training stage of the analysis, exemplars of L1 tokens of English were fed to the
algorithm, in terms of F1 and F2 (Bark transformed and subsequently z-normalised
within speakers) as well as vowel duration (z-transformed). As the results will point out,
the algorithm, once trained on the native English vowel data, achieved a good
classification of the native English vowel tokens (76% correct identification; chance
would be 9% correct, i.e. 1 in 11). Then the same algorithm (optimized for L1 English
vowel categories) was used to classify the Sudanese L2 English vowel tokens. In this
way, the LDA functions as a model of a typical native L1 listener on the assumption
that an L1 listeners knows where the vowel tokens in his language are typically located
and how far individual vowel tokens may stray away from their prototypes (i.e.
centroids in the F1-by-F2 (-by duration) space. I have also repeated the process and
trained the model with Sudanese L2 English tokens; then it was examined how well the
LDA model identified the vowels spoken by Sudanese learners and by native speakers
of English.
Tables 6.1-2-3-4 below show the results of the LDA in confusion matrices.
Table 6.1 Confusion matrix of Sudanese accented English vowels classified by Linear Dis-
criminant Analysis. The algorithm was trained and tested on RP vowels (76.4% correctly
classified vowel tokens). Correctly classified vowels are on the main diagonal (bolded).
Stimulus Identification by LDA

vowels KÖ + G 3 ¡ #Ö n nÖ 7 WÖ «Ö
KÖ 97.4 2.6
+ 2.8 92.2 2.8 2.7 2.7
G 10.5 66.2 7.5 .8 .8 14.3
3 9.5 82.5 6.3 2.6
¡ .9 2.7 72.6 18.1 2.7 .9 5.2
#Ö 2.1 17.8 66.7 14.4
n 2.0 16.7 67.6 10.8 3.9
nÖ 2.0 92.8 6.1
7 5.3 14.0 59.6 19.3 2.8
WÖ 6.3 5.1 25.3 63.3
«Ö 3.8 13.8 2.3 10.0 2.3 2.3 68.8
In the rows of the matrices, the vowel types are listed as intended by the speakers,
whilst in the columns the vowel types identified by the LDA are displayed as the most
likely category. As a result, the main diagonal in the matrix contains the correct
identifications, while confusions are found in the off-diagonal cells. I will first examine
Table 6.1, which contains the results of the LDA when trained and tested on L1
English vowels.
Table 6.1 shows, that correct classification of vowel type ranges between 60% (for /7/)
and 97% (for /KÖ/) with an average of 76.4%. The strongest confusion is found
between /WÖ/ and /7/: the tense vowel is misclassified as its lax counterpart in 25% and
the lax member is confused with the tense member in 19%. Even though the
classification is imperfect, (as would be the classification by human listeners) I may
now classify the Sudanese L2 tokens by applying the native classification schema. The
results are presented in Table 6.2.
criminant Analysis. The algorithm was trained on RP data but tested on Sudanese-Arabic
accented L2 vowels (42 % correct vowel classification). Correctly classified vowels are on the
main diagonal (bolded). Confusions t 30% are indicated in grey-shaded cells.

KÖ 90.9 9.1
+ 42.9 42.9 7.1 7.1
G 54.5 45.5
3 8.3 50.0 33.3 8.3
¡ 72.7 18.2 9.1
#Ö 9.1 63.6 27.3
n 9.1 45.5 36.4 9.1
nÖ 9.1 82.8 .0 9.1
7 7.1 14.3 7.1 64.3 7.1
WÖ 90.0 10.0 .0
«Ö 50.0 8.3 16.7 8.3 16.7
The performance of the LDA in Table 6.2 was poor (42% overall correct vowel
identification ) in comparison to the previous one (76.4%). Similar types of errors were
repeated where /WÖ/ was almost always replaced by /nÖ/ and less often by /7/ and /nÖ/
by /n/. Other frequent errors were the misclassifications of /+/ as /KÖ/, /G/ as /+/, /3/
as /¡/, /#Ö/or /«Ö/ and finally /«Ö/ was misidentified as /G/ and less often as /¡/ and
/n/. The last analysis is an LDA trained on L2 data and used to classify native English
vowels.
criminant Analysis. The algorithm was trained and tested on Sudanese-Arabic accented EFL
vowels. Correctly classified vowels are on the main diagonal (bolded). 54.7% of the vowel tokens
were correctly classified. Confusions t 30% are indicated in grey-shaded cells.

KÖ 90.9 9.1
+ 57.1 28.6 7.1 7.1
G 45.5 45.5 9.1
3 8.3 50.0 25.0 16.7
¡ 9.1 36.4 36.4 9.1 9.1
#Ö 9.1 18.2 72.7
n 9.1 9.1 45.5 36.4
nÖ 9.1 82.8 9.1
7 7.1 28.6 57.1 7.1
WÖ 10.0 90.0
«Ö 16.7 16.7 8.3 8.3 50.0
Table 6.3 shows that many of the English vowels produced by the Sudanese speakers
were misclassified, with a mean correct of 54.7% and lots of confusions. For example,
/+/ was misclassified as /KÖ/ (57% confusion), /n/ as /nÖ or 7/, /7/ as /n/ and /¡/ as
/#Ö or nÖ/ and /G/ was misclassified as /+/ (46%). The results also showed that /«Ö/ was
mispronounced as /G, 3, ¡, nÖ/. Interestingly, there were no serious errors made in the
classification of /KÖ/. There are other slight mispronunciations of English vowels made
by the subjects, which do not reflect a clear error pattern.
In Table 6.4 the rate of confusion was even worse (48.7%) when the same English
vowel tokens were identified automatically in native listeners’ terms. For instance, /n/
was misclassified as /nÖ/ and sometimes as /¡, #Ö, WÖ, 7/. Moreover, /nÖ/ was almost
misclassified as /WÖ/ and less often as /7/, whilst tense-lax pair /WÖ~7/ was inter-
changeably misclassified. Automatic identification also shows that the tense vowel /+/
is often replaced by /G/ or vice versa. Furthermore, the English vowel tokens /G, ¡, «Ö, 3,
#Ö/ were interchangeably substituted for one another, however, the English vowel pair
/+~KÖ/ was rarely confused.
Table 6.4 Confusion matrix of Sudanese-accented English vowels classified by Linear Dis-
criminant Analysis. LDA trained with L2 vowels but tested on L1 vowels. Correctly classified
vowels are on the main diagonal (bolded). 48.7% of the vowel tokens were correctly classified.
Confusions t 30% are indicated in grey-shaded cells.

KÖ 95.6 4.4
+ 2.8 62.9 35.4 .9
G .8 25.6 12.3 .8 60.9
3 .8 79.4 4.0 5.6 10.3
¡ 3.4 46.6 40.5 3.4 .9 2.7 3.4
#Ö 15.6 54.4 4.4 25.6
n 3.9 7.8 10.8 63.7 3.9 9.8
nÖ 3.1 3.1 93.9
7 2.8 3.5 70.2 24.6
WÖ 2.5 5.1 2.5 72.2 17.7
«Ö 2.3 13.8 3.8 3.8 3.8 3.8 70.0
In conclusion, the classification matrices show that the production of English vowels
proved to be more problematic for Sudanese speakers. However, results of the native
speakers revealed better performance, as Table 6.1 shows. These results allow the
prediction that the Sudanese speakers do not follow certain learning patterns, probably
because these types of vowels are lacking in Arabic language. The data also bear out the
prediction that Sudanese listeners/speakers were more intelligible to each other than to
the native English speakers and vice versa, which reflects the inter-language speech-
intelligibility effect in which speech participants benefit if speakers and listeners share
the same native language. 22
22 Inter-language means using a language system, which is neither the L1, nor the L2. It is a third
language, with its own grammar, its own lexicon and so on. The rules used by the learner are to
be found in neither his own mother tongue, nor in the target language. In this context, inter-
language describes the possibility that, in interactions, listeners can explicitly categorize unfamiliar
speakers due to regional dialects/linguistic backgrounds (Van Heuven and Wang 2007). Obvious-
ly, for English native listeners, the native speakers of English are most intelligible. Similarly, the
non-native listeners find the non-native with the same linguistic background more intelligible
than the natives. This is called matched inter-language speech intelligibility benefit. On the other
hand, the type of degraded level of intelligibility that occurs between native and non-native
speech participants is referred to as mismatched inter-language speech intelligibility benefit (Bent
and Bradlow 2003).
6.5 Conclusions
The articulation of the /G, ¡, «Ö, n, #Ö, WÖ, nÖ, +, 3/ proved to be difficult as the subjects show
a poor performance. However, there are remarkably few errors made in the
pronunciation of the tense vowel /KÖ/. This is probably because the Sudanese speakers
have similar equivalents for such vowels.
Unlike the native speakers’ vowels, Sudanese EFL learners’ vowels are mostly
distinguished with lower formant values (probably due to inventory differences
between L1 and L2). The speakers need to enhance their vowel inventory to produce
less foreign-accented English vowels.
The English short/long vowel durations of the Sudanese learners show similar
ordering to those of the native English speakers. However, some vowel durations are
slightly lengthened, probably due to the circumstance that the learners tend to produce
English vowels with their L1 productive strategies.
Both speaker types benefit from their national backgrounds (inter-language), as was
shown by the results of applying automatic vowel classification in native English and
Sudanese-accented EFL vowel tokens after that the classification algorithm had been
trained with native and EFL data. If it is accepted that the automatic classification
procedure mimics the performance of a human (native or non-native) listener, the
results support the hypothesis that each of the Sudanese and British speakers manifest
a greater level of intelligibility when they are perceived by (simulated) listeners with the
same native-language background.
Several production problems of English vowels such as the orthographically motivated

errors in which /G/ was mispronounced as /+/ and the reduction of /G+/ and /«Ö/ to
/G/, took place due to the lack of L2 phonemic knowledge.
Production errors detected in this study followed different directions which suggest
that the Sudanese learners of English do not follow a clear learning pattern.
Chapter Seven
Acoustic analysis of English obstruents

7.1 Introduction
The primary focus of this chapter is on English consonants produced by Sudanese

university EFL learners. It attempts to delineate the acoustic correlates of these
consonants accounting for the phonetic and phonological differences that exist
between them and those of the native speakers. Acoustic analysis in this section covers
(relative) intensity and temporal parameters such as consonant duration, peak intensity
and centre of gravity (COG). The learners’ L1 (Arabic) includes consonants which have
much in common with English; however, each language also possesses it own specific
distinctions. These specific distinctions are expected to influence the learners’
production, in one way or another. The aim of the acoustic analysis in this study was to
account for the extent to which differences between L1 and L2 can affect correct
production of plosives, fricative and affricates spoken in Sudanese Arabic-accented
English. It has been assumed that correct production depends on phonological
representations that are related to specific phonetic contrasts that must be maintained
within a language (Maniwa, Jongman & Wade 2009). The only available previous
studies with English consonants spoken in Arabic-accented English provided im-
pressionistic descriptions only. A few studies approached the production problems of
English consonants on an experimental basis but gave no satisfactory account of the
subject at issue. These studies observed substitutions of /U~6/ in words such as sick,
thick and sink, think and of /&~\/ in words like then, zen, etc. (Rababah 2003, Patil 2006,
Jesry 2005). Some studies have approached the production of English consonants
experimentally, such as Altaha (1995), do Val Barros (2003), who reported the
substitution of /X/ for /H/ or /D/ as in words like very/berry and volley ball/bolley ball and
/U~6/, /&~\/, etc. The English affricate /F</ is often split up by /K/, e.g., a word like
bridge is pronounced as /DTKFK</ or by /I/ as in village /XKNKI/. The importance of this
chapter comes from the experimental conduct used to approach English consonants,
which helps to uncover more evidence on the production problems of English
consonants spoken by Sudanese university learners of English. English consonants
spoken with a Sudanese-Arabic accent may display more specific L1 interference effects
than those detected among other Arabic speaking groups. Little, if anything, is known
about the characteristics of consonant durations, preceding vowel duration and other
properties of English consonants spoken by Sudanese EFL learners. There is also little
information known about the extent to which such characteristics can compromise the
intelligibility of Sudanese Arabic-accented English.
7.2 Objective
The objective of the study is to find experimental evidence for the production
problems with English consonants spoken by Sudanese university EFL learners. It has
been argued that pronunciation difficulties arise due to differences between L1 and L2
speech sounds. These difficulties do not only result in pronunciation problems but also
they lead to the perception of unintended English speech sounds by the native English
listeners, causing intelligibility problems.
The data obtained can help understand which English consonants are the most difficult
to produce and what the causes of these difficulties are. Thus, it would be possible to
obtain cognitive insights into the L2 production problems and to utilize these insights
for pedagogical purposes.
7.3 Methods
7.3.1 Material
Stimuli comprised a list of CVC words included in a carrier phrase Say …again. These
The consonants were plosives, fricatives and affricates produced by 11 Sudanese
university EFL learners (see Appendices 3.2a-b). The nasals and semivowels were
excluded as they were not expected to present production problems. Moreover, the
onset C1 could be each of the possible onset consonants specified above (i.e. excluding
/0/ and /</ because they do not occur in initial position). Similarly, C2 components
could be each of the possible coda consonants, i.e., excluding the semivowels /J/, /L/,
/Y/, /T/, which do not occur in coda position. Additionally, /</ was not tested in coda
position, even though it occurs in words such as beige or rouge. Such French loans are
too infrequent to warrant the inclusion of /</.
7.3.2 Participants
7.3.2.1 Sudanese EFL learners
Eleven male Sudanese Arabic speakers were recruited primarily from the student
population at Gadarif University. In selecting the subjects, I focused on semi-final
students who had reached a considerable level of English proficiency and, hence, from
whom a relatively good performance was expected. These students specialized in
English. In general, they used it inside the classroom and in other academic activities
such as debates, discussions, etc.
7.3.2.2 Native speakers of English
As there were no native speakers at hand, I took recourse to published native speakers’
data related to this study for comparison purposes.
CHAPTER SEVEN: ACOUSTIC ANALYSIS OF ENGLISH OBSTRUENTS 143
Voice onset time (VOT) data of Docherty (1992) were used for comparison purposes.
Five male native speakers of Southern British English, aged between 18 and 22,
provided Docherty’s data. These were students preparing for a bachelor degree at
Edinburgh University but had been educated and brought up in South-East England.
None of these subjects had a regional accent and there were also no systematic
differences between them.
I used Centre of gravity (COG) and spectral SD values of Maniwa et al. (2009). This
study included eight fricatives of English recorded by 20 male and female native
speakers of American English (aged 19-34). The fricatives were embedded in /#C#/
non-words. Each syllable was recorded in isolation in conversational and in clear
speaking style. Data on the preceding vowel duration were extracted from House
(1961), English consonant durations from Catford (1977) and peak intensity data from
Ball and Rahilly (1999).
7.4 Procedure
7.4.1 Test battery
Materials were recorded on a laptop computer using Adobe Audition. The subjects
were seated in a quiet room with their lips a few centimetres away from a head-
mounted close-talking microphone. They were asked to read a list of monosyllabic
English words which included all the target English consonants. These words were
embedded in a carrier sentence (Say …again). The carrier sentences were intended to
help the subjects to speak at a constant rate. Moreover, keywords were provided in the
list along with the target words as a guideline to help learners achieve correct
pronunciation (see Appendices 3.2a-b). The subjects were encouraged to give their best
possible production of the words. If the experimenter suspected that an error in the
production was simply a reading error, rather than a genuine indication of the subject’s
inability to pronounce a certain word, he asked the subject to repeat the word. The
recorded materials were then submitted to acoustic analysis using Praat software.
7.4.2 Praat
For speech analysis, the Praat speech-processing programme was used. Praat is an
open-source software tool, which is used for speech signal editing and labelling, as well
as for various acoustic (spectral, formant and duration) analyses and manipulations
(Boersma and Weenink 1996). It has other advantages of being easily adaptable for
specific research purposes; results can also be exported to Excel-compatible
spreadsheets for offline statistical analysis of results.
7.5 Overall results
This section presents the acoustic characteristics of the English plosives, fricatives and
affricates produced by Sudanese EFL Learners. The measurements included the voice
onset time (VOT), duration of the preceding vowel in different consonant environ-
ments, consonant duration, peak intensity, centre of gravity (COG) and the standard
deviation (SD) of the spectrum (explained in § 7.5.8).
7.5.1 English plosives
7.5.2 Acoustic features of English plosives
English plosives differ from other consonants in their ways of articulation. That is to
say, they can be characterized acoustically by three main phases which include (i) a
closure leading to silence (the silent interval), (ii) a release noise burst and (iii) a fast
movement of the articulators into or away from the vowel. In the case of the first phase,
a perceptible period of silence appears throughout the whole spectrum. However, in
the voiced plosives /D, F, I/ there is usually only a near-absence of energy but some
low-frequency energy is maintained during the closure. This low frequency dis-
tinguishes the voiced plosives from their voiceless counterparts by the presence of a
voice bar, which appears in the spectrum below 250 Hz. As a result, the closure release
has a relatively higher intensity in voiceless than voiced stops because at the moment of
release, intra-oral pressure is lower in voiced stops than the voiceless stops. Release is
the second phase. It causes a rapid escape of air, which in turn gives rise to random
pressure variations, i.e. a noise burst. According to Kent, Dembowski and Lass (1996)
there is a short period of varying constriction of the upper vocal folds, which occurs
immediately after release and which results in a post-release fricative-like periodic
sound. For the voiceless plosives /R, V, M/, there is usually a higher onset or offset in
fundamental frequency into the following and/or preceding vowel. Moreover, there is
likely to be a marked rising bend of F1 of the adjacent vowel in the case of /D, F, I/ that
is not as marked in the case of /R, V, M/. Furthermore, distinctions between different
plosives, i.e. bilabial, alveolar and velar stops, are indicated by the noise frequency of
the burst that appears at the onset of the release stage together with bends of F2 and F3,
which are also known as formant transitions. Such transitions move into the following
or preceding vowels. These stages are referred to as overlapping rather than discrete
and are not necessarily evident in any individual stop token. Articulators are the third
phase, which move apart from each other giving rise to turbulence due to airflow
through the glottis (aspiration). The airflow just starts prior to the onset of the vocal
fold vibrations. The next section provides details and illustrations about the plosives’
VOT.
7.5.3 Spectral preparation
Voice onset time (VOT) is a term which is widely used to describe the timing of voicing
in stops. It refers to the interval (in ms) which exists between the release of the stop
closure and the start of the voicing for a following voiced segment. The voice onset
time is used as an acoustic parameter to distinguish syllable-onset cognates in many
languages of the world (Docherty 1992). Figure 7.1 provides spectrographic illustrations
of the voice onset time of some English initial plosives: bab, dad and gag. In the pattern
shown, the voice onset time codes the voicing category. As the spectrogram shows,
there is no voicing during the closure of any of the three initial plosives (0 ms VOT).
Immediately after the silence (in Figure 7.1, this is shown as a white bar), there is a
burst of energy (a noise burst, in Figure 7.1, this is shown as a dark line between the
silence and the vowel bar) followed by voicing. The vowel sound appears as a wide
black bar, which normally follows the burst. In this way, measuring the time from the
burst to the beginning of the following vowel is called the voice onset time (VOT).
Failure of ESL/EFL speakers to produce voice onset time for plosives like /R,V, M/
with long-lag values that correspond to the values of the native speakers in the same
phonetic context is detectable by the native speakers. Such differences of VOT values
contribute to the appearance of foreign accent – intelligibility problems.
Figure 7.1 An illustration of the voice onset time (VOT) in native English plosives.
In this section, I present the voice onset time (VOT) of English plosives produced by
Sudanese EFL learners. Figure 7.2 shows the VOT of English initial and coda plosives
produced by Sudanese EFL learners. Figure 7.3 shows the voice onset time (VOT) of
English plosives which were produced by native speakers (data from Docherty 1992).
Both datasets were produced in a carrier phrase (Say ..… again).
7.5.4 Voice onset time
Figure 7.2 presents the Voice Onset Time (VOT, in ms) measured for the English
plosives produced by 11 Sudanese EFL learners for onset and coda consonants
separately and broken down further by place of articulation.
Voice onset time (ms)
Target consonant
Figure 7.2 Mean Voice onset time (VOT, in ms) of English plosives produced by 11 Sudanese
learners of English. Onset and coda stops are plotted separately and broken down by place of
articulation. Error bars are ±2 standard errors of the mean.
Figure 7.3 Presents VOT measurements obtained from a group of native English
control speakers (Docherty 1992). These measurements were done for plosives in onset
position only. For the sake of comparison I copied the corresponding EFL values into
Figure 7.3, which therefore partially repeats the EFL data in Figure 7.2.
Figure 7.3 shows that the English VOT of the Sudanese learners is very different from
the English norm. The difference is highly significant by a paired t-test, t(5) = 13.7 (p
< .001).
Voice onset time (ms)
Target consonant
Figure 7.3 Mean Voice Onset Time (VOT, in ms) in initial plosives produced by British (native)
and Sudanese (non-native) speakers of English. Data of native speakers were extracted from
Docherty (1992). Sudanase EFL data is my own (see Figure 7.2).
The learners’ VOT of both voiced and unvoiced stops almost falls within the short-lag
range of the continuum. This is quite different from the native speakers (Figure 7.3)
whose voice onset time falls within the short-lag range for the voiced plosives, whilst
those of the voiceless counterparts fall in the long-lag range. This finding reveals
systematic acoustic differences between English and the Sudanese learners’ L1 (Arabic).
The distribution of their VOT values shows a different organization than those of
English, which probably reflect different categorical distinctions between English and
Arabic. Similar findings were reported by Flege and Port (1980), Fokes, Bond and
Steinberg (1985) and Khattab (2002). They demonstrated that the acoustic correlates of
voicing for onset stops in Arabic are the presence of glottal pulsing (pre-voicing/
phonation), which occurs during the closure interval for voiced stops (i.e. negative
VOT), and presence of a noise burst for unvoiced stops (i.e. short positive VOT). On
the other hand, in English, the voicing contrast is shown by the presence of silence (in
the oscillogram this appears as entirely a flat line) during the closure interval followed
by a short noise burst for voiced plosives (i.e. short positive VOT) and aspiration for
voiceless plosives (i.e. long positive VOT). One more aspect of difference is that the
learners produced relatively shorter VOT values for the English voiceless stops in the
coda position than in the onset, which shows a feature of L1 influence. That is, Arabic
speakers of English tend to show hardly any voicing contrast in coda plosives. However,
the voiced onset and coda plosives showed similar results; coda VOT values are shorter
than in onsets, which prompt caution about such a finding. More interestingly, even on
the individual speaker level, the VOT total mean values tend to be shorter for the
English plosives: 4, 6, 9, 11, 12, 5, 0, î5, î14, î4 and î55 ms for the eleven individual
speakers (for individual voice onset time mean values of each plosive see Appendix 7.1)
than those of the native speakers. This is due to the use of Arabic acoustic correlates
(Flege and Port 1981). These findings suggest that the VOT data of the Sudanese
learners are unstable, showing no length pattern similar to that of the native English
VOT norm. This implies that the learners have difficulty acquiring the English voicing
contrast properly, particularly for the coda consonants. They also suggest that the
learners do not adopt a clear learning pattern in the production of the English plosives,
due to partial learning. Flege (1976) concluded that VOT differences between Arabic
and English are neither due to the confounding factor of vowel context which requires
speakers to produce English stops with longer VOT values, nor to a lack of experience.
Flege described such differences as the result of wrong phonetic representations (these
are normally L1 sound categories) which L2 learners use as guides for the production of
L2 speech sounds categories. Yet despite all, it appears that the learners often acquire
some English voicing contrast features correctly; i.e., the VOT values of some stops
increase as the articulation moves further back in the mouth. This property was fully
shown by the voiceless onset plosives /R, V, M/, whose voice onset time values increase
monotonically with place of articulation: 34, 42 and 53 ms, respectively. The voiceless
coda stops showed a similar rank order for /R/ and /V/ VOT values (0 and 3 ms,
respectively); however, the /M/ VOT value is 0. Similarly, the voiced coda plosives /D, F,
I/ also follow the predicted effect of place of articulation, since their VOT values are 8,
7 and 0 ms, respectively. This data strongly suggests that Sudanese EFL learners have
insufficient experience with English.
7.5.6 Preceding vowel duration
Figure 7.4 presents the duration (in ms) of the vowels preceding voiced versus voiceless
target consonants, as spoken by the Sudanese learners. The consonant types have been
broken down by position in the syllable (onset versus coda) and by manner of
articulation (plosive, fricative, affricate).
Figure 7.4 Mean duration of the English vowel preceding onset and coda plosives, fricatives and
affricates produced by Sudanese learners. Error bars are ±2 standard errors of the mean.
Figure 7.4 shows that the Sudanese learners’ English vowel durations preceding voiced
and voiceless plosives relatively correspond to the English norms for the preceding
vowel duration, but the difference between the voiced and voiceless members in each
pair never reaches significance. Even the largest difference found in any of the six pairs,
i.e., in coda affricates, falls short of significance, t(19) = 1.5 (p = .154, two-tailed). The
mean duration of the vowels are longer before voiced plosives and shorter before
voiceless stops in both onset and coda positions, which finding concurs with
Cruttenden (2008) and Dretzke (1998) (see also Appendix 7.3). Similarly, the duration
values of the vowels preceding onset fricatives show a distinctive pattern. This is
because vowels preceding the voiced fricatives are longer than vowels before the
voiceless counterparts. However, fricatives and affricates show variation in the duration
of the preceding vowels. This appears clearly in the coda fricatives and onset affricates
where the duration values of the preceding vowel do not conform to the pattern that is
expected of the English voicing contrast. Affricates also show similar differences,
where the durations of the vowels preceding onset affricates violate the native English
norm, although coda affricates reflect the correct voicing contrast similar to that of the
native speakers. Unstable duration values such these probably occur due to the
influence of the speaking style. One more interesting finding is that duration values are
nearly equal for vowels followed by alveolar and dental fricatives, particularly in coda
position. This might be due to an incorrect production of English fricatives, which
probably resulted from the voicing influence of the source consonants. That is, learners
tend to substitute their L1 counterpart fricatives boundaries between /U/ and /6/ and
/\/ and /&/ which are blurred (see Dickins 2007, Watson 2002), probably due to the
circumstance that Arabic speakers of English often fail to implement a proper voicing
contrast with final English consonants. In both cases, the ultimate result of this is an
incorrect preceding vowel duration.
7.5.6 Duration of consonants
Figure 7.5 presents the mean duration values of the English onset and coda plosives,
fricatives and affricates produced by Sudanese learners.
Figure 7.5 Total mean duration values of the English onset and coda plosives, fricatives and
affricates produced by Sudanese learners. Error bars are ±2 standard errors of the mean.
Figure 7.5 shows that generally, the English voiceless consonants which were produced
by Sudanese learners, have longer duration values than their corresponding voiced
consonants on both onset and offset positions; coda affricates differ, though. Moreover,
onset consonants show longer duration values than coda consonants. These findings
reveal that the voicing contrast of such consonants shows pattern that consistently
correspond to results in other literature on native English consonant duration (Catford
1977, Cruttenden 2008, House 1961). Moreover, affricates show similar duration
features in onset position but differ in codas. However, the English consonant
durations of the Sudanese learners tend to be longer than the durations of the native
speakers. The overall mean durations of the individual consonants tokens produced by
the Sudanese EFL learners and those produced by native speakers concur with these
results (see Appendix 7.4).
Longer durations are an indication of unstable consonants, and can be attributed to

several factors. The influence of the learners’ speaking style, incorrect L2 perceptual
and productive categories and the lack of the learners’ explicit knowledge of English
consonants can contribute to longer durations. It is also possible to attribute the high
duration values to the typically Arabic use of articulatory emphasis, which functions
contrastively in the learners’ native language (see Kaye 1997).
7.5.7 Peak intensity
Intensity correlates with the (perceived) loudness of a sound. Intensity is the square of
the amplitude of the sound wave integrated over a moving average (time window) that
should be long enough to include at least two glottal pulses. It is determined by the size
of the variation of air pressure, and is conveniently expressed in decibels (dB)
(Ladefoged 2003). Voiced sounds have greater intensities at low frequencies (typically
below 1000 Hz) than voiceless sounds. This feature labels (low-frequency) intensity as a
relative cue that can be used for distinguishing between voiced and voiceless
consonants.
Figure 7.6 Mean peak intensity of the English onset and coda plosives, fricatives and affricates
produced by Sudanese learners. Error bars are ±2 standard errors of the mean.
Generally, the data of the English consonants’ intensity of the Sudanese learners show a
variation of intensity rates at both onset and coda positions (see Figure 7.6; for values
of all the consonants, see Appendix 7.5). The onset lenis plosives have greater intensity
(67 dB) than their fortis counterparts (64 dB). Similarly, the lenis coda plosives,
fricatives and affricates have marginally greater intensity than the fortis consonants: 57,
67 and 65 dB against 57, 66 and 63 dB, for plosive, fricative and affricate pairs,
respectively. This means that the onset plosives and all voiced and voiceless coda
consonants show relative but insignificant correspondence to English intensity where
the voiced sounds tend to have greater intensity than the voiceless sounds (Ladefoged
2003). However, the onset lenis fricatives have lower intensity (65 dB), and so do
affricates (60 dB), than their onset fortis counterparts (65 and 67 dB, respectively).
7.5.8 Centre of gravity
Acoustic correlates are used as parameters to measure issues such as the difference
between speech sounds and qualities of these sounds, etc. Formant values are used as
correlates to distinguish between different vowel sounds since they are linked to the
relative positions and movements of the tongue. However, formants are inappropriate
measures for most consonants. Instead, the spectral centre of gravity (COG) may used
to capture information on the place of articulation of fricatives (and of frication noise
in general). Computationally, COG is the mean frequency calculated as the first spectral
moment, expressed numerically as fi·Ei/Ei, where f and fi are frequencies in Hertz,
E(f) and Ei the spectral power as a function of the frequency (see Figure 7.7). For
instance, if the frequency range is sampled between 0 and 10,240 Hz with, say, 1024
points separated by 10-Hz intervals, the COG is the weighted mean of these 1024
frequencies (at 5, 15, 25, 35, …, 10,235 Hz), where each frequency is weighted by its
intensity. If the emphasis is on the high end of the spectrum, as in /U/-like fricatives,
the COG will assume a relatively high value; if low frequencies are dominant, as for
velar and uvular fricatives, the COG will be found at a relatively low frequency. Figure
7.7 provides an illustration of a sound with low-frequency emphasis. It can be seen that
the COG, indicated by the dashed vertical line is at a rather low value, to the left of the
centre of the analysis bandwidth.
Figure 7.7 Illustration of the Centre of Gravity. COG is represented by the dashed line. It is the
mean of all frequencies within the analysis band, weighted by the acoustic energy at each
frequency (from Van Son and Pols 1999).
The place of articulation of a fricative (or noise burst of the homorganic plosive or
affricate) defines the size of the resonating cavity beyond the constriction point. The
larger (especially longer) the cavity beyond the constriction point, the lower its
resonance frequency, and thereby the COG value. However, there is (at least) one
second parameter that is needed to define the gross shape of the friction spectrum.
Two different fricatives, for instance /H/ and /5/ may have similar COG values but
differ in the distribution of intensity around the centre of gravity. Typically, /H/ has a
flat and level spectrum with intensity evenly distributed over all frequencies whereas /5/
has its energy concentrated more closely around the COG. Such a measure is afforded
by the standard deviation (SD) of the spectrum. If the spectrum contains just one sine
wave, the SD would be zero, indicating that the spectrum is maximally compact: there
is no energy in the spectrum at any frequencies other than at the COG. If the spectrum
is white noise, then there would be maximal dispersion of energy over all available
frequencies within the range analysed, which would be the analysis range divided by 12.
In my analyses, the range extends between 0 and 11,025 Hz (i.e. the digital sampling
frequency divided by 2, a result which is also called the Nyquist frequency. An /H/-like
noise spectrum would then have a spectral SD approximating 10,025/12 = 3183 Hz.
In terms of the earlier example, the spectral SD would be computed by taking the
difference between each of the 1,024 frequencies fi and the COG fx and then
determining the root-mean-square average of these differences after weighting each
individual difference by the intensity of the fi.
In my recordings, COG and the Spectral SD were computed for the middle portions of
the friction sounds. The exact time points of the onset and offset of the noise bursts
for plosives, affricates and fricatives were marked in Praat Textgrids. An analysis
window was then automatically defined between 25 and 75 percent of the duration of
the friction portion of the target sound, such that the COG and Spectral SD were
measured for the central half of the friction portion, which can be assumed to be
relatively stable and optimally representative (see also Maniwa et al. 2009).
In the remainder of this chapter I will concentrate on the COG and spectral SD of the
fricatives. Only for fricatives there are data available in the literature on native speakers
of English that can be compared with the results. No such data can be found for
plosives and affricates. The full data for all manner categories can be found in
Appendix 7.2).
Figure 7.8 presents the centre of gravity (COG) values and the spectral SD of the
English fricatives produced by Sudanese learners and native speakers of English. The
latter data were obtained by estimating the values of the measurement points at 25, 50
and 75% of the friction duration in Figure 2 in Maniwa et al. (2009: 3968). It is assumed
that the mean of the three COG and SD measurements in the central 50% of the
fricative duration is equivalent to a single COG and SD determination averaged over
the middle 50% of the duration of the fricative noise, as was done in my own analysis.23
The first thing which is observed in Figure 7.8 is that the Sudanese learners use the
two-dimensional friction space less effectively than the native speakers do. For one
thing, native /U/ has a COG over 7,000 Hz with a fairly narrow concentration of
energy, while the EFL counterpart has the COG at a substantially lower frequency
(approximately 5,000 Hz) and with a wider spread of energy. Overall, the Sudanese
speakers show roughly the same COG values for voiced and voiceless cognates
whereas the native speakers observe a large difference in COG such that voiced
fricatives have clearly lower values than their voiceless counterparts. This latter
difference is what should be expected given that the voiced fricatives have a lot of low
frequency energy as a result of vocal cord vibration. The results are compatible with my
earlier finding that Sudanese EFL speakers fail to make a proper distinction between
the voiced and voiceless fricatives. Interestingly, the /U~\/ pair do not suffer from this
shortcoming: even though the COG values of the EFL /U~\/ are much lower than
those in native English, the absolute difference between the cognates is of equal
magnitude. The relative location of /U/ versus /\/ in the native data differs radically
from that in the EFL data. In the EFL data there is a tendency for the COG and
spectral SD values to be strongly correlated, r = .837 (p = .019, two-tailed). The voice-
less sounds are always characterized by a higher COG and a larger spectral SD than
their voiced counterparts, which shows that vocal cord vibration is largely absent from
the voiced counterparts. This is in clear distinction to the native English data, where
COG and spectral SD are not correlated, r = .387 (p = .344, two-tailed, ins.). The
spectral SD of native voiced fricatives is always larger than that of the voiceless
counterpart, while the COG is at lower values. This finding is compatible with the
23 In fact, Maniwa et al. (2009) collected two sets of COG and SD measurements; one set was
defined on conversational speech, the second set was collected for optimally clear repetitions of
the target items. I assume that the speaking style of the recordings in my own materials is more
like clear speech than like conversational speech.
presence versus absence of low-frequency energy in the voiced members, due to

voicing.
Figure 7.8 Centre of gravity (COG) and Spectral standard deviation values (in Hz) of the English
fricatives produced by Sudanese learners (top panel) and native speakers of English (bottom
panel).
In order to determine how well the fricatives are distinct in the EFL data I ran Linear
Discriminant Analyses (LDAs) on the fricative tokens, categorizing place of articulation
for voiced (three categories) and voiceless (four categories) fricatives separately, with
COG and spectral SD as predictors (see § 6.4.1.4 for an explanation of the procedure).
COG and spectral SD values were z-normalised within individual speakers (over
fricative tokens only) in order to abstract away from speaker-individual differences in
mean COG and spectral SD. The results of these two LDAs are shown in Table 7.1,
which is a confusion matrix of predicted and observed category membership (place of
articulation). The upper part of Table 7.1 shows the results for voiced fricatives, the
lower part deals with the voiceless counterparts. Overall correct assignment of place of
articulation amounted to 60% correct for the voiced fricatives (27 points better than
chance, which is 33%). Correct place assignment rose to 59% for the voiceless
fricatives, which is more than twice as good as chance (= 25%). For both the voiced
and the voiceless fricatives, place assignment based on COG and spectral SD is quite
reasonable, between 64 and 77% correct, with one notable exception: the dental place
of articulation was poorly recognized by the LDA (40 % correct or less). The dental
fricatives, /6, &/ present an obvious problem for the Sudanese-Arabic EFL speakers.
These fricatives are most systematically but asymmetrically confused by the LDA with
labials, showing that the dental tokens are included in the scatter cloud of the labials but
not vice versa. 24
Table 7.1 Observed versus predicted place of articulation, based on Linear Discriminant Analysis
with COG and spectral SD as predictors for voiced and voiceless English fricatives spoken by
Sudanese-Arabic learners. Correct predictions in bold face.
Place Predicted place of articulation

N
labial dental alveolar post-alveolar
labial 70.0 25.0 5.0 20
Voiced
dental 40.0 40.0 20.0 20

alveolar 13.6 18.2 68.2 22
labial 68.2 27.3 4.5 .0 22
Voiceless
dental 54.5 22.7 13.6 9.1 22

alveolar .0 13.6 77.3 9.1 22
post-alveolar 10.5 15.8 5.3 68.4 19
Finally, I performed an LDA on the discrimination of voiced versus voiceless

counterparts, including only labial, dental and alveolar places of articulation. The LDA
was done for all data and for onset versus coda position separately. As before, the LDA
was performed using COG and spectral SD as predictors after intra-individual z-
normalisation. The results of this LDA can be seen in Table 7.2.
24 Previous findings showed that overlapping exists between alveolar and dental fricatives of the
Sudanese-Arabic dialect that suggests a sort of retraction or merger between dental, alveolar and
palato-alveolar sounds (see Dickins 2007, Watson 2002). This retraction forms a major cause of
intelligibility problems, and impedes a precise articulation of fricatives (Cruttenden 2008, Raphael,
Borden and Harris 2003).
Table 7.2 Observed versus predicted voicedness versus voicelessness, based on Linear Discrimi-
nant Analysis with COG and spectral SD as predictors for English onset and coda fricatives
spoken by Sudanese-Arabic learners. Correct predictions in bold face.
Voicing Predicted voicing

N
voiced voiceless
All voiced 57.6 42.4 85
voiceless 19.4 80.6 62
Onset voiced 60.5 39.5 43
Coda voiced 54.8 45.2 42
Table 7.2 shows that, overall, voiced versus voiceless fricatives are poorly discriminated
in terms of COG and Spectral SD. When the results are lumped together across onset
and coda positions, mean correct assignment of the voicing feature is 67%, which is
only 17 points better than chance. Performance of the algorithm did not improve
noticeably when I performed separate analyses for onset and coda positions, with mean
percentages correct voicing assignment of 69% and 66%. The results reveal a bias
favouring voiceless decisions, indicating that voiced fricatives have greater overlap with
their voiceless counterparts than vice versa.
The automatic determination of voicing is the only possible comparison that can be
made with published data on English speakers. Maniwa et al. (2009) mention an overall
percentage of correctly assigned voicing of 95. Taking this information into account, I
may conclude that the voiced-voiceless distinction is insufficiently well coded in the
COG and spectral SD properties of English fricatives as pronounced by Sudanese-
Arabic learners of English.
7.5.9 Conclusions
The acoustic analysis of temporal and intensity measures of English consonants which
were produced by Sudanese EFL learners, permit the following conclusions:
The Sudanese EFL learners tend to apply Voice Onset Time (VOT) trends that differ
from those of the native speakers of English. The EFL voiced and voiceless plosives
fall in the short-lag range of the native English continuum, most likely due to L1
influence. Therefore, the learners need to enhance their VOT strategies in order to
produce a correct voicing contrast.
English dental and alveolar fricatives /&, 6, U, \/, labiodentals /H, X/ and /5, </ have
Centre of gravity (COG) values which are closer to one another than in the native
English reference data. Moreover, coda affricates show unstable patterns of duration
(i.e. they tend to be longer) for both the consonants themselves and for the vowels
preceding them.
Although duration values of the preceding vowel show the same (relative) ordering
along the acoustic continuum that is found in the English reference data, the durations
are unstable in comparison to native English realisations. Unstable durations most likely
occurred as a result of categorical differences between the Sudanese learners’ L1
(Arabic) and English. Similarly, consonant duration values tend to be twice as long as
those of L1 English. Most probably, longer durations in the learners’ production are
due to the lack of their knowledge of English consonants and their slow speaking rate.
The Centre of Gravity data reveal relative correspondence to the native English
patterning. Correspondence takes place because the Sudanese data of the sibilant
fricatives appear with spectral peaks at relatively higher frequencies than non-sibilants.
This correspondence occurs probably because Arabic has many consonants that
resemble those of English. However, the COG values of the native speakers’ fricatives
are higher than those of the Sudanese EFL learners. This is probably because the
learners are not skilful enough at producing precise English fricatives due to insufficient
practice or partial learning.
Chapter Eight
Acoustic analysis of
English consonant clusters
8.1 Introduction
This chapter focuses on the production problems of English consonant clusters that
are experienced by Sudanese university EFL learners. It attempts to provide acoustic
accounts for acoustic problems with the English consonant clusters that were produced
by such learners. Cross-linguistic studies paid much attention to the acquisition of
English singleton consonants. However, relatively little investigation has been done on
the production of consonant cluster problems among EFL learners. Initial and coda
consonant clusters occur in a large number of English vocabulary items, which suggests
the necessity of further effort on the part of Sudanese EFL learners in the production
of consonant clusters. More importantly, research revealed that incorrect perception
and production of English consonant clusters of two or three segments such as /VT, RT,
URN/ result in intelligibility problems of many second language learners (see Altenberg
2005, McLeod and Arciuli 2009). More specifically, Sudanese university EFL learners
arguably have pronunciation problems of such types of English sounds with words
beginning and/or ending with clusters like: flow, clock, special, twelve, glass, string, proper,
ground, etc. A process of vowel epenthesis often occurs before these clusters (e.g. spell
becomes ispell or espell) or between the cluster members where flow becomes (‘>’) filow,
glass > gilass, cream > kiream, and text > tekist, etc. (Mohamed 2005, Patil 2006). An
insertion of /+/ between the members of English onset obstruent clusters /U + {V, R, M, N,
Y, P, O}/ as such is intended to facilitate producing cluster consonants of English. In
general, Arabic syllable structure does not permit consonant clusters of two segments
such as /RN, RT, IT, UR, 6Y/ or three-segment clusters like /URT, UMT, UVT, URN/, etc., nor does
it allow them in coda position. Similarly, Sudanese Arabic (SA) allows only CV, CVC
and CVV syllables, but complex syllables such as those yielded by English onset and
coda consonant clusters as e.g. in split, twelfths, bursts and glimpsed are forbidden in SA
(Broselow 1984, Kaye 1997, Mohamed 2005, Raimy 1997). Production problems of
English consonant clusters occur due to different constraints on word syllabification
that exist in English and in Arabic. Studies on second-language acquisition attribute
problems with consonant clusters to motoric output constraints that are based on
permissible types of syllables in the first language (Carlisle 2001). These constraints
result in epenthetic vowels among many Spanish speakers of English as a repair strategy
(Altenberg 2005, see also Davidson 2006). Other studies on English consonant clusters
refer the inaccuracy of production to incorrect acoustic cues used by second-language
learners. This study investigates the learning problems with English consonant clusters
in an experimental approach aiming to find an empirical account for the causes of such
problems. I argue that the acoustic analysis of the durational properties of the English
consonant clusters produced by Sudanese EFL learners will reveal a variation of

acoustic differences that might yield effective insight into the issue at hand.
8.2 Objective
This section aims to examine the production errors in English consonant clusters made
by Sudanese EFL learners. The investigation attempts to derive acoustic accounts on
the basis of duration differences that may exist between the target and Sudanese EFL
leaners’ clusters.
8.3 Participants
8.3.1 Sudanese EFL learners
Eleven male Sudanese-Arabic speakers were recruited primarily from the student
population of the Department of English at Gadarif University. The total number of
the student population was 22. These students specialized in English as a foreign
language. They were all semi-final students who had reached a considerable level of
English proficiency. In general, they used English inside the classroom and in other
academic activities such as debates, discussions, etc. However, only 11 students were
selected for the experiment. Only this subject of participants speak Arabic as their
mother-tongue, whilst the others speak Arabic as a second language.
8.3.2 Native speakers of RP English
Two native speakers (one male, one female) of RP English served as the control
speakers in this study. The EFL speakers’ production will be compared with the
properties of the control speakers’ tokens.
8.4 Methods
8.4.1 Material
A number of seventeen onset and coda cluster items were chosen as the stimulus
material for this study. These cluster consonants form problem areas for Sudanese
learners of English. All words are meaningful; non-existent words were not used in the
experiment. The pairs were varied according to certain factors observed in literature on
production errors in English onset and coda clusters challenging Arabic native speakers
learning English (Patil 2006, Altaha 1995). The distribution was as follows:
CHAPTER EIGHT: ACOUSTIC ANALYSIS OF ENGLISH CONSONANT CLUSTERS 161
• Onset clusters of plosive + liquid consonants (4).

• Onset clusters of fricative + plosive + liquid (2).
• Onset cluster of fricative + liquid (2).
• Coda clusters of plosive+ fricative, fricative + plosives, nasal + plosives (9).
The set of 17 clusters was almost evenly distributed between onset (eight items) and
coda positions (nine items). For a full list of words included in the experiment see
Appendix 3.3.
8.4.2 Test battery
Recordings were made on a laptop computer using Adobe Audition. In individual

sessions, the subjects (11 Sudanese EFL learners and two model speaker of RP English)
were individually seated in a quiet room with their lips a few centimetres away from a
head-mounted close-talking swan-neck Sennheiser HSP4 microphone. They were asked
to read a list of monosyllabic English words, which included the 17 target English
consonant clusters. These words were embedded in a fixed carrier sentence
(Say …again). The carrier sentences were intended to help the subjects speak at a
constant rate. Moreover, keywords were provided in the list along with the target words
as a guideline to help learners achieve a correct pronunciation (see Appendix 3.3). The
subjects were encouraged to give their best possible production of the words. If the
experimenter suspected that an error in the production was simply a reading error,
rather than a genuine indication of the subject’s inability to pronounce a certain word,
the subject was asked to repeat the word. The recorded materials were then submitted
to acoustic analysis using the Praat speech processing software.
8.4.3 Praat
For speech analysis, the Praat speech processing programme was used. Praat is an open
software tool, which is used for speech-signal editing and labelling, as well as for
various acoustic (spectral, formant and duration) analyses and manipulations (Boersma
and Weenink 1996). It has other advantages of being easily modified for specific
research purposes and the results can be exported to Excel-compatible spreadsheets.
8.5 Results of cluster production
In this section I present the acoustic results of the English consonant clusters produced
by Sudanese EFL university learners, in both onset and coda positions. There are two
sections in this part arranged according to cluster position.
This section describes the measured durational properties of the English onset and
coda consonant clusters, which were read by both Sudanese EFL learners and native
speakers of RP English. Measurement aimed at testing the production problems with
the learners’ different durational properties of the English clusters targeted. In greater
detail, these included all the cluster members of the target items of the test list; the first
(C1) , second (C2) and the third (C3) consonant cluster members, in both initial and coda
positions. Plosives were split up into two subphones, viz. a silent interval (‘si’) reflecting
the closure duration and a second portion (called ‘rest’) containing the release noise
burst. For the sake of data processing, however, all consonant types were split up into
si- and rest-components, where the si-component was set to zero when the consonant
was not a plosive. Accordingly, in Figures 8.2, 8.3, 8.5 and 8.6, you will find keywords
such as C1—si, which stands for the silent interval of first consonant cluster member.
C1—rest stands for the noise burst duration when the first cluster member is a plosive.
Similarly, C2—si represents the silence duration of the plosive, but this time of the
second consonant cluster member /R, V, M/. In this case, C2—rest indicates the noise
burst of such a plosive. Finally, C3—rest stands for the third consonant cluster member;
in the data these are usually /N/ or /T/. The spectrogram, in Figure 8.1, provides an
illustration of the durational components. Notice that the same legend is used for the
coda clusters, where C1—rest refers to the first cluster members /P, 0/ or /N/.
Figure 8.1 illustration of the durational properties of the English onset and coda consonant
clusters. It shows the positions of durational property of consonant cluster in the spectrogram
using keywords such as C1—si, C2—si, C1—rest, C2—rest, C3—rest (further see text).
8.5.1 Onset clusters
Figures 8.2-3 show the mean duration of English initial consonant clusters produced by
the Sudanese EFL learners and of the native speakers of RP English. T-tests were used
to determine the statistical significance of the duration difference per test component
between the two speaker groups. However, the results show no significant differences
t(11) = –.299 (p = .771, two-tailed) for the silent interval of the onset plosive C1—si,
t(1.038) = 2.0 (p = .293, two tailed) for C1—rest, t(1.003) = .802 (p = .569, two-tailed)
for C2—si, t(1.218) = 1.2) (p = .435, two-tailed) for C2—rest and t(11) = –.1 (p = .955,
two-tailed) for C3—rest.
Figure 8.2 Mean duration (ms) of nine English initial consonant clusters (plosives to the left,
fricative clusters to the right). Components of the clusters are shown as separate bars (further see
text).
Figure 8.3 Mean duration (ms) of nine English initial consonant clusters (plosives to the left,
fricative clusters to the right). Components of the clusters are shown as separate bars (further see
text).
As Figures 8.2-3 show, there is a variation of differences between the mean duration
values of the English onset consonant clusters produced by Sudanese and native
speakers of English. The English plosive/liquid clusters /RN, MN, FT/ produced by the
EFL speakers tend to have longer silence durations than those of the native speakers;
mean cluster durations are 164, 117 and 103 ms, against 138, 72 and 97 ms, respectively.
However, the cluster /IN/ shows a lower duration (102 ms) compared to that of the RP
speakers (124 ms). Moreover, clusters starting with voiceless stops, like /RN, MN/, have
even longer silence duration, but shorter silent intervals are observed when such
clusters contain /IN, FT/. This is in contrast with their counterparts in the RP native
speakers’ tokens, which do not show this pattern of duration distribution. Similarly, the
stop+liquid clusters produced by Sudanese EFL learners showed longer noise burst
values compared with those of the native speakers: these are 30, 29, 49 and 25 ms
against 16, 21, 12 and 29 ms, respectively. Moreover, it was observed that the alveolar
voiceless fricative C1 /U/ in English initial consonant clusters /UN, URN, URT, UV, UY/ was
produced with shorter friction duration than those of the native English speakers.
Mean durations of /U/ in each of such clusters are 114, 97, 94, 111 and 141 ms for the
Sudanese EFL learners and 212, 184, 172, 228 and 242 ms for the native speakers.
More interestingly, in contrast to the results of the native speakers, the C2 of the
Sudanese EFL learners following the English fricative /U/ in /URN, URT/ manifested
longer duration values: 111, 110 and 98 ms against 73, 84 and 57 ms for the native
speakers, respectively. The acoustic analysis of the English clusters of Sudanese EFL
learners indicates no vowel epenthesis which might have occurred in initial English
consonant clusters such as /UN, URT, URN, UV, UY/ nor between the two cluster members e.g.
initial /RN, IN, MN, FT/ and coda /DF/, in contradistinction to what has been suggested in
the literature. More or les unexpectedly, the results show that the stops following the
fricative /U/ which were produced by Sudanese EFL learners, have stronger aspiration
than those of the native RP speakers (see also Figure 8.4). These finding suggest the
existence of production problems with initial plosive+liquid clusters and codas.
Figure 8.4 Duration (s) of components of onset clusters beginning with fricatives for English
native speakers and for Sudanese learners of English.
Notice that the overall duration of the fricative clusters is much longer for the native
speakers (317 ms) than for the Sudanese learners (259 ms). It is not the case, however,
that all consonant clusters produced by the EFL speakers are shorter since the plosive
clusters of the learners were about equal in duration to those of the native speakers. In
addition to the shorter duration of the fricative clusters (possibly indicating incomplete
or sloppy articulation), the internal division of the component durations differs
considerably between the foreign and native tokens. In the EFL tokens, the fricative
lasts about as long as the rest of the cluster, whilst the /s/ in the native clusters is about
twice as long as the rest of the cluster.
8.5.2 Coda clusters
Figures 8.5-6 show mean duration values of English coda consonant clusters produced
by the Sudanese EFL learners and by the native speakers of RP English. A t-test is used
to determine the difference in duration values per test component between the two
speaker groups. However, the results show no significant differences, t(11) = .064 for C
1— si, t(11) = .983 for C 1— rest, t(11) = .009 for C 2— si, and t(11) = .213 for C 2— rest.
Figure 8.5 Mean duration values (ms) of nine English coda consonant clusters (plosives to the
left, other consonant clusters to the right). (Sub)components of the clusters (Consonant 1,
Consonant 2 and Consonant 3 are shown as separate bars (further see text).
Figure 8.6 Mean duration (ms) of nine English coda consonant clusters (plosives to the left,
other consonant clusters to the right). (Sub)components of the clusters (Consonant 1, Consonant
2 and Consonant 3 are shown as separate bars (further see text).
Results in Figures 8.5-6 show that Sudanese EFL learners tend to produce inaccurate
English coda consonant clusters. This is observed in the production of several cluster
consonants. The production of the English coda cluster /DF/ implies inaccuracy of
acoustic cue implementation compared with those of the native speakers in Figure 8.4.
First, whilst the native speakers tend to make a longer silence duration (177 ms),
Sudanese EFL learners tend to make a shorter duration (118 ms). Second, /DF/
production revealed that the speaking style of the Sudanese learners differs from that of
the native speakers. The findings do not show much difference in terms of cluster types,
which suggests that similar production strategies are used irrespective of cluster type.
8.6 Discussion and conclusions
Findings based on the production of English consonant clusters support the hypothesis
that Sudanese native speakers of Arabic have difficulty with English consonant clusters.
The reversed aspiration process in plosives /R/ and /V/ preceded by /U/ in initial
English consonant clusters such as /URT, URN, UV/ read by Sudanese EFL learners (see
Figures 8.2-3-4) is most likely due to phonological differences which exist between
English and the learners’ L1. As the data of the native RP speakers shows, in English
the voiceless /R, V, M/ are aspirated at the beginning of a (stressed) syllable but remain
unaspirated when in final position or when preceded by tautosyllabic /U/ (e.g. Spencer
1996). On the other hand, the Sudanese EFL learners’ plosives /R, V/ following the
fricative /U/ are strongly aspirated, whilst /U/ itself has a weak and short frication (see
Figure 8.4). As related research showed, the L1 stress system is responsible for such a
type of problem since aspiration is dependent on stress (Spencer 1996). In Sudanese
Arabic (and essentially in traditional Arabic syllable structure) there are certain prefixes
such as in in-ka ¥tal ‘was killed’, in-ka ¥sara ‘was broken’ and in -tab ¥dala ‘exchanged’, etc.,
where stress (indicated by ‘¥’) lodges on the second syllable, so that the vowel in the first
syllable escapes stress (Kenstowicz 1994). Thus, due to interference of this L1 syllable-
structure rule, Sudanese EFL learners’ plosives have strong aspiration, while /U/ has a
weak friction. Therefore, it is probably this point of syllable structure that accounts for
the contrast above. Moreover, this view would also account for the absence of vowel
epenthesis in my results as a repair strategy for Arabic EFL speakers, which view
dominates the previous literature. Note, once again, that the acoustic analysis provided
in the present chapter has not indicated any phonetic properties suggesting vowel
epenthesis. 25 That is, there is no epenthetic vowel in the English initial fricative+plosive
clusters of the learners, but an incorrect articulation of initial English fricative+plosive
clusters with a weak friction of English /U/ followed by a strong aspiration of /R, V, M/.
The findings also imply that the occurrence of vowel epenthesis in the production of
English clusters, which is hypothesized to be due to L1 transfer, can be reduced by
different factors such as the learners’ knowledge, practice, modification, etc., of English
consonant clusters. Similar results were reported by related research in which post-test
results revealed that Arab EFL learners produced lower error rates in English
consonant clusters compared to their pre-test findings where they showed low and less
accurate performance. The results show that the EFL learners managed to explore the
phonotactic constraints of English better after exposure to a small amount of training,
which provides a shortcut to using English constraints. This means that teaching
phonotactics guided the learners’ attention to the presence of such cues. As non-native
learners transfer their L1 phonotactic constraints to English, phonotactics should
represents an important part of L2 ear training and pronunciation programs. This
conclusion indicates that appropriate practice would help EFL learners to perceive and
pronounce without epenthetic vowels, the legal English consonant clusters that are
illegal in their native language. Moreover, training might play a role in limiting L1
transfer in auditory processing (Al-jasser 2008). 26
25 It is hypothesized that if a schwa occurs between two cluster members, the vocal tract between
the constrictions of the two consonants will be sufficiently open for a vowel to be perceived.
Moreover, the tongue shapes arise if the output of the phonology for CC word-initial clusters
does not actually include a schwa gesture with its own target (Davidson et al. 2004).
26 Actually, vowel epenthesis has been observed in the English of native Arabic speakers. Studies
demonstrated that syllable structure changes by syllable preference laws where, if a change
compromises syllable structure, it is not a syllable change but a change of some other parameters
that may affect syllable structure. Observations made in the relationship between the members of
consonant clusters reveal that closed syllables trigger errors among speakers who come from
languages lacking such types of syllables. Most closed syllables targeted are modified by the L2
speaker through epenthesis or deletion of vowels, due to L1 transfer. This appears among many
The influence of English spelling can often add to incorrect production of English
consonant clusters. While the English spelling system is complex, colloquial Sudanese
Arabic has a simple phonetic spelling system, which follows a direct letter-sound-
correspondence. Therefore, to pronounce words phonetically, my learners get serious
intelligibility problems due to spelling differences (see also Mohamed 2005, Patil 2006).
Learners tend to experience less serious problems with the production of

plosive+liquid and nasal+liquid clusters, probably Sudanese EFL learners are more
acquainted with this pattern of English clusters. However, the production of the coda
clusters /DF, PV, VU/ by the EFL learners, points to some kind of low accuracy of English
clusters or rather failure of speakers to adequately overlap the consonant gestures. More
importantly, the incorrect data of the acoustic correlates of /DF/ suggest some kind of
difficulties which face the learners in the production of the English word final
morpheme -ed, in words like fibbed, clicked, looked, etc. This phonological phenomenon is
wide-spread among EFL learners whose L1 does not permit word-final clusters.
These errors are attributable to the unfamiliarity of the learners with English consonant
clusters or to their slow speaking style. To avoid such problems, learners have to
improve their abilities to produce English cluster consonants. They need to be mentally
prepared for a major shift in articulation. This requires that both instructors and
learners must be cognitively aware of the existence of clusters as complex consonantal
entities, which necessitates additional perceptual effort and conscious articulatory focus.
native Arabic speaker groups. However, when learners have had considerable exposure to
English, this phenomenon diminishes (Carlisle 2001).
Chapter Nine
Intelligibility assessment:
written questionnaires
9.1 Introduction
This chapter used a written questionnaire that asked informants overt questions about
their speech intelligibility problems focusing on pronunciation and perception abilities
that represent major components of intelligibility. The questionnaire invited both
Sudanese University EFL learners and their teachers to delineate these problems giving
details about their nature, causes and the contribution of the courses taught, and so on.
Admittedly, it seems impractical to use a written questionnaire as an instrument of data

collection, asking respondents to report on their actual pronunciation or perception of
a language. The researchers face a number of practical problems when they approach
testing pronunciation by means of writing. For example, they claim that it is not
possible to measure aspects of language that include features like sounds, syllables,
words and connected speech, all of which involve both understanding and speaking, by
asking learners to write answers. Nevertheless, a number of language studies have
addressed matters such as phonetics, phonology and speech problems, across languages,
by means of written questionnaires, which proved to be effective instruments of data
collection. Some aspects of language knowledge are only available to introspection,
which can best be investigated by written questionnaires (Labov 1966, Martinet 1945,
Wells 1999). The use of a written questionnaire as a survey of pronunciation
preferences, for example, has been found to be an effective research tool which permits
the researcher to explore data from a spoken corpus. Additionally, many informants
have strong views about certain investigative matters, but they can only express their
impressions successfully by means of written questionnaires. Furthermore, the
information gathered may help to establish priorities for future work and a series of
tests can provide a sense of achievement that can be motivating for both the students
and their teachers. In this study, written questionnaires were used to provide data that
help predict/infer the effectiveness of the courses taught and the extent to which the
lack of explicit language knowledge and the involvement of L1 rules may lead to
intelligibility problems among Sudanese EFL learners. In other words, they seek a reply
to issues such as how intelligible these subjects are to native English speakers (i.e.
distinguishing vowels, single or cluster consonants) and what the causes of intelligibility
problems are.
Moreover, written questionnaires are effective techniques of data collection because

they give candidates freedom and adequate time to describe what they feel towards a
specific linguistic phenomenon.
9.2 Objective
The questionnaires in this study aim to provide feedback about the speech intelligibility
problems of Sudanese University EFL learners. The feedback comes in the form of
impressions and judgments provided by both EFL learners and teachers. Part of the
questionnaire data is also intended to provide background information about speech
intelligibility problems, contributing in this way to the literature.
Moreover, the assessment of the data acquired from these informants may yield insights
which probably support or refute the conclusions arrived at by other speech
intelligibility measurements adopted in the study.
Furthermore, part of the information provided by the questionnaire, in the form of

opinions, justifications, or explanations, can be added to the literature as questionnaire
data. Thus, questionnaires work as a further potential source of information, which may
be compared with other sources of knowledge.
9.3 Subjects
Data was collected from twenty respondents including a number of ten Sudanese
university EFL learners preparing for their bachelor degree at Gadarif University and
ten school and university teachers of English. Because of resource constraints, the
respondents were sampled purposively (Trochin 2006, Reimer 2008). This approach of
sampling corresponds to statistical tables for the estimation of the sampling error.27
27
Sampling accuracy refers to the measurement of variance that occurs around the estimated
statistics treating a given sample of population. Sample accuracy is important where statistical
tables can show the degree of precision (sampling errors) that is obtainable for samples of
different sizes (e.g. the sampling error of sample size between 10 or 90 equals 3.0%). Interestingly,
we can obtain accurate results with a small sample size. The achievement of accuracy depends on
how the sample chosen is a truly representative of the population. When invalid populations are
used, erroneous predictions occur. Moreover, sample sizes should be determined by theoretical
requirements like the precision of the sample operation and ultimately constraints of time and
cost (McCollough and Van Atta 1963).
CHAPTER NINE: INTELLIGIBILITY ASSESSMENT, WRITTEN QUESTIONNAIRES 173
9.4 The construction of the student and teacher questionnaires
9.4.1 Test content
The content of this test included sample behaviour of the syllabus taught. These were
basic principles of English phonetics and phonology, and perception and production of
English speech sounds. The content included basic English phonology principles such
as phonemes, allophones and acoustic cues, accompanied by practice activities. The
content of the teacher questionnaires included items such as principles of English
pronunciation, perception and production matters and intelligibility problems. It also
covered areas such as structure of the curriculum and the teaching methods of English
and the students’ performance.
9.4.2 Format and structure
Questionnaires in this research (shown in Appendices 9.1a-b) were constructed on the

basis of the related literature on EFL learners (Duan and Gu 2004, Smith 1992, Van
den Doel 2006). There were two types of questionnaires distributed to both the learners
and teachers of English at the Sudanese educational institutes. Each questionnaire
comprised four sections. In both the student and teacher questionnaires, section one
started with preliminary questions that asked the subjects to provide information about
the efficacy of the English phonetics and phonology courses taught and the sort of
problems which were expected to obstruct the speech intelligibility process. Multiple-
choice questions were also raised in sections two and three to test specific issues such
as the level of difficulty experienced in the perception or production of the vowels,
single or cluster consonants of English. In this facet, the subjects were asked to rate the
frequency (a: never, b: rarely, c: often, d: frequently, e: always/permanently) or quality (a:
weak, b: fair, c: good, d: excellent) that they thought would precisely indicate their level
of performance. The students were asked to perform tasks such as match, complete,
underline, etc. In the teachers’ test, the teachers were asked to express their viewpoints
on similar tasks depending on their experience in language teaching. Finally, the
questionnaires were concluded by open-ended questions which sought feedback about
the impact of the lack of L2 phonological knowledge and the interference of the
learners’ L1. Instead of the term ‘phonology’ the word ‘pronunciation’ was often used
in the questionnaire as a synonym.
Although the reliability issue applies mostly to research results and conclusions, I
considered it desirable at the time of the questionnaire design to have reliable (accurate)
tests from an earlier stage and to avoid running the risk of missing data on any relevant
research question. Usually, the reliability of the data is determined according to the
frequency of choices. The more often the item is chosen from among the options given,
the more reliable it is. This is because the more agreement of data sources on a
particular issue, the more reliable the interpretation of the data.
In the data display, choices construe the total means and standard deviation of the
performance of the subjects in each item. The data display of the items of the two
questionnaires are arranged into three groups of tables, in terms of their domain, i.e. (i)
general matters, (ii) the perception and (iii) the production of English speech sounds,
respectively. This arrangement achieves clarity in data presentation and makes it easier
for comparison.
9.4.3 Test procedure/apparatus
The subjects were asked to write down their answers on the right place, tick, cross or
account for matters raised in the test. Some students had difficulty in replying to the
test items; e.g., they misunderstood the questions or provided inappropriate answers.
Therefore, I helped them continue performance by translating the test item into Arabic.
Translation and elucidation of test items took place inside the classroom for all subjects
so that the students share an equal understanding of the test items. On the other hand,
teachers were well aware of the topic of the questionnaire and they provided useful
information. Most of them found completing the questionnaire both demanding and
useful, yet it took some of them up to three months to hand in the answer sheets.
9.5 Scoring procedure
This section describes the scoring procedures, which were applied to the questionnaire
data of both the students and teachers. There are marks for each test item, which are
assigned by a number of grade descriptors, e.g., good, weak, etc. The concepts of these
grades are either frequency (often, rarely, etc.) or quality (weak, good, etc.) grades. In
more detail, marks were assigned by figures that range between 5 and 1 where 5
represents the highest mark, while 1 represents the lowest mark. Thus, the grades are
interpreted as follows: (i) in the case of quality grades, 5 equals A [full mark/excellent],
4 equals B [very good], from 2.5 to 3 equals C [good] and marks from 1, 2 till 2.4 equal
D/E [weak/not].
On the other hand, (ii) the occurrence or absence of a language phenomenon such as
an error, problem or difficulty mostly deals with frequency grades, and is scored as
follows: Grade 5 equals A [permanently], which means that this phenomenon always
occurs. Grade 4 equals B [frequently], which means this phenomenon often occurs.
Grades from 2.5 to 3 equal C [neutral with respect to frequency] and marks from 2.4
until 2.0 equal D and 1 equals E, which latter two grades are interpreted as [rarely] and
[none], respectively. Notice that in all cases, the tables below present the results in in
terms of scale values highlighting the most frequent responses.
9.6 Overall results
9.6.1 Results of the student questionnaire
I will now present the results of the questionnaire obtained from the students. I will
first deal with the items that ask about general matters, then deal with questions relating
to perception problems and finish with the items that ask about production problems.
Since the sample of respondents is fairly small, it is important to ascertain that the
respondents show at least a reasonable agreement amongst themselves. Agreement is
expressed in terms of the reliability coefficient called Cronbach’s alpha. The coefficient
computed for the ten respondents was D = 0.860, which shows that the level of
agreement was good.
9.6.1.1 General Matters
This section will present the results of the students’ level of intelligibility and their
impressions of the courses taught.
Table 9.1 presents the assessment of the students of intelligibility and the courses
taught components. The table shows the distribution of the four responses per item
over the five scale values, as well as the mean and standard deviation of the scale values.
Table 9.1 Student responses to the four questionnaire items that pertain to general matters. The
table shows the distribution of the four responses per item over the five scale values, as well as
the mean and standard deviation of the scale values. The modal (most frequent) response
category is highlighted in the table. See appendix 9.1a for a verbatim copy of the questionnaire
items.
Scale value
No. Item
1 2 3 4 5 mean SD
1.1 Understand spoken English 2 0 8 0 0 2.6 0.84
1.2 Native speaker understand you 1 3 4 2 0 2.7 0.94
1.3 Practical & interesting courses 0 1 6 0 3 3.5 1.08
1.4 Relevant & authentic courses 0 1 5 1 3 3.6 1.07
Generally, the results of the questionnaire in Table 9.1 show that the Sudanese
university learners of English have difficulty in identifying English speech sounds. The
subjects claim to habitually face intelligibility problems; however, they show a positive
impression about the phonology and phonetic courses to be learnt.
9.6.1.2 Perception of English speech sounds
The section below, will present the students’ results for the extent of difficulty and
success that the learners experience in the perception of the English phonemes.
Table 9.2 presents the responses to the ten questionnaire items that pertain to speech
sound perception. It shows the distribution of the ten responses per item over the five
scale values, as well as the mean and standard deviation of the scale values. The modal
(most frequent) response category is highlighted in the table. For tabulation purpose,
some items in the table are identified by only one word; see Appendices 9.1a-b for the
verbatim text of each questionnaire item.
Table 9.2 Distribution of responses of the students in the written survey about the intelligibility
problems they experience. The table shows the distribution of the four responses per item over
the five scale values, as well as the mean and standard deviation of the scale values. The modal
(most frequent) response category is highlighted in the table. See appendix 9.1a for a verbatim
copy of the questionnaire items.
Scale value
No. Item
1 2 3 4 5 mean SD
1.3.1 Successful in perceiving E. consonants 0 1 9 0 0 2.9 0.31
2.3.2 Plosives 0 2 6 2 0 3.0 0.66
3.3.3 Fricatives 0 4 5 1 0 2.7 0.67
4.3.4 Nasals 0 4 4 2 0 2.8 0.78
5.3.4 Approximants 1 1 6 2 0 2.9 0.87
6.3.6 Difficulty with final clusters 0 1 9 0 0 2.9 0.31
7.3.7 Difficulty with initial clusters 1 2 5 2 0 2.8 0.91
8.3.7 Difficulty to distinguish short vowels 0 1 7 2 0 3.1 0.56
9.3.7 Difficulty to distinguish long vowels 0 6 3 0 1 2.6 0.96
10.3.7 Difficulty to distinguish diphthongs 7 3 0 0 0 1.3 0.48
The results in Table 9.2 show that my students have difficulty recognizing short vowels,
long vowels and diphthongs. Moreover, the short vowels and diphthongs are more
problematic than the short vowels. More importantly, the results reveal that the
subjects do not report serious problems in the perception of English consonants,
although fricatives and nasals were often found to be a bit difficult. The performance
on initial cluster consonants is less problematic than on final clusters. Thus, these
results indicate that the English consonants are more intelligible to Sudanese listeners
of English than the vowels and consonant clusters.
9.6.1.3 Production of English speech sounds
In this section, I will present the results of the learners’ production of English speech
sounds. These cover the learners’ ability of correct L2 sounds production and level of
intelligibility.
Table 9.3 presents the types of problems the students experience in producing the
English speech sounds. It also gives background information on the level of success
these students think they achieved in learning English speech sounds and the effect of
their L1. Table 9.3 shows the distribution of the ten responses per item over the five
scale values, as well as the mean and standard deviation of the scale values. The modal
(most frequent) response category is highlighted in the table.
Table 9.3 Student responses to the questionnaire items that pertain to speech sound production.
The table shows the distribution of the four responses per item over the five scale values, as well
as the mean and standard deviation of the scale values. The modal (most frequent) response
category is highlighted in the table. See appendix 9.1a for a verbatim copy of the questionnaire
items.
Scale value
No. Item
1 2 3 4 5 mean SD
5.2.5 Problems with. pronunciation 0 4 0 6 0 3.2 1.03
6.2.6 Difficulty experienced with E. sounds 0 1 9 0 0 2.8 0.33
1.4.1 How successful in producing E. cons 0 0 10 0 0 3.0 0.00
2.4.2 Successful in producing E. plosives 2 0 7 1 0 2.7 0.94
2.4.3 Successful in producing E. fricatives 1 1 5 3 0 3.0 0.94
2.4.4 Successful in producing E. nasals 1 4 5 0 0 2.4 0.69
2.4.5 Difficulty in producing E. approximants 3 1 5 1 0 2.4 1.77
3.4.3 Difficulty in producing E. plosives 1 2 4 3 0 2.9 0.99
4.4.4 Difficulty in producing final clusters 0 2 6 2 0 3.0 0.66
5.4.5 Difficulty in producing short vowels 2 4 4 0 0 2.2 0.78
6.4.6 Difficulty in producing long vowels 4 4 2 0 0 1.8 0.78
7.4.7 Difficulty in producing diphthongs 7 3 0 0 0 1.3 0.48
9.4.9 Difficulty to pronounce cloth, rich, chair, etc. 0 2 6 2 0 3.3 0.82
10.4.10 Difficulty to pronounce words with silent letters 0 1 6 2 1 2.8 1.22
11.4.11 Difficulty to pronounce here, there, three, final /r/ 0 0 4 0 6 4.2 1.03
12.4.12 Words ending in -ary, -ory, -able 2 1 4 1 2 3.0 1.41
13.4.13 Learning E. pronunciation improves intelligibility 0 1 2 2 5 5.0 0.00
8.4.8 Mother-tongue affects E. pronunciation 2 0 7 0 1 2.6 0.84
According to the results in Table 9.3, English pronunciation forms a permanent

problem for Sudanese university EFL learners. The English vowels are reported as the
most problematic area. The subjects claim to make more errors in producing the
English diphthongs and long vowels, but make fewer errors in producing the short
vowels. On the other hand, the subjects concerned think that their pronunciation of
English single and cluster consonants is more correct. Moreover, they claim to make
more pronunciation errors in final clusters than in initial clusters. This finding supports
related literature, which reported that more English pronunciation errors are detectable
in the coda clusters (Patil 2002). In the practical tasks of the fricatives, nasals, stops and
words with mute sounds, etc., the students report no difficulties. However, they admit
that lack of awareness of the L2 pronunciation norms and L1 interference have an
influence on their intelligibility in English.
9.6.2 Results of teacher questionnaires
9.6.2.1 General Matters
A reliability analysis for the teachers’ responses to the questionnaire was done first.
Cronbach’s alpha was computed as before but turns out to be rather low in the case of
the teacher responses, D = 0.553, which shows that the level of agreement was poor to
moderate at best. Closer inspection of the reliability data reveals that one respondent
correlated negatively with each of the other nine teachers. Therefore, I eliminated the
single contradictory respondent and recomputed alpha, which then rose to D = 0.616,
which is at least a moderate reliability.
I will now present the results of the questionnaire obtained from the teachers. I will
first deal with the items that ask about general matters, then deal with questions relating
to perception problems and finish with the items that ask about production problems.
Table 9.4 presents the responses about the courses, learning strategies and intelligibility
components of the survey, which was conducted through teacher assessment of student
performance.
Table 9.4 Distribution of teacher responses to the four questionnaire items that pertain to general
matters. The table lists the responses per item over the five scale values, as well as the mean and
standard deviation of the scale values. The modal (most frequent) response categories are
highlighted in the table. See appendix 9.1b for a verbatim copy of the questionnaire items.
Scale value
No. Item
1 2 3 4 5 mean SD
1.1 How intelligible are the students? 4 3 2 1 0 2.0 1.05
1.2 Is intelligibility pronunciation-related? 1 2 6 1 0 2.7 0.82
1.3 Are learning strategies effective? 1 4 2 2 0 2.6 1.01
1.4 Relevant & authentic courses? 0 2 5 3 0 3.1 0.74
The results of the teacher questionnaires (Table 9.4) show that teachers think
favourably of the courses and the learning strategies, which indicates that the courses
are effective and are urgently needed for the achievement of speech intelligibility. The
teachers’ assessments also reveal a tight relationship between pronunciation and speech
intelligibility problems. These results support the students’ findings (cf. Tables 9.1 and
9.4).
9.6.2.2 Perception of English speech sounds
I will now present the results of the questionnaire obtained from the teachers. The
results deal with problems facing the learners in identifying English speech sounds.
Table 9.5 presents the responses to questionnaire items referring to learning difficulties
and the effects of some linguistic factors on intelligibility of Sudanese learners of
English. The components of the survey were conducted through teacher assessment of
student performance.
Table 9.5 Distribution of responses of the instructors in the written survey about the intelligibility
problems facing Sudanese EFL students. The table lists the ten responses per item over the five
scale values, as well as the mean and standard deviation of the scale values. The modal (most
frequent) response category is highlighted in the table. See appendix 9.1b for a verbatim copy of
the questionnaire items.
Scale value
No. Item
1 2 3 4 5 mean SD
2.1 Difficulty to regroup with same vowel/consonant sound 0 6 2 2 0 2.6 0.84
2.2 Difficulty to find out an odd vowel/consonant sound 1 5 4 0 0 2.3 0.68
2.3 Difficulty to discriminate between voiced/voices cons. 1 0 6 3 0 3.1 0.88
2.4 Difficulty perceiving E. final clusters 1 2 6 1 0 2.7 0.82
2.5 Difficulty perceiving initial clusters 1 2 6 1 0 2.7 0.82
2.6 Difficulty to distinguish E. short vowels 0 7 2 0 1 2.5 0.97
2.7 Difficulty to distinguish long vowels 4 4 2 0 0 1.8 0.79
2.8 Difficulty to distinguish diphthongs 2 6 2 0 0 2.0 0.67
2.9 Degree of perception errors due to L1 interference 0 4 5 0 1 2.8 0.92
2.10 Degree of perception errors due to lack op L2 knowledge 0 4 5 1 0 2.7 0.68
The results of the teacher questionnaires reveal that Sudanese listeners of English
encounter difficulties recognizing English speech. According to the assessment of the
language teachers (Table 9.5), the subjects concerned repeatedly make errors in the
perception of the English phonemes. The English vowels are reported as the most
difficult to understand, i.e., the listeners’ level of perception of the short, long and
diphthongal vowels is claimed to be poor. Despite the fact that the single and cluster
consonants of English are a bit more intelligible than vowels, these too constitute a
perception problem. The instructors claim that they regularly face difficulty on the part
of their students regrouping and sorting out the words with the same consonant sounds;
minimal pairs or quartets. It is worth noting that the feedback of the students and the
teacher’s questionnaires reflect the same judgment viz. that English vowels are more
difficult to understand than the consonants. (cf. Tables 9.2 and 9.5).
9.6.2.3 Production of English speech sounds
I will now present the results of the questionnaire obtained from the teachers that deal
with problems facing the learners in producing English speech sounds.
Table 9.6 presents the responses about learning difficulties and the effects of some
linguistic factors on intelligibility of Sudanese learners of English. The components of
the survey were conducted through teacher assessment of student performance.
Table 9.6 Distribution of instructors’ responses to the questionnaire items that pertain to speech
sound production of Sudanese EFL students. The table lists the ten responses per item over the
five scale values, as well as the mean and standard deviation of the scale values. The modal (most
frequent) response category is highlighted in the table. See appendix 9.1b for a verbatim copy of
the questionnaire items.
Scale value
No. Item
1 2 3 4 5 mean SD
2.1.4 To pronounce fricatives /U, \/, /6, &/, /H, X/, /5, </ 1 6 3 0 0 2.2 0.63
2.1.5 To produce a consistent vowel quality 4 6 0 0 0 1.6 0.52
3.3.1 Difficulty in producing initial E. clusters 2 5 2 0 1 2.3 1.16
3.3.2 Difficulty in producing final clusters 2 4 4 0 0 2.2 0.79
3.1.1 Difficulty in producing short vowels 2 6 1 0 0 1.9 0.60
3.1.2 Difficulty in producing long vowels 2 5 2 0 1 2.3 1.16
3.1.4 Difficulty in producing diphthongs 4 5 1 0 0 1.7 0.68
2.2.1 Mother-tongue interference 0 3 6 1 0 2.8 0.63
2.2.2 Use universals to achieve intelligibility 3 5 1 0 1 2.1 1.20
2.2.3 Avoid difficult sounds 3 2 2 2 1 2.6 1.43
2.2.4 Overgeneralisation 2 3 3 1 1 2.6 1.27
2.2.5 Substitute sounds of L1 for L2 2 5 2 1 0 2.2 0.92
2.2.6 Ability to dissociate sounds of L1 for L2 3 5 2 0 0 1.9 0.74
As the assessment by Sudanese English language teachers shows (Table 9.6), Sudanese
EFL learners have little knowledge of the English vowels and they are weak in the
pronunciation of such vowels, but they show an acceptable level in producing the
English consonants. These instructor assessments clearly converge with the results of
the student questionnaire; the students claim to frequently face difficulties in the
pronunciation of short, long and diphthongal vowels of English, but they assert having
no serious problems in the production of the consonants (compare: Tables 9.3 and
9.6 ). The results also reveal that the speakers frequently substitute L1 for L2 in their
production of the En1glish sounds.
9.7 Correlation between student and instructor judgments
It could be observed in the preceding sections that the Sudanese EFL students and
their instructors often agree on which aspects of English pronunciation and listening
ability are easy or difficult. This is a good thing. It would be highly undesirable if
students have a completely different view than their instructors of their strengths and
weaknesses. In order to quantify the degree of correspondence between student and
teacher judgments, Figure 9.1 presents the mean rating of the students and teachers on
the intelligibility of Sudanese learners of English in a scatterplot for the 15 question-
naire items that are shared between students and instructors. The black line which runs
across the figure, shows the linear relation between the students’ responses and the
corresponding responses given by the instructors. The dotted line is a reference line
which defines positions in the graph where students and instructors would have given
the same evaluation of the students’ performance.
Figure 9.1 Total mean rating of the two subject groups shown by the scatter plot. Data points
tightly cluster around the line is indicative of a positive correlation between the students and
instructors’ results.
Figure 9.1 shows that there is a moderately strong linear relationship between the
students’ and the instructors’ judgments, with a significant positive correlation of r
= .569 (p < .01, one-tailed). This indicates that the self-rated performance by the
students and the assessment of the students’ performance by their instructors
correspond reasonably well to each other, although the students tend to have a more
optimistic view of their proficiency than their instructors have, as is evidenced by the
fact that the majority of the scatter points in the graph lie above the reference line.
9.8 Discussion and conclusions
The results of the written questionnaires revealed speech intelligibility problems

experienced by Sudanese EFL learners. With respect to accounts of assessments and
the responses of both Sudanese EFL learners and teachers, the perception and
production of the English vowels were described as areas where the subjects’ perform-
ance is the worst. The subjects attributed such problems to L1 influence and to the lack
of the explicit language knowledge. Many results in the previous literature support their
conclusions (Mohammed 1991, Fahal 2004).
The results also revealed that RP long and diphthong vowels proved to be more
difficult to learn than short vowels. Similar results were reported in related studies
where the native speakers of Arabic have difficulty distinguishing between English
central and back vowels such as /#, nÖ, 7/ as in cot, caught and boat, all of which are often
pronounced as /nÖ/ due to absence of these vowels from their L1 (Brett 2004).
On the other hand, the results of both the students and the language teachers suggest
that the English single and cluster consonants are comparatively better perceived and
produced by the Sudanese learners than the vowels. This is probably because the
learners are more familiar with consonant sounds than vowels.
An interesting finding is that the content of the phonology and phonetics syllabus
taught are assessed as effective and feasible as the data show. This finding gives a hint
of inconsistency between the students’ performance and their assessment of the
syllabus taught. In other words, the data reveal that the students’ scores of the English
speech sounds are generally low, especially those of the vowels, whilst the courses are
referred to as practical and interesting. It is probably because other linguistic aspects
such as insufficient cognitive knowledge or communicative context, etc., contributed to
this problem.
Moreover, the output of the coefficient correlation shows a positive relation between
the performance of both the students and teachers at r = .569 (p < .025, one-tailed)
which indicates that both the students and the teachers are in conformity with each
other in terms of the feedback obtained through the questionnaires.
The correlation also revealed that the students’ results concur with those of the teachers;
however, the students’ judgments tend to be higher than those of the their instructors,
probably because the former are not critical enough of their own level of achievement.
Chapter Ten
Conclusion
10.1 Introduction
Studies in English as a second/foreign language are paying more attention to the

learning problems of ESL/EFL learners. They attempt to answer questions such as
what makes EFL/ESL learners more intelligible to native speakers? and what makes it a
problem to native English speakers to understand L2 speech? and so on. In this study,
the investigation focuses on examining receptive and productive intelligibility problems
of Sudanese EFL learners. That is, it attempts to measure the abilities of these learners
having comprehension and production skills when they are involved in interactions
with native English speakers. The final aim of the study is to scrutinize the nature and
linguistic causes of these problems. In doing so, I assume that most learning problems
of ESL/EFL learners are due to insufficient L2 knowledge and transfer of the learners’
L1. In a broader context, attention for receptive and productive proficiency in
ESL/EFL has increased due to the educational and academic desire to improve the oral
communicative abilities of Sudanese EFL learners, who should be able to use English
in a variety of communicative domains (politics, science, education, trade, commerce,
etc.).
Experimental work treats two language domains, L2 speech production and perception,
which represent the major components of speech intelligibility. Specifically, the
investigation targets segmental analysis of the English vowels, single and cluster
consonants, which form the basic building blocks of spoken words. This is because
(some) linguists assume that more than 50 percent of the intelligibility of a spoken
utterance depends on correct sound production (Fraser 2005) rather than on other
matters such as incorrect syntax and morphology.
This chapter identifies and elucidates the issues approached in the present study, on the
problems of speech intelligibility among Sudanese university EFL learners. The study
has yielded a large amount of information concerning the topic at hand. Each area of
investigation in this study contributes to an understanding of the problems of speech
intelligibility facing these learners and therefore to a better understanding of how the
entire problem could be approached. The chapter will provide an account of the most
general aspects of this knowledge divided into three sections: summary, conclusion and
recommendations.
10.2 Summary
The study represents an experimental attempt aiming to explore the problems of

speech intelligibility experienced by Sudanese EFL learners (see aims in section above).
The investigation is based on the measurement of segmental intelligibility of English
speech sounds, which was evaluated by three auditory discrimination tests, in different
chapters (chapters 3, 4 and 5). To execute these discrimination tasks, I use the Modified
Rhyme Test (MRT), which asks listeners to indicate in multiple-choice format what
they heard. Test stimuli included vowels, onset and coda single consonants and
consonant clusters of English in meaningful monosyllabic words. These words were
read in fixed carrier phrase (Say….again). The second part of the stimuli consisted of a
number of simple predictable and meaningful sentences adapted from the Speech
Perception in Noise (SPIN) test such as To open the jar twist the lid., in which the listeners
had to write down they sentence-final keyword only.
For the assessment of phonemes and multi-phonemes (consonant clusters), the

responses were scored as either intelligible or unintelligible. A speaker with a score of
(close to) 100% can be interpreted as completely intelligible, while a score below 50% is
considered indicative of unintelligible performance (Lafon 1966). Word intelligibility at
the sentence level, was determined by the percentage of final words in SPIN sentences
that were correctly identified. However, partially correct answers are also examined
since they give information about the differential perception of phonemes in onset,
nucleus and coda position.
Different groups of participants were involved in performing the auditory

discrimination and word recognition tests mentioned above. These include Sudanese
EFL learners (listeners/speakers), native speakers of RP English (either listeners or
speakers) and Dutch and American listeners of English. None of these listeners
participated in the experiments more than once. Experimental data of perception tests
delineate the type of information that listeners extract from the tests and the
performance of the listeners as better or worse, and so on. Phenomena to which
listeners are particularly sensitive are conflation, substitution, deletion or addition of
segments in the three tests.
Next, the study included three production tests targeting the measurement of the
acoustic correlates of vowels, consonants and clusters spoken with Sudanese-Arabic
accented English (chapters 6, 7 and 8). In the data analysis the results obtained for the
Sudanese EFL speakers were compared to acoustic properties of the same (or similar)
stimuli produced by native speakers of RP English.
Comparison of Sudanese EFL learners’ data with native speaker control other data
forms an essential ingredient in the evaluation of both auditory and productive task
performance. The purpose of comparison is to scrutinize issues like relative accuracy,
correctness and standardization of the learners’ performance in both auditory and
productive tasks. As part of the comparison, I involved Dutch and American listeners
of English in auditory tasks to obtain a better understanding of the learning problems
investigated. Written questionnaires were also used (chapter 9) as part of an assessment
process that asked the participants (both Sudanese EFL students and their teachers) to
CHAPTER TEN: CONCLUSION 185
reply to questions and perform tasks. The purpose of the written questionnaires is to
supply data about the same field of investigation but with a different instrument.
Moreover, the data may reinforce the findings obtained from the perception and
production tests. The final objective of the accounts is to give insight into the nature
and causes of learning problems of speech intelligibility. Accounts also provide
statistical insight into these errors in terms of means, frequency and correlation for
credibility purpose. In this way, segmental error patterns manifest in the learners’
performance will present answers to the questions raised and clarify the causes of
intelligibility problems experienced by Sudanese EFL learners.
There is a clear convergence in the results of the tests throughout the study. Moreover,
the findings support many tendencies reported in previous literature.
10.3 Conclusion
This section provides answers to the questions that are raised in the study. It also
provides miscellaneous conclusions in other aspects of the research that do not address
specific research questions.
10.3.1 Nature of speech intelligibility problems of Sudanese EFL learners
The findings of this study reveal that Sudanese EFL learners face speech intelligibility
problems. Relatively, they experience difficulties in recognizing and producing native
English speech. The learners’ perception level (e.g. segmental intelligibility as quantified
by means of the Modified Rhyme Test) of English speech sounds is low. Their mean
correct scores in the identification test of the English vowels, codas in single and
consonant clusters, and word recognition in SPIN sentences, which represent the most
problematic areas, are 47.8, 66.0, 71 and 33%, respectively (Chapter 3). More
importantly, as EFL listeners, the learners produced different types of error patterns
when they are involved in interaction with the native speakers of English. These errors
included the confusion of the English /¡/ and /n/, substitution of /7/ for /WÖ/ and /H/
for /X/ or confusing the coda clusters /UV~UM/ and /DF~NF/, etc. These results
demonstrate that speech intelligibility may vary significantly depending on the speech
sounds present in the native language. The properties of the native language of the
learner and those of the target language determine the direction of difficulties and
pattern of errors that learners experience with L2 learning. This was observed in the
learners’ acoustic results of the English vowels, as shown by automatic classification
through Linear Discriminant Analysis (LDA). The classification data indicates that
vowels are problematic and they reveal a variation of error patterns like the confusion
of /WÖ/ as /nÖ/ and substitutions of /«Ö/ for /3, ¡,G, or n/, /¡~n/, etc. As the study
concludes, these patterns of errors indicate that the learners apply their L1 strategies to
the learning of English speech sounds.
Speech intelligibility problems also arise when Sudanese EFL learners are involved in
interaction with native British and American listeners. The data reveal that such
problems occur because the learners produce incorrect English speech sounds. For
example, the confusion of /3~¡/ and /n~7/ occurs due to phonological differences
between L1 and L2. The learners also often make production errors of a non-phonetic
nature such as those that relate to the difference of spelling systems between English
and Arabic. For instance, front vowel /G/ is frequently replaced by /+/ due to the
influence of the Sudanese-Arabic spelling system (Chapters 4, 5 and 6). Similarly, the
substitutions of the English consonants, e.g. /U/ for /6/, /V/ for /F/ etc., frequently
occur, particularly between L1 and L2 phonemes which share similar phonetic features.
As the study reveals, the ultimate causes of these problems are that the native listeners
are not able to determine the strategies in which the sound structures of the learners’
speech work. Thus, the native listeners’ failure to discover the systematicity of the
learners’ speech production makes it difficult for them to interpret the speech signal
correctly. The study also reveals that Sudanese-Arabic accented English deviates from
the native norms of English. Deviations become manifest primarily when the learners
attempt to produce English speech, where many systematic errors occur across sound
categories.
10.3.2 Intelligibility of Sudanese university EFL learners to native listeners of

English
How intelligible are Sudanese university EFL learners to native English listeners? The
answer to this question accounts for productive intelligibility of Sudanese EFL learners
to native speakers of English. The learners show various levels of speech intelligibility
to the native listeners of English. Variation depends on the types of English speech
sounds and tasks involved. To start, both British and American listeners face
perception problems with practically all EFL vowels, part of the onset and coda
consonants and clusters produced by the Sudanese learners. The English speech sounds
which were produced by Sudanese EFL learners’ were identified by both British and
American listeners less successfully than when the same test items were read by a native
speaker of RP English.
Figure 10.1 below summarizes the differences in intelligibility of the Sudanese and
native speakers of English, as established from the responses given by British and
American native listeners.
Moreover, these results concur with the data obtained from the SPIN sentences where
Sudanese learners show lower intelligibility scores, of 69.2 and 64.8% with native
British and American listeners, respectively. In the SPIN words, the vowel nuclei
proved to be more difficult than singleton and cluster consonants (see Figure 10.1, see
also Chapter 5).
On the other hand, Sudanese EFL listeners have difficulty in understanding native (RP)
English speech. The lowest perception scores were found for the English vowels
(around 48% correct) and for word recognition in the SPIN test (around 30% correct).
Moreover, the perception of the coda consonants and clusters proved more difficult
than that of single onset consonants and consonant clusters: 66 and 71% against 94
and 75%, respectively (Chapter 3). The negative correlation of r = –.682 (p < .05)
between identification scores obtained for vowels and onset consonants indicates that
poorer identification of vowels goes together with better results for onset consonants.
On the other hand, vowels rather than consonants displayed a fairly high positive
correlation with correct word identification (r = .700, p < .01). It would seem therefore
that correct vowel identification is a more important determinant of word recognition
than identification of either onset or coda consonants. This conclusion may not be true
of word recognition in general but it seems valid in the special situation where native
English listeners are confronted with Arabic-accented English, in which the quality of
the consonants is generally better than that of the vowels.
Figure 10.1 Summary of perception differences of vowels, consonants and clusters of English
spoken by a Sudanese EFL learner and a British speaker of English.
The vowel results in Chapter 6 also respond to the question raised above. The chapter
examines the intelligibility of Sudanese EFL learners to native listeners of English,
where English listeners were simulated by Linear Discriminant Analysis (LDA). The
results of an acoustic analysis of the English vowels spoken by Sudanese EFL learners,
appear to be relatively similar to their counterparts in Chapter 5 where the same vowel
tokens were identified by native English listeners.
These conclusions suggest that although the learning of the English speech sounds is
problematic in general, vowels in particular form a major element blocking intelligibility.
Moreover, there is consistency with the previous studies where non-native English
listeners have greater difficulty in decoding impoverished (LPC-resynthesized) speech
than human speech (Reynolds, Bond and Fucci 2006). Some types of speech
intelligibility problems of Sudanese EFL learners indicate their limited English skills.
Linguistically, there is distance between the learners L1 and L2, a factor that presents an
essential source of their intelligibility problems. On the other hand, native
listeners/speakers benefit from their similar national background and so they show a
high intelligibility level.
10.3.3 The most difficult sounds
English vowel production proved to be the most difficult aspect for the Sudanese EFL
learners, as the results have shown. A fair conclusion is that these learners make
relatively more production errors in English vowels than in English singleton and
cluster consonants. Acoustically, there is a large spectral contrast between the English
vowels produced by Sudanese EFL learners and those of the native speakers. Unlike
those of the native speakers, which show similar distribution in the vowel space across
speakers, English vowel tokens of the learners show incorrect distribution in the vowels
space. The members of short/long (lax/tense) vowel pairs are closer to each other,
whilst the central and back vowels of the learners exhibit no relation to the native
English vowel repertory. Statistically, identification results obtained by Linear
Discriminant Analysis (LDA) reveal rather poor English vowel production for the
learners targeted. When the LDA was trained on RP data but tested on L2 vowels, the
correct automatic identification is only 42%. Comparison of the LDA results with those
of the human identification of the same vowel tokens (in the next paragraph), provides
additional strong support that vowels are the most difficult sound type to pronounce.
The English vowels produced by Sudanese EFL learners were correctly identified by
British and American listeners (Chapter 5) in 68 and 63 percent, respectively as
determined by the Modified Rhyme Test. Their performance on the Arabic EFL single
and consonants and clusters, is relatively better. The single consonants were correctly
identified at 85.0 and 84.8% by the British and American listeners, respectively, while
scores on clusters were 84 and 88%, respectively.
In the preceding section it was concluded that correct vowel identification correlates
significantly with the word recognition scores obtained by native English (and
American) listeners to Sudanese-accented English. No such relationship could be
established for consonant identification and word recognition. The conclusion follows,
then, that vowel pronunciation is not only the most difficult problem for the Sudanese
learners, but the errors they produce are also most detrimental to their intelligibility at
the sentence level.
On the basis of the joint evidence provided by identification by machine (through LDA)
and by human listeners, it appears that Sudanese EFL learners find the pronunciation
of English vowels the most difficult. This would imply that vowel nuclei frequently are
an essential ingredient of correct word production. One more point is that when an
English vowel represents a perception problem, it also represents a production problem.
This point confirms that, there is a relationship between the ways Sudanese EFL
learners use in learning English vowels and the patterns of errors they make - L1 effect
(Flege 1981, 1995).
Thus, these findings show consistency with the literature that found a strong
relationship between segmental errors and degradation of intelligibility. That is, the
involvement of the L1 articulation system causes Arabic speakers of English to

substitute L1 /C/ for the English /3/ phoneme in words like add, bat and dad. Similar
problems occur in the learning of the English obstruents /6~U/ and /\~&/ (Arslan and
Hansen 1996). These findings also show that the perception errors of English speech
sounds often predict production errors and vice versa.
10.3.4 Linguistic causes of intelligibility problems
This part gives an account of the linguistic causes of the intelligibility problems of
Sudanese learners of English.
10.3.4.1 L1 and L2 Inventory differences
This section addresses the phonological differences that exist between English and
Sudanese Arabic. It seeks evidence of phonemic contrasts between these languages
discussing the potential of how these contrasts affect the learning of the target language.
It is assumed that there are differences in the inventory of each of these languages that
compromise the learners’ perception and production of English speech sounds. For
more detail, see § 2.1, which presents a contrastive analysis of the two inventories.
10.3.4.2 Lack of explicit knowledge aggravates the intelligibility problems
This section seeks to account for how lack of explicit knowledge of English hinders the
intelligibility of Sudanese EFL learners. It is argued that the mastery of English
phonetics and phonology is necessary for the achievement of intelligibility (see § 2.2).
10.3.4.3 Procedure of error analysis
The research is motivated by what Sudanese EFL learners actually acquire or hear when
they attempt to learn English, which part of their output is deviant from the correct
norm of the target language and what the causes are (i.e., difference in sound inventories,
differences between L1 and L2 rules, and the lack of explicit L2 knowledge, etc). To
answer the questions, error analysis methods were applied as a scientific procedure that
serves to obtain credible explanations (Ellis 2003, Taylor 1986).
Error analysis. This refers to a systematic procedure of identification, description and

explanation of errors made by the learners. The aim is to see what linguistic elements
are responsible for these errors. There is a need for a corpus of errors made by the
learners, which would enable the researcher to detect such elements in the performance.
Frequency of errors of various types made by the learners is determined in the corpus.
However, if some errors do not occur frequently, this does not mean they are less
difficult; they are still of interest. To determine error frequency, a survey of the
performance of a number of Sudanese, Dutch and native speakers of English was
carried out. Most errors made by the Sudanese learners in the perception and
production of English speech sounds range between 30 % to 90% in the area of vowels,
coda consonants, onset consonants and onset and coda clusters, while minor errors
range between 10% and 20% (see Chapters 3 - 9).
Data collection and error identification. I collected samples of the learners’ language that
effectively illustrate the features of their performance in order to compile a
comprehensive list of errors. The sample involved all the results of the study in which
the learners took part as either listeners/speakers (Chapters 3 - 9). Accounts of errors
are based on a number of mechanisms that started with a procedure such as the
recognition of an error (definition) and the effect of L1 transfer where the presence of
L2 errors mirrors L1 transfer. Other mechanisms are the process of using L2
knowledge in performance, in particular data dealing with communication problems,
importance of explicit knowledge of L2 speech sounds, training transfer and the
utilization of innate knowledge of linguistic universals (unmarked or common
phenomena). In regard to these mechanisms, the performance of the learners’ output
represents an important source of evidence for speech errors that occur in at the level
of the segment, syllable and words. Related literature, observations, analyses of rigorous
research also provide data that helps to assess, to make decisions and to determine
where errors occur; i.e., which speech sounds cause students difficulties, and what their
frequency and gravity are. Similarly, data of the written questionnaires of the EFL
Sudanese learners and teachers constitute an extra source of information. Finally, the
collected data is expected to provide a deeper understanding of the nature and
classification of speech intelligibility problems of the learners concerned.
Error description. The description of the learner errors (the learner’s problems in speech
perception and production) involved a comparison with the performance of the target
language. This refers to the performance of the native listeners/speakers who
participated in the study as control groups, the other groups involved and related
previous studies. Error description in this context identified problem areas like
confusion or substitution of speech sounds, etc.
Error explanation and classification. This refers to the description of the source of the
problems of speech intelligibility, which Sudanese EFL learners faced. It is an attempt
to establish the processes that are expected to be responsible for the occurrence of
these errors. Then a tentative classification of the errors follows aiming at identifying
the nature of the source of such errors. Classification treats types of errors such as
interference error (reflecting the L1), intralingual error (reflecting failure to learn, or
partial/incomplete learning of a rule), developmental error (reflecting errors that occur
while a learner is building – faulty – hypotheses about L2), etc. Tables 10.1-2 below
provide accounts for the causes of the errors/problems of the learners. In Table 10.1,
two gross error categories will be distinguished which describe the pattern observed in
the target language, i.e. English. In the error pattern which will be called ‘confusion’,
two sounds are used interchangeably as response categories. This is a symmetrical
confusion pattern. I reserve the term ‘substitution’ for asymmetrical confusion patterns
whereby a sound that should be perceived as phoneme /Z/ is (more or less)
consistently perceived as a token of phoneme /y/ but not vice versa.
Table 10.1 Causes of errors and/or speech intelligibility problems experienced by Sudanese EFL
learners in this study with focus on perception problems.
Perception Errors
No. Category Description Example Explanation
1. Confusion Listeners fail to dis- - /¡~n/ (L1 interference) when L2 knowledge
criminate between central - /#Ö~nÖ/ is lacking learners fall back on the
and back vowels - /n~¡/ habits of their L2
2. Confusion /«Ö/ is misperceived in - /«Ö/ as Involvement of L1 (due to partial
words like work/worse /n, ¡, G/ learning or insufficient L2
knowledge)
3. Confusion /G/ and /+/ are misper- - /G~+/ or Partial learning/transfer of L1 ortho-
ceived interchangeably as - /+~G/ graphy, incorrect perceptual repre-
in enter, pet. sentations
4. Confusion Listeners fail to - /#7~7/ Partial learning and L1 interference
distinguish between such - /+~G/
vowel tokens. - /C+~G+/
5. Substitution Listeners mistake the - /T~Y/ Due to close F2 and F3 (partial
English onset /T/ for learning or small L2 knowledge)
Arabic/Y/.
6. Substitution Listeners fail to distin- - /I~M/ Voicing feature resists learning due to
guish between such pairs. - /F~V/ insufficient knowledge
- /H~X/
- /\~U/
7. Substitution Listeners hear English - /6~U/ Incorrect perceptual representations
/6/ as /U/ Learners carry over L1 phonetic
habits into English.
8. Substitution Phonological alternations - /MN~IT/ The speech signal not detected well
or misperception of a - /UN~UP/ (lack of L2 experience or unfamiliar-
cluster or one of the - /URT~URN/ ity or place/manner of articulation
cluster members - /P\~O\/ effect)
- /UV~UM/
In Table 10.2, which summarizes error patterns found in the EFL production data of
the subjects, similar terminology is used. Here the pattern which is called ‘confusion’,
denotes a symmetrical error pattern: two phonemes which should be kept distinct in
English are used indiscriminately. In the substitution pattern phoneme /x/ is used (and
perceived as such by native English listeners) when phoneme /y/ should be used but
not vice versa.
Table 10.2 Causes of errors and/or speech intelligibility problems experienced by Sudanese EFL
learners in this study with focus on production problems.
Production Errors
No. Category Description Example Explanation
1. Confusion Learners fail to discriminate - /n~¡/ Incorrect perceptual
between central and back - /«Ö~#Ö/ representations (L1 effect and
vowels - /n~7/ lack of L2 knowledge)
- /'~#Ö/
- /C7~7/
2. Substitution Learners fail to discriminate - /3>¡/ (Incorrect vowel source) L1
between English fully front interference
and central vowels
3. Substitution Learners fail to discriminate - /G>+/ spelling/graphical differences
between front vowels between L1 & L2
4. Substitution Diphthongs rendered to - /G+>G/ L1 interference (Sudanese
monophthong Arabic vowel source)
5. Substitution Voiced & voiceless - /6>U/ Incorrect representations of
fricatives are substituted in - /&>\/ English fricatives due to L1
initial and final positions. filter effect (also reduced
acoustic contrast in COG)
6. Substitution Learners fail to distinguish - /\>U/ Lack of clear distinctive
between consonants of the - /0>P/ voicing feature (lack of L2
same place of articulation exposure)
7. Substitution Learners show no clear - /MN>IN/ Weak explosion of the voice-
distinction producing either in onset less velar: phonotactic re-
the first or second onset strictions between L1 and L2
cluster member.
8. Substitution Learners show no clear - /PV>0M/ Unclear voicing feature:
distinction producing either - /P\>O\/ phonotactic restrictions
the coda first or second in coda between L1 and L2
cluster member.
9. Acoustic Learners produce incorrect - /D, R/ Involvement of L1 acoustic
feature or imprecise voice onset - /F, V/ correlates
(VOT) time (VOT), particularly in - /I, M/
coda plosives.
10. Acoustic Inaccurate or incorrect - /UN, URN, Involvement of L1 acoustic
feature production of fricative+ URT, UM/ cues (use of learner’s L1
plosive+liquid or fricative+ strategy)
liquid. Weak friction and
strong aspiration
10.3.5 Pedagogical implications of error analysis
The explanation of the sources of the speech intelligibility problems above helps to
infer the following:
1. It seems that, generally, many of intelligibility problems are due to either L1

interference or lack of explicit L2 knowledge. Thus, L1 interference leads to errors
caused by incorrect perceptual representations, wrong acoustic features, incorrect

implementation of L2 rules, etc. On the other hand, the lack of explicit L2
knowledge causes problems such as intra-lingual (partial learning) errors, errors
due to insufficient practice, orthographic errors, wrong implementation, training
transfer and errors due to unfamiliarity. Linguists attribute the former group of
errors to competence and the latter group to performance (Corder 1974, Ellis 2003,
Taylor 1986).
2. Acoustic data contributes to the interpretation of some error patterns with a
degree of certainty because it forms experimental evidence. The data supports the
accounts for many of the problems made in the perception and production of the
English speech sounds. Errors are important in themselves since they constitute
evidence for the learning device that EFL students follow to learn L2 speech
sounds. The errors of Sudanese EFL learners’ thus represent an area of interest
and significance for teachers as well as for developers of curricula and teaching
materials, and motivate them to devise appropriate materials and effective teaching
techniques and to construct suitable tests for the different levels and matching the
needs of the learners.
3. In terms of difficulty and error frequency, vowels, particularly central and back
vowels, are the most vulnerable to mispronunciation. The English consonants,
particularly fricatives and the final consonant clusters also proved problematic.
Importantly, some of these errors carry a high functional load while others do not.
This phenomenon impacts on the EFL speaker’s intelligibility. Therefore, it is
useful for second-language teachers to use functional load rankings as a way to deal
with these errors. Instructors can focus on the most difficult areas of language
learning where some errors are more salient to the listeners/speakers than others
are; i.e. the latter group is of low functional load.
10.3.6 Findings of the acoustic analysis of the English speech sounds
10.3.6.1 Acoustic analysis of English vowels
1. Spectrally, the Sudanese Arabic-accented English vowels differ from those of

native speech. The vowel space of the vowels, particularly /G, ¡, «Ö, n, #Ö, WÖ, nÖ, +, 3/,
manifest different locations in the vowel space of the EFL learners when
compared to native speaker data. In contrast to the vowels produced by native
speakers of English, the vowels of Sudanese speakers are distinguished by lower
formant values. Differences occur due to differences between L1 and L2.
2. The English vowel durations of the Sudanese learners show significant cor-
respondence to ordering of native durations at r = .943 (p < .01). The strong cor-
relation is caused, among other reasons, by the excellent distinction on the part of
the EFL speakers between the duration of short versus long vowels. In fact, some
tense vowels were lengthened more than they should have been, due to the cir-
cumstance that the learners tend to produce English vowels with their L1 product-
ive strategies. The difference in duration between short and long vowels in Arabic
tends to be greater than between lax and tense vowels in English. Nevertheless,
such a correspondence helps the EFL speaker achieve better intelligibility.
10.3.6.2 Acoustic analysis of English consonants
The results of the acoustic correlates in this study reveal differences between the
English consonants produced by Sudanese EFL learners and those of the native
English (Chapter 7). These differences are the following:
1. The English voice onset time (VOT) produced by the Sudanese EFL learners
differs strikingly from the native pattern, in that the VOT of both the voiced and
voiceless stops falls in the short-lag range. The native speakers’ VOT is categorical,
where the voiced plosives fall within the short-lag range and the voiceless plosives
have VOT values in the long-lag range. Moreover, the learners’ VOT does not
reflect the effect of the place of articulation in which VOT should increase as the
stop consonant is articulated further back in the mouth. Acoustic differences such
as these indicate that the Sudanese-Arabic learners have difficulty in both detecting
and producing the precise voicing features of English stops.
2. Generally, the vowel duration values of the English vowels preceding obstruents
show relative correspondence to native English voicing contrast, but final
fricatives and affricates slightly differ. These differences are due to the L1
strategies and the slow speaking style of the learners.
3. The durations of the English consonants correspond to the English native norm,
where the voiceless obstruents have longer duration values than the voiced.
However, coda affricates have deviant (and probably incorrect) duration values.
4. The centre of gravity (COG) measurements reveal a relative correspondence to the
English pattern. The Sudanese-accented sibilant fricatives show spectral peaks at
relatively higher frequencies than non-sibilants, as they also do in native English
speech. This correspondence occurs probably because Arabic has many conso-
nants that resemble those of English. However, the COG values of the native
speakers are higher, in comparison to those of the Sudanese learners, possibly due
to a difference in speaking style.
10.3.6.3 Acoustic analysis of consonant clusters
Acoustic measurements of the English clusters reveal that the Sudanese EFL learners
have problems in producing English consonant clusters.
1. The learners’ plosives /R/ and /V/ in clusters beginning with the fricative /U/ are
strongly aspirated, whilst /U/ has weak frication. This contrasts with those of the
native English speakers, where the voiceless /R, V, M/ are aspirated only at the
beginning of a syllable but remain unaspirated when final or when preceded by /U/
in the same syllable.
2. The production of the English coda clusters proved to be difficult. They hardly
show a learning pattern that converges toward the native norm. The results also
show that speech intelligibility problems may have to do with the distribution of
sounds. A few errors of the English coda cluster consonants, like substitutions of
/P\~O\/ and /P\~F\/ seem to reflect the effect of differences in sound distribu-
tion between Arabic and English.
10.3.6.4 Findings of the written questionnaires
The written questionnaire data represents a useful contribution to the research. The
results strongly support the findings of the experimental chapters in the study. The
findings of both the students and the teachers show that there is a speech intelligibility
problem among Sudanese EFL learners. For example, the results reveal that the
learners have problems in recognizing native English speech sounds and they also find
it difficult to produce English short and diphthong vowels, fricatives and the nasal pair
/P~0/. However, both the students and the language teachers claim that the English
single and cluster consonants are comparatively better perceived and produced by the
learners than the vowels. The respondents attribute these problems to the lack explicit
knowledge, L1 interference and insufficient practice. A reliability analysis revealed that
the Sudanese EFL learners show a high agreement amongst themselves (Cronbach’s D
= .860), i.e. the share the same views of their strengths and weaknesses when it comes
to perceiving and producing English sounds. The agreement within the group of
instructors is lower (D = .616, even after eliminating the most a-typical respondent),
which shows that the instructors’ opinions on the students’ strengths and weaknesses
are more diversified. Nevertheless, students and instructors are in reasonable agreement
when the analysis is restricted to only those items that are shared between the student
and teacher versions of the questionnaire (r = .569), be it that overall the students rated
their level of proficiency in English more positively than was the case in the views of
their instructors.
10.3.7 Recommendations
In the following subsections, a number of recommendations will be made for the

teaching of perception and pronunciation of English in the context of the English
curriculum taught at Sudanese universities. It be should pointed out at this juncture that
not all of the recommendations follow from my experimental work in a strict sense.
They may also be based on observations found in the literature (and discussed in this
dissertation) or on English teaching practice that I observed while living outside Sudan.
10.3.7.1 Focus on speech sound production in isolation and in context
Higher priority should be given to the production of English speech, which represents
a major learning problem for Sudanese EFL learners. In this respect, the emphasis in
production should be on getting the sounds right at the word level, dealing with words
in isolation and with words in controlled sentence environments. This way of speech
production enables learners/instructors to recognize which sounds are the most
difficult to distinguish, e.g. in minimal pairs like /n~nÖ/ as in cot/caught and /G~«Ö/ as in
bed/bird), which can have a negative impact on intelligibility when not properly dis-
tinguished.
Moreover, production instructions should place more effort on language as

communication, as this will motivate successful production. Pronunciation must be as a
necessary component of intelligibility in which the learners should surpass the threshold
level so that their production does not hinder their communicative abilities.
10.3.7.2 EFL teachers need specific assistance
Sudanese EFL learners who are specialized in ELT at teacher colleges and education
faculties, should obtain a high level of intelligibility, since they represent a model for
English input to their students. Therefore, they should receive special assistance that
enables them to do their job properly. For example, listen-and-imitate techniques,
language laboratory exercises, free conversations, minimal pair drills, etc. are required.
Phonetic description of the articulatory system of the target language is also important
since it offers the learners an opportunity to develop explicit knowledge about the
perceptual representations of L2 sounds. This is because learners cannot produce a
speech sound correctly unless they acquire correct perceptual information about the L2.
10.3.7.3 Experimental approach to problem solving
Future researchers should pay more attention to speech intelligibility problems,

teaching pronunciation, perception, listening skills, etc., as issues that receive relatively
little attention. Their investigations should use experimental evidence to account for the
learning problems concerned, rather than using impressionistic views. Results which are
obtained by means of experiments have some degree of certainty and are scientifically
more credible than impressionist judgments, especially when the impressions are voiced
by observers who are not native speakers of the target language.
10.3.7.4 Use of language labs to teach foreign languages
Language laboratories are needed to maintain a high level of training in foreign or

second language learning. Learners need to acquire an accurate perceptual
representation of the speech sounds of the target language, which is a necessary
prerequisite for pronouncing the foreign speech sounds adequately. The language
laboratory forms the most suitable place to practise phonetic exercises.
10.3.8 Suggestions for further studies
Taking cues from the results, further large-scale and comprehensive investigations
should be conducted to cover other areas that have to do with the speech intelligibility
issue in the Sudanese EFL classroom. Therefore, research will be required in the
following themes:
Insufficient practice, wrong implementation and partial learning represent major causes
of such problems. So, a further study that treats the use of the language laboratory to
teach English phonetics and listening comprehension skills in Sudanese EFL teacher
colleges should be conducted. The primary focus of spoken language is communication,

where listening represents the most important skill in both listening to understand and
listening to imitate. Skills such as these can successfully be developed through language
laboratory exercises that train learners to achieve accurate perception and production of
the sounds of the new language. Moreover, when listening to a foreign language, it is
necessary to know the sounds, rhythms, tunes and stress patterns of that language. A
language laboratory will provide the right environment where the learners can practise
such pronunciation tasks, which will benefit the students’ intelligibility.
Further study is also needed to investigate the possibility of giving more space to
English pronunciation in the curriculum. The materials and classroom activities in-
cluded in secondary and tertiary syllabi in Sudanese EFL settings scarcely incorporate
pronunciation teaching. The proposed study can focus on the teachability-learnability
scale; i.e. what English pronunciation features should be taught and how to sequence
and teach these features with consideration to the differences that exist in the learners’
L1? An important area to be considered is the segmental level, which includes vowel
and consonant sounds as well as syllables. Item sequencing in the syllabus should begin
with the basic sound knowledge which cover vowels, consonants and clusters, and
should end with words and sentences. The study should also consider to what extent
the explicit teaching of basic phonetics (for instance the organization and function of
the speech organs, such as lips, teeth, alveolar ridge, palate, tongue, vocal folds, etc.) is
helpful in the acquisition of EFL pronunciation skills.
Since Sudanese EFL learners receive training to become qualified teachers, it is

important that these learners should master language skills, particularly pronunciation,
which forms the major component in oral communication. Therefore, research that
assesses the learners’ command of intelligible and comprehensible production of
English speech is necessary. Such research can investigate the possibility of finding
effective ways of pronunciation evaluation targeting students preparing for BA or B.Ed.
degrees in teaching English as a foreign language. Assessment can consider many
activities such as interviewing the EFL teachers to find out what techniques they use to
teach pronunciation. In the class, assessors can make a list of the techniques and
methods that the trainee-teacher employs in teaching pronunciation. The teacher’s
philosophy in teaching pronunciation is also important. Several points should be
addressed here. For example, (i) the amount of time teachers spend on the explanation
of specific pronunciation items, (ii) whether the instructor provides a good model of
pronunciation that students benefit from, (iii) the explicit knowledge of the phonology
which the instructor has about the L2, (iv) ability to use contrastive analysis in
establishing differences and similarities between L1 and L2 and (v) effectiveness of the
teachers’ correction of the students deviant pronunciation. The study should also
consider, as one of its goals, the assessment of the testing system to be implemented at
the end of the pronunciation course. This can target test construction treating content,
format and time allowed, and the scoring procedure established.
References
Abdalla, S. Y. (2001). Drop in English performance among Sudanese students at
secondary and tertiary levels. Paper presented at: Drop in English standards
among Sudanese university students, A seminar organized by the Institute of
Abdulmajeed Imam for Humanities, Khartoum, Sudan.
Abdalla, S. Y (2005). Towards a functional approach to the English research on the
writing skills in Sudan. Unpublished Ph.D. dissertation, Khartoum University.
Adank, P., Smits, R. & Van Hout, R. (2004). A comparison of vowel normalization
procedures for language variation research. Journal of the Acoustical Society of
America 116(5), 3099-3107.
Ahmed, M. O. (1988). Vocabulary learning strategies: A case study of Sudanese learners
of English. Unpublished Ph.D. dissertation, University College of North
Wales, Bangor, UK.
Ahmed, M. S. (1984). An experimental investigation of emphasis in Sudanese
colloquial Arabic, Unpubl. PhD diss., University of Reading.
Al-Arishi A. Y. (1991). Quality of phonological input of ESL and EFL trained teachers.
System. An International Journal of Educational Technology and Applied Linguistics 19,
63-74.
Al-Arishi, A. Y. (1992). Positional /p, b/ phonological variability in the speech of
Arabic EFL students. Journal of King Saud University, Arts 4(2), 91-107.
Alan, S. K. (1997). Arabic and its relationship to the other Semitic languages. Arabic
phonology – Phonology of Asia and Africa. California State University. Fullerton. 2,
188-204.
Al Dawla, A. G. (2005). An Analysis of syntactic errors in written and oral productions:
A case study of university students studying English at the Faculty of Arts,
University of Khartoum. Unpublished PhD dissertation, University of
Khartoum.
Alghamdi, M. A. (1998). A spectrographic analysis of Arabic vowels: A cross-dialect
study. Journal of King Saud University, Arts 10(1), 3-24.
Al-jasser, F. (2008). The effect of teaching English phonotactics on the lexical
segmentation of English as a foreign language. System. An International Journal of
Educational Technology and Applied Linguistics 36(1), 94-106.
Allen, J. S. & Miller, J. L. (1999). Effects of syllable-initial voicing and speaking rate on
the temporal characteristics of monosyllabic words. Journal of the Acoustical
Society of America 106(4), 2031-2039.
Altaha, F. (1995). Pronunciation errors made by Saudi university students learning
English: Analysis and remedy. ITL: Review of Applied Linguistics 19, 11-123
Altenberg, E. (2005). The judgment, perception and production of consonant clusters
in a second language. International Review of Applied Linguistics 43, 53-80
Amayreh, M. M., & Dyson, A. T. (1998). The acquisition of Arabic consonants. Journal
of Speech, Language & Hearing Research 41, 642-653.
Anderson-Hsieh, J. & Koehler, K. (1988). The effect of foreign accent and speaking
rate on native speaker comprehension, Language Learning 38, 561-612.
Arslan, L. M. & Hansen, J. H. L (1996). Language accent classification in American
English. Speech Communication, 18(4), 353-367.
Atechi, S. N. (2006). The intelligibility of native and non-native English speech: A comparative
analysis of Cameroon English and American and British English. Cuvillier, Go‫ޠ‬ttingen,
Berlin.
Ball, M. J. & Rahilly, J. (1999). Phonetics: The science of speech. Oxford University Press,
New York.
Benki, J. R. (2003). Analysis of English nonsense syllable recognition in noise. Phonetica
60, 129-157.
Bent, T. & Bradlow, A. R. (2003). The interlanguage speech intelligibility benefit. Journal
of the Acoustical Society of America 114(3), 1600-1610.
Best, C. & Tyler M. (2007). Nonnative and second-language speech perception:
Commonalities and complementarities. In O. S. Bohn and M. J. Munro (Eds.),
Language Experience in Second language Speech Learning. In honor of James Emil Flege.
John Benjamins, Amsterdam, 13-34.
Bjarkman, P. C. & Hammond, R. M. (1989). American Spanish Pronunciation: Theoretical
and Applied Perspectives. Georgetown University Press.
Bobda, A. S. (2000). English pronunciation in sub-Sahara Africa as illustrated by the
NURSE vowel. A comprehensive and innovative review of speech in West,
East and Southern Africa. English Today 16, 41-48.
Boersma, P. & Weenink, D. (1996). Praat, a system for doing phonetics by computer, version 3.4.
Institute of Phonetic Sciences of the University of Amsterdam, Report 132.
Bond, K. (2001). Pronunciation problems for Brazilian students of English: Free
resources for teacher and students of English. Karen's Linguistics Issues.
Retrieved from www3.telus.net/linguisticsissues/pronunciation.html.
Bosman, A. J. (1989). Speech perception by the hearing impaired. Unpublished Ph.D.
dissertation, Utrecht University.
Bo-Young, K. (2005). The patterns of vowel insertion in IL phonology: The P-map
account. Proceedings from the Annual Meeting of the Chicago Linguistics Society 41,
University of Chicago, IL.
Bradlow, A., Clopper, C. & Smiljanic, R. (2007). Perceptual similarity space for
languages. Proceedings of the XVIth International Congress of Phonetic Sciences,
Saarbrücken, 1373-1377.
Brett, D. (2004) Computer generated feedback on vowel production by learners of
English as a second language. ReCALL Journal 16(1), 103-113
Broselow, E. (1984). An investigation of transfer in second language phonology.
International Review of Applied Linguistics 22, 253-326.
Broselow, E. (1992) Parametric variation in Arabic dialect phonology. E. Broselow, M.
Eid and J. McCarthy (Eds.), Current Issues in Linguistic Theory B5: Perspectives on
Arabic Linguistics. Philadelphia: John Benjamin, 7-45.
Brière, E. J. (1966). An investigation of phonological interference. Language 41(4), 768-
796.
Canepari, L. (2005). A handbook of phonetics: Natural phonetics: Articulatory, auditory and
function. LINCOM, München.
Carlisle, R. S. (2001). Syllable structure universals and second language acquisition.
International Journal of English Studies 1(1) 1-19.
Carr, P. (1999). An introduction: phonetics and phonology. Oxford, Blackwell.
Carrell, J. & Tiffany, W. R. (1960). Phonetics: Theory and application to speech improvement.
McGraw-Hill, London-New York.
REFERENCES 201
Catford, J. C. (1977). Fundamental problems in phonetics. Indiana University Press,

Bloomington.
Catford, J. C. (2001). A Practical Introduction to Phonetics. Cambridge University Press.
Chomsky, N. & Halle, M. (1968). The Sound Pattern of English. Harper & Row, New York,
Evanston and London.
Clements, G. (1990). The role of the sonority cycle in core syllabification. In J.
Kingston and M. Beckman (Eds.), Papers in Laboratory Phonology 1, Between the
Grammar and Physics of Speech. Cambridge University Press, Cambridge, 283-333.
Clements, G. N. & Keyser, S. J. (1988). From CV phonology: A generative theory of
the syllable. Language 64(1), 118-129.
Collins, B. & Mees, I. (1981). The sounds of English and Dutch. Leiden University Press,
The Hague, Boston, London.
Comrie, B., Dryer, M.S., Haspelmath, M. & Gil, D. (Eds.) (2005). World Atlas of
Language Structures. Oxford: Oxford University Press.
Corder, S. P. (1974). Error Analysis. In J. P. B. Allen and S. Pit Corder (Eds.), Techniques
in Applied Linguistics,. Oxford University Press, London, 122-154.
Corriente, F. (1978). D-L doublets in Classical Arabic as de-lateralisation of dad
development of its standard reflex. Journal of Semitic Studies 23(1), 50-55.
Cruttenden, A. (2008). Gimson’s Pronunciation of English. Oxford University Press, New
York.
Crystal (1999). The Penguin dictionary of language. Blackwell, Malden, MA.
Cunningham-Andersson, U. (2003). Temporal indicators of language dominance in
bilingual children. PHONUM 9, 77-80.
Cutler A., Smits R. & Cooper, N. (2005). Vowel perception: Effects of non-native
language vs. non-native dialect. Speech Communication 47, 32-42.
Dahlquist, L. (2002). Technology and phonemic awareness: A step toward literacy.
Closing the Gap 21.
David J. Ch. & Dickson, S. V. (1999 ). Phonological awareness: Instructional and
assessment guidelines. Intervention in School and Clinic 34(5), 261-270.
Davidson, L. & Stone, M. (2004). Epenthesis versus gestural mistiming in consonant
cluster production. In G. Garding and M. Tsujimura (Eds.), Proceedings of the
West Coast Conference on Formal Linguistics 22.: Cascadilla Press, Somerville, MA.
Davidson, L. (2006). Phonology, phonetics or frequency: Influences on the production
of non-native sequences. Journal of Phonetics 34, 104-137.
De Jong, K. (2004). Stress, lexical focus and segmental focus in English: patterns of
variation in vowel duration Journal of Phonetics 23(4), 493-516.
Delattre, P. C., Liberman, A. M., & Cooper, F. S. (1955). Acoustic Loci and transitional
cues for consonants. Journal of the Acoustical Society of America 27, 769-773.
Derrick, D. (2005). Production quality of /r/ and /l/ liquids among Cantonese and
Mandarins ESL learners. Journal of the Acoustical Society of America 117(4), 2425-
2425.
Deterding, D. (1997). The formants of monophthong vowels in Standard Southern
British English pronunciation. Journal of the International Phonetic Association 27,
47-55
Dickins, J. (2007). Sudanese Arabic: Phonematics and syllable structure: Integrating consonants and
vowels. Otto Harrassonwitz Verlag, Wiesbaden.
do Val Barros, A. M. (2003). Pronunciation difficulties in the consonant system by

Arabic speakers when learning English after puberty. Unpublished MA thesis,
University of West Virginia.
Docherty, G. J. (1992). The timing of voicing in British English obstruents. Foris, Dordrecht.
Dretzke, B. ( 1998). Modern British and American English Pronunciation. UTB, Munich.
Duan, P. & Gu, W. (2004). Teaching trial and analysis of English for technical
communication. Asian EFL Journal 2(20), 14.
Eckman, F. R., Elreyes, A. & Iverson, G. K. (2003). Some principles of second
language phonology. Second Language Research 19, 169-208.
Elobeid, A. R. & Maaly, I. A. (1996). Towards parametric representations of Arabic
speech signals. Sudan Engineering Society Journal 40(34), 35-42.
Fahal Z. M. (2004). Awareness of pronunciation among Sudanese EFL students at
tertiary level – A case study of SUST students. Unpublished MA thesis, Dept.
of Linguistics, Sudan University of Science and Technology.
Fant, G. (1973). Speech Sounds and Features. MIT Press, Cambridge, MA.
Fareh, Sh. (2010). Challenges of teaching English in the Arab world: Why can’t EFL
programs deliver as expected? Procedia - Social and Behavioral Sciences, 2(2), 3600-
3604.
Fender, M. (2008). Spelling knowledge and reading development: Insights from Arab
ESL learners. Reading in a Foreign Language 20(1), 19-42.
Flege, J. E. (1999) Age of learning and second-language speech. In D. Birdsong
(Ed.), Second language acquisition and the Critical Period Hypothesis. Lawrence
Erlbaum, Hillsdale, NJ, 101-132.
Flege, J. E. (2003). Assessing constraints on second-language segmental production and
perception. In A. Meyer & N. Schiller (Eds.), Phonetics and phonology in language
comprehension and production, differences and similarities. Mouton de Gruyter, Berlin,
319-355.
Flege, J. E. (1992). The intelligibility of English vowels spoken by British and Dutch
talkers, in Kent, R. D. (Ed.), Intelligibility in speech disorders. Theory, measurement
and management. Amsterdam/Philadelphia: John Benjamins, Studies in Speech
Pathology and Clinical Linguistics 1 ,157-232.
Flege, J. E. & Port. R. (1981). Cross-language phonetic interference: Arabic to English.
Language and Speech 24, 125-146.
Flege, J. E. (1976). Instrumental study of L2 speech production: Some methodological
considerations. Language Learning 37(2), 285-295
Flege, J. E. (1995). Second language learning. Theory, findings and problems. In W.
Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research.
York Press, Baltimore, MD, 233-277.
Flege, J. E., & Font, R. (1980). Phonetic approximation in second language acquisition.
Language Learning 30(1), 117-134
Fokes, J. Bond, Z. S., & Steinberg, M. (1985). Acquisition of the English voicing
contrast by Arab children. Language and Speech 28, 81-92.
Fraser, H. (2005). Teaching pronunciation: A guide for teachers of English as a second language and
learn to speak clearly in English. Fyshwick, Australia: Catalyst Interactive, 2001
(Windows CD-ROM).
Frisch, S. (1996). Similarity and frequency in phonology. Ph.D. dissertation, North-
Western University, Evanston, IL
REFERENCES 203
Gass, S. M., & Selinker, L. (2008). Second language acquisition: An introductory course (3rd ed.).
Routledge, New York.
Giegerich, H. J. (1992). An introduction to English phonology. Cambridge University Press,
Cambridge.
Gierut, J. (1999). Syllable onsets: Clusters and adjuncts in acquisition. Journal of Speech,
Language and Hearing Research 42, 708-726.
Gierut, J. & Champion, A. H. (2001). Syllable onsets II: Three-element clusters in
phonology treatment. Journal of Speech, Language and Hearing Research 44(4), 886-
904.
Gilbers, D. (1992). Phonological networks: A theory of segment representation. Ph.D.
dissertation, Groningen University.
Gilbert, J. B. (1984). Clear speech: Pronunciation and listening comprehension in American English.
Teacher’s manual and answer key.. Cambridge University Press, Cambridge.
Gilbert, J. (1995). Pronunciation practices as an aid to listening comprehension. In D. J.
Mendelson and J. Rubin (Eds.), A guide for the teaching of Second Language Learning.
Dominic Press, San Diego, 97-111.
Gimson, A. G. (1989). An introduction to pronunciation of English. Cambridge University
Press.
Goldrick, M. (2004). Phonological features and phonotactic constraints in speech
production. Journal of Memory and Language 51(4), 586-603.
Groenen, P., Maassen, B. & Crul, Th. (1996). The specific relation between perception
and production errors for place of articulation in developmental apraxia of
speech. Journal of Speech and Hearing Research 39(3), 468-482.
Gussenhoven, C. & Broeders, A. (1976). The Pronunciation of English; A course for Dutch
learners. Longman, London.
Gussenhoven, C. & Jacobs, H. (1998). Understanding phonology. Arnold, London.
Hassan, Z. M. (2003). Temporal compensation between vowel and consonant in
Swedish & Arabic in sequences of CV: C & CVC and the word overall
duration. PHONUM 9, 45-48.
Hayat, A. (2005). Transcribing Arabic phonemes. A preliminary attempt. I-mag 3, 29-33.
(available from www.I-mag.org).
Heeren, W. & Schouten, M. E. H. (2008). Perceptual development of phoneme
contrasts: How sensitivity changes along acoustic dimensions that contrast
phoneme categories. Journal of the Acoustical Society of America 124(4), 2291-2302.
Hewings, M. (2004). Pronunciation practice activities. A source book for teaching English
pronunciation. Cambridge University Press.
Hillenbrand, J. M., & M. J. Clark (2000). Some effects of duration on vowel recognition.
Journal of the Acoustical Society of America 108(6), 3014-3022.
Hoffer, B. (1970). Contrastive analysis of generative phonology. Journal-Newletter of the
Association of Teachers of Japanese 6(3), 3-11.
House, A. S. (1961). On vowel duration in English. Journal of the Acoustic Society of
America 33, 1174-1178.
Hudgins, C. V., Hawkins, J. E., Jr., Karlin, J. E., & Stevens, S. S. (1947). The
development of recorded auditory tests for measuring hearing loss for speech.
The Laryngoscope 57, 57-89.
Huthaily, Kh. (2003). Contrastive phonological analysis of Arabic and English.
Unpublished MA thesis, University of Montana.
Hyman, L. M. (1975). Phonology: Theory and analysis. Holt, Rinehart & Winston, New
York.
Iverson, P. & Kuhl, P. K. (1995). Mapping the perceptual magnet effect for speech
using signal detection theory and multidimensional scaling. Acoustical Society of
America 97(1), 553-562
Iverson, P., Ekanayake, D., Hamann, S., Sennema, A. & Evans B. G. (2008). Category
and perceptual interference in second-language phoneme learning: An
examination of English /w/-/v/ learning by Sinhala, German, and Dutch
speakers. Journal of Experimental Psychology: Human Perception and Performance, 34
(5 ), 1305-1316.
Jacewicz, E., Fox, R. A. & Salmons, J. (2006). Prosodic prominence effects on vowels
in chain shifts. Language Variation & Change 18(3), 285-316.
Jenkins, J. (2000). The phonology of English as an international language: new models, new norms,
new goals. Oxford University Press, Oxford.
Jenkins, J. (2002). A sociolinguistically based, empirically researched pronunciation
syllabus for English as an International Language. Applied Linguistics 23(1), 83-
103.
Jesry, M. M. (2005). Theoretically-based practical recommendations for improving
EFL/ESL students’ pronunciation. Journal of King Saud University, Language &
Translation 18, 1-33.
Johnson, J. S. & Elissa J. N. (1989). Critical period effects in second language learning:
The influence of maturational state on the acquisition of English as a second
language. Cognitive Psychology 22, 60-99.
Jongman, A., Herd, W. & Al-Masri, M. (2007). Acoustic correlates of emphasis in
Arabic. Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken,
913-316.
Jones, M. & Llamas, C. (2008). Fricated realizations of /t/ in Dublin and Middles-
brough English: an acoustic analysis of plosive frication and surface fricative
contrasts. English Language and Linguistics 21(3), 419-443.
Kalikow, D. N., Stevens, K. N. & Elliott L.L. (1977). Development of a test of speech
intelligibility in noise using sentence materials with controlled word
predictability. Journal of the Acoustical Society of America 61(5), 1337–1352.
Kang, H., & Yoon, K. (2005). Tense and lax distinction of English [s] in intervocalic
position by Korean speakers: Consonant/vowel ratio as a possible universal
cue for consonant distinctions. Studies in Phonetics, Phonology and Morphology
11(3), 407-419.
Karouri, A. M. (1996). Phonetics of classical Arabic: A selectional study of the problematic sounds.
Khartoum University Press, Khartoum.
Kawasaki, H. (1982). An acoustic basis for universal constraints on sound sequences.
PhD dissertation, University of California, Berkeley.
Kawasaki, H. (1993). The phonetics of sound change. In Ch. Jones (Ed.), Historical
Linguistics: Problems and Perspectives. Longman, London.
Kaye, A. S. (1997). Arabic and its relationship to the other Semitic languages. In A.S.
Kaye (Ed.), Phonologies of Asia and Africa (including the Caucasus), Vol. 2.
Eisenbrauns, Winona Lake, IN, 188-204.
Kenstowicz, M. J. (1994). Phonology of generative grammar. Blackwell, Cambridge.
REFERENCES 205
Kent, R. D., Dembowski, J. & Lass, N. J. (1996) The acoustic characteristics of

American English. In N. J. Lass (Ed.), Principles of experimental phonetics, Mosby,
St. Louis, MI, 185-225.
Kharma, N. & Hajjaj, A. (1989). Errors in English among Arabic speakers: Analysis and
remedy. Longman, London.
Khattab, Gh. (2000). VOT Production in English and Arabic bilingual and monolingual
children. Leeds Working Papers in Linguistics and Phonetics, 8, 95-122.
Khattab, Gh. (2002). /r/ production in English and Arabic bilingual and monolingual
speakers. Leeds Working Papers in Linguistics and Phonetics, 9, 91-129.
Klecka, W. R. (1980). Discriminant Analysis. Sage Publications, Beverly Hills, CA.
Kluge, D. C., Rauber, A. S., Reis, M. S. & Bion, R. A. H. (2007). The relationship
between perception and production of English nasal codas by Brazilian
learners of English. Proceedings of Interspeech 2007, 2297-2300
Kopczwski, A. & Mellani, R. (1993). The vowels of Arabic and English. Papers and
Studies in Contrastive Linguistics. Adam Mickiecwicz University and Constantine
University of Alegeria 27, 184-192.
Krashen, S. D. (1973). Lateralization, language learning and the critical period: Some
new evidence. Language Learning 23, 63-74.
Krashen, S. D. (1985). The input hypothesis: issues and implications.: Longman, Harlow.
Kuhl, P. K. (1994). Learning and representation in speech and language. Current Opinion
in Neurobiology 4, 812-822.
Kuhl, P. K. (2000). A new view of language acquisition. Proceedings of the National
Academy of Sciences 97(22), 11850-11857.
Labov, W. (1966). The social stratification of English in New York City. Center for
Applied Linguistics, Washington DC:
Lachs, L. (1999). Use of partial stimulus information in spoken word recognition
without auditory stimulation. Research On Spoken Language Processing.
Progress Report No. 23. Dept. of Psychology, Indiana University:
Bloomington, 83- 85
Ladefoged, P. (1993). A course in phonetics. Harcourt, Brace and Jovanovich, Fort Worth,
TX.
Ladefoged, P. (2003). Phonetic data analysis: An introduction to fieldwork an instrumental
techniques. Blackwell, Malden MA.
Lafon, J. C. (1966). The Phonetic test and the measurement of hearing. Centrex Publishing,
Eindhoven.
Laufer, A (1988). The emphatic and pharyngeal sounds In Hebrew and Arabic. Language
and Speech 31, 191-199.
Laver, J. (2002). Principles of phonetics. Cambridge University Press.
Lee, B. D. (1969). Identification of American English initial /l/ and /r/ by native
speakers of Japanese. MA thesis, Dept. of Linguistics, University of Tokyo.
Lehn, W. & Slager, W. R. (1983). A contrastive study of Egyptian Arabic and American
English: The segmental phonemes. In B. Wallace Robinett & J. Schachter
(Eds.), Second language learning: Contrastive analysis, error analysis and related aspects,
University of Michigan Press, Ann Arbor, 32-40.
Lehn, W. (1963). Emphasis in Cairo Arabic. Linguistic Society of America, 39(1), 29-39.
Liberman, A. M., Harris, K. S., Hoffman, H. S., Griffith, B. C. (1957). The

discrimination of speech sounds within and across phoneme boundaries.
Journal of Experimental Psychology 54(5), 358-368.
Logan, J. S., Greene, B. & Pisoni D. B. (1989). Segmental intelligibility of synthetic
speech produced by rule. Journal of the Acoustical Society of America 86, 566-582.
Long, D. (1996). Quasi-standard as a linguistic concept. American speech, 71(2)118-135.
Long, M. (1990). Maturational constraints on language development. Studies in Second
Language Acquisition 12, 251-285.
Luchini, P. (2005). Task-based pronunciation teaching: A state-of-the-art perspective.
Asian EFL Journal 7(4), 191-202.
Maniwa, K., Jongman, A. & Wade, T. (2009). Acoustic characteristics of clearly spoken
English fricatives. Journal of the Acoustical Society of America 125, 3962-3973.
Marslen-Wilson, W. D. & Welsh, A. (1978). Processing interactions and lexical access
during word recognition in continuous speech. Cognitive Psychology 10, 29-63.
Martinet, A. (1949). Phonology as functional phonetics. Oxford University Press, London.
Massaro, D. W. (1975). Understanding language: An information processing analysis of speech
perception, reading and pyscholinguistics. Academic Press, New York.
McCollough, C. & Van Atta, L. (1965). Introduction to statistics and correlation: A program for
self-instruction. Mc Graw Hill, New York.
McLeod, S., van Doon, J., & Reed, V.A. (2001). Normal acquisition of consonant
clusters. American Journal of Speech-Language Pathology 10, 99-110.
McLeod, Sh., Arciuli, J. (2009). School-aged children’s /s/ and /r/ consonant clusters.
Folia Phoniatrica et Logopaedica, 61, 337-347
Miller, G. A. & Nicely, P. E. (1955). An analysis of perceptual confusions among some
English consonants, Journal of the Acoustical Society of America 27(2), 338-352.
Miller, L. K. (1981). Perceptual independence of the hemifields in children and adults.
Journal of Experimental Child Psychology, 32, 298-312.
Mitleb, F. (1981). Segmental and non-segmental structure in phonetics: evidence from foreign accent.
Ph.D. dissertation, University of Indiana.
Mitleb, F. (1984). Timing of English vowels spoken with an Arabic accent. In M. P. R.
van den Broecke & A. Cohen (Eds.), Proceedings of the Tenth International Congress
of Phonetic Sciences. Foris, Dordrecht, 700-705.
Mitchell, T. F. (2004). Arabic phonology 1: Translated with introduction and commentary.
Oxford University Press, Oxford.
Mitchell, T. F. & El-Hassan, S. H. (1989). English Pronunciation for Arabic Speakers..
Longman, London.
Mohamed, Y. M. (2005). Pronunciation difficulties experienced by Sudanese learners of
EFL. A contrastive investigation of phonemic and phonotactic structures.
Ph.D. dissertation, Khartoum University.
Mohammed, A. M. M (1991). Error-based interlinguistic comparisons as a learner-
centred technique of teaching English grammar to Arab students. Ph.D.
dissertation, University of Salford.
Morris, R. J., McCrea, C. R. & Herring, K. D. (2007). Voice onset time differences
between adult males and females: Isolated syllables. Journal of Phonetics 36(2),
308-317.
Munro, J. M. (1993). Production of English vowels by native speakers of Arabic:
Acoustic measurement and accentedness ratings. Language and Speech 36, 39-62.
REFERENCES 207
Munro, J. M., Derwing, T. D. & Morton, L. S. (2006). The mutual intelligibility of L2

Speech. Studies in Second Language Acquisition 28(1), 111-131.
Nair-Venugopal, Sh. (2003). Intelligibility in English: Of what relevance today to
intercultural communication? Language and Intercultural Communication, 3(1), 36-
47.
Newman, D. and Verhoeven, J. (2002) Frequency analysis of Arabic vowels in
connected speech. Antwerp papers in linguistics 100, 77-86.
Nielsen, K. Y (2004). Segmental difference in the visual contribution to speech
intelligibility. MA thesis, University of California, Los Angeles.
Nooteboom, S. G. (1981). Lexical retrieval from fragments of spoken words:
Beginnings versus endings. Journal of Phonetics 9, 407-424.
Nwesri, A. F. A., Tahaghoghi, S. M. M. & Scholer, F. (2006). Capturing out-of-
vocabulary words in Arabic text. Proceedings of the 2006 Conference on Empirical
Methods in Natural Language Processing, Sydney, Australia, 258-266.
Obrecht, D. H. (1968). Effects of the second formant on the perception of velarized consonants in
Arabic. The Hague: Mouton.
Ohata, K. (2007). Phonological differences between Japanese and English: Several
potentially problematic areas of pronunciation for Japanese learners. Asian
EFL Journal 6(4), 1-2.
Odisho, E. Y. (2005). Techniques of teaching comparative pronunciation in Arabic and English
Gorgias Press, Piscataway, New Jersey.
Pascoe, M. (2005). What is intelligibility? How do SLP’s evaluate and address children’s
intelligibility intervention? The Apraxia-Kids Monthly, 6, 5.
Patil, Z. N. (2006). On the nature and role of English in Asia. Linguistics Journal 2(2), 88-
132.
Piske, Th., Flege, J. E., Ian, R. A. & Meador, M. D. (2002). The production of English
vowels by fluent early and late Italian-English bilinguals. Phonetica 59, 49-72.
Rababah, Gh. (2005). Communication problems facing Arab learners of English: A
personal perspective. Journal of Language Learning 3(1), 17-18.
Raimy, E. (1997). Syllable repair in Sudanese Arabic. Toronto Working Papers in Linguistics
16, 117-131
Ramsaran, S., Ed. (1999). Studies in the Pronunciation of English: A commemorative volume in
honour of A. C. Gimson. Routledge, London and New York.
Raphael, L. J., Borden G. J. & Harris, K. S. (2003). Speech science primer: Physiology, acoustics
and perception of Speech. Lippincott, Williams and Wilkins, Baltimore, MD.
Rasmussen, Z. B. (2007) The inter-language speech intelligibility benefit: Arabic-
accented English. MA thesis, The Speech Acquisition Lab, University of Utah.
Reynolds, M. E, Bond, Z. S. & Fucci, D. (2006). Synthesized speech intelligibility
among native speakers and non-native speakers of English. Augmentative and
Alternative Communication Research 22(4), 258-268
Rhebergen, K. S. & Versfeld, N. J. (2005). A Speech Intelligibility Index-based
approach to predict the speech reception threshold for sentences in
fluctuating noise for normal-hearing listeners. Journal of the Acoustical Society of
America 117, 2191-2192.
Roach, P. (2004). British English: Received Pronunciation. Journal of the International
Phonetic Association 34(2), 240-245.
Roach, P. J., Hartman, James. W., & Setter, J. E. (Eds.) (2006) Daniel Jones’ English
Pronouncing Dictionary. Cambridge University Press, Cambridge.
Ruhaif, S. A. (2007). Difficulties in oral/aural communication for Arab learners of

English. NNETESOL. Saint Michael’s College.
Ryding, K. C. (2005). A reference grammar of Modern Standard Arabic. Cambridge University
Press.
Schmidt, R. (1992). Psychological mechanisms underlying second language fluency.
Studies in Second Language Acquisition 14, 357-385.
Schmidt, R.W. (1977). Sociolinguistic variation and language transfer in phonology.
Working Papers in Bilingualism 12, 79-95.
Scholes, R. J. (1968). Phonemic Interference As a Perceptual Phenomenon. Language
and Speech 11, 86-103.
Scott, K. (1999). The Impact of accent, noise and linguistic predictability on the
intelligibility of non-native speakers of English. Ph.D. dissertation, Florida
University.
Seidlhofer, B. (2005). Key concepts in ELT: English as a lingua franca. ELT Journal
59(4), 339-341.
Seo, M. (2003). A segment contact account of the patterning of sonorants in consonant clusters. Ph.D.
dissertation, Ohio State University.
Shibatani, M. (1973). The role of surface phonetic constraints in generative phonology.
Language, 48(1) 87-106.
Singleton, D. & Lengyel Z. (Eds.) (1995). The age factor in second language acquisition: a
critical look at the Critical Period Hypothesis. Multilingual Matters, Clevedon.
Smith III, J. O. & Abel, J. S. (1999). Bark and ERB bilinear transforms, IEEE
Transactions on Speech and Audio Processing 7(6), 697-708.
Smith, L. E. (1992). Spread of English and issues of intelligibility. In B. B.Kachru (Ed.),
The other tongue. University of Illinois Press, Urbana. IL, 76-90.
Smith, L. E. & Bisazza, J. (1982). The comprehensibility of three varieties of English
for college students in seven countries. Language Learning 32, 259-269
Steinlen, A. K. 2002). The influence of consonants on native and non-native vowel production.
Günter Narr, Tübingen.
Strange, W., Bohn, S. O., Trent, S. A. & Nishi, K. (2004). Acoustic and perceptual
similarity of North German and American English vowels Journal of the
Acoustical Society of America, 115, 1791-1807.
Subramaniam, N. & Ramachandraiah, A. (2006). Speech intelligibility issues in
classroom acoustics: A review. IE(I) Journal-AR, 87, 28-33.
Suhana, Sh. (2001). A cross-linguistic study of phonological development. Journal of
Undergraduate Research, University of Florida, 2, 11.
Takayuki, A. Dawn, B., Peter, C. & Kirk Sullivan (1999). Perceptual cues to vowel
quantity: Evidence from Swedish and Japanese. Proceedings Fonetik 99, Swedish
Phonetics Conference, Göteborg, 29-31 June, 1999.
Taylor, G. (1986). Errors and explanations. Applied Linguistics 7, 144-166.
Tomokiyo, L. M., Black, A. W., & Lenzo, K. A. (2003). Arabic in my hand: Small-
footprint synthesis of Egyptian Arabic. Proceedings of Eurospeech 2003, 2049-
2052.
Thelwall, R. (1990), Illustrations of the IPA: Arabic. Journal of the International Phonetic
Association 20, 37-41.
Traunmüller, H. 1990. Analytical expressions for the tonotopic sensory scale. Journal of
the Acoustical Society of America, 88, 97-100.
Trochin, W. M. K. (2006). Social Research Methods Knowledge Base. Atomic Dog Publishing.
REFERENCES 209
Trudgill, P., & Hannah, J. (2002). Guide to the variations of standard English. Oxford
University Press, New York.
Tsukada, K. (2009). An acoustic comparison of vowel length contrasts in Arabic,
Japanese and Thai: Durational and spectral data. International Journal on Asian
Language Processing, 19(4), 127-138.
Tucker, B. V. & Warner, N. (2010). What it means to be phonetic or phonological: The
case of Romanian devoiced nasals. Phonology 27, 289-324.
Van Bezooijen, R. & Van Heuven, V. J. (1997). Assessment of speech synthesis. In D.
Gibbon, R. Moore, R. Winski (Eds.), Handbook of standards and resources for
spoken language systems. Mouton de Gruyter, Berlin/New York, 481-653.
Van den Doel, R. (2006). How friendly are the natives? An evaluation of native-speaker judgments
of foreign-accented British and American English. LOT dissertation series nr. 144.
LOT, Utrecht.
Van Heuven, V. J. (1986). Some acoustic characteristics and perceptual consequences
of foreign accent in Dutch spoken by Turkish immigrant workers. In J. van
Oosten & J. F. Snapper (Eds.), Dutch Linguistics at Berkeley, papers presented at the
Dutch Linguistics Colloquium held at the University of California, Berkeley on November
9th, 1985, Berkeley: The Dutch Studies Program, U. C. Berkeley, 67-84.
Van Heuven, V. J. (2008). Making sense of strange sounds. (Mutual) intelligibility of
related language varieties. A Review. International Journal of Humanities and Arts
Computing 2, 39-62.
Van Heuven, V. J. & Wang, H. (2007). Quantifying the interlanguage speech
intelligibility benefit. Proceedings of the 16th International Congress of Phonetic Sciences,
Saarbrücken, 1729-1732.
Van Son, R. J. J. H. & Pols, L. C. W. (1999). An acoustic description of consonant
reduction. Speech Communication 28, 125-140.
Venkatagiri, H. S. & Levis, J. M. (2007). Phonological awareness and speech
comprehensibility: An exploratory study, Language Awareness 16(4), 263-277.
Walker, R. (2001). Pronunciation for international intelligibility. Karen’s linguistics
issues: Free resources for teacher and students of English. English Teaching
Professional Magazine 22, 1-4.
Wang, H. (2007). English as a Lingua Franca. Mutual intelligibility of Chinese, Dutch and
American speakers of English. LOT Dissertation series nr. 147, LOT, Utrecht.
Wang, H. & Van Heuven, V. J. (2003). Mutual intelligibility of Chinese, Dutch and
American speakers of English. In P. Fikkert & L. Cornips (Eds.), Linguistics in
the Netherlands 2003, Amsterdam/Philadelphia: John Benjamins, 213-224.
Wang, H. & Van Heuven, V. J. (2004). Cross-linguistic confusion of vowels produced
and perceived by Chinese, Dutch and American speakers of English. In L.
Cornips & J. Doetjes (Eds.), Linguistics in the Netherlands 2004. John Benjamins,
Amsterdam/Philadelphia, 205-216.
Wang, H. & Van Heuven, V. J. (2006). Acoustical analysis of English vowels produced
by Chinese, Dutch and American speakers. In J. M. van de Weijer & B. Los
(Eds.), Linguistics in the Netherlands 2006. John Benjamins, Amsterdam/Phila-
delphia, 237-248
Watt, D. J. L., Docherty, G. J. & Foulkes, P. (2003). First accent acquisition: a study of
phonetic variation in child-directed speech. Proceedings of the 16th International
Congress of Phonetic Sciences, Saarbrücken, 1959-1962.
Watson, J. C. E. ( 2002). The phonology and morphology of Arabic. Oxford University Press.
Wells, J. C. (1962). A study of the formants of the pure vowels of British English. M.A.
thesis. University of London. Website 2/1/2002. Wells, Formants of pure
vowels: relative amplitude.
Wells, J. C. (1999). British English Pronunciation preferences: a changing scene. Journal
of the International Phonetic Association 29(1), 33-50.
Woods, A., Fletcher, P. & Hughes, A. (1986). Statistics in language studies. Cambridge
University Press, Cambridge.
Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands
(Frequenzgruppen), Journal of the Acoustical Society of America 33, 248.
Samenvatting
Communicatie met behulp van taal vindt plaats tussen twee interactanten, een spreker
en een luisteraar. Wanneer de luisteraar de woorden herkent en begrijpt wat de spreker
tegen hem zegt, dan is de spraak verstaanbaar en de communicatie succesvol. Als een
luisteraar de spreker niet of niet goed verstaat, kan dat liggen aan elk van beide
interactanten. Dit proefschrift gaat over slechte verstaanbaarheid in situaties waarin de
spreker of de luisteraar (of beiden) de taal waarin gecommuniceerd wordt slechts
beheerst als een tweede of vreemde taal. Meer in het bijzonder gaat deze studie over
verstaanbaarheidsproblemen bij Sudanese universitaire studenten Engels als vreemde
taal (EVT) en over de taalkundige oorzaken achter deze problemen.
Het onderzoek omvat (i) drie auditieve perceptie-experimenten, (ii) drie productie-
experimenten en (iii) twee schriftelijke enquêtes. De experimenten richten zich op de
segmentele verstaanbaarheid van spraak die is geproduceerd in EVT met een Sudanees-
Arabisch accent.
In de perceptie-experimenten heb ik gebruik gemaakt van een variant op de Modified

Rhyme Test (MRT) als instrument om segmentele verstaanbaarheid te meten (Logan,
Greene en Pisoni 1989). In deze test moeten woorden herkend worden in een gesloten
set van vier alternatieven waaruit de luisteraar het woord moet kiezen dat de spreker
bedoeld heeft. De score wordt gevormd door het percentage correct aangewezen
woorden. De test richt zich op de identificatie van fonemen en multifonemen.
Fonemen zijn individuele klinkers en medeklinkers terwijl multifonemen medeklinker-
reeksen (zgn. clusters) bevatten. Daarenboven is ook de herkenbaarheid van woorden
gemeten in korte, voorspelbaar verlopende alledaagse zinnen, waarbij het testmateriaal
is overgenomen uit de Speech Perception in Noise (SPIN) test (Kalikov, Stevens en
Elliot 1977), die eerder met succes is toegepast in vergelijkbaar onderzoek. De SPIN-
score is het percentage correct herkende woorden, waarbij in elke van de 25 testzinnen
alleen het laatste woord herkend hoeft te worden, zoals in Spread some butter on your bread
(sleutelwoord onderstreept).
In de eerste perceptieproef werd nagegaan hoe goed Sudanese universitaire studenten

moedertaalsprekers van het Engels verstaan (hoofdstuk 3). Het tweede experiment
vergelijkt de verstaanbaarheid van Sudanese EVT-sprekers en moedertaalsprekers van
het Brits Engels voor Nederlandse studenten als EVT-luisteraars (hoofdstuk 4). Het
derde perceptie-experiment werd uitgevoerd met Engelse en Amerikaanse moedertaal-
luisteraars die spraakmateriaal te horen kregen van een representatieve Sudanese EVT-
spreker en een moedertaalspreker van het Brits Engels (hoofdstuk 5).
Drie spraakproductie-experimenten werden uitgevoerd om de akoestische correlaten in

kaart te brengen van klinkers (hoofdstuk 6), medeklinkers (hoofdstuk 7) en mede-
klinkerclusters (hoofdstuk 8) zoals gesproken in Engels met een Sudanees-Arabisch
accent. Het doel van deze experimenten was om de akoestische eigenschappen van
deze klanken te onderzoeken in vergelijking met die van moedertaalsprekers van het
Brits Engels, teneinde zicht te krijgen op overeenkomsten en verschillen tussen de twee
klanksystemen. Resonantiefrequenties van de eerste en tweede klinkerformant (F1 en

F2), klinkerduur, moment van steminzet (Voice Onset Time (VOT), intensiteit,
medeklinkerduur, voorgaande klinkerduur, spectraal zwaartepunt (Centre of Gravity,
COG) en clusterduur werden geanalyseerd en vergeleken tussen de moedertaal- en de
EVT-realisaties van de doelklanken.
Een derde informatiebron was een schriftelijke enquête waarin Sudanese EVT-leerders
en hun docenten werden gevraagd naar hun mening en waardeoordelen. Aan de hand
van een reeks vragen in gesloten (multiple-choice) of open format beoogden deze
enquêtes een beeld te krijgen van de subjectieve ideeën over sterke en zwakke punten in
de uitspraak en herkenning van Engelse klanken door Sudanese studenten Engels, in de
beleving van zowel die studenten zelf als die van hun docenten. Deze subjectieve
gegevens zijn een aanvulling op de objectieve onderzoeksgegevens uit de laboratorium-
experimenten, waardoor vollediger zicht wordt gekregen op het probleem.
Ik geef nu een korte samenvatting per hoofdstuk.
Hoofdstuk 1 beschrijft het onderzoeksplan. Het biedt een inleiding in het onderwerp,
zet de doelstellingen uiteen en formuleert de onderzoeksvragen. Dit hoofdstuk geeft
ook algemene informatie over testmethoden, de opzet van de experimenten, keuze van
proefpersonen en testmaterialen.
Hoofdstuk 2 presenteert een contrastieve analyse van de klankinventarissen van het

Engels en het Arabisch. Het doel van dit onderdeel is om inzicht te verschaffen in de
overeenkomsten en verschillen tussen de klanksystemen van het Engels en het Ara-
bisch en om technieken en strategieën te bespreken die gebruikt worden om fouten en
leerproblemen te voorspellen bij de verwerving van een vreemde taal (T2). Daarbij
bespreek ik taalkundige theorieën en hypothesen zoals de contrastieve analyse (CA),
foutenanalyse (error analysis, EA) en de Markedness Differential Hypothese (MDH).
Het hoofdstuk eindigt met voorspellingen over welke structurele verschillen tussen het
Engels en het Arabisch zullen leiden tot uitspraakproblemen en daardoor de verstaan-
baarheid van Sudanees-Arabische EVT-leerders zullen aantasten.
Hoofdstuk 3 behandelt de identificatie van Engelse klinkers, medeklinkers en mede-

klinkerclusters alsmede de herkenning van woorden in SPIN-zinnen. Het testmateriaal
was ingesproken door een representatieve moedertaalspreker van het Standaard Brits
Engels (Received Pronunciation, RP). Als luisteraars diende een groep van 10 Sudanese
EVT-studenten die in opleiding waren voor docent Engels. Het testmateriaal omvatte
een lijst van eenlettergrepige Engelse woorden die waren gelezen in een vaste draagzin
(Say … again) en waarin alle Engelse klinkers, alle medeklinkers en een selectie van
medeklinkerclusters aan bod kwamen. De resultaten wijzen uit dat de studenten
ernstige problemen ondervonden bij de perceptieve identificatie van de Engelse
klanken als ook bij de herkenning van woorden in zinnen. Foutieve klankidentificatie
kwam vaker voor bij klinkers (48% correct) dan bij medeklinkers (85% correct) en
clusters (75% correct). Slechts 30 procent van de woorden in SPIN-zinnen werd
correct herkend. De foutenanalyse laat zien dat deze verstaansproblemen het gevolg
zijn van transfer van de normen van de moedertaal (T1) en van onvoldoende kennis
van de klankstructuur van de T2.
SAMENVATTING 213
Hoofdstuk 4 richt zich op de identificatie van klinkers, enkelvoudige medeklinkers en

medeklinkerclusters als ook op de herkenning van woorden in SPIN-zinnen. Tien
Nederlandse luisteraars namen deel aan de perceptieproeven. Nederlandse luisteraars
zijn in dit onderzoek opgenomen omdat zij een groep niet-moedertaalluisteraars van
het Engels vormen die veel belangrijke eigenschappen gemeen hebben met mijn
Sudanese studenten maar daar in één cruciaal aspect van verschillen, namelijk in het
gegeven dat het Nederlands en het Engels nauw verwante talen zijn terwijl het Arabisch
en het Engels dat niet zijn. Het testmateriaal werd ingesproken door twee sprekers,
d.w.z. dezelfde moedertaalspreker van het Engels die ook werd gebruikt in hoofdstuk 3
(in feite werd hetzelfde materiaal opnieuw gebruikt) en een representatieve Sudanese
EVT-spreker, die als taak had dezelfde spraakmaterialen te produceren als de moeder-
taalspreker. Deze Sudanese spreker was geselecteerd uit een grotere groep van 11
Sudanese studenten Engels op basis van een beoordeling van de opnamekwaliteit. De
testmaterialen van de moedertaalspreker werden correct geïdentificeerd door de Neder-
landse luisteraars met scores van 88% (klinkers), 100% (medeklinkers), 96% (clusters)
en 70% voor woorden in SPIN-zinnen. De herkenning van het Sudanese EVT-test-
materiaal pakte aanzienlijk slechter uit met resp. 50, 80, 84 en 27% correct. Vooral deze
laatste score, die betrekking heeft op de herkenning van woorden in context, geeft aan
dat de verstaanbaarheid van dit type EVT-Engels onvoldoende is voor communicatie in
het Engels in het internationale verkeer (Engels gebruikt als Lingua Franca). De
problemen komt deels voort uit onbekendheid bij de Nederlandse luisteraars met de
precieze klankcontrasten van het Engels; de Nederlandse luisteraars herkennen een
aantal klanken bij de Engelse moedertaalspreker verkeerd. Deze problemen worden
echter veel ernstiger wanneer het testmateriaal gesproken is met een Sudanees-Arabisch
accent, wat voor Nederlandse luisteraars een onbekend soort Engels oplevert.
In hoofdstuk 5 is hetzelfde materiaal als in hoofdstuk 4 aangeboden aan 20 moedertaal-

luisteraars van het Engels (10 Brits, 10 Amerikaans). De Sudanese EVT-spreker blijkt
voor deze luisteraars minder verstaanbaar dan de moedertaalspreker. De Britse en
Amerikaanse luisteraars hebben slechts sporadisch problemen met het moedertaal-
materiaal, getuige scores van 92% (klinkers), 99% (medeklinkers), 97% (clusters) en
94% (woorden). De herkenningspercentages voor de EVT-sprekers lagen resp. op 66,
85, 86 en 67. Op grond hiervan zouden we kunnen concluderen dat het Engels van
Sudanese studenten goed genoeg is voor communicatie met moedertaalsprekers van het
Engels, waarbij het niet uitmaakt of dat Engelsen of Amerikanen zijn.
Hoofdstuk 6 bevat een akoestische analyse van de Engelse klinkers zoals die geprodu-
ceerd zijn door 11 Sudanese EVT-studenten, waarbij gegevens van Deterding (1997)
gebaseerd op 10 moedertaalsprekers van het Brits Engels (5 manlijke en 5 vrouwelijke
radiosprekers van de BBC) het vergelijkingsmateriaal vormden. Een lijst met alle
Engelse klinkers werd door de Sudanese sprekers ingesproken in een vaste draagzin
(Say … again). De moedertaalsprekers hadden dezelfde Engelse klinkers ingesproken in
losse woorden. De klinkerrealisaties zijn akoestisch geanalyseerd waarbij de resonantie-
frequentie F1 (die een maat vormt voor de graad van mondopening bij de klinker-
articulatie) en de F2 (die overeenkomt met de tongpositie langs de voor-achterdimensie)
alsmede de klinkerduur gemeten werden. De resultaten laten zien dat de EVT-klinkers
veelal gearticuleerd werden op verkeerde posities in de klinkerruimte maar dat het
duurcontrast tussen gespannen en ongespannen Engelse klinkers uitstekend overeind
gehouden werd in de EVT-uitingen. Een automatische klinkeridentificatie met behulp

van Lineaire Discriminant Analyse, met F1, F2 en klinkerduur als voorspellers, geeft aan
dat er substantiële discongruenties zijn tussen het Engelse klinkersysteem van moeder-
taalsprekers en dat van de Sudanese EVT-studenten. De waarschijnlijke oorzaken van
deze problemen zijn, andermaal, onbekendheid bij de EVT-leerders met klinkerrijke
talen zoals het Engels, verschil in spreekstijl en interferentie vanuit de Arabische
moedertaal (T1-filter).
Hoofdstuk 7 concentreert zich op de akoestische analyse van de Engelse medeklinkers

geproduceerd door dezelfde 11 Sudanese studenten Engels als in hoofdstuk 6. De
stimuli omvatten een lijst eenlettergrepige CVC-woorden in de vaste draagzin Say …
again. Alle medeklinkers aan het begin (onset) en aan het einde (coda) van de Engelse
lettergreep waren opgenomen in het materiaal. De realisaties werden akoestisch geana-
lyseerd in termen van Voice Onset Time (VOT), duur van de voorafgaande klinker,
duur van de medeklinker zelf, de piekintensiteit alsmede het Centre of Gravity (COG)
en de spectrale standaarddeviatie. De resultaten zijn vergeleken met literatuurgegevens
over dezelfde akoestische parameters die gepubliceerd zijn voor (Britse dan wel Ameri-
kaanse) moedertaalsprekers van het Engels. De vergelijking brengt aanzienlijke verschil-
len aan het licht in de akoestische parameterwaarden tussen de moedertaalsprekers en
de EVT-studenten. De Sudanese prekers produceren systematisch andere VOT- en
COG-waarden, die veelal voortkomen uit het klanksysteem van hun T1 (Arabisch).
Hoofdstuk 9 presenteert de samenstelling en bevindingen van de schriftelijke enquêtes.

Tien Sudanese universitaire EVT-leerders en 10 professionele EVT-docenten gaven aan
welke uitspraak- en verstaansproblemen zij in de praktijk constateerden bij zichzelf,
resp. bij de studenten. De resultaten komen sterk overeen met de bevindingen van de
experimentele hoofdstukken. De studenten geven aan problemen te ondervinden bij de
herkenning van Engelse spraakklanken en zij vinden het ook lastig om Engelse korte
klinkers, tweeklanken en wrijfklanken te produceren, en om onderscheid te maken
tussen de nasale klanken /P~0/. Zowel de studenten als hun docenten geven aan dat de
Engelse enkelvoudige medeklinkers en de clusters beter herkend en geproduceerd
worden dan de klinkers. De respondenten wijten de problemen aan gebrek aan
expliciete kennis van het Engelse klanksysteem, aan interferentie van de T1 en aan
onvoldoende oefening. Een betrouwbaarheidsanalyse wijst uit dat er een grote mate van
overeenstemming bestaat in de responsies van de studenten (Cronbach’s D = .860), wat
wil zeggen dat de studenten onderling dezelfde ideeën hebben over hun sterktes en
zwaktes bij de perceptie en productie van Engelse spraakklanken. De onderlinge over-
eenstemming tussen de docenten is geringer (D = .616), wat aangeeft dat de meningen
van de docenten over de leerproblemen van hun studenten en de achterliggende oor-
zaken daarvan, wat verdeelder liggen. Niettemin blijken de meningen van studenten en
docenten in redelijke mate overeen te stemmen als de vergelijking wordt beperkt tot
alleen de vragen die zowel aan studenten als aan docenten waren voorgelegd (r = .569),
zij het dat de studenten over de gehele linie een wat hogere dunk hadden van hun
prestaties dan het geval was in de ogen van de docenten.
Hoofdstuk 10, ten slotte, vat de belangrijkste bevindingen van deze studie samen, trekt
conclusies en doet aanbevelingen voor het Sudanese onderwijsveld en suggesties voor
verder onderzoek.
Summary
The primary function of language is social contact, which takes place between human
beings anywhere they are. A person speaks to influence the actions of his/her fellows,
i.e. to involve them into interactions. In all situations of language use, there are two
major roles, which are played by the speech participants – speaker and hearer. Normally,
these two functional roles are present either actually or implicitly in every speech act
when the speech participants achieve successful communication: i.e. when the hearer
understands what the speaker says, the speech act is described as intelligible. However,
when a speech participant fails to understand the speaker’s message, the speech is said
to be unintelligible. Failure to understand or produce intelligible speech has recently
been classified by linguists as speech intelligibility problems which may result from the
hearer’s or the speaker’s side or from both due to linguistic factors. Moreover, linguists
assume that most speech intelligibility problems occur between L1 and L2 speakers
coming from different language environments. This study attempts to investigate
speech intelligibility problems experienced by Sudanese university EFL learners and to
find experimental evidence on the nature and the linguistic causes of these problems.
The research comprised (i) three auditory perception experiments, (ii) three production
experiments and (iii) two paper-and-pencil questionnaires. The experiments target the
segmental intelligibility of speech produced in Sudanese-Arabic accented English.
In the perception tasks I used the Modified Rhyme Test (MRT) as a suitable instrument
for the measurement of segmental intelligibility (Logan, Greene and Pisoni 1989). The
test involves word identification tasks in a closed set of four alternatives, where the
listeners are asked to select the alternative they think the speaker intended. The score is
the number of correctly responded-to items. Test items target phonemes and multi-
phonemes. Phonemes refer to vowels and single consonants, whilst multi-phonemes
refer to consonant clusters. Word intelligibility, on the other hand, was determined on
the basis of final words embedded in short redundant sentences which were copied
from the Speech Perception in Noise (SPIN) test (Kalikow, Stevens and Elliot 1977),
which has been used successfully in related research. Measurement is based on the
recognition task of 25 words embedded in meaningful sentences in which one con-
textually predictable keyword had to be recognised, e.g. Spread some butter on your bread
(with the sentence-final keyword underlined).
The first perception test aims at testing how well Sudanese university EFL listeners
identify sounds and recognise words produced by native speakers of English (chapter
3). The second experiment compares the intelligibility of the Sudanese EFL learners
and native speakers of RP English using Dutch university students as non-native
listeners (chapter 4). The third experiment test the intelligibility of Sudanese EFL
learners and native speakers of RP English for both British and American listeners
(chapter 5).
Three speech production experiments were carried out in order to measure the acoustic
correlates of vowels (chapter 6), consonants (chapter 7) and consonant clusters (chapter
8) spoken in Sudanese-Arabic accented English. The aim of these experiments is to

examine the acoustic properties of these sounds in comparison to those produced by
native speakers of RP English. Such a comparison reveals the differences in the non-
native and the native realisations of the target sounds. Vowel resonance frequencies
(first and second vowel formants, F1 and F2), vowel duration, voice onset time (VOT),
intensity, preceding vowel duration and consonant duration, Centre of gravity (COG)
and cluster duration were analysed and compared between native and non-native
tokens of the target sounds.
A third source of information was obtained through the administration of paper-and-

pencil questionnaires collecting the assessments and impressions from both Sudanese
EFL learners and teachers. Through a series of questions in either closed (multiple-
choice) or open format, the questionnaires aimed to establish subjective impressions on
strengths and weaknesses in the Sudanese students’ pronunciation and auditory
recognition of English sounds, as experienced by the students themselves and by their
instructors. These subjective data complement the objective research findings obtained
by the laboratory experiments, in order to establish a comprehensive survey of the
topic area investigated.
I will now present a short summary per chapter.
Chapter 1 addresses the research plan. It discusses the topic area of the study setting
out the goals and formulating the research questions. Chapter one will also provide
information about the testing methods and experiment design, subjects and the test
materials.
Chapter 2 comprises two sections. Section 1 presents a contrastive analysis of the

English and Arabic speech sound inventories. The goal of this section is to provide
insight into the similarities and differences between English and Arabic discussing
techniques and strategies used in the prediction of errors and learning problems in L2.
In doing so I discuss linguistic theories and hypotheses such as contrastive analysis
(CA), error analysis (EA) and the Markedness Differential Hypothesis (MDH). Section
1 ends with predictions of what structural differences between Arabic and English may
compromise the speech intelligibility of Sudanese-Arabic learners of English. Section 2
reviews related literature on speech intelligibility problems in a broader context. It sheds
light on the nature and types of these problems discussing the scientific methods used.
Chapter 3 presents the identification of the English vowels, consonants, clusters and
words imbedded in SPIN sentences. The materials were spoken by a representative
native speaker of Standard British English (Received Pronunciation, or RP). The
listeners were a group of 10 Sudanese EFL university students of English. The test
material includes a list of monosyllabic words of English targeting vowels, consonants
and clusters, read in a fixed carrier phrase (Say ….again). The results reveal serious
problems experienced by the students in the perceptual identification of English speech
sounds and in word recognition. Sudanese EFL listeners misidentified vowels (48%
correct) more often than consonants (85% correct) and clusters (73% correct). Only 30
percent of the words in SPIN sentences were recognised correctly. The error analysis
SUMMARY 217
shows that the intelligibility problems are due to transfer of L1 norms and to
insufficient knowledge of the sound structure of the L2.
Chapter 4 addresses the identification of English vowels, singleton consonants, con-

sonant clusters and the recognition of words in SPIN sentences. Ten Dutch listeners
participated in the perception tests. The Dutch listeners were included in this study
since they constitute a group of non-native listeners who share many characteristics
with my Sudanese students and yet differ in one crucial aspect, viz. the circumstance
that Dutch and English are related languages while Arabic and English are not. The test
materials were spoken by two speakers, viz. the same native speaker of RP English that
was used in chapter 3 (in fact the same materials) and a representative Sudanese EFL
speaker who produced the same materials as the native speaker. The Sudanese speaker
was chosen from among 11 speakers by means of a sound quality test. The native
English materials were correctly identified with scores of 88% (vowels), 100%
(consonants), 96% (consonant clusters), and 70% for words in SPIN sentences. The L2
materials were recognised with clearly poorer scores, viz. 50, 80, 84 and 27% correct,
respectively. Especially the latter score indicates that the intelligibility of this type of
non-native English is insufficient for adequate communication between non-natives in
an international context (English as Lingua Franca). The problems occur partly due to
the uncertainty of Dutch listeners of the phonemic contrasts in English, i.e. they show
inadequate recognition of the phonemes even when the materials are produced by a
native speaker of English. The problems are considerably aggravated when the
materials are spoken with a Sudanese-Arabic accent, which is a type of English that is
unknown to the Dutch listeners.
In Chapter 5 the same materials that were used in Chapter 4, were presented to 20
native listeners of English (10 British, 10 American). The data reveals that the Sudanese
EFL speaker is less intelligible for the target listeners than the native RP speaker.
British and American listeners show no serious perception problems with English
speech sounds produced by the native speakers, with scores of 92% (vowels), 99%
(consonants), 97% (clusters) and 94% (words). The corresponding percentages for the
non-native speaker were 66, 85, 86 and 67.
Chapter 6 addresses the acoustic analysis of English vowels produced by 11 Sudanese

EFL learners (university students), whilst data of Deterding (1997), which is based on
10 speakers of British English (five male and five female BBC broadcasters), is used as
a control group. A list of all the English vowels in monosyllabic words were read in a
carrier phrase (Say …again) by the Sudanese speakers. The native speakers read the
same English vowels in isolated words. The vowel tokens were acoustically analysed.
Resonance frequencies F1 (corresponding with degree of mouth opening) and F2
(indicative of tongue position), as well as vowel duration were measured. The results
show that non-native vowels were articulated at incorrect positions in the vowel space
but that the duration contrast between tense and lax vowels was well preserved in the
non-native vowel tokens. Automatic vowel identification by Linear Discriminant
Analysis, using F1, F2 and vowel duration as predictors, revealed substantial mismatches
between the native and non-native English vowel systems. Again, the probable causes
of these problems are the unfamiliarity of the learners with a large number of vowel
sounds as those of English, the speaking style and filter effect of the learners’ L1.
Chapter 7 focuses on the acoustic analysis of English consonants produced by the same
11 Sudanese university EFL learners that were studied in Chapter 6. Stimuli comprised
a list of monosyllabic CVC words embedded in a fixed carrier phrase (Say …again). All
onset and coda consonants of English were included in the test materials. The
consonant tokens were acoustically analysed in terms of Voice Onset Time (VOT),
preceding vowel duration, consonant duration, peak intensity as well as centre of
gravity (COG) and Spectral Standard Deviation. The results were compared with
literature data on the same acoustic parameters published for (either British or
American) native English. The findings show considerable discrepancies in the acoustic
parameter values obtained from native and non-native speakers. The Sudanese speakers
produce systematically different VOT and COG values, due to influence of their L1
sound system.
Chapter 8 deals with an acoustic analysis of English consonant clusters, which were
read by eleven Sudanese EFL learners, and by two native speakers of RP English (one
male, one female) serving as control speakers. A selection of onset and coda clusters in
meaningful English words was read in a fixed carrier phrase by both groups of speakers.
The durations of the consonants that made up the clusters were measured. Statistical
analysis reveals systematic deviations in the component durations between native and
non-native tokens, which are attributable to the influence of the learners’ L1. Counter
to expectation, however, no epenthetic vowels breaking up the clusters were found in
the recordings.
Chapter 9 presents the construction and results of a written questionnaire. Ten

Sudanese university EFL learners and 10 teachers provided assessments of intelligibility
problems experienced in the learning of English. The results are fully in line with the
findings of the experimental chapters in the study. The learners claim to have problems
in recognizing native English speech sounds and they also find it difficult to produce
English short and diphthong vowels, fricatives and the nasal pair /P~0/. However,
both the students and the instructors claim that the English single and cluster
consonants are comparatively better perceived and produced by the learners than the
vowels. The respondents attribute these problems to lack of explicit knowledge, L1
interference and insufficient practice. A reliability analysis revealed that the Sudanese
EFL learners show a high agreement amongst themselves (Cronbach’s D = .860), i.e.
the share the same views of their strengths and weaknesses in perceiving and producing
English sounds. The agreement among the instructors is lower (D = .616), which shows
that the instructors’ opinions on the students’ strengths and weaknesses are more
diversified. Nevertheless, students and instructors are in reasonable agreement when
the analysis is restricted to only those items that are shared between the student and
teacher versions of the questionnaire (r = .569), be it that overall the students rated
their level of proficiency in English more positively their instructors did.
Chapter 10, finally, summarises the main findings of this study, draws conclusion and
makes recommendations and suggestions for teaching practice and future research.
Appendices
Appendix 3.1 Vowel list: /hVd/ meaningful words in a fixed carrier phrase (Say …..again); 19
different full vowels and diphthongs read by Sudanese EFL learners and native speakers of RP
English. The stimuli were used in the perception tests in chapters 3, 4 and 5 as well as in the
acoustic analysis in chapter 6.
No. Vowel Keywords

1. air chair, pair
2. pet met, let
3. pat rat, fat
4. pot lot, got
5. nut hut, cut
6. pit hill, tin
7. peat feet, meet
8. fool cool, school
9. full bull, good
10. mile file, Nile
11. peer dear, fear
12. poor sure, tour
13. late shade, rate
14. out shout, loud
15. boy toy, foil
16. bird girl, curt
17. bard hard, card
18. board lord, short
19. boat coat, goat
Appendix 3.2.a Onset consonants list of meaningful words in a fixed carrier (Say …..again). The
stimuli were read by one Sudanese EFL learner and one native speaker of RP English. The
stimuli were used in the perception tests in chapters 3, 4 and 5 as well as in the acoustic analysis
in chapter 7.
Onset
No. consonants Keywords
1 got god, ghost
2 bang ban, bark
3 shut ship, shop
4 pin pit, pill
5 fit fish, fill
6 then this, them
7 thaw theme, thin
8 zeal zero, zebra
9 den dish, deaf
10 sip sit, sick
11 job jot, jog
12 vest vent, verb
13 tame take, tale
14 cold core, cop
15 chat chair, charge
Appendix 3.2.b Coda consonants list of meaningful words in a fixed carrier (Say …..again). The
stimuli were read by one Sudanese EFL learner and one native speaker of RP English. The
stimuli were used in the perception tests in chapters 3, 4 and 5 as well as in the acoustic analysis
in chapter 7.
No. Coda consonants Keywords

1. sack pack, lack
2. mash rash, cash
3. page wage, rage
4. heath teeth, beneath
5. sad lad, mad
6. pat cat, rat
7. safe wave, rave
8. pub rub, hub
9. rave cave, shave
10. match patch, latch
11. cop shop, stop
12. lace race, face
13 raze raise, plays
14. cog fog, log
15. with the, weather
APPENDICES 221
Appendix 3.3 Onset and coda consonant clusters list of meaningful words in fixed carrier
(Say …..again). The stimuli were read by one Sudanese EFL learner and one native speaker of RP
English. The stimuli were used in the perception tests in chapters 3, 4 and 5 as well as in the
acoustic analysis in chapter 8.
Onset consonant clusters Keyword

sty stain, steam
splint split, splash
slack slam, slap
ply plot, play
drain dream, drip
glaze glare, glue
swine swipe, sweet
clean clear, clip
Coda consonant clusters Keyword

fibbed limbed
lint mint, hint
elm film, helm
putts cuts, nuts
mast fast, cast
buns Huns, runs
bugs rugs, figs
wink link, pink
wits fits, hits
Appendix 3.4 SPIN (Speech in Noise) sentence intelligibility test. Only contextually highly
predictable keywords were used. Keywords are always sentence final.
1. Throw out all the useless junk.

2. She cooked him a hearty meal.
3. Her entry should win the first prize.
4. The stale bread was covered with mood.
5. The fireman heard her frightened scream.
6. Your knees and your elbows are joints.
7. I ate a piece of chocolate fudge.
8. Instead of a fence plant a hedge.
9. The story had a clever plot.
10. The landlord raised the rent.
11. Her hair was tied with a blue bow.
12. He’s employed by a large firm.
13. To open the jar twist the lid.
14. The swimmer’s leg got a bad cramp.
15. Our seats were in the second row.
16. The thread was wound on the spool.
17. They tracked the lion to his den.
18. Spread some butter on your bread.
19. A spoiled child is a brat.
20. Keep your broken arm in a sling.
22. The mouse was caught in the trap.
22. I have got a cold and a sore throat.
23. Ruth poured herself a cup of tea.
24. The house was robbed by a thief.
25. Wash the floor with a mop.
APPENDICES 223
Appendix 3.5 Instructions and answer sheet of the identification test of English vowels read by
native speakers of RP English responded to by Sudanese EFL listeners
Part 1. Identification of English vowels
Date: ………… Listener position: [ ]
Instructions
You will hear 20 English-spoken items on the CD. Every item contains the same short
utterance “Say xxx again”, where xxx is a one-syllable word. Each time you hear an
item, decide which one of the four possibilities listed under A-B-C-D is the one that
was said. To indicate your choice, tick the appropriate box on your answer sheet.
Remember that you have to make a choice for every word you hear, one choice, no
more, no less. If you do not know what to answer, just gamble.
After you hear an item, you have five seconds you place your tick mark. To help you
keep track, you will hear a beep after every tenth item on the CD.
You will now first hear two practice items.
A. B. C. D.
a. ɷ net ɷ nut ɷ not ɷ nit
b. ɷ boy ɷ buy ɷ bay ɷ bow
If everything is clear, we will now start the test items proper.

A. B. C. D.
1. ɷ pat ɷ putt ɷ pot ɷ put
2. ɷ pet ɷ put ɷ pit ɷ pat
3. ɷ put ɷ pet ɷ pat ɷ pot
4. ɷ peat ɷ pat ɷ pet ɷ pit
5. ɷ net ɷ nut ɷ not ɷ nit
6. ɷ fill ɷ fool ɷ fell ɷ full
7. ɷ fool ɷ full ɷ fill ɷ fell
8. ɷ pit ɷ peat ɷ pet ɷ put
9. ɷ bard ɷ board ɷ bird ɷ beard
10. ɷ board ɷ bird ɷ beard ɷ bard
11. ɷ beard ɷ bard ɷ bird ɷ board

12. ɷ boy ɷ buy ɷ bay ɷ bow
13. ɷ male ɷ mile ɷ mill ɷ meal
14. ɷ let ɷ lit ɷ late ɷ light
15. ɷ peer ɷ pair ɷ poor ɷ pore
16. ɷ ate ɷ oat ɷ out ɷ at
17. ɷ err ɷ or ɷ ear ɷ air
18. ɷ peer ɷ poor ɷ pair ɷ pore
20. ɷ put ɷ pat ɷ pit ɷ pet
APPENDICES 225
Appendix 3.6 Answer sheet of the identification test of English consonants read by native
speakers of RP English responded to by Sudanese EFL learners
Part 2. Identification of English consonants
Instructions
You will now first hear five practice items.
A. B. C. D. A. B. C. D.
a. ͚ sap ͚ sack ͚ sat ͚ sag f. ͚ hit ͚ lit ͚ bit ͚ wit
b. ͚ match ͚ man ͚ mash ͚ mat g. ͚ pot ͚ cot ͚ jot ͚ got
c. ͚ pale ͚ page ͚ pane ͚ pave h. ͚ must ͚ bust ͚ gust ͚ dust
d. ͚ cog ͚ cop ͚ con ͚ cock i. ͚ rob ͚ cob ͚ job ͚ bob
e. ͚ heat ͚ heave ͚ heath ͚ he’s j. ͚ bed ͚ led ͚ wed ͚ red
If everything is clear, we will now start the test items proper. Turn the page over for the
answer sheet for the consonant test.
A. B. C. D. A. B. C. D.
1. ͚ sap ͚ sack ͚ sat ͚ sag 31. ͚ hit ͚ lit ͚ bit ͚ wit
2. ͚ match ͚ man ͚ mash ͚ mat 32. ͚ pot ͚ cot ͚ jot ͚ got
3. ͚ pale ͚ page ͚ pane ͚ pave 33. ͚ must ͚ bust ͚ gust ͚ dust
4. ͚ cog ͚ cop ͚ con ͚ cock 34. ͚ rob ͚ cob ͚ job ͚ bob
5. ͚ heat ͚ heave ͚ he’s ͚ heath 35. ͚ bed ͚ led ͚ wed ͚ red
6. ͙ bad ͙ ban ͙ bang ͚ bad 36. ͚ nut ͚ but ͚ gut ͚ shut
7. ͚ can ͚ cap ͚ cam ͚ cab 37. ͚ tin ͚ fin ͚ pin ͚ chin
8. ͚ sap ͚ sad ͚ sag ͚ san 38. ͚ hid ͚ lid ͚ bid ͚ fid
9. ͚ pad ͚ pan ͚ pack ͚ pat 39. ͚ tot ͚ pot ͚ lot ͚ not
10. ͚ sap ͚ sack ͚ sat ͚ sag 40. ͚ peel ͚ zeal ͚ feel ͚ seal
11. ͚ pun ͚ put ͚ pub ͚ puck 41. ͚ thaw ͚ law ͚ paw ͚ saw
12. ͚ cog ͚ cop ͚ con ͚ cock 42. ͚ pen ͚ ten ͚ den ͚ then
13. ͚ mash ͚ mad ͚ mat ͚ match 43. ͚ fen ͚ pen ͚ yen ͚ hen
14. ͚ cop ͚ cod ͚ con ͚ cock 44. ͚ fat ͚ hat ͚ bat ͚ chat
15. ͚ lane ͚ lace ͚ late ͚ lame 45. ͚ went ͚ bent ͚ rent ͚ dent
16. ͚ dad ͚ dam ͚ dan ͚ dab 46. ͚ rip ͚ dip ͚ tip ͚ sip
17. ͚ save ͚ sage ͚ sane ͚ safe 47. ͚ wick ͚ pick ͚ tick ͚ lick
18. ͚ rate ͚ race ͚ raze ͚ rape 48. ͙ fang ͙ bang ͙ gang ͙ rang
19. ͚ match ͚ man ͚ mash ͚ mat 49. ͙ den ͙ ten ͙ men ͙ pen
20. ͚ heat ͚ heave ͚ heath ͚ he’s 50. ͙ name ͙ game ͙ tame ͙ dame
21. ͚ pale ͚ page ͚ pane ͚ pave

22. ͚ pane ͚ pale ͚ page ͚ pave
23. ͚ map ͚ mat ͚ man ͚ mad
24. ͚ wit ͚ with ͚ wick ͚ wiz
25. ͚ raze ͚ rate ͚ rave ͚ rape
26. ͚ peel ͚ zeal ͚ feel ͚ seal
27. ͙ den ͙ ten ͙ men ͙ pen
28. ͚ rob ͚ cob ͚ bob ͚ job
29. ͚ west ͚ vest ͚ nest ͚ best
30. ͚ hold ͚ cold ͚ told ͚ gold
APPENDICES 227
Appendix 3.7 Instructions and answer sheet of the identification test of English consonant
clusters read by native speakers of RP English and responded to by Sudanese EFL learners.
Part 3. Identification of English consonant clusters
Instructions
A. B. C. D.
a. ɷ film ɷ fist ɷ fills ɷ filth
b. ɷ pry ɷ fry ɷ cry ɷ fly
If everything is clear, we will now start the test items proper.

A. B. C. D.
1. ɷ blaze ɷ craze ɷ glaze ɷ graze
2. ɷ sty ɷ spy ɷ sky ɷ sly
3. ɷ sprint ɷ splint ɷ squint ɷ print
4. ɷ smack ɷ snack ɷ slack ɷ stack
5. ɷ pry ɷ try ɷ dry ɷ ply
6. ɷ brain ɷ drain ɷ grain ɷ crane
8. ɷ swine ɷ shrine ɷ twine ɷ spine
9. ɷ queen ɷ clean ɷ green ɷ glean
11. ɷ buzzed ɷ bugs ɷ bulb ɷ bussed
12. ɷ filth ɷ film ɷ filled ɷ fibbed
13. ɷ lilt ɷ limp ɷ lint ɷ link
14. ɷ else ɷ elk ɷ elf ɷ elm
15. ɷ putts ɷ puns ɷ pulse ɷ punt
16. ɷ mask ɷ marched ɷ marked ɷ mast
17. ɷ butts ɷ buds ɷ buns ɷ bums
19. ɷ winch ɷ wins ɷ wind ɷ wink
20. ɷ wits ɷ wimp ɷ width ɷ wisp
APPENDICES 229
Appendix 3.8 Instructions and answer sheet of the SPIN word recognition test. Sentences were
read by a native speaker of RP English and responded to by Sudanese EFL learners.
Part 4. Recognition of words in English sentences
Instructions
You will now hear 25 sentences on the CD. Each time you hear a sentence, write down
only the last word you think have heard. This time there will be no practice items.
Please write down a single word for every sentence you hear. After every sentence you
will have five seconds to write down a word. There will be a beep after every tenth
sentence.
Nr. Last word of sentence

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24. End of tests.
25. Thank you very much for your help.
Note we repeated the same the stimuli of identification tests of English vowels, consonants,
consonant clusters and SPIN sentences in chapters 3, 4 and 5.
Appendix 4.1 Instruction and answer sheets of the identification test of English vowels read by
one Sudanese speaker and one native speaker of RP English. Test responded to by Dutch
listeners of English.
Perception tests Part 1. Identification of English vowels
Instructions
You will hear 20 English-spoken items twice [in two texts A and B] on the CD. The A
texts will be pronounced by a non-native speaker of English, but the B texts by native
speaker. Note that the sound quality of the recording has been degraded by the addition
of noise. This was done to make the listening task more difficult. Every item contains
the same short utterance “Say xxx again”, where xxx is a one-syllable word. Each time
you hear an item, decide which one of the four possibilities listed under A-B-C-D is the
one that was said. To indicate your choice, tick the appropriate box on your answer
sheet.
Note that pronunciation errors may occur in the vowel or the consonants: only check
the sounds that are printed in bold face. Remember that you have to make a choice for
every word you hear, one choice, no more, no less. If you do not know what to answer,
just gamble.
After you hear an item, you have five seconds to place your tick mark. You may also
use part of this time to look at the response alternatives of the next item. To help you
keep track, you will hear a beep after every fifth item on the CD.
A. B. C. D.
answer sheet for the vowel test.
APPENDICES 231
Text [A] Non-native speaker
A. B. C. D.

11. ɷ pit ɷ peat ɷ pet ɷ putt

12. ɷ put ɷ pot ɷ pit ɷ putt
13. ɷ pat ɷ putt ɷ pot ɷ pit
14. ɷ pat ɷ put ɷ putt ɷ pet
15. ɷ pat ɷ pit ɷ pet ɷ peat
16. ɷ net ɷ not ɷ nit ɷ nut

17. ɷ fill ɷ fool ɷ full ɷ fell
18. ɷ board ɷ bard ɷ bird ɷ beard
19. ɷ peat ɷ pat ɷ pit ɷ pet
Text [B] Native speaker

A. B. C. D.
2. ɷ pet ɷ put ɷ pit ɷ pat
3. ɷ put ɷ pet ɷ pat ɷ pot
4. ɷ pit ɷ pat ɷ pet ɷ peat
6. ɷ fill ɷ fool ɷ fell ɷ full

8. ɷ pit ɷ pet ɷ peat ɷ put
9. ɷ bard ɷ board ɷ bird ɷ beard


20. ɷ put ɷ pat ɷ pit ɷ pet
APPENDICES 233
Appendix 4.2 Instructions and answer sheets of the identification test of English consonants
read by Sudanese speakers and native speakers of RP English. The test was responded to by
Dutch listeners of English.
Part 2. Identification of English consonants.

Date: ………… Listener position : [ ]
Instructions
texts will be pronounced by a non-native speaker of English, but the B texts by native
of noise. This was done to make the listening task more difficult. Every item contains
sheet. Note that pronunciation errors may occur in the vowel or the consonants: only
check the sounds that are printed in bold face. Remember that you have to make a
choice for every word you hear, one choice, no more, no less. If you do not know what
to answer, just gamble.
A. B. C. D.
a. ͚ sap ͚ sack ͚ sat ͚ sag
b. ͚ match ͚ man ͚ mash ͚ mat
answer sheet for the consonant test.
Text [A] Non-native speaker
A. B. C. D. A. B. C. D.
1. ͙ then ͙ zen ͙ ten ͙ den 21. ͚ can ͚ cap ͚ cam ͚ cab

2. ͙ den ͙ ten ͙ men ͙ pen 22. ͚ wick ͚ with ͚ whiz ͚ wits
3. ͙ fang ͙ bang ͙ gang ͙ rang 23. ͚ pad ͚ pan ͚ pack ͚ pat
4. ͚ sits ͚ fits ͚ wits ͚ pits 24. ͚ raze ͚ race ͚ rave ͚ rape
5. ͚ rip ͚ dip ͚ tip ͚ sip 25. ͚ pane ͚ pale ͚ page ͚ pave
6. ͚ went ͚ bent ͚ rent ͚ dent 26. ͚ wit ͚ with ͚ wick ͚ whiz
7. ͚ fit ͚ lit ͚ bit ͚ hit 27. ͚ cop ͚ cod ͚ con ͚ cock
8. ͚ bed ͚ led ͚ wed ͚ red 28. ͚ sap ͚ sat ͚ sag ͚ sad
9. ͚ zen ͚ ten ͚ den ͚ then 29. ͙ bad ͙ ban ͙ bang ͚ bad
10. ͚ thaw ͚ law ͚ paw ͚ saw 30. ͚ heat ͚ heave ͚ he’s ͚ heath
11. ͚ peel ͚ zeal ͚ feel ͚ seal 31. ͚ pale ͚ page ͚ pane ͚ pave
12. ͚ tot ͚ pot ͚ lot ͚ not 32. ͚ heat ͚ heave ͚ heath ͚ he’s
13. ͚ west ͚ vest ͚ nest ͚ best 33. ͚ match ͚ man ͚ mash ͚ mat
14. ͚ tin ͚ fin ͚ pin ͚ chin 34. ͚ rave ͚ race ͚ raze ͚ rape
15. ͚ nut ͚ but ͚ gut ͚ shut 35. ͚ save ͚ sage ͚ sane ͚ safe
16. ͚ rob ͚ cob ͚ job ͚ bob 36. ͚ dad ͚ dam ͚ dan ͚ dab
17. ͚ must ͚ bust ͚ gust ͚ dust 37. ͚ lace ͚ lane ͚ late ͚ lame
18. ͚ pot ͚ cot ͚ jot ͚ got 38. ͚ sap ͚ sack ͚ sat ͚ sag
19. ͚ hit ͚ lit ͚ bit ͚ wit 39. ͚ mash ͚ mad ͚ mat ͚ match
20. ͚ hold ͚ cold ͚ told ͚ gold 40. ͚ pun ͚ put ͚ pub ͚ puck
APPENDICES 235
Text [B] Native speaker
A. B. C. D. A. B. C. D.
1. ͚ sap ͚ sack ͚ sat ͚ sag 21. ͚ hit ͚ lit ͚ bit ͚ wit
2. ͚ match ͚ man ͚ mash ͚ mat 22. ͚ pot ͚ cot ͚ jot ͚ got
3. ͚ pale ͚ page ͚ pane ͚ pave 23. ͚ must ͚ bust ͚ gust ͚ dust
4. ͚ pane ͚ pale ͚ page ͚ pave 24. ͙ fang ͙ bang ͙ gang ͙ rang
5. ͚ heat ͚ heave ͚ he’s ͚ heath 25. ͚ bed ͚ led ͚ wed ͚ red
6. ͙ bad ͙ ban ͙ bang ͚ bad 26. ͚ nut ͚ but ͚ gut ͚ shut
7. ͚ can ͚ cap ͚ cam ͚ cab 27. ͚ tin ͚ fin ͚ pin ͚ chin
8. ͚ sap ͚ sad ͚ sag ͚ san 28. ͚ hit ͚ lit ͚ bit ͚ fit
9. ͚ pad ͚ pan ͚ pack ͚ pat 29. ͚ tot ͚ pot ͚ lot ͚ not
10. ͚ save ͚ sage ͚ sane ͚ safe 30. ͙ then ͙ ten ͙ zen ͙ den
11. ͚ pun ͚ put ͚ pub ͚ puck 31. ͚ thaw ͚ law ͚ paw ͚ saw
12. ͚ raze ͚ rate ͚ rave ͚ rape 32. ͚ zen ͚ ten ͚ den ͚ then
13. ͚ mat ͚ match ͚ man ͚ mash 33. ͚ peel ͚ zeal ͚ feel ͚ seal
14. ͚ cop ͚ cod ͚ con ͚ cock 34. ͙ den ͙ ten ͙ men ͙ pen
15. ͚ lane ͚ lace ͚ late ͚ lame 35. ͚ went ͚ bent ͚ rent ͚ dent
16. ͚ dad ͚ dam ͚ dan ͚ dab 36. ͚ rip ͚ dip ͚ tip ͚ sip
17. ͚ rate ͚ race ͚ raze ͚ rape 37. ͚ wits ͚ sits ͚ pits ͚ fits
18. ͚ sap ͚ sat ͚ sag ͚ sad 38. ͚ rob ͚ cob ͚ bob ͚ job
19. ͚ wit ͚ with ͚ wick ͚ whiz 39. ͚ west ͚ vest ͚ nest ͚ best
20. ͚ save ͚ sage ͚ sane ͚ safe 40. ͚ hold ͚ cold ͚ told ͚ gold
Appendix 4.3 Answer sheets of the identification test of English consonant clusters read by
Sudanese speakers and native speakers of RP English. The test was responded to by Dutch
listeners of English.
Part 3. Identification of English consonant clusters
Instructions
texts will be pronounced by a non-native speaker of English, but the B texts by a native
of noise .This was done to make the listening task more difficult. Every item contains
sheet. Note that pronunciation errors may occur in the vowel or the consonants: only
check the sounds that are printed in bold face. Remember that you have to make a
choice for every word you hear, one choice, no more, no less. If you do not know what
to answer, just gamble.
A. B. C. D.
a. ɷ film ɷ fist ɷ fills ɷ filth
b. ɷ pry ɷ fry ɷ cry ɷ fly
If everything is clear, we will now start the test items proper. Turn the page over for
the answer sheet for the cluster test.
APPENDICES 237
Test [A] Non-native speaker
A. B. C. D.
1. ɷ wits ɷ wimp ɷ width ɷ wisp
6. ɷ putts ɷ puns ɷ pulse ɷ puffs

13. ɷ ply ɷ dry ɷ cry ɷ fly
Test [B] Native speaker
A. B. C. D.
2. ɷ ply ɷ dry ɷ fly ɷ cry
15. ɷ putts ɷ puns ɷ pulse ɷ puffs
APPENDICES 239
Appendix 4.4 Answer sheets of the SPIN test of English sentences, which were read by Sudanese
speakers and native speakers of RP English. The test was responded to by Dutch listeners of
English.
Part 4. Identification of English SPIN

Instructions
You will now hear 25 sentences twice [in two texts A and B] on the CD. Each time you
hear a sentence; write down only the last word you think you have heard. This time
there will be no practice items. Please write down a single word for every sentence you
hear. After every sentence, you will have five seconds to write down a word. There will
be a beep after every fifth sentence.
Test [A] Non-native speaker

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
Test [B] Native speaker

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
End of tests. Thank you very much for your help

APPENDICES 241
Appendix 6.1 English vowel durations (ms) of eleven Sudanese university learners of English.
Missing data are indicated by ‘---’.
Speaker no.
vowel
1 2 3 4 5 6 7 8 9 10 11
1. #Ö 252 207 174 133 293 160 356 215 130 232 170
2. 3 258 148 149 98 204 137 190 189 99 69 113
3. #7 415 200 241 245 247 280 366 256 272 161 190
4. C+ 207 141 197 174 261 188 353 165 158 177 164
5. G 200 42 59 38 112 78 103 66 53 89 59
6. G 278 433 269 166 279 247 318 217 246 210 211
7. G+ 237 145 181 114 221 179 302 163 181 241 180
8. + 92 42 61 58 67 58 68 51 32 61 47
9. KÖ 191 148 160 93 158 113 180 55 92 217 86
10. + 248 280 241 142 256 212 277 202 144 201 139
11. n 156 110 58 47 113 213 312 65 68 128 082
12. nÖ 264 188 186 147 265 158 318 191 128 178 163
13. QY 262 137 170 145 125 232 346 133 118 266 113
14. n+ 363 226 244 241 450 181 666 227 261 210 202
15. 7 137 056 82 92 88 92 92 75 72 103 78
16. WÖ 138 084 154 115 187 134 337 150 125 186 114
17. 7 234 129 163 165 203 203 197 164 238 --- 226
18. « 77 91 81 68 74 91 94 --- 77 99 66
19. «Ö 252 244 159 140 265 179 313 123 --- 46 90
Appendix 6.2 Mean absolute duration (ms) of RP English vowels (abstracted from: Wells (1962;
see further http://www.phon.ucl.ac.uk/home/wells/formants/table-7-uni.htm).
No. Vowel duration

1. + 139
2. 7 142
3. ¡ 148
4. G 170
5. n 178
6. 3 210
7. +Ö 293
8. WÖ 294
9. «Ö 309
10. nÖ 330
11. #Ö 335
Mean of all vowels 232
APPENDICES 243
Appendix 7.1 Individual speaker VOT mean values for English stops produced by Sudanese
speakers.
Target plosives – VOT

Speaker D F I R V M
1. 103.00 68.00 114.00 16.00 32.00 32.00
2. 53.00 10.00 30.00 18.00 22.00 39.00
3. 15.50 6.00 16.00 13.00 22.00 18.00
4. 50.00 7.33 12.50 16.50 24.50 23.50
5. 19.50 19.33 32.00 22.50 53.00 32.50
6. 102.00 37.67 85.00 12.50 18.50 32.00
7. 0.00 36.33 67.50 25.00 34.00 17.00
8. 25.50 23.33 18.00 30.50 12.33 22.00
9. 9.00 8.33 19.00 22.00 9.00 20.00
10. 0.00 17.67 28.50 16.00 12.00 29.00
11. 62.50 22.67 52.50 15.50 16.00 29.50
Appendix 7.2 Mean Centre of Gravity (COG) values (Hz) of the English obstruents produced
by Sudanese learners. The top panel shows COG values for voiced obstruents; voiceless
obstruents are shown in the bottom panel. The EFL values differ substantially from those
obtained from native English speakers (not shown here).
Appendix 7.3 Duration (ms) of preceding vowels produced by Sudanese learners of English.
Native data from Kent, Dembowski and Lass (1996).
Consonant EFL learners Native speakers

1. R 093 165
2. D 095 250
3. V 170 175
4. F 199 255
5. M 173 170
6. I 114 275
7. H 162 225
8. X 208 280
9. 6 168 225
10. & 078 310
11. U 181 312
12. \ 204 325
13. 5 174 ---
14. < --- ---
15. V5 180 ---
16. F< 130 ---
APPENDICES 245
Appendix 7.4 Mean consonant duration (ms) of Sudanese ELF learners and native speakers of
English. Consonant duration data of native speakers cited from Lavoie (2001).
Consonant EFL learners Native speakers

1. R 187 115
2. D 160 089
3. V 202 123
4. F 139 084
5. M 187 112
6. I 186 081
7. H 185 110
8. X 181 082
9. 6 203 108
10. & 176 058
11. U 207 120
12. \ 166 078
13. 5 185 120
14. < --- ---
15. V5 206 139
16. F< 122 107
Appendix 7.5 Relative intensity rates (decibels) of English consonants and native speakers (Ball
and Rahilly 1999).
obstruent Sudanese EFL learners Native speakers

1. R 60.0 7
2. D 62.0 8
3. V 61.0 11
4. F 61.0 13
5. M 60.0 11
6. I 60.6 11
7. H 65.1 7
8. X 66.0 10
9 6 67.0 0
10. & 57.9 10
11. U 63.6 12
12. \ 64.0 12
13. 5 65.1 13
14. < ---- 13
15. V5 65.0 16
16. F< 62.0 13
APPENDICES 247
Appendix 9.1a Paper-and-pencil questionnaire filled in by Sudanese EFL Students at the

University of Gadarif, Sudan.
1- Students’ questionnaire
Dear students
This questionnaire is directed to provide data about speech intelligibility problems
experienced by Sudanese university learners of English. It attempts to find the effect of
both mother-tongue (Arabic) and the lack of L2 pronunciation knowledge of the
learners concerned.
Section [1]: Preliminary information. Please reply to the issues below:
1- Gender ͚ male / ͚ female

2- Degree ͚ B.A. / ͚ B.Ed
3- Class/year ͚ first ͚ second ͚ third ͚ fourth ͚- fifth
4- The number of phonetics/phonology courses you have studied during the whole
period of the B.A./B.Ed. programme.
͚ One ͚ two ͚ more
5- Would you please write a few lines about the nature of these courses?
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in student’s copy)
Section [2] Please choose the most applicable answer from the following:
1- How well do you understand spoken English?
a- weak b- fair c- good d- very good e- excellent
2- How well do English native -speakers understand you?
3- How interesting and practical are the courses you study?
a- not b- hardly c- average d- very e- maximally
4- How relevant and authentic are the courses you study, with respect to the
development of pronunciation skills?
a- not b- hardly c- average d- very e- maximally
5- How often do you have problems with the pronunciation of English sounds?
a- never b- rarely c- often d- frequently e- permanently
6- Which English sounds do you find difficult?
a- vowels b- consonants c- clusters d- … and … e- all
7- How do you experience these problems?

a- I totally fail to understand
b- I often fail to understand English pronunciation
c- I fail to produce English pronunciation
d- …………………………………………..
Section [3] Recognizing English speech sounds

C- Short vowels:
1- How often do you find it difficult to distinguish between minimal pairs such as
bet/bat pen/pin pot/pat cot/coat cat/cart
net/gnat pan/pun cat/cut mad/mud pull/bull
up/rub hill/hell sit/seat hit/heat look/loop
Read the statements BELOW and then choose one option from the ones below:
II- Explain why you have chosen a, b, c, d or e above?
a- I identify short vowel sounds.
b- I partly grasp them. Because I often misidentify the short vowel /3/ as /#Ö/.
c- I never recognise short vowels. I find it difficult.
III- Can you give further details these difficulties.
…………………………………………………………………………………………
…………………………………………………………………………………………
D- Long vowels
1- I ……… experience problems in correctly perceiving long vowels like: /WÖ, nÖ, KÖ/, etc.,
in words such as calm, warm, worm, glue, choose, hoop, beat, boot, bough, leap, lead, read
2- Explain why you have chosen a, b, c, or e above?
a- I hear them well and identify them. b- I confuse / fail to discriminate.
c- I often confuse some long vowels. d- I totally fail to identify them.
3- Please can you give example(s) of the errors you commit? E.g., I fail to discriminate
between /7/ and /WÖ/ in minimal pairs like food/boot, full/fool, pull/pool, etc.
…………………………………………………………………………………………
…………………………………………………………………………………………
APPENDICES 249
E- Diphthongs
1- I ……… experience problems in perceiving diphthongs like: /C+, G, +, n+, C7, G+/, etc.,
in words such as: fire/fear, bare/bowl, cow/power
2- Explain why you have chosen a, b, c, d or e given above.
a- I identify such sounds. b- I partly grasp such sounds.
c- I often fail to discriminate sounds. d- I do not realize them at all.
3- Give example/s of the errors you commit. e.g.: I fail to discriminate between /7/ in
boat and /nÖ/ in taught, caught.
…………………………………………………………………………………………
…………………………………………………………………………………………
A- Consonant sounds
How successful are you in perceiving the following English consonants?
1- Plosives like: /R, D, V, F, M, I/
2- Fricatives like: /H, X, U, \, 6, &/ or affricates like /V5, F</.
3- Nasals like: /O, 0, P/
4- Approximant like /N, T, Y, L/.
B- Clusters
How often do you experience difficulty perceiving the clusters below, whenever you
hear them?
1- Initial clusters like prompt, play, scream, string, spring and sword.
2- Final clusters in words such as: ground, interrupt, risk, next , blink
3- What state do you find yourself in whenever you are exposed to speech including
this type of English clusters?
a- I hear consonant clusters well and understand them. For example, I hear
play as play and alert as alert, etc.
b- I substitute consonant clusters. Because I often hear clusters like /RN/ as
/DN/ or /DT/ and /HN/ as /HT/.
c- I never understand them at all. I find it difficult to discriminate them.
Section [4] Producing English speech sounds:

A- Short vowels
1- Do you have difficulty in pronouncing the English vowel sounds in words such as
the pin/pen, tin/ten, win/won, bat/bad, cat/cot, etc.?
1- I ……… have difficulty in understanding short vowels like /G, 7, #, +, ¡, 3, n, /.
2- Choose a letter (a, b, c or d) from the list below to complete your answers.
a- I produce these sounds correctly.
b- I partly produce these sounds correctly.
c- I often find it difficult to produce such phonemes.
d- I cannot produce them at all.
3- In case you chose the letters b, c or d, please give example(s) of the errors you make?
E.g., I interchangeably substitute /G~+/ in words like wit/wet, pit/pet, etc.
…………………………………………………………………………………………
…………………………………………………………………………………………
B- Long vowels
1- I ………… experience problems in producing long vowels like /nÖ, +Ö, «Ö/
2- Explain why you have chosen a, b, c or d above.
a- I produce the English long vowels correctly.
b- I can produce some of them correctly.
c- I often find it difficult to produce long vowels.
d- I cannot produce them at all.
3- Would you please give example/s of the errors you make? e.g.: I fail to discriminate
between /7, WÖ/ as in pairs like: foot/food, look/luke, full/fool, book/boo
…………………………………………………………………………………………
…………………………………………………………………………………………
C- Diphthongs
1- I ……… experience problems producing diphthongs like /C+, G, +, n+, C7, G+/, etc., in
words like: bye, boy, hear.
2- Explain why you have chosen a, b, c or d above.
1- I produce them well.
2- I partly produce diphthongs.
3- I often find it a bit difficult to produce such English vowels.
APPENDICES 251
D- Consonant sounds
Draw a circle around one answer from the following:
How successful are you in producing the following English consonants?
1- Plosives like /R~D/, /V~F/, /M~I/.
2- Fricatives like /U~\/, /H~X/, /5~< /, /6~&/
3- Nasals like /O, P, 0/
4 - Approximant like /N, T, Y, L/
E- Clusters
How often do you find it difficult to produce clusters in continuous speech?
1- Initial clusters in words like prompt, play, scream, string, spring, and sword.
2- Final clusters such as ground, interrupt, risk, clerk, bring
3- What state do you find yourself in whenever you are involved in a speech that
requires the production of clusters?
a- I can produce consonant clusters correctly.
b- I can produce some of them correctly.
d- I find it difficult to produce consonant clusters.
c- I cannot produce them at all.
4 - In case you choose answers b, c or d, can you give examples of the errors you make?
E.g., I fail to pronounce words like: print, double, count, stay, spring, bare, eight, etc.
…………………………………………………………………………………………
…………………………………………………………………………………………
Section [5] Causes

1- Do you think your mother-tongue influences your perception and production of
English speech?
2- Is it difficult to pronounce words like cloth, eight, fine, rich, bridge, chair, literature, please?
3- How difficult is it to pronounce words like enough, rhyme, obstacle, column, etc.?
4- Is it difficult to pronounce words like here, there, this, tree, three, etc.?
5- Is it difficult to pronounce words theory, dictionary, library, probable, etc.?
2- Do you think your intelligibility in English would be better if you learn more about
English pronunciation?
…………………………………………………………………………………………
…………………………………………………………………………………………
Thank you for your assistance

APPENDICES 253
Appendix 9.2b Paper-and-pencil questionnaires filled in by Sudanese EFL teachers at secondary

schools (SELTI) and University of Gadarif, Sudan.
This questionnaire provides information about the pronunciation problems of English

vowels, single and cluster consonants, which are assumed to compromise the
intelligibility of the Sudanese university learners of English. The information collected
through this questionnaire represents the professional judgments of experienced
language teachers of English in the Sudanese context.
Section (1): General pronunciation matters. Show the most appropriate grade of your
students.
1- How intelligible do you rate your students’ English pronunciation?
2- To what degree is student’s intelligibility pronunciation related?
3- How often do pronunciation-learning strategies of your students influence their
perception and production of intelligible speech?
Section (2)
(A)- Please indicate the degree of difficulty you assume the students experience in the
following tasks.
1- To regroup the words that have the same vowel or consonant sounds in words like
meat, rate, maid, let, say, said, query.
2- To find out the odd member among consonant or vowel sounds such as dull, bull,
wool, pull or warn, dawn, scorn, barn.
3- To discriminate between voiced and voiceless consonantal sounds such as /R, V, M/
and /D, F, I/, etc.
4- To pronounce fricatives like /U, \, &, 6, H, X, 5, </.
5- To produce a consistent vowel quality.
(B) - Please choose one option to reply to the following pronunciation phenomena.
1- How often does the interference of L1 [Arabic] and L2 orthography [spelling] cause
erroneous pronunciation?
2- To what extent do the subjects concerned tend to make value of the phonological
universals to achieve intelligible speech?
3- How often do subjects tend to avoid learning the problematic sounds?
4- How often do your students tend to apply a newly learnt pronunciation rule in an
inappropriate context [overgeneralization]?
5- How often do learners tend to substitute similar sounds of their L1 for L2 but which
are acoustically different?
6- How successful are the Sudanese university learners of English in disassociating their
L2 sound utterance from the repertoire of L1?
Section (3)
What problems do you think do the students experience in the learning of vowels,
consonants and consonant clusters of English? Please give examples of such problems
[wrongly pronounced sounds].
[A] Vowels
I- Examples of pronunciation problems on the short vowels /G, #, ¡, 3, , 7, +/.
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in instructor’s copy)
II- Examples of pronunciation problems on the long vowel sounds /KÖ, WÖ,«Ö, #Ö, nÖ/.
…………………………………………………………………………………………
…………………………………………………………………………………………
III- Examples of pronunciation problems on the following English diphthongs: /+, G+,
n+, 7, 7, G, C+, C7/.
…………………………………………………………………………………………
…………………………………………………………………………………………
APPENDICES 255
[ B ] Consonants
I- Examples of pronunciation problems involving consonants [both onset and coda],
e.g. substitution of /D/ for /R/:
…………………………………………………………………………………………
…………………………………………………………………………………………
[C] Clusters
I- Examples of pronunciation problems experienced with initial clusters. For example,
addition of /+/ in front of /UV/ and /URT/, etc.
…………………………………………………………………………………………
…………………………………………………………………………………………
II- Examples of pronunciation problems involving final clusters: e.g.: /RV, UV, PV, 0M/, etc.
…………………………………………………………………………………………
…………………………………………………………………………………………
Section (4)
Influence of mother-tongue and lack of pronunciation knowledge of the learners
[A] Mother-tongue transfer

1- To what degree are pronunciation errors caused by mother-tongue transfer [Arabic]
of the Sudanese university learner of English?
2- The largest number of pronunciation errors committed by the subjects as result of
mother-tongue transfer, appears on the level of:
a- consonants b- clusters c- vowels d- both …&… e- all
3-To what extent does the mother-tongue transfer influence the learner’s perception of
intelligible speech negatively?
4- To what extent does the mother-tongue transfer influence the learner’s production of
intelligible speech negatively?
5- How do you interpret this linguistic phenomenon?

…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
[B] The lack of the learners’ pronunciation knowledge:

1- To what degree are pronunciation errors caused by the lack of knowledge of the
English sound system on the part of the student?
2- The largest number of pronunciation errors committed by the subjects as result of
the lack of the pronunciation knowledge, appears on the level of:
a- consonants b- clusters c- vowels d- both …&… e- all
3- To what extent does the learners’ lack of pronunciation knowledge negatively
influence the learner’s perception of intelligible speech?
4- To what extent does the lack of the learner’s knowledge of pronunciation negatively
affect the production of intelligible speech?
5- How do you justify your judgment on this linguistic phenomenon?
…………………………………………………………………………………………
…………………………………………………………………………………………
Section [5]
What other linguistic elements do you suggest that you think delay the achievement of
intelligible speech but which have not been covered herein?
…………………………………………………………………………………………
…………………………………………………………………………………………
Thank you for your assistance

Curriculum vitae
Ezzeldin Mahmoud Tajeldin Ali was born in 1971 in Showak, Sudan. He obtained a BA
degree in English Language Teaching (ELT) from Gezira University, Sudan in 1996
and an MA degree in the same discipline from the same university in 2001. Since then
he has been a lecturer in English Language in the Faculty of Education at Gadarif
University, Sudan. At the same university he is also the head of the English Language
Translation Unit.
From 2007 until 2010 he was affiliated to the Leiden University Centre of Linguistics as
a PhD candidate doing research on the intelligibility of English spoken by Sudanese-
Arabic students of English. During this period in the Netherlands he was supported by
a grant from the Sudanese Ministry of Education and exempt from his regular teaching
duties. The present dissertation is the result of this research project.

Speech Intelligibility Problems of Sudanese Learners of English

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speech Intelligibility Problems of Sudanese Learners of English

Uploaded by

Copyright:

Available Formats

Speech intelligibility problems of

Sudanese learners of English

Cover illustration: An overlay of the IPA vowel chart of British English

Copyright © 2011: Ezzeldin Mahmoud Tajeldin Ali. All rights reserved.

ter verkrijging van

EZZELDIN MAHMOUD TAJELDIN ALI

Promotor: Prof.dr. Vincent J. van Heuven

Overige leden: Dr. Rias Z. van den Doel (Universiteit Utrecht)

Chapter One: Introduction

Chapter Two: Linguistic background and related literature

2.2.5.1 The Modified Rhyme Test 38

Chapter Three: Intelligibility of RP English to Sudanese listeners

Chapter Four: Intelligibility of Sudanese English to Dutch listeners

4.5 Test battery 78

Chapter Five: Intelligibility of Sudanese English to British and American

5.7 Conclusions 126

Chapter Six: Acoustic analysis of Sudanese-English vowels

Chapter Seven: Acoustic analysis of English obstruents

Chapter Eight: Acoustic analysis of English consonant clusters

8.3.2 Native speakers of RP English 160

Chapter Nine Intelligibility assessment: written questionnaires

Chapter Ten: Conclusion

10.3.6.3 Acoustic analysis of consonant clusters 194

Appendices (numbered separately by chapter)

Curriculum vitae 257

I wish to thank many of my professional colleagues: Willemijn Heeren, Jurriaan

Thanks go to my students and professional colleagues at Gadarif University in Sudan

Special thanks go to my family for encouragement.

1.1 Introducing the topic of this study

Investigation problems like these are often a by-product of inappropriate research

This study attempts to implement an experimental approach to the investigation of the

Thirdly, phonological awareness of a second or a foreign language is necessary for the

productive intelligibility. The final objective of the investigation of intelligibility

1.2 Statement of topic area

In a wider context, related studies report that Arab-speaking learners of English

1.3 The significance of the study

The study uses an experimental approach to examine segmental intelligibility problems

Secondly, the involvement of native and non-native listeners/speakers as participants is

l. 4 The objectives of the study

(i) To identify the linguistic causes of intelligibility problems manifest among

1.5 Questions raised by the research

1.6 Experimental design and testing methods

1.6.1 Means of data collection

1.6.2 Speaker and listener groups

1.6.3 Intelligibility tests

1.6.3.1 Perception tests

1.6.3.2 Production tests

1.6.3.3 Selection procedure of a model Sudanese EFL learner

For the selection of a representative speaker from among a total number of 11

1.6.3.4 Written questionnaires

The remainder of this study consists of nine chapters arranged as follows:

Chapter 3 investigates the receptive speech intelligibility problems of Sudanese EFL

Chapter 4 investigates the productive speech intelligibility problems of the EFL

Chapter 6 reports an acoustic analysis of the English vowels spoken by Sudanese

Chapter 7 provides an acoustic analysis of the English consonants spoken by Sudanese

Chapter 8 performs an acoustic analysis of the English consonant clusters produced by

Chapter 9 discusses impressions and assessments of Sudanese students and teachers of

Linguistic background and literature

To the best of my knowledge, there is no comprehensive experimental work involving

2.1.2 Acoustic and perceptual characteristics of vowels

2.1.2.1 English and Arabic vowels

2. 1.2.2 Length feature

2. 1.2.3 English and Arabic vowel formants