Professional Documents
Culture Documents
An experimental approach
Published by
LOT phone: +31 30 253 6006
Trans 10
3512 JK Utrecht e-mail: lot@uu.nl
The Netherlands http://www.lotschool.nl
ISBN: 978-94-6093-057-7
NUR 616
An experimental approach
PROEFSCHRIFT
door
References 199
Samenvatting 211
Summary in English 215
I owe gratefulness to the Leiden University Fund (LUF) and the Leiden University
Centre for Linguistics (LUCL). Their commitment solving many of the practical
problems was invaluable support for this study.
In the first years of my stay, I enjoyed sharing coffee and chat with Rob Goedemans,
Maarten Hijzelendoorn, Jurgen van Oostenrijk, Ellen van Zanten, Chaoju Tang, Sandra
Barasa, Vincent van Heuven, Jos Pacilly and others. Special thanks are also due to the
staff members of the Sudan Embassy in the Netherlands, and many Dutch, foreign and
Sudanese friends in Leiden. They were friends indeed when help was needed.
Thanks go to the library staff at Leiden University for being so helpful offering service
in every respect.
I would like to express my gratitude to the staff members at the Ministry of Higher
Education in Sudan and at Gadarif University for the scholarship I received during my
stay in the Netherlands, and to the contact persons at the Ministry of Higher Education
for their patience listening to my telephone calls.
Introduction
Instructors of English as a foreign language (EFL) aim to help their students to achieve
successful communication, using the language. The students may use English to
perform various communicative tasks such as assignments, debates, passing
examinations and so on, as part of their daily work during a semester. They also need
English to engage in complicated communicative activities in real life, such as
communication for job interviews, and academic and professional pursuits. Therefore,
it is fair to conclude that the task of achieving effective communication by EFL learners
is complex, which requires mastering many language skills such as listening, reading,
writing and speaking, pronunciation and comprehension abilities. However, in this
context, pronunciation, comprehension and listening abilities will receive more
attention than the other skills which have very much to do with the study at issue. The
learners need to produce accurate speech sounds as well to show high abilities in
comprehension, when they are involved in interactions. However, for various reasons,
learners of English as a foreign language have problems making themselves intelligible.
They either fail to understand the message conveyed by speech or to pronounce
English intelligibly. Many language studies are now attempting to investigate this type
of speech learning problem.
As everywhere in the world, the study of speech intelligibility problems of English has
recently emerged as a rapidly growing issue of inquiry, extending across different
disciplines of language teaching, in Sudan. Researchers and language teachers have
approached many English language issues such as syntax, lexis, comprehension, reading
and other skills (see e.g. Towards a functional approach to the English research on the writing
skills in Sudan – Abdalla 2005, 2001 and Vocabulary learning strategies: A case study of
Sudanese learners of English – Ahmed 1988). Their accounts indicate that much effort has
been expended in these areas. However, relatively little empirical investigation has been
done on English speech perception and production problems, in Sudan, except for a
few studies and text-books that provided reviews in a more impressionistic manner
(English pronunciation for Arabic speakers – Mitchell and El Hassan 1993 and Errors in
English among Arabic speakers: Analysis and Remedy – Kharma and Hajjaj 1989).
Impressionistic views such as these only inform the scholarly reader about the topic
under investigation in descriptive terms and provide virtually no practically oriented
reference methodology. Therefore, they do not contribute effectively to the solution of
the problem. One similar example is that examining issues such as pronunciation or
intelligibility problems using only written tests, which ask candidates overt questions
2 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
about the types of problems they face in recognizing and pronouncing, for example,
English phonemes or words, will mostly provide inadequate findings.
More specifically, as it has been argued that phonemes form the basic sound knowledge
of speech, the study focuses on segmental analysis of the English speech sounds
spoken by Sudanese EFL learners. Recently, comprehensive literature surveys have
been carried on segmental analysis, e.g. assessing constraints on second-language
segmental production and perception (e.g. Flege 2003). In L2 pedagogy, intelligibility
and speech perception are issues that motivate many investigations targeting segmental
difficulties which are experienced by ESL/EFL learners. 3 Segmental analysis
approaches the measurement of intelligibility in phonological and phonetic terms that
closely relate to differences between the sound systems of the learners’ L1 and L2. This
evokes the argument that phonemic differences may well represent the most difficult
learning problems experienced by EFL learners: many related studies reveal the
influence of phonemic variation across languages (do Val Barros 2003).
Different considerations necessitate the use of segmental analysis in this context. Firstly,
3
A distinction is often made between English as a second language (ESL) and English as a
foreign language (EFL). In the former case English is the dominant language in the learning
environment, e.g. when an immigrant has to learn the language in England. In the latter case the
new language is learnt in the learners’ country of origin, typically as part of the school curriculum.
CHAPTER ONE: INTRODUCTION 3
there are many differences between the phonemic systems of English and Arabic. The
Arabic vowel system distinguishes only three vowels, viz. /C, W, K/. These vowels are
mostly unwritten (or marked by diacritics on consonant symbols) and represent short
vowels. They are not part of the Arabic alphabetic or ordinary spelling; the vowels are
inferred from context. The vowels perform a morpho-phonemic function in Arabic
word formation (Hayat 2005, Alan 1997). They also function to mark inflectional
categories such as tense, gender and number, which reveals the nature of the Arabic
non-concatenative morphological system underlying deep phoneme regularities (Ken-
stowics 1994). The situation is different in English, which has a large number of vowels
of a more complicated nature, comprising pure vowels (or monophthongs) as well
diphthongs (such as /G+/, /C+/, etc.), all of which may occur in accented as well as
unaccented syllables.
Secondly, there are acoustical differences between the English and Arabic vowel
systems. Vowel length is an important temporal feature distinguishing between vowels
in the two languages. In Arabic, length signals a short/long distinction (length in
relation to vowels is like gemination in relation to consonants). In English, some vowels
are long, e.g. /KÖ/ in seed and /#Ö/ in car, whilst others are short e.g. /+/ in fit and /G/ bed
(Mitchel 2004) but the long and short vowels are also distinguished by difference in
phonetic vowel quality (determined by degree of mouth opening, constriction place
along the front-back dimension and degree of lip rounding). This reinforces the
argument that phonologically, the durational differences in Arabic vowels are
independent of vowel quality, whilst in English, durational differences are not and do
not necessarily have a systematically orthogonal relation to quality differences (De Jong
2004). Similar phonemic and acoustic differences exist between the English and Arabic
consonant inventories. Arabic has complicated phonological features such as emphasis
and gemination, which may not correspond to those of English.
The theme sketched above has recently motivated researchers of ESL/EFL (e.g.
Strange, Bohn, Trent and Nishi 2004, Wang and Van Heuven 2006) to conduct
experimental analyses of the English vowel system as spoken and perceived by native
and non-native speakers. In the current study, experimental analysis has been
conducted targeting receptive and productive speech intelligibility. The analysis covers
perception tasks which treat receptive intelligibility. It also covers important properties
such as the graphical representation of the vowel space and temporal structure of
English vowels, consonants identified and produced by Sudanese speakers dealing with
4 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
The topic of this research is to investigate speech intelligibility problems that Sudanese
EFL learners face at university. Investigation attempts to account for the extent to
which linguistic factors can impede receptive and productive intelligibility of the
English speech sounds. Linguistic factors herein refer to (i) L1 interference and (ii)
awareness of English speech sounds. Concerning the first factor, L1 interference, the
purpose is to examine to what extent the learners’ mother-tongue obstructs their
learning of the English speech sounds. This is because Sudanese EFL learners arguably
experience difficulty in identifying and producing the English speech sounds (i.e.
vowels, consonants and clusters) due to transfer of their L2. These problems have been
addressed before, but from an impressionistic point of view. One of these studies
suggests that Sudanese learners of English have difficulties perceiving and producing
the English vowels (Mohammed 1991). Another study (Bobda 2000) more specifically
claims that the production of the English vowels /«, «Ö, ¡/ forms a problem to Sudanese
EFL learners due to their Arabic linguistic background.
Lack of explicit knowledge of the English speech sounds represents the second factor
that is argued to cause intelligibility problems for Sudanese learners of English. It is
assumed that the learners’ explicit knowledge of English speech is insufficient, which
delays their recognition and production of these sounds. Explicit knowledge, in this
study, covers the articulatory and auditory awareness of English required to recognize
and produce English speech sounds. Articulatory knowledge includes learning to
produce the new sounds and this implies unfamiliarity of the learners with such speech
sounds. Therefore, the learners need to develop articulatory habits, which can be
acquired with more exercises or exposure. Awareness of this aspect of knowledge
enables Sudanese EFL learners to understand, choose and use L2 sounds efficiently in
CHAPTER ONE: INTRODUCTION 5
interactions. The learners also need to know about the correct distribution of the
English speech sounds, in isolated words, or in connected speech, when they hear these
sounds in recordings or spoken, in their correct order in syllables or words, etc. The
learners may need to know the perceptual representations of speech sound patterns,
which are built from the auditory mapping information. This is because the perception
of specific L2 speech sounds can influence the identification of these sounds.
Perceptual confusion patterns may be indicative of the structure of the perceptual space
and the strategies used by L2 listeners. Furthermore, learners, in this study, need to
have some background in the acoustic and temporal cues of English, in so far as they
are important for phonemic distinctions. As the related literature shows, most of the
perception and production errors of English speech sounds are the result of the lack of
these aspects of knowledge (e.g., Mohammed 1991).
To find evidence with a realistic degree of certainty about the speech intelligibility
problems, an experimental approach will be adopted. For receptive intelligibility
measurements, I will implement auditory discrimination methods such as the Modified
Rhyme Test (MRT), which treats isolated stimuli (vowels, consonants and clusters),
read in a fixed carrier phrase (Say …..again), and the SPIN (Speech Perception in Noise,
see § 2.2.5.3) test, with the target words embedded in meaningful sentences. The MRT
tests the existence of categorical distinctions on the part of the listener; i.e. it is a
segmental intelligibility measurement (Flege 1976). When EFL materials are presented
to native listeners, the tests show whether the L2 production of the learners contains
contrasts of interest, e.g. rake vs. lake. Conversely, when native English materials are
presented to L2 listeners, the tests show whether or not the EFL learners know how to
make the relevant perceptual distinctions in the target language.
The study will also attempt to measure the acoustic correlates of the English speech
sounds produced by the Sudanese learners of English. Acoustic correlates include (i)
duration and phonetic quality (position in the F1-by-F2 formant space for vowels, (ii)
duration, voice onset time (VOT), centre of gravity and intensity for consonants and (iii)
duration of constituent sounds in consonant clusters. The aim of the measurements is
to find out how the differences spectral and temporal properties between Sudanese
EFL learners’ L1 (Arabic) and English can affect intelligibility. Finally, paper-and-pencil
questionnaires will be distributed to test Sudanese students’ and their instructors’
opinions on what difficulties they experience in learning English as a foreign language,
which intuitions may help to understanding the problems uncovered by the functional
tests.
hardly gives real insight into such a subject, unlike the experimental conduct that may
provide more real and accurate feedback using technology.
In fact, few studies, in the Sudanese context, have approached the issue of speech
intelligibility problems in experimentally, as this study will do (see § 1.1). Most
investigations focus on English learning problems such as reading, writing, syntax,
listening skills, and so on. However, few studies treat the problems of English pro-
nunciation or perception problems among the Sudanese EFL learners in descriptive
terms.
Thirdly, multiple methods are used in the investigation to substantiate data sources
increasing the reliability of this research. The use of several data sources and different
methods presented a sort of triangulation: i.e. a variety of methods in social sciences for
data collection. The idea behind triangulation is to contribute to agreement of different
data sources, which serves a more reliable interpretation of the data.
This study aims to devote a greater care to speech intelligibility problems that are
experienced by Sudanese learners of English. Very little effort has been given to such
types of language problems. Even the specialized workshops (these are workshops
dealing with EFL problems in the Sudanese context) on speech intelligibility problems
have provided little information. Their findings provide insufficient accounts for the
problems concerned. Moreover, in terms of methods, these studies use a database
obtained by means of interviews, which give descriptions and impressions about
research problems, rather than results extracted from experiments. In general, the
current research attempts further investigation on the impediments of speech
intelligibility among Sudanese EFL learners. The study will act as a pioneer project in
the sense that it avails itself of experimental evidence for the issue under concern in
order to serve as a blueprint guideline for future attempts aiming to solve such types of
problems. Thus, the research specifies two goals:
1. To what extent are Sudanese university EFL learners intelligible to native listeners
of English?
2. Are English vowels the most difficult to pronounce as opposed to consonants or
consonant clusters?
3. Which English speech sounds produced by Sudanese EFL learners do native
listeners find most difficult to recognize within each of the categories vowels,
consonants and consonant clusters?
4. What is the precise nature of the speech intelligibility problems observed among
the Sudanese learners of English?
5. What are the linguistic causes of such problems? More specifically,
Do the inventory differences between the learners’ L1 and the target language
present a major cause of these problems?
Does insufficient explicit knowledge of the English sound system on the part of
EFL learners aggravate their intelligibility problems?
This part provides a short description, which serves as a bird’s eye view of the research
methods and experimental design adopted in this study. However, more specific
information on the experimental design will be provided later in the separate chapters.
Different ways of data collection are adopted in this study, which include perception
tests, production tests and written questionnaires. All the tests target English speech
sounds that include vowels, consonants, clusters, and words embedded in high-pro-
bability sentences (SPIN).
This study targets three groups of participants that descend from different linguistic
backgrounds. The Sudanese university EFL learners represent the test group, which
participated in all the experiments as listeners and/or speakers of English. Similarly,
native speakers of RP English are involved in the experiments as listeners/speakers of
RP English (model groups). American and Dutch groups of subjects participated in the
experiments as listeners only. More importantly, the selection of participants varied in
terms of nationalities, linguistic distance and the number of times they took the tests.
None of the individual subjects involved in these experiments, participated in any of
the perception, production tests or the questionnaires more than once. Furthermore,
recruitments for the perception tests include native listeners (students and professors)
of British and American English who answer the tests questions from inside The
Netherlands, while others answered them online from a distance, e.g. from Britain or
8 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
America. This variation in the recruitment criteria will contribute to the reliability of the
results.
For the measurement of intelligibility of the subjects, the Modified Rhyme Test (MRT)
was used in all the perception tests, which is considered to be the most accurate and
reliable measurement of such an speech intelligibility (see Logan, Greene and Pisoni
1989). The MRT measures segmental intelligibility through a word identification task
employing a set of four-alternative forced choice test items.
The production experiments serve to establish the acoustic correlates of English speech
sounds spoken by Sudanese EFL learners. The tests seek insight into the phonetic and
acoustic differences between the learners’ L1 and L2 in areas such as vowel duration,
voice onset time (VOT), centre of gravity, preceding vowel effect. Importantly, before
doing the production tests, the learners had a short training. Firstly, they were asked to
read three lists of key words of English including vowels, single consonants and clusters.
The aim of the key words was to guide the learners to the correct pronunciation of the
target phonemes. Secondly, the learners were instructed to pay special attention to
different types of vowels (lax vowels, tense vowels, diphthongs), to the contrast
between voiced and voiceless consonants, and to initial and coda clusters (see separate
chapters).
The third important means of data collection were written questionnaires that invited
Sudanese EFL learners and their teachers to voice their subjective opinion as to what
difficulties they experienced in correctly producing and perceiving English sounds and
sound combinations. The availability of this type of data may afford a better
understanding of the topic under investigation. For one thing, It also forms one of
relaxed technique of data collection, which offers the subjects an opportunity to think
and write down their answers.
1.7 Chapterization
Chapter 2 includes two sections. Section one provides a contrastive analysis of the
English and Arabic phoneme inventories, describing differences and similarities
between the two systems. On the other hand, section two provides a linguistic
background and reviews the contributions of relevant literature.
Chapter 5 treats the speech intelligibility problems of Sudanese EFL learners when their
speech production is presented to native listeners of English (British and Americans).
The chapter aims to establish the extent to which Sudanese EFL speech is less
intelligible to native English listeners than native English speech.
Chapter 10 presents a summary of the research and its findings. It discusses the
implications of the findings for current views on the role of native-language
interference in second-language acquisition, and makes recommendations for future
research.
Chapter Two
2.1.1 Introduction
The difficulty of learning the phonological categories of a target language has received
much discussion in second-language studies. Brière (1966) attributes the learning
problems of phonological categories to the competing phonemic categories of L1 and
L2 systems, the allophonic features of the phonemes and the distribution of these
categories within their respective systems. Therefore, the presence or absence of these
features plays an important role in the learning of L2 speech sounds. That is, the higher
the degree of similarity that exists between the phonological systems of the source (L1)
and the target (L2) language, the easier it is for the second or foreign language speaker
to learn the phonological categories. In this sense, the hypothesis of a phonological
system of a language does not only refer to the sounds of such a language, but a
combination of distinctive and non-distinctive features that may cause interference.2 L1
interference affects the learning of L2 speech sounds in two ways. The learners tend to
pick up only the distinctive features and to ignore the redundant. They also tend to
interpret the target sounds in terms of the features of their L1 sound system. However,
in another account of interference it is argued that it is easier for second-language
2
Interference is a language phenomenon that refers to the transfer of L1 rules to the learning of
L2. In the learning theories of second language (Flege 1995), a sort of language filter occurs in
the learning process of a second/foreign language where the norms of L1 may facilitate learning
or inhibit it. In the case of similarities, L1 norms facilitate the learning of L2 through positive
transfer. However, a negative effect often takes place and this is normally associated with the
differences in L1. This negative transfer is also called interference (Miller 1981). Native speakers
can identify foreign accents that appear in the speech produced by L2 speakers. Therefore,
pronunciation errors of second-language learners do not just present random attempts to
produce unfamiliar sounds but rather reflect the sound inventory, rules of combining sounds, and
the stress and intonation patterns of their native languages (Ohata 2007).
12 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
speakers to learn an entirely new phoneme that is absent from their mother tongue than
to learn a sound that partially resembles an L1 sound. All in all, learning problems of L2
phonemes occur when a second-language speaker starts from the assumption that L2
speech sounds are the same as those of his/her L1. In this sense, the learners start by
using their L1 perceptual strategies in recognizing or producing the new language
sounds. Contrastive analysis is a branch of linguistics that seeks to identify the types of
phonological errors that EFL/ESL speakers make when perceiving and pronouncing a
second or foreign language. Moreover, contrastive analysis makes predictions with
regard to a hierarchy of difficulties, which is based on the new phonemes, new
allophones and new sequences, i.e., those aspects that stand out as the distinctive
properties of the target language (Brière 1966, Hoffer 1970). The phonetics of a
language should also be considered since it causes many of the difficulties facing the
ESL/EFL learners. Contrastive analysis in this section aims to make predictions about
the types of errors that might be a true reflection of learning problems. It attempts to
show the degree of dissimilarity between the sound inventory of English as the target
language, and of Arabic, which is the first language of Sudanese EFL learners.
Importantly, the discussion of Arabic considers both Sudanese colloquial and Modern
Standard Arabic (MSA), which starts with MSA and then moves on to Sudanese
Colloquial Arabic (SCA) discussing differing areas. In the present research it would
seem impossible to make a unique choice between the two varieties, i.e. MSA and SCA,
as the EFL learner’s native language background, and it is precisely for this reason that
I will assume that both varieties have to be considered together when accounting for
learning problems experienced by Sudanese students of English. Several considerations
have led me to this decision.
Firstly, MSA forms the common base from which the phonemes of Arabic dialects
stem. Secondly, as a part of the educated class, the learners’ everyday communication is
not totally free from MSA. The learners’ shift to MSA may arguably influence their
colloquial Arabic serving to reduce the differences existing between Sudanese colloquial
and Modern Standard Arabic. Thirdly, the context of Arabic linguistics is characterised
by what is known as diglossia, i.e., a language phenomenon that refers to two varieties
of a language used adjacently. The two varieties at issue are MSA and the spoken
vernaculars. Vernaculars which are used in everyday communication across the Arab
world, are characterized as more mutable and flexible forms of language than MSA.
This language reality reinforces the argument that the existence of two varieties side by
side serves to narrow the distance between these varieties. The reality of Sudanese
Arabic supports these arguments, which witnesses only a narrow change of its sounds.
Its vowel inventory developed /G/ and /Q/. As for consonants, /&, \/ are merged into
/\/ and /6, U/ into /U/ whilst /S/ is pronounced /I/. Furthermore, Sudanese Arabic
permits no diphthongs or consonant clusters at all (details in §§ 2.1.2.1, 2.1.3, 2.1.4). So,
it is possible to argue that all other vowels and consonants do not differ from MSA.
According to Ryding (2005), the Arabic language context does not show a sharp
division between the written and spoken forms of Arabic varieties across the Arab
world as it might be the case in some other languages. There is a continuum of
language ordering, which runs from high to low. Thus, MSA takes the highest position,
followed by formal (a spoken standard form, see Long 1996) and colloquial varieties.
But this reality is conditioned by several factors such as the speakers’ academic
background and the use of Modern Standard Arabic as a means of communication on
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 13
television and other public media everywhere in the Arab world. Importantly, the use of
Standard Arabic and colloquial varieties side by side can eliminate dialects.
Vowels are characterized by a free passage of the air stream. It is possible to describe
and feel movement and posture of the tongue and the relatively passive surface of the
vocal tract of vowels; however no closure or strictures occur when vowels are produced.
Importantly, there is a need to make use of the auditory and articulatory means of
perception and description of vowels. In this context, the ear is all-important for this
task since speech can be seen as a matter of input and output.
Arabic is a language which has a small inventory of vowel sounds. Its vowel system is a
classical triangular system that maintains the Proto-Semitic vocalism represented as
open, close front, close back: /C, W, K/ (see Figure 2.1), each of which may be short or
long (geminated) (Kaye 1997). These three vowels are often described as diacritics,
which refer to special unwritten marking interpreted as short /C, W, K/. 3 Munro (1993)
reports similar descriptions according to which Standard Arabic has three basic short
vowels /K, W, C/ of which /K / is realized as /+/, /W/ as /7/ and /C/ as /3/, but he adds
that there are five long vowels realized as /KÖ, GÖ, CÖ, WÖ, QÖ/. Arabic has only two di-
3 In the Arabic script, the harakat (diacritic marks) are special unwritten marks (they are not part
of Arabic alphabetic or ordinary spelling, but understood from context) which represent short
vowel sounds /C, K, W/. The literal meaning of harakat is ‘movements’, e.g., in the context of
moving airwaves that we produce while pronouncing vowels. Diacritic marks stand for English
lax vowels /C, K, W/ (Chomsky and Halle 1968, Hayat 2005, Alan 1997). This characteristic affects
the ability of Arab learners of English to extract and process the English vowels, which form part
of English words. That is, Arabic orthography of the daily newspapers do not use diacritics.
Native speakers of Arabic focus on only consonants, the structure of which encodes the roots
with general semantic value. This process cannot be applied to vowels in English (Fender 2008).
14 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
phthongs /CW, G+/ (Hayat 2005). However, Mitchell (2004) states that the diphthongal
feature is absent from the Arabic speech sound system. There is a variation in Arabic
vowels across Arabic dialects. According to Dickens (2007) the Sudanese vowel
inventory contains five short vowels /K, W, C, G, Q/ and five long vowels /KÖ, WÖ, CÖ, GÖ, QÖ/,
which uncontroversial form an extension of the short vowels (see also Munro 1993,
Raimy 1997). However, in Sudanese Arabic, /G/ is also realized as a reduced form of
/G+/, whilst /Q/ is a reduced form of /C7/ and often realized as /W/. Moreover, in
Sudanese urban Arabic there is alternation between /K/ and /L/ on the one hand, and
between /W/ and /Y/ on the other, depending on the position of /L/ or /Y/ in the
syllable. Since no vowels are possible in initial position in Arabic, the alternation is
analysed as an underlying phoneme /M/ which is realized as /K/ in nucleus position but
remains a consonant /L/ in marginal position. Similarly, /Y/ is realized as /W/ in the
nucleus and as a consonant /Y/ in peripheral position. However, Sudanese Arabic /L/
can often be represented well by /KÖ/ rather than /K/, whilst /Y/ can be represented by
/WÖ/ rather than /W/ in nucleus position. This account is clear in some Arabic words
such as /KÖF/ ‘an annual occasion (festival in Arabic culture)’ versus /KF/ ‘water well’
and /IWÖN/ (‘say’ in Sudanese Arabic) versus /SWN/ (‘say’ in MSA) and so on.
Figure 2.1. Arabic vowels as described in classical triangular Proto-Semitic (after Kaye 1997).
The figure stands for the original Arabic vowel system which forms the base of Modern Standard
Arabic and other Arabic dialects.
Importantly, as sound properties, Arabic vowels play an essential role, e.g., in syllable
and word formation: i.e. they do not bear meaning like consonants, but they represent
connectors in word structure. This means that in word structure, vowels form
constituent morphemes sprinkled through the word rather than taking place as
continuous segments. This characteristic is clear in word families such as (darasa ‘he
studied’) and (hamala ‘he carried’) where the a-vowels are inflectional affixes. It is worth
noting that, in these families of semantically related words, the only constant formal
property is that each stem has three consonants in a fixed order (drs and hml,
respectively). Vowel-consonant interspersion, in a way, reveals deep regularities of the
nature of Arabic vowels and how/where they work. It also reveals that the distribution
of consonant versus vowels in Arabic is determined by the CV template that
characterizes the morphological categories a given word belongs to; i.e. it marks the
inflectional categories such as tense in verbs and number on nominal cases, etc.
(Kenstowicz 1994, Frisch 1996, Nwesri, Tahaghoghi and Scholer 2006). In English, the
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 15
consonant versus vowels distribution is lexically contrastive (cf. VCC art, CVC rat and
CVCC taunt).
On the other hand, the English vowel system is complex (see Figure 2.2 below). It
consists of nineteen (or even twenty) vowel phonemes. These include eleven (twelve if
// is accepted as a separate phoneme) pure vowels (or: monophthongs) and eight
diphthongs in stressed position which can be categorized in different terms. Vowel
production involves the position of the lips, the tongue, the parts of the tongue used
and the degree of raising. With respect to tongue position in the mouth, there are three
distinctions in RP English; i.e., front vowels /KÖ, +, G, 3/, central vowels /«Ö, ¡/ and back
vowels /WÖ, 7, nÖ, b, #Ö/. The back vowels are rounded while the front and mid vowels are
unrounded. In terms of the degree of tongue height, /KÖ, +, WÖ, 7/ are high vowels, /G, «Ö,
nÖ/ are mid vowels and /3, #Ö, ¡, b/ are low vowels. The RP vowels are divided into
tense/long and lax/short vowels (force of articulation). This contrast is primarily one
of vowel quality; the difference in duration is only a secondary cue of the tense/lax
distinction. The tense vowels occur in both closed and open syllables whereas the lax
vowels may only occur in closed syllables. Tense vowels are accompanied by ‘Ö’ as a
length mark, such as in /WÖ, «Ö/. Importantly, the distinction between English short/
long vowels depends upon three oppositions, which make the task more complex (see
Dretzke 1998). Another important feature of English vowels is that the tense/lax vowel
tokens, i.e. /WÖ, 7/, can often be distinguished by quality alone as in foot/boot, quality and
quantity as in good/food, while the quality of /7/ has to be kept quite distinct from that
of a reduced form of /WÖ/. This feature can cause learning problems for ESL/EFL
learners whose native languages have a small vowel inventory. Additionally, English has
sequences of vowels included under a term called diphthongs. These are /G+, 7, C+, C7, n+,
+, G, 7/. Diphthongs are vowel sounds that have a glide within the syllable. The first
element in English diphthongs is called the starting point and the second is the one in
which the glide is made. The diphthongs mentioned above are illustrated by words such
as laid, load lied, loud, Lloyd, leered, laird and lured, respectively. The centring diphthongs
/+/, /G/ and /7/ are a prominent characteristic of British English (Mitchell 2004). A
number of generalizations apply to RP diphthongs. Their length is equivalent to that of
long vowels and they are susceptible to regional variation (Cruttenden 2008).
16 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Figure 2.2. English pure (or: monophthongal) vowels (after Roach, Hartman and Setter 2006).
Vowels are described in relation to the tongue and lips positions. High vowels are in the top of
the chart, mid vowels in the middle and the low vowels appear in the lowest part of the chart.
The horizontal dimension captures the front (left) to back (right) distinction.
Vowel length presents an important temporal cue, which classifies vowels into short
and long tokens. In Arabic, all three vowels /C, W, K/ are subject to a short/long
distinction. Similarly, English possesses short/long contrast; however, in English,
vowel duration is influenced by the following consonant and other environmental
features (Mitchell 2004). In Arabic, short and long vowels are clearly different from
each other. Long vowels tend to be twice as long as the short ones. In this sense, there
is a possibility that short/long vowels of Arabic across dialects can correspond to
English equivalent short/long vowels (Munro 1993, Mitleb 1981). However, in Arabic
listeners/speakers may attend to more than just acoustic vowel duration to distinguish
between short/long vowels (Tsukada 2009). Arguably, Sudanese Arabic applies a
similar duration strategy distinguishing between short/long vowels.
4
For an explanation of vowel formants (resonances) see § 6.3. Here it suffices to know that the
lowest resonance frequency, F1, is reflects degree of mouth opening; the second lowest resonance
frequency, F2, is related to vowel backness and lip rounding.
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 17
be closer to Arabic /C/, however the front vowels /+~KÖ/ show no serious spectral
problems. In general, Arabic effects on L2 vowel production are pervasive in all vowels.
(Munro 1993). Specifically, Sudanese Arabic shows differences in vowel spectral
properties (i.e., L1 and L2 formant values). That is, the Sudanese Arabic long vowels /KÖ,
CÖ, WÖ/ show relatively lower F1 and F2 values compared with their English counterparts
(Elobeid and Maaly 1996).
Figures 2.3-4 provide a comprehensive survey on the F1, F2 and F 3 of Arabic and
English vowels. In general, the formants of Sudanese vowels tend to have lower values
compared with their English counterparts (cf. Figures 2.3 and 2.4). However, all
Sudanese Arabic vowels relatively show formant directions similar to those of English.
This information can help predict the durations of the Sudanese Arabic vowels may
have some kind of correspondence to English duration rates. It also implies that the
Sudanese EFL learners may not have problems producing English durations of short
and long vowels.
Frequency (Hz)
Figure 2.3. F1, F2 and F3 values (plotted vertically, in Hz) of Sudanese Arabic vowels (after
Alghamdi 1998).
18 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Frequency (Hz)
Figure 2.4. F1, F2 and F3 values (plotted vertically, in Hz) of English vowels (after Deterding
1997).
This section provides linguistic information about the similarities and differences that
exist between English and Arabic language sound systems. The section will attempt to
survey the types of learning errors which may occur due to phonetic and phonological
differences between English and the learners’ L1 (Arabic) using the data of the related
studies.
Table 2.1 below provides some patterns of phonemes which exist in the English vowel
inventory but which may or may not exist in the Arabic inventory. This information is
useful in making predictions of the learning problems which Sudanese learners of
English are assumed to face.
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 19
Table 2.1 Some predictions of learning problems of English vowels. It provides accounts for the
sort of errors assumed to be made by Sudanese EFL learners.
The first language of the subjects is Arabic, a language with at least 28 consonantal
sounds. These are the obstruents /D, F, V, M, H, \, U, O, P, &, 6, F<, 5/, approximants /Y, Ä,
L/, trill /T/ and the back consonants glottal /!, J/, uvular /¢, ZS and pharyngeal /Í, /,
plus the emphatic stops and fricatives /V, F, &, U/ (Huthaily 2003, Kaye 1997, Laufer
1988). English, the target language, has 24 consonants /D, R, F, V, I, M, F<, V5, X, H, &, 6, \, U,
<, 5, O, P, 0, N, Y, L, J/ and an approximant /T/. In principle, some kinds of similarities
exist between English and Arabic consonants, in a wide range that includes obstruents,
20 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
nasals and approximants (see Suhana 2001). However, some consonants have specific
characteristics that mark them as unique due to categorical phonemic differences.
Arabic has a considerable presence of plosives at different places of articulation. It has
both voiced /D, F/ and voiceless stops /V, M/. However, unlike English, the absence of
/R/ and /I/ is unique to the Arabic language. In some Arabic dialects such as Iraqi and
Lebanese there is a voiceless /R/ probably due to the influence of Persian (Kaye 1997).
Along the same line, the phonemic system of Sudanese Arabic (SA) (see Figure 2.1
below) has /I/ instead of the uvular /S/. In fact, /I/ is often used by Bedouins in the
place of /S/, which suggests that the latter is the original phoneme (Karouri 1996).
Arabic has a large number of fricative sounds, including four pairs that show a voicing
contrast and three voiceless fricatives with no voiced counterpart. In terms of
articulation, the fricative pair /&, 6/ are dental in English whilst in Arabic they are
rather inter-dental sounds. These fricatives are absent in many of the languages of the
world which designate them as a major source of mispronunciation for ESL/EFL
learners. Such a case applies to the consonant inventory of Sudanese colloquial Arabic:
i.e. the inter-dental fricatives do not exist in the Sudanese IPA phonetic chart. They
merged with the apico-dental fricatives /U/ and /\/.
The Arabic voiced palatal approximant /L/ and voiced labial-velar approximant /Y/ are
found in many languages of the world (often called semi-vowels – see Arabic vowels
above). More importantly, /Y/ has two interpretations in Sudanese Arabic. In the
phonetic literature, it is formally described as a labial-velar. This means that the
description of /Y/ as bilabial glide refers to the phonetic realization that labiality is the
primary articulation and that velarity is a concomitant secondary feature which forms a
natural corollary of this labiality, however, other linguists claim the opposite. According
to Dickens (2007), the phonemic system of the Sudanese Arabic makes this
phenomenon more reasonable. That is, in Sudanese Arabic has both /Y/ articulated as
a post-dorso-velar, in terms of standard articulation and as a palatal-velar in terms of
the functionalist analysis.
Sudanese Arabic also contains /Z/, a sound produced in the same place in the mouth as
English /I/ but with a fricative sound. It is usually transliterated as <kh> and
corresponds to the final sound in Scottish loch ‘lake’ or German lach ‘laugh’. One more
distinctive Arabic consonant is /S/. It is called a uvular because the tongue touches the
uvula. Another consonant is pronounced even farther back, the tongue touching the
back wall of the throat (pharynx) just enough to produce a hissing sound like //. This
consonant forms one of the most distinctive Arabic sounds. The glottal Arabic sound
/!/ is classified as stop consonant. However, speakers of Arabic as a second or foreign
language often considers it a vowel sound. In this sense, there is an important
consideration that no syllable or word in Arabic starts with a vowel. Therefore, if an
Arabic word is heard to begin with a vowel, it actually begins with a glottal stop /!/
hamza). Non-native speakers tend to classify many Arabic words such as umma /WOOC/
‘nation’ and usbuu /WUDWÖ/ ‘a week’, as initiated with a vowel. It is probably because they
are not familiar with such a type of phoneme that forms a real stop in Arabic (see
Ryding 2005).
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 21
The voiceless affricate /V5/ is absent from the Arabic consonant inventory; the only
voiced post-alveolar affricate that exists in Arabic is /F</. Some Arabic dialects do have
such a phoneme, e.g. in Iraqi and Lebanese dialects, but Sudanese Arabic does not have
it. Development of /V5/ in some Arabic dialects is likely due to influence of Persian,
which language possesses this sound. On the other hand, the English consonant
inventory includes both the post-alveolar voiced and voiceless affricates /F<, V5/.
In terms of distribution, emphatics are only a feature of a CV syllable (but not VC),
which represents its minimum span, e.g., as in /V+D/ ‘medicine’, /F+F/ ‘against’, etc., It
is also possible to find more than one emphatic in words containing two or more
syllables.
Arabic (in the area around Khartoum) tends to be just less pharyngealized, the Buttana
dialect (a large region in Eastern Sudan) lost emphatic /U/ almost completely. However,
in words such as /TCÖU/ ‘a head’, /T7UWÖO/ ‘fees’ native speakers of Sudanese colloquial
Arabic change /U/ to /U/. This new use of emphatic sounds is common among
Baggara Arabs in Western Sudan (Shuwa Arabs).
Phonetic influence of emphatics: One important issue to be discussed in this context, is the
phonetic effect of emphatics. Traditional analysis of Arabic provides little acoustical
information on emphatics. However, it accounted for the contrast between emphatic
and plain forms in terms of the orthographical system of Arabic. According to this
analysis, the only evidence of contrast between plain and emphatic consonants is that
only forms with emphatic graphs in the spelling of Arabic can be considered emphatics.
For example, plain and emphatic forms /V, V/, as in /V+ÖP/ ‘figs’ and /V+ÖP/ ‘clay’, and /F,
F/ as in /TCÖM+F/ ‘still, e.g. still water’, and /TCÖM+F/ ‘runner, jogger’ respectively, con-
stitute evidence of contrast. On the other hand, consonants like /P, M,O/, etc. are not
considered emphatics since they do not have emphatic counterparts in Arabic ortho-
graphy (Lehn 1963). In synchronic descriptions, however, one generally finds no claims
to the effect than the emphatic consonants per se are acoustically different from their
plain counterparts. The emphatic~plain contrast is apparent only from the effect of the
contrast on the adjacent vowels. Vowels following emphatics show a raised vowel
formant F1 and a lowered F2 in comparison to their counterparts in a plain environ-
ment (Obrecht 1968, Ahmed 1984, Newman and Verhoeven 2002, Watson 2002). In a
recent study, Jongman, Herd and Al-Masri (2007) reported that the effect of emphatics
on following vowels is clear in all Arabic dialects, and that the F2 lowering in short
vowels appears to be stronger than in long vowels. Moreover, compared to high vowels
such as /K, W/ the F2 lowering effect is stronger for low vowels, such that it results in a
different vowel quality for /C/: in an emphatic environment, the following vowel /C/ is
heard as /Q/. 5
Arguably, Sudanese Arabic emphatics will affect the learning of English vowels. The
learners may transfer the emphatic feature learning English /V/ in words such as talk,
taught, (lap)top, tall, tough, etc., pronouncing it as emphatic [V]. The reason for this is that
in the learners’ L1, most CVC syllables with back vowels /Q, CÖ, QÖ, W/ begin with /V/.
On the strength of this assumption, I expect that word categories such as the above will
have different F2 and F1 and thereby different vowel qualities than similar words not
beginning with /V/. I will highlight some more phonological contrasts in the next
section.
5
Although the literature mentions effects of the plain~emphatic contrast on following vowels
only, it seems to me that the effect is more or less symmetrical and should also affect the F1 and
F2 of preceding vowels. It is precisely for this reason that the contrast can also be perceived in
pre-pausal position, i.e., when no vowel follows the consonant.
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 23
Figures 2.5-6 provide a description of the articulatory systems of English and Arabic.
They provide information on the distribution, place and manner of articulation of
speech sounds in each language.
Figure 2.5 presents the consonants of Sudanese Arabic (SA). It provides background
about the number and characteristics of sounds and how they differ from Standard
Arabic, whilst Figure 2.6 illustrates the English consonants in terms of number and
distribution. These charts allow the readers to compare the phonetic and phonological
distribution of the speech sounds in English and Sudanese Arabic
Place of articulation
labio-dental
Manner of
apicodental
Pharyngeal
post-dorso
post-dorso
pre-palatal
post-velar
articulation
alveolar
Glottal
dorso-
apico-
labial
D V F V F L E velar
M I !
Stop
Fricatives H U \ U \ U I Z Í J
Nasal O P P
Liquids T T
N N
Glide Y [ Y
Figure 2.5. Phonetic representation of the Sudanese Arab phonemic system (after Dickens 2007).
24 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Place of Articulation
Manner of
Bilabial Labio- dental Alveolar Alveo- Palatal Velar Glottal
articulation
dental palatal
Stop R D V F M I !
Fricative H X 6 & U \ 5 < J
Affricate V5 F<
Nasal O P 0
Literal
approximant N
Retroflex
approximant T
Glide Y L
Figure 2.6 English consonant sounds (after Roach, Hartman and Setter 2006).
This section provides linguistic information about the similarities and differences that
exist between the English and Arabic sound systems. Phonemes that exist in the
English consonant inventory may or may not exist in the Arabic inventory. This
information enables the researcher to make predictions of learning problems.
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 25
Table 2.2. Some predictions about the learning problems of English consonants. It provides
accounts for the sort of errors assumed to be made by Sudanese EFL learners.
English and Arabic have different rules of syllable and word construction. English
syllable structure is flexible. It permits a wide range of syllable patterns such as CV,
CVC, VC, CVCC, CCV, CCVC, CCVCC, and so on. The syllable structure of Modern
Standard Arabic is considerably more restricted. It allows: (i) a light or open syllable
which includes CV and CVV, in words such as /OCÖ/ ‘not’, /HKÖ/ ‘in’ and (ii) a closed
syllable CVC as in /O+P/ ‘from’ and (iii) super-heavy syllables which include the
following types: CVC1C1 and CVVC1C1 (with geminate coda clusters), CVC1C2 and
CVVC1C2 (with non-geminate coda clusters) as well as CVCCV and CVVCV (Mitchell
2004). Importantly, the CV and CVC syllable patterns frequently occur in Arabic
prepositions, whilst the super-heavy types prevail in nouns, verbs and derivations from
these lexical categories. The Arabic CVCC syllable type is only allowed after a pause
(which is orthographically indicated by diacritic mark called ‘sukun’). Sudanese Arabic
and MSA have similar syllable patterns. However, the final consonant cluster in the
CVCC type is frequently split in Sudanese Arabic by vowel epenthesis.
26 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
The VC syllable type is a common feature of English word structure where many words
start with a vowel such as in echo, inch, ebb. As explained above, no syllable/word in
Arabic begins with a vowel. Also, English permits consonant clusters in the onset of
syllables while Arabic does not (see details next section). Clearly, then, the constraints
on the syllable structure differ substantially between English and Arabic. The study of
these differences may provide information that is needed to understand the problems
of Sudanese EFL learners of English consonant clusters.
Consonant clusters are a feature of many of the languages of the world. In the 486
language sample in the World Atlas of Linguistic Structures no fewer than 425 (87%) have
clusters (Comrie, Dryer, Haspelmath and Gil 2005: feature/map 12). McLeod, Doorn
and Reed (2001) and Ramsaran (1999) state that in their sample of 104 languages that
have clusters, 39 percent have word-initial clusters, only 13 percent have final clusters,
while the remaining 48 percent have both. In English, only one third of the
monosyllabic words begin with consonant clusters, whereas the predominance of
clusters is found in word-final position. This dominance is explained by the phonemes
/U, \, V, F/ that can be appended in suffixes. When such morph-phonemes are discarded,
the incidence of consonant clusters declines to only 19%. Consonant clusters are
sequences of two or three consonants that come together in a word, without being
separated by a vowel. In English, the groups /URN/ and /VU/ are consonant clusters in
the word splits. Some linguists argue that the term can properly be applied only to those
consonant clusters that occur within one syllable. Others contend that consonant
clusters are more usefully defined when they may occur across syllable boundaries. The
longest consonant clusters in the word extra, given the conservative definition, would
be /MUV/ and /UVT/, while the latter, more liberal view allows /MUVT/. In English, the
longest possible initial cluster is CCC, as in split, whilst the longest possible final cluster
is CCCC, as in twelfths, but in practice the probability of finding final clusters longer
than three is extremely small (Ramsaran 1999).
As explained above, Modern Arabic dialects have simple syllable shapes such as CV,
CVC and CVV. The occurrence of syllables with clusters such as CVCC is largely
restricted to Modern Standard Arabic. On the dialectal level, consonant clusters are rare
in both initial and coda positions. Therefore, many Arabic dialects apply syllable repair
strategies, which are largely controlled by the sonority properties of the individual
consonants. For instance, Sudanese Arabic (SA) adopts a strategy by which a CVCC
cluster is broken up by the insertion of /K/ or /C/. Thus, /JKON/ ‘a load’ becomes
/JKOKN/ and /MCND/ ‘a dog’ becomes /MCNKD/. This syllabification process is very common
in both geminate and non-geminate clusters (Broselow 1992, Raimy 1997).
Another repair strategy requires some syllables to begin with a vowel. When the
passive-marking prefix /P/ is added to the (active) verb katl, ‘he killed’, resyllabification
and vowel epenthesis is required, as in as in-katal ‘he was killed’. This means that SA
obeys syllable constraints that require repair strategies when an underlying form cannot
be syllabified to obtain sufficient ‘syllabic harmony’ (Kenstowicz 1994, Raimy 1997).
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 27
Finally, it is possible to argue that such vowel epenthesis strategies can affect the
pronunciation of English consonant clusters by Sudanese EFL learners.
The structure of consonant clusters is often highly complicated. It seems safe to say
that there are no two languages in the world with the same inventory of clusters.
English clusters are tightly related to the syllabic system of the words where the syllable
is always composed of a vowel sound plus 0, 1, 2, or 3 onset and/or coda consonants
that form the consonant clusters. Moreover, the English clusters are not formed in an
arbitrary way, although there is not a clear rule for their formation. Be this as may,
some researchers have provided rules depending on their experience and empirical
work. For example, some clusters are sequenced as (i) /U/ + /R, V, M, H, O, P, Y, N, L/ (in
this case /U/ is pre-initial) or (ii) pre-initial plus initial plus post-initial, e.g. /URT, UMT/.
In the phonological sequential constraints of English, a word can start with only certain
segments. For example, if a word begins with three consonants, then the sequential
constraint must be /U + {RL RN,RT, VL, VT, ML, MT, MY}/; any other word-initial combination
of three consonants is unacceptable even if /U/ precedes a perfectly legal two-member
cluster, e.g. /*UVY, *UMN, *UHL/ (Hyman 1975). 6
6/L/ in an English onset cluster may only occur before /WÖ/ as in spurious, stew, skewer, furious.
/*UMN/ occurs in sclerosis but is ruled out since this is a (Greek) loan word.
28 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
This section provides linguistic information about similarities and differences that exist
between consonant sounds in English and Arabic. It provides patterns of the clusters
which exist in English but which may or may not exist in Arabic. This information
helps to predict learning problems.
All English initial and coda clusters may represent learning problems for Sudanese EFL
learners as these types of sequenced consonants are absent from the Arabic inventory
(see literature – next section). Some of the more obvious learning problems involving
consonant clusters are exemplified in Table 2.3.
Table 2.3 Predictions of learning problems of English consonant clusters. It provides accounts
for the sort of errors assumed to be made by Sudanese EFL learners.
This section applies the markedness principle to other phonological aspects (e.g. in L1
and L2) to locate the contrast properties. In the syllable structure of a language, the
purpose is to seek how a speaker who comes from an L1 with e.g. a syllable structure
that only permits CV, will have difficulty to adapt to an L2 syllable structure with more
complex syllable types such as CCV, CVC, CCCVC, and so on. Tables 2.4, 2.5 and 2.6
show examples of phonological features that are marked in English, Arabic, or in both.
Table 2.4. Linguistic facts on which the proposed marked hypothesis bases its predictions. This
table shows the degree of marked and unmarked vowels in English and Arabic. Dialectical
variation is considered when some features exist in formal Arabic but not across dialects.
1. Vowels
Description Languages Frequency
A language maintains a English Most frequent
complex vowel system
A language maintains a English and Arabic Frequent in both
short/long vowel contrast languages
A language maintains more English (Arabic has only two diph- Frequent in
diphthongal categories thongs /G+, C7/. In Sudanese these English. Rare in
diphthongs are rendered to /G, Q/). Arabic
30 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 2.5. Linguistic facts on which the proposed marked hypothesis bases its predictions. This
table shows the degree of marked and unmarked consonants in English and Arabic. Dialectical
variation is considered when some features exist in formal Arabic but not across dialects.
2. Consonants
Description Languages Frequency
A language maintains English Frequent in
aspiration voiceless stops /R, V, M/ are aspirated when English
they are syllable initial, in words such as
pot, cat, car but unaspirated after /U/ in
words like spew , stew , skip.
Arabic
/V, M/ are aspirated when they appear in
the beginning of a stressed syllable and are
released in word final position.
A language with more English and Arabic Frequent in
fricative sounds both languages
A language maintains English and Arabic Frequent in
voicing in initial, medial both languages
and coda positions
A language maintains English
an allophonic feature Aspirated [R*, V*, M*] allophones of /R, V,M/
Table 2.6. Linguistic facts on which the proposed markedness hypothesis bases its predictions.
This table shows the degree of marked and unmarked cluster consonants in English and Arabic.
Dialectical variation is considered when some features exist in formal Arabic but not across
dialects.
3. Consonant clusters
Description Languages Frequency
A language maintains English and Arabic Frequent in both languages. CVCC
CV and CVC, CVCC frequent in Standard Arabic but
syllable structure not in Sudanese dialects.
A language maintains a English Frequent in English only
complex syllable
structure CVCC,
CCVC, CCCVC, or VC
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 31
2.1.6 Conclusion
This part provides a short outline, which is suitable as a bird’s eye view of this section.
It reviews literature about the impediments to speech intelligibility, a problem that is
argued to be experienced by Sudanese university students specializing in English. It
describes the contribution of previous studies to this topic accounting for the effect of
both the learner’s L1 transfer and the lack of phonological awareness of English in the
occurrence of intelligibility problems. The section also talks about the methods and
tests they used discussing their adequacy. There is much more concern with segmental
analysis of the vowels, single and cluster consonants of English, which form the basic
sound knowledge of speech. Therefore, all related literature that deals with speech
intelligibility, speech perception, pronunciation problems of vowels, single and cluster
consonants of English, will be the subject of the survey. Moreover, previous research
on speech problems of Arabic-speaking students of English forms a primary source of
information to account for the perception and production problems in English among
Sudanese learners. However, other ESL/EFL literature is also useful as a second source,
which views the topic in a broader sense, accounting for the speech problems which are
faced by non-native speakers of English from different linguistic backgrounds.
Speech refers that expressive utterance used by human beings to communicate their
ideas. Speech is transmitted by a set of sounds that are produced by varying the
strictures along the path of the air flowing from the lungs. Such sounds formed by the
human voice have an effective role and permits it to bear a message by variations in
timbre (Lafon 1966). There is a difference between speech and language. As voice,
speech has characteristics that may imply certain messages. For example, it is possible
to identify a person by his/her voice as either sharp, low or loud, etc., whilst other
elements, such as the sequences of phonemes that are used to differentiate between the
words in the lexicon, are qualities of language. Thus, a language is a system of con-
ventional signals used for communication in a society. Such a pattern of conventions
consists of distinctive sound units such as phonemes, a vocabulary system and the
association of meaning with words. When a person performs a speech task in inter-
actions, all these linguistic elements are involved in the achievement of verbal com-
munication.
Therefore, segments are subject to more phonetic and acoustic features, e.g., the
duration of the vowel segments in words such as mitt/meat can differ from that of bid/
bead for a number of elements. Thus, as major sound units of spoken words, segments
have some articulatory features in common with each other but undergo variations due
to environmental differences, e.g., the preceding and the following vowels (Lass 1996
and Laver 2002). Therefore, the relationship between segments and phonemes can be
interpreted as a matter of realization where segments are most commonly represented
in different phonemic environments. Several adjacent sounds in connected speech may
carry information on the same phoneme, and there is an overlapping in so far as one
and the same sound segment carries information on several adjacent segments (Fant
1973, Gilbers 1992).
2.2.2 Accent
Furthermore, there are two types of accent. First, a foreign accent refers to speech
produced by non-native speakers of a language in which these speakers involve their L1
perceptual and productive phonemic strategies in the learning of L2. For example, if a
person has difficulty pronouncing some of the sounds of a second language he is
learning, he may substitute similar sounds that occur in his L1. The speech sound such
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 33
a learner produces sounds wrong or ‘foreign’ to native speakers of the target language.
The other kind of accent is simply the way a group of people speak their native
language. This is determined by who they are, where they live and to what social groups
they belong. People who live in close contact grow to share an accent, which will differ
from the way other groups in other places speak. For example, someone who lives in
the United States will have a native accent that is different from that of a British
English speaker.
The origin of the term RP (‘received’ pronunciation) has been subject to controversy,
but A.J. Ilis’ On early English Pronunciation and John Walkers’ Critical Pronouncing Dictionary
and Exploiter of English are among the sources that contributed to its appearance.
However, received means ‘generally accepted by the best society’. Received
Pronunciation is used by the educated class, in formal affairs, and it used to be the
language variety deemed suited for radio and television. In this way, RP is not a regional
accent but is recognizable as being the standard or neutral accent (Cruttenden 2008,
Roach 2004). It is a form of English now used in Britain which dominates the areas
around London and the two historic towns of Oxford and Cambridge. In the past RP is
an English pronunciation that is best represented in the BBC, courts, films, theatre,
television programmes, etc. Bernard Shaw’s plays Arms and the man and Pygmalion
represent a real reflection of the RP accent, forming in this way a linguistic reference
for language scholars who seek evidence for the inextricable link between accents and
social class. However, the RP accent is now known and accepted on radio and
television. It is also described in books and phonetics and is taught to L2 learners of
English.
2.2.2.2 Feasibility of RP
Received Pronunciation (RP) has recently become the target of a great deal of criticism
as elitist and limited to certain speech communities, but the reality of language use
falsifies such criticisms. As some studies currently show, RP is placed higher on a scale
of perceived attractiveness compared with other varieties as an accent that is widely
preferred and commonly used by the speech community in formal situations. Previous
studies refer to RP sounds as mutual and more intelligible. In a related activity where
participants listened to accent samples in order to judge which one is more preferred as
a suitable model, responses to both the RP and South East accent were 100% positive,
while other accents such as Devon, Belfast, Shield and Pontypridd were rejected. The
latter strain the listeners, sound elliptic, cause comprehension problems to listeners and
form a gross deviation from the standard sounds encountered in normal listening. RP,
on the other hand, is an accent that forms an understandable and usable model for
non-native speakers in everyday communication (Ramsaran 1999, Trudgill and Hannam
2005). The use of RP enables speakers to overcome comprehension difficulties that
they may otherwise encounter when involved in speech in which regional accents form
a language reality; this fact indicates that, linguistically, RP is a genuinely regionless
model that is known and easily understood all over England and elsewhere (Collins and
34 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Mees 1981). In the communities where English is used as a second language, such as
India, Nigeria and so forth, the RP accent is no longer the major model. Such
communities have developed their own local English accents, which fit non-native
environments. Although this idea sounds practical, it is often neither safe nor fair,
simply because the use of RP can at least be desirable in establishing certain minimum
standards for the achievement of mutual intelligibility. Its phonemic system is capable
of conveying a message efficiently from a native English listener’s standpoint, given
that the listener has time to ‘tune in’ to the speaker’s pronunciation in a given context.
It retains the accentual characteristics of English while it is possible to reduce the
segmental inventory of English and retain a good level of intelligibility.
For instance, a speaker can reduce the vowel system to a central pair /Ö, /. This
change makes it difficult to understand the message. Distribution of post-vocalic /T/ in
words such as farm, heard, bird, etc., is observed in some English dialects; however, RP
does not permit this phenomenon. RP includes accentual features that represent crucial
elements to natural forms of English and exhibit a considerable homogeneity in their
consonant systems. Thus, in turn it prevents any further simplifications keeping these
features shared with the natural system (Cruttenden 2008). The vowel system (see
Figure 2.1 below) has no regional variation; it has a variation of other types, though. In
particular, there is a variation between conservative and advanced RP. This largely
reflects the linguistic change that has occurred in RP with advanced pronunciations
typical of younger speakers. For instance, the RP vowel system no longer shows a
distinction between /n/ as in sore and /nÖ/ as in saw, etc. There is also a wide-spread
loss of /7/ and its merger with /nÖ/. Thus, some words such as sure are pronounced as
/5nÖ/ like shore. In the majority of accents now the phoneme /WÖ/ is commonly used in
words like suit, resume and enthusiasm, etc. In RP (and in Popular London) both /WÖ/ and
/LWÖ/ are heard in words like hue, due, Tuesday, etc. However, the tendency to omit /L/ is
stronger among younger speakers (Cruttenden 2008). The phoneme /WÖ/ is retained in
words like Susan and super (Trudgill and Hannah 2002). No changes to monophthongs
are classed as almost complete, but the loss of schwa in the diphthong /G/ results in
the monophthong /G/ in words like share, pear, though some older speakers use the
diphthongal pronunciation (Cruttenden 2008). In RP, words like pip and peep have
different length. If you speak the two words, you will probably find that tense peep is
longer than lax pip. The long (tense) vowels are indicated by ‘Ö’, so the long counterpart
of RP /+/ is /KÖ/ and so on, see Figure 2.1 above.
It is well known that speakers substitute speech sounds from their L1 for those of their
L2 in the attempt to communicate, which results in producing accented speech.
Normally this occurs due to the absence of one or more sound features from the
speaker’s L2. It also occurs when L2 knowledge is lacking, which makes speakers resort
to a repair strategy to adapt to new features. Arabic speakers of English often make
production errors that indicate an Arabic accent. For example, they apply vowel
epenthesis before (‘prothesis’) or inside (‘anaptyxis’) English consonant clusters as a
repair strategy. Thus, words like special, speak, are pronounced as /+URG5N/and /+URKÖM/ by
Arabic learners of English (Patil 2006). This is quite similar to what has been reported
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 35
for Spanish learners of English (Lado 1957, Hyman 1975) as well as Brazilian learners
of English (Bond 2001). Examples of anaptyxis are found in fly and drain, which are
pronounced as /H+NC+/ and /F+TG+P/ by Arab learners of English, e.g., Sudanese and
Egyptians. Moreover, Korean speakers of English insert a vowel more often in stop+C
clusters rather than in strident+C or sonorant+C clusters. Interestingly, it has been
found that stop+C clusters reveal an asymmetry between voiced and voiceless stops.
For instance, Korean speakers insert a vowel more frequently in voiced stop+C clusters
than in voiceless ones. The same pattern was observed in Mandarin Chinese and
Cantonese speakers’ production of English consonant clusters where various clusters
they produced were illegal in English (Kwon 2005).
Errors like these help make predictions about how speakers/listeners of one language
will reproduce speech sounds of another language. They also reflect the psychological
reality of phonological descriptions. In some cases, native listeners may find difficulty in
understanding the English spoken in a different manner from their own (foreign
accent). Accent modification training would help obliterate the problem, or at least to
develop a new accent that would improve communication ability. Not all sound
substitutions and omissions are speech errors. Instead, they may be related to a feature
of a dialect or accent. For example, speakers of African American Vernacular English
(AAVE) may use /F/ for /&/, e.g. /F+U/ for /&+U/ this. This is not a speech sound dis-
order, but rather one of the phonological features of AAVE.
Linguistically, speech perception and production mutually support each other: i.e., the
occurrence of the first relates to occurrence of the second (Gilbert 1995). Perception
represents the power supply of speech production. Kuhl (1994) defines speech
perception as a process that involves the employment of cognitive, motor and sensory
skills to hear and understand speech. Kuhl explains that a child perceives speech by
forming mental conceptual maps of the speech it hears in its environment. Such
conceptual maps are stored in the brain, which constitute, later, the specifics of speech
perception and serve as blueprint guidelines that a child uses to produce speech.
Therefore, the process of speech perception is not immediate, but an output of long-
term operations that accumulate over time. It is a series of organized events that
involve the establishment and storage of information over time. During subsequent
stages, information is developed, transformed, reduced, elaborated, stored, recovered
and used to make different types of decisions. Speech production, on the other hand, is
a process that requires the brain to transmit a message to the speech organs, and these
in turn produce the patterns of speech sounds on demand (Cruttenden 2008, Crystal
36 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Researchers refer to the relationship between the perception and production of speech
sounds as an important issue because they bring them to an understanding of the
mental processes involved in the learning of L2 speech sounds. It also provides them
with insight into the types and the nature of speech perception and production
problems that ESL/EFL speakers face.
Fraser (2005) claims that most of the impediments to speech intelligibility are
attributable to segmental factors and that more than 50% of speech intelligibility is
accounted for on the basis of sound (rather than morphological or syntactic deviations).
Similarly, Jenkins (2000) stated that while the syntactic level plays a salient role in
comprehensibility in EFL interactions, pronunciation forms the most prominent single
element of intelligible speech. 7 Furthermore, the measurement of speech intelligibility
7
Earlier, Van Heuven (1986) reasoned that faulty syntax and morphology can only compromise a
speaker’s intelligibility if words can be recognized. After all, if the words are pronounced so
poorly that they cannot be recognized, it will not be possible to establish any order (i.e. syntax)
among them.
38 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
based on linguistic elements necessitates the use of native speakers of the target
language as a standard reference/model.
Speech intelligibility tests have an long history. With regard to the data obtained using
such tests, only few tests have proved to be effective. The Modified Rhyme Test (MRT)
is one of the worldwide-standardized measurements of segmental intelligibility of
speech. The MRT forms an extension of two earlier attempts. These are the PB and RT
tests. The PB test refers to the Phonetically Balanced word lists that were compiled at
Harvard University during the Second World War. The lists were composed of
monosyllabic quartets, which were chosen in a way that gives an approximation of the
relative frequency of phoneme occurrence in the language. Each PB list consisted of 50
monosyllabic words, which is enough to adequately approximate the relative frequency
of phoneme occurrence in English. One of the features of the PB list is that the relative
difficulty of the stimuli is constrained so that the stimuli that are always missed or
always correct are removed, leaving only those items that provided useful information.
The PB test was developed to compare phonetic discrimination and for overall
recognition accuracy. The test material targets both vowels and consonants. 8 The PB
word list provided a considerable contribution to speech intelligibility research, but
further requirements were needed to make the test more adequate and economic. These
requirements gave birth to the Rhyme Test. The RT presents the stimulus in stem form,
e.g., [-ot, -ag], etc., and the listener is required to complete or provide the missing letter,
while s/he is listening to the items spoken. However, the RT has some drawbacks,
since it focuses only on initial consonants, while non-initial ones like /0, </ are excluded.
8 Phonetically Balanced word lists (PB lists) have been used widely since the Second World War
in statistical intelligibility testing. The words in each list are presented in a new, random order
each time the list is used, where each item was spoken in the same carrier phrase. PB intelligibility
test requires more training than other statistical tests, and is particularly sensitive to variation in
signal-to-noise ratio. In other words, a relatively small change in S/N causes a large change in the
intelligibility score. Moreover, phonemically, PB presents a balanced tool for the measurement of
speech intelligibility. It has stimuli lists which are composed of monosyllabic CVC words that
have been selected in such a way that the lists reflect the statistical distribution of the phonemes
in that dialect. Because of the limited size of typical PB word lists, repetition of the list is very
likely to lead to the listener learning of the list. This problem can be overcome by only presenting
the list once, or by training the subjects first so that the effects of learning have leveled out
before the actual tests. Once the list is learned, the PB word list is equivalent to a limited
response set, i.e. effectively a multiple-choice test. (Hudgins et al. 1947).
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 39
Such weaknesses of the RT have given rise to the Modified Rhyme Test (MRT). The
MRT forms the most accurate and reliable measure of intelligibility (Logan, Greene and
Pisoni 1989). Speech intelligibility measures involve word identification tasks in a closed
set of six items. The test has a list of 300 words, consisting of a representative sample
of stimulus words arranged into rows. Each row encompasses six words with the same
rhyme. The rhyme functions give an economic value, which serves to reduce the
speaker’s vocal effort. Furthermore, the methods and materials of the MRT require that
both the speaker and the listener be trained. That is, the administrator instructs the
speaker to read the words using a carrier phrase, while s/he takes notes regarding
feedback about the loudness, clarity and the rate of the speakers’ performance
throughout the reading of the list. These notes will be used in the stage of analysis.
Listeners, on the other hand, can have the chance to hear the words and then they start
responding. The time limits of the test are measured from the time when the button is
pushed until the end of the words presentation. The score is the number of items
correctly responded to. Test items normally target single and multi-phonemes or words;
these refer to vowels, single and cluster consonants of English. The formal assessments
interpret the responses as either intelligible or unintelligible; put in figures, a score of
(close to) 100% is interpreted as completely intelligible performance (Lafon 1966).
Many approaches that have been designed for the measurement of intelligibility, do not
give an adequate account and have many drawbacks. An example of this is the use of
comprehension questions (Anderson and Koehler 1988) and picture selection in
response to a stimulus (Smith and Bisazza 1982), etc. Yet, they offer something
valuable and their drawbacks motivate questions. Consider, for instance, the
comprehension question test that draws conclusions about the listener’s efficient
comprehension from the scores provided; the assessment will be a reasonable one since
listeners respond correctly. However, a comprehension test of this type cannot account
for how well listeners’ responses correlate with speakers’ intentions. Speech
intelligibility is a complex phenomenon, which is influenced by several variables. Firstly,
not all audible speech is necessarily a condition for good speech intelligibility, just as
adding more light to a blurred text does not make it more legible. Similarly, the addition
of more sound intensity to speech that is surrounded by reverberation, echoes or
distortion does not make it more intelligible. In standardized speech intelligibility,
testing the talker-to-listener transmission path is measured with three assumptions. It is
assumed that the talker should speak without accent or speech impediments; the speech
has to be in a normal form with normal emphasis of words and the listener has to
possess normal hearing abilities. Moreover, in both cases the actual performance will
vary, especially if the assumptions made about the talker and listener cannot be met in
practice. Secondly, the transmission of the voice signal also affects speech intelligibility
from talker to listener. Such factors can spoil the integrity of a voice signal while it is on
its way to the listener. A poor signal-to-noise ratio, for example, masks the voice signal.
Reverberation, i.e., echoes in rooms, is a special kind of noise that causes smearing or
blurring of the sounds and makes the speech less audible and difficult to understand.
40 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Fant (1973), states that an intelligibility test has to account for the phonemic distances
which exist between speech sounds. An intelligibility test also regards the specific
conditions under which a test has taken place. Thus, it enables the experimenter to have
precise feedback. In this issue, the rhyme test has an advantage of minimizing the rate
of contextual confusion. Many advantages make the MRT practical. A confusion matrix
of phonemes can be calculated from the scores of the tests. That is, the actual rate of
intelligibility is simply the number of words correctly responded to. Naive listeners can
participate more than once without being exposed to any training, which is a very
distinctive feature compared with other types of tests. Reliable results can be obtained
even with a small number of subjects, which usually ranges from 10 to 20. More
importantly, the results of recent research have proven the MRT to be an excellent
measure of segmental intelligibility of natural speech.
Several studies have expanded upon the paradigm of the MRT word list. Importantly,
some researchers often eliminate the number of choice response sets of the test from
six to four items, which has two advantages: it will help the listeners avoid becoming
confused by a large number of choice items; hence, they will make a smaller number of
perception errors (Wang 2007).
The SPIN (Speech Perception in Noise) test is a speech perception test that is based on
simple and predictable English sentences: e.g., the test uses two types of sentences; high
and low probability sentences. The words at the end of high probability sentences are
predictable from the body of the sentence, e.g., spread some butter on your bread. On the
other hand, the words at the end of low probability sentences cannot be predicted from
sentences: Mary could discuss the tack. The function of the SPIN test is the assessment of
listeners’ ability to understand everyday speech by combining bottom-up and top-down
information. Normally, words are more intelligible in sentence context than in isolation,
as many studies have revealed. The sentence context decreases the probability of errors
by the listeners (Kalikow, Stevens and Elliot 1977, Miller 1981). This is because
sentences impose constraints on the set of alternative words, which will increase
intelligibility. Measurement is based on a recognition task of twenty-five words
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 41
embedded in meaningful and highly predictable sentences, as in she wore her broken arm in
a sling (target word underlined). Listeners only write down the final word that they think
they heard in each sentence. This part of the SPIN test has proved to be efficient at
assessing speech recognition abilities (Rhebergen and Versfeld 2005). Although
listeners’ performance is primarily quantified in terms of numbers of whole words
correctly recognized, partially correct answers are also important since they give
information about the perception of phonemes in onset, nucleus and coda position.
The term ‘confusion matrix’ refers to a visualization tool typically used in supervised
learning. Each column of the matrix displays the instances in a predicted class, while
each row represents the instances in an actual class. The value of confusion matrices is
that it is easy to judge if the system is confusing two classes; i.e. commonly mislabeling
one as another. Later, the raw data will be analyzed in terms of phonetic classes of
perceptual or phonological features to give values about what is confused and what is
not. For instance, one can examine consonant confusion across manners of articulation
or analyze the data in terms of voicing. Benki (2003) and Nielsen (2004) analyzed
confusion data for voicing, place of articulation and manner of articulation in syllables
and phonemes. 9 Bosman (1989) states that the interpretation of consonant confusion is
usually based on the features shared by the confused phonemes. Phonemes that have a
feature in common are more susceptible to confusion than phonemes that differ with
respect to this feature (all else being equal). Vowel perception is largely determined by
the first two vowel formants, F1 and F2; the vowel space is determined by the position
of the tongue-hump [front~back] and the degree of constriction [close/high~
open/low]. Typically, listeners tend to confuse vowels most frequently that are adjacent
in the F1-by-F2 vowel space.
Several factors lead to confusability of segments in L1 with those of L2. First, and
foremost, incorrect perception in the L2 is caused by the degree of similarity between
the L2 sound and the nearest sound category in the listener’s native language. But there
are other factors to be considered as well. Environmental factors such as lighting, angle
of viewing distance between the speaker and the listener clearly affects the quality of
the optical information provided and the lip-reading abilities of the listener to use this
information. A third factor is the interaction of different linguistic levels such as
semantic, syntactic, lexical and phonological constraints as potential sources of
disambiguation of a spoken utterance. Such factors facilitate the performance of the
listeners/speakers under consideration, acting as a combination for maximum benefit
(Lachs 1999). Many researchers present stimulus words embedded in test items as part
of semantically and syntactically meaningful sentences and compare the listener’s
performance on the same (or similar) words presented in isolation or in meaningless
contexts, as a way of determining the listener’s ability to use contextual information in
the speech recognition process (see e.g. Nielsen 2004, Wang 2007).
9
The classical reference on confusion studies is Miller and Nicely (1955) in their ground-breaking
work on the role of distinctive features in the perception of consonants.
42 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Sudanese EFL learners are expected to make different types of English vowel
production errors, e.g., in words such as bait, and, ask, let, fate, make, lace, poor, peat, put, pot,
putt, bit, fear, bet, stay, etc. Mohammed (1991) described pronunciation errors made by
Sudanese EFL learners as the result of inter-linguistic transfer and ineffective teaching.
Al-Alrishi (1992) and Bobda (2000) found that the English NURSE vowel /«Ö/ is
rendered in Sudan as /¡/, or /n/ if /«Ö/ is represented orthographically as <or> in
words like work, worth, word, etc. Here the absence of /n/ in the Sudanese-Arabic vowel
inventory and the misleading spelling of English conspire to produce the incorrect
vowel substitution pattern observed. In related L2 production of English vowels,
similar errors were reported in several studies of Arabic-speaking groups. Brett (2004)
found that Arabic speakers of English face serious difficulties in distinguishing between
English vowels such as /n/, /nÖ/, /7/ as in cot, caught, and coat, all of which are often
pronounced as /nÖ/ or undergo substitutions. Altaha (1995) also reported that Arabic
learners of English produce the English front vowel /G/ as /+/ so that words such as set
and sit are both pronounced as /U+V/.
More importantly, English vowel production problems are detected even among ESL
learners who come from language backgrounds linguistically related to English.
German learners of English have difficulties differentiating between /3/ and /G/ in bat
vs. bet, on the one hand and between /¡/ and /n/ as in duck and dock on the other
(Steinlen 2002). Some errors due to orthographical influence involving the production
of the English /G/ (in words like red, bed, dead) were detected among Italian speakers of
English, where /G/ was pronounced as /G+/ (Piske et al. 2002).
The literature has revealed that English vowel production is also influenced by
differences in temporal cues. In English, incorrect vowel duration compromises
intelligibility (Jenkins 2000, Walker 2001). In the production of the English vowels,
Arab learners of English showed an exaggeration of duration differences between short
(lax) and long (tense) vowels. Specifically, Arabic ESL speakers produced the English
/K~+,G+~G, W~7/ tense and lax vowel pairs with duration ratios of 2.6:1, 2.6:1 and 2.5:1,
respectively. In contrast to this, native English control speakers produced lower
duration ratios of only 2.2:1 in all three vowel pairs. Moreover, the Arab groups
produced the native-like ordering of vowel duration for front vowels, but the order
among the back vowels differed due to transfer of L1 (Mitleb 1981, Munro 1993). That
is, the learners used their L1 productive strategies to produce English vowels. It is
possible to conclude that L2 learners of English need to be aware that the English
short vowels are not as short as those of their L1 (and that the long vowels as long as
those of Arabic). Linguistic theories describe ESL/EFL learners’ incorrect pro-
nunciation as the result of neurological development that occurs in the brain due to a
process of normal maturation at puberty. After this period the speech production and
perception systems become are specialized for the processing of only L1 sounds. The
specific native-language prototypes interfere with the L1 learner’s perception of some
L2 contrasts by acting as a perceptual magnet, which pulls L2 vowels towards the L1
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 43
prototypes. Thus, L2 vowel sounds which are located near an L1 vowel prototypes are
discriminated less readily than vowels that are not located near L1 prototypes. It has
been assumed that the phonetic ‘prototype’ for each sound category exists in memory
and plays a unique role in speech perception and production (Iverson and Kuhl 1995).
However, Flege (1976) found that the incorrect conceptual representations of English
sounds adopted by such learners are strongly responsible for speech production
problems. That is, in Flege’s Speech Learning Model (SLM), it has been hypothesized
that without accurate perceptual targets to guide sensorimotor learning of sounds,
production of the L2 sounds will be inaccurate. This is because learners of the L2 may
fail to perceive L2 sounds which are affected by the L1 (Flege 1995). The lack of
knowledge of the English vowels was also reported to contribute to English
pronunciation problems. Research results of some Sudanese secondary school learners
of English recently showed that phonological awareness is urgently needed for
intelligible speech. The results revealed that the subject group exposed to
pronunciation knowledge achieved better results than those who received no training
(Al Dawla 2005, Mohammed 1991). Similar problems with the production of the
English speech sounds are widely spread among Arabic speaking learners of English.
Similar problems manifest themselves in the perception of the English vowels when
Sudanese EFL learners are exposed to English. The learners have problems
discriminating between /G/ and /G+/ in words like let, shade, make, rate, etc. Moreover,
the English tense and lax vowels /+, KÖ, 7, WÖ/ are frequently substituted in words such as
beat/bit, sit/seat. Listeners also fail to deal with vowels such as pot, put, pert, cut, etc.
Very little has been written about the English vowel perception problems that Sudanese
university EFL learners face. However, in related studies, Huthaily (2003) reports that
Arabic native speakers misperceive /+, G, n, 3, nÖ/ due to the unfamiliarity of Arab
speakers with such a large number of vowels as those of English. Brett (2004) reports
that perception problems of English vowels experienced by Arabic EFL learners
probably occur due to the fact that their L1 (Arabic) lacks central vowels.
Although Sudanese Arabic has many consonants that resemble those of English,
Sudanese EFL learners have difficulty understanding and pronouncing some English
consonants. In this sense, Sudanese EFL learners arguably fail to discriminate between
English fricatives such as /6, U, &, \/ in words like thin/sin and then, there/zero, zeal, etc.
The voiced labiodental /X/ is often substituted for /H/ or /D/ as in words like very/berry
and volleyball/bolleyball. Previous studies of Arabic speakers learning English manifest
similar pronunciation problems. The English consonants /R, X, U, 6, &, \, F<, 0/ are
reported to be difficult to produce for Arabic speakers (Al-Arishi 1992, Altaha 1995, do
Val Barros 2003, Jesry 2005, Ruhaif 2007). Moreover, do Val Barros (2003) states that
such types of pronunciation difficulties occur after puberty and are caused by the
interference of productive strategies of the mother tongue. Do Val Barros (2003)
explains that the English /0/ represents the highest percentage of pronunciation errors
made by such subjects. This is most probably because sound pairs such as /P~0/,
44 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
/D~R/, /X~H/ are allophones of one phoneme in the Arabic language, whilst they
present separate phonemes in English. In attempting to account for these types of
problems, Rababah (2003) states that pronunciation errors of Arab EFL students are
attributable to deficiencies in linguistic competence and to the differences that exist
between the English and Arabic pronunciation systems, resulting in communication
breakdown. Similarly, Patil (2000) explains that divergences like consonant devoicing
(mug pronounced as muck) cause communication to break down because they damage
essential phonological features, which play a significant role in intelligibility. The
replacement of English voiceless /R/ by voiced /D/ is common, which is attributable to
interference of the speaker’s L1. American listeners have difficulty recognizing English
stops /R, D/ which are produced by Saudi speakers as /D/, due to VOT differences
between the L1 and L2 inventories. In other previous studies, the vowel context effect
of VOT is assumed to derive from aerodynamic properties of the human speech
production mechanism. This effect is expected to manifest itself also in Arabic where
the VOT in /NV/, /MN/ was longer by about 10 ms, e.g. in Lebanese Arabic. The study
reported a small mean difference (52 ms) for /NV/, /MN/ in Arabic against (51 ms) in
English which is likely to be due to underestimation of the real VOT difference
between Arabic and English. This is because of the difference in vowel context and
because the subjects used to estimate the Arabic phonetic norm were speakers of
English L2 (and may therefore have produced Arabic stops that resembled English
stops in terms of VOT). Opportunely, neither the confounding factor of vowel context
and nor the subjects’ L2 experience weaken the assumption that voiceless stops in
Arabic and English differ in terms of VOT, a process which requires native speakers of
Arabic to produce voiceless stops with longer VOT values in English than in Arabic.
However, the confounding of vowel context does undermine the validity of the finding
that Arabic subjects shortened VOT when switching from Arabic to English. Most or
all of the observed ‘shortening’ of VOT, which averaged about 14 ms, was likely due to
the difference in vowel context in the Arabic and English speech material (Flege 1976).
This type of interference may occur on the level of phonetic implementation of a
certain phonemic feature, i.e. similar phonemes in different languages may have
different implementations, which cannot be easily grasped by EFL/ESL learners (Flege
and Port 1981, Rasmussen 2007). This phenomenon may support the assumption that
similarities of sound structure between two languages facilitate the learning of an L2.
However, other studies have proven the opposite in a learning situation where both L1
and L2 contain similar phones. That is, the learning of these sounds turns out to be
more difficult than learning new contrasting phonemes that are completely absent in
the L1. In other words, it is more difficult to acquire a sound in the target language
which is relatively similar to the native language than one which is substantially
different. Although no coherent explanation for this phenomenon has been
forthcoming, there is substantial literature documenting that similarities between the
native language and the target language can cause problems in L2 acquisition (Flege and
Port 1981, Eckman, Elreyes and Iverson 2003).
On the other hand, Sudanese EFL learners also have problems in understanding
English speech sounds. To my knowledge, very few reports have been provided about
these learners; however, arguably, there are interchangeable substitutions of the English
consonants /U/ for /6/, e.g. in words such as sick/thick and sink/think and /&/ for /\/
in words like then/zen. The recognition of English consonants such as /V5, F<, H, X/ also
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 45
prove to be difficult. Literature shows that Arabic EFL learners experience similar
perception problems with English consonants (Rasmussen 2007). The English
approximants /T, N, Y/ also present perception problems for the learners. The sound
/Y/ is often heard as /T, N/ as in rent/lent/went. For instance, a word like went is realized
as rent, which is probably due to similarity in the manner of articulation between these
approximants. This type of substitution error reveals a kind of linguistic development
where there is a phonological rule merging /T/ with /Y/. It reinforces the potential that
two different phonological representations are often possible for the same sound
(Hyman 1975). In terms of phonetics, the Arabic /T/ is an alveolar trill whilst the
English /T/ is a retroflex frictionless continuant, a voiced alveolar or post-alveolar
approximant, which is incorrectly produced by Arabic learners of English who treat it
as a counterpart to their mother tongue. Probably because the Arabic phoneme /T/ is
pronounced with more physiological effort than that of English, such a manner of
articulation results in an incorrect perceptual representation of the English /T/, which is
pronounced with less force (Khattab 2002).
English consonant clusters are expected to cause problems for Sudanese EFL learners.
There are arguments that the learners have problems understanding initial and coda
clusters in words like flow, clock, special, twelve, glass, string, proper, ground. Insertion of an
epenthetic vowel before or between the cluster members generally occurs. In the
literature, insertion of a vowel sound between the cluster members by Arab EFL
learners is reported in words such as cream, /MKTKÖO/, text /VGMKUV/, etc. (Patil 2006, Carlisle
2001). Similarly, the English affricate /F</ is often split by /K/, e.g., a word like bridge is
pronounced as /DTKFK</ (Rababah 2003). 10 An insertion of the English /+/ between the
members of the onset English obstruent clusters /U+ (/V, R, M, N, Y, P, O/) as such is
intended to facilitate producing cluster consonants of English. This is because clusters
such as /RN, RT, IT, UR, 6Y/, etc., or a three initial-segment cluster like /URT, UMT, UVT, URN/ are
totally absent from the Sudanese colloquial Arabic inventory (Kaye 1997). Arguably,
similar learning problems arise in the perception of English clusters, where Sudanese
EFL listeners misperceive English cluster consonants. In the previous studies, Arab
listeners of English use their L1 phonotactic constraints to identify English clusters
even when these phonotactics do not facilitate the perception of the target language.
An English cluster item like /PV/ is heard as /0M/, /RN/ as /DN/, RT/ as /RN/, /FT/ as /IT/,
and /6T/ as /VT/.
Linguists believe that the sound sequences of languages are controlled by phonotactic
constraints that are encoded in the processing system of such languages. This principle
10
The pronunciation of village as /XKNKI/ has also been reported.
46 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
gives each language its own sound sequences, describing which sounds should take up
the initial position in a syllable and which ones occupy final positions. The types of the
representations that are used in the processing system of language to encode constraints
are the subject of an important area of debate in second-language studies. In English
constraints, the sound /0/ is not permitted to appear in all positions. So, one possibility
for /0/ is to appear in a syllable final position but there it cannot be preceded by long
vowels or diphthongs (Goldrick 2004). Other phonotactic constraints on English
syllable structure are that /V5, F<, &, \/ do not cluster in onsets and /N, T, Y/ only occur
alone or as non-initial elements in onset clusters. Moreover, /TJ, L, Y/ do not occur in
final position in RP and Australian English, although /T/ can occur in final position in
rhotic dialects such as American English.
Similar sound sequences also apply to English cluster consonants, which determine the
sound sequences that can appear in a syllable and the positions in the syllable where
particular sounds can occur (onset or coda). Thus, sequencing constraints govern which
sound classes should appear adjacent to each other and they aid the identification of
word boundaries. Differences of L2 phonotactic constraints often motivate perceptual
and phonological problems among L2 speakers, as previous studies show. Seo (2003)
reported that segment positional restrictions motivate phonological alternations on
similar consonant clusters, which result in poor speech perception. An account of
speech perception of some cross-linguistic patterning provides correct predictions that
homorganic C+liquid sequences are more likely to undergo phonological change than
heterorganic C+liquid sequences in a given language. Findings of cross-language
investigations of 31 languages from different language families show that nasal+liquid,
obstruent+liquid clusters (or sonorant+sonorant and obstruent+sonorant sequences)
of homorganic sequences like /PV,NV/ and are more vulnerable to phonological change
than those of heterorganic sequences /RT, DT RN, MT/ (onsets) and /NR, TM/ (codas).
Compared with heterorganic consonants, homorganic consonants have an additional
shared acoustic property, e.g., vowel formant transitions for the same place of
articulation, assuming that they are adjacent to a vowel. Thus, the two sounds in a
homorganic C+liquid sequence can be considered as being phonetically more similar to
each other than those in a heterorganic C+liquid sequence are. Moreover, phonological
change can also occur due to the absence of contexts with appropriate phonetic cues:
e.g., velar-to-alveolar shift is interpreted as a repair strategy. According to Kawasaki
(1982) and Ohala (1992, 1993), if two sounds in a sequence are acoustically and
auditorily similar, the degree of distinctiveness of the two sounds would be diminished
and thus they would be subject to modification. Vowel epenthesis is one of repair
strategies that occur due to phonotactic differences between L1 and L2. A good
example of this phenomenon is made manifest in the performance of some English
consonant clusters of Iraqi and Egyptian speaker groups. Both dialects have syllable-
structure conditions that disallow consonant clusters in word-initial position. Yet
speakers of each dialect modify English words with initial consonant clusters in a
different manner. Egyptian speakers will pronounce /HN7/ flow as [H+N7] whereas Iraqi
speakers will pronounce it as [+HN7]. Both pronunciations can be attributed to rules of
epenthesis in the native language that bring underlying syllable structures into
conformity with surface structure restrictions. In a word such as flow, the first
consonant is extra-syllabic (unassociated with a nucleus) and a vowel must be inserted
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 47
One principle that has recently been established to treat consonant cluster sequencing
more adequately is the sonority principle theory. Phoneticians postulate various
phonetic features to characterize sonority. One feature is that the position of a segment
in a syllable is determined by its sonority. The most sonorous segments form the
peak/nucleus of the syllable, whereas the others are arranged around the syllable
nucleus according to their degree of sonority. In other words, there is a downward cline
towards the syllable margins, which starts from the peak of sonority. Thus, vowels are
the most sonorous sounds, followed in decreasing order by liquids, nasals, fricatives
and stops as in the following words: trip, drip, ripe, come and please (Clements 1990,
Gierut 1999, Gierut and Champion 2001, Ladefoged 1993). Sonority plays a prominent
role in accounting for phonotactic patterns across languages. However, this does not
mean it can account for every phonotactic matter or pattern since many constraints
have very little to do with syllable structure and lie, in this way, totally outside the
domain of sonority theory. For instance, there is a common constraint which requires
that obstruent clusters agree in voicing, and it operates not only within syllables but also
across syllable boundaries in many languages (e.g., French, Russian, Catalan) showing
its entire independence of syllabification. One constraint, which often overrides the
syllable contact principle, is the prohibition of a complex syllable onset. If a cluster is
composed of a sonorant plus obstruent or ends in one of a small set of obstruent
clusters, it is well-formed and requires no epenthetic vowel. In other cases, an
epenthetic vowel appears between its two cluster members (Clements 1990).
Explicit knowledge refers to language rules and vocabulary items that second/foreign
language learners acquire through instruction (teaching). The learners will be able to
reflect this knowledge directly in their actual use of the target language (Krashen 1985,
Ellis 1994). Thus, the concept of explicit knowledge implies two considerations
involving second/foreign language learning. Firstly, the learners’ explicit knowledge
develops due to the learning experiences in which they acquire explanations of the ways
the target language functions. Secondly, compared to implicit knowledge, explicit
knowledge is an essential element for language acquisition, particularly for adult learners,
48 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
when the task of acquisition demands paying attention (Schmidt 1992).11 Most probably,
these considerations represent part of the reasons why linguists focused on explicit
knowledge designing the pedagogical materials for second/foreign-language teaching.
In designing these materials, some linguists focused on ‘form’, which can include
grammar points, vocabulary items, a language function or pronunciation (Ellis 1994).
According to Venkatagiri and Levis (2007) explicit explanations of structural properties
of the target language pay off in all of these aspects but are highest in the area of
pronunciation. Explicit knowledge of phonology should therefore play an important
role in improving the pronunciation accuracy of learners. In other words, conscious
knowledge of L2 speech sounds can help learners to achieve correct perception and
production of a second language.
Related studies describe attempts at teaching explicit knowledge of (aspects of) the
sound structure of the target language to foreign language learners. In the Sudanese
context, it was shown that teaching explicit knowledge improved the quality of the
learners’ English pronunciation (Al Dawla 2005, Fahal 2004). Earlier, Munro (1993)
found that the production of English vowels by Arabic speakers improved with
increased training, i.e. through increased knowledge of the target sound system.
11 According to Krashen (1985) and Bjarkman and Hammond (1989), implicit knowledge refers
to the tacit or subconscious knowledge which is developed and stored in the form of
generalizations during the learning of the target language. Linguists claim that a newborn baby
starts language acquisition from a genetically determined zero stage, proceeding forward to a
complete state of language knowledge using its subconscious (i.e. implicit) knowledge. Implicit
knowledge forms the available knowledge that learners need in order to acquire a second
language. If learners are at stage ‘i’ of language development, for example, they can acquire i+1 if
they comprehend an input item including i+1.
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 49
significant difference in meaning (Dahlquist 2002, David, Shirley and Dickson 1999).
Along the same line, mastering explicit phonemic knowledge, the learners will be able
to judge or differentiate between acceptable phonemic sequences and unacceptable
ones; e.g., in English an /UV/ cluster is acceptable, while /UH/ is not. In a wider context,
the variation of phonemes from word to word and from speaker to speaker makes the
learning of phonemes more complicated. However, if L2 learners have background
knowledge of this variation, they will achieve intelligible speech. According to Carr
(1999) the acquisition of native-like English pronunciation is a difficult task that
requires much effort, especially for learners past the age of puberty, but is a very
important element to avoid frustration among the speech participants. This means the
learners need to explicitly know more about the phonemes, i.e., they need to focus on
sound units (Gussenhoven and Broeders 1976).
Orthographical issues. The phonemic system of a language is related to its writing system.
Therefore, a sort of reference to spelling in the language should take place that gives
guidelines on pronunciation and perception of speech.
Quite apart from the difference in symbols, the difference between the English and
Arabic writing systems results in speech intelligibility problems. English has a complex
orthography, whilst the Arabic orthography is phonemic, such that one letter represents
one sound. These differences cause Sudanese learners of English to have pronunciation
problems. Historically, in most languages, members of the speech community learned
orthography from their elders. If it is supposed that there is a time when the
relationship between letters and sounds is clear and direct to those first created forms,
it will not remain the same as time passes. This is simply because the following
generations will not understand this relation and consequently a problem rises in
pronouncing words. The physical preservation of written forms resulted in the rise of
conservation practices in orthography by virtue of which the graphic form remains
unchanged, while the spoken form undergoes modification. The English word knight,
for example, originates from German, which presents a cognate of knecht. English
conservative orthography writes it as knight and it pronounces it as night /PC+V/. There
are many English words with similar spellings that have come to be pronounced
differently: e.g., plough, through, rough or roll, doll, home, come, etc. Thus, English ortho-
graphy is inadequate in comparison to orthographic systems of other languages. In
addition to the complex nature of English spelling inherited from the past, there are
idiosyncrasies in spelling that make it tricky to use. Idiosyncrasies refer to the large
number of consonant and vowel sounds varying from one dialect to another but which
give poor links to letters. Such relations make prediction of pronunciation difficult to
ESL/EFL learners. For instance, some learners have difficulty figuring out what
phoneme the digraph ‘th’ represents in words such as thin and then. This is because ‘th’
has two perceptual representations in English: voiceless and the voiced dentals /6, &/
(Heffner 1975). English is a language which has borrowed words from various
languages, such as Latin, Greek, Arabic and Russian. This feature makes English
pronunciation problematic, particularly for non-native speakers, since the relation
between letters and sounds in many of these borrowed words is not clear. Consider, for
50 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
example, words such as tchotchkes, chemical, alcohol, gnocchi, in which some letters are
written but not pronounced. Learners of English who come from a linguistic
background of simple orthography systems need to make much effort to learn how to
pronounce such words. Moreover, some borrowed words retain their spelling and
pronunciation of origin. In the 14th century, there was such a tendency, which
motivated an enthusiasm for things but in a neoclassical style. Such tendency allows the
spellings of words to undergo adaptation where words like nacioun changed its spelling
to nation while ‘gg’, which denotes ‘jh’, has been substituted for ‘dg’ in word final
position. Furthermore, spelling differences for the same sounds exist simply because
such sounds are pronounced differently, e.g., ‘ee, ea’. Later, the spelling of words
containing this newly unified sound had stabilized, so the double spellings were
preserved. Long vowel sounds witnessed a shift from a continental pronunciation that
is more like Spanish or French to the current one, after which the vowels took two
forms, short and long, as in ship/sheep, etc.
Teaching background. English pronunciation receives little space in the syllabus taught at
the primary, secondary and tertiary level in Sudan. Arguably, very few pronunciation
lessons are ever interspersed between the syllabus items, which represent inappropriate
and often insufficient phonological and phonetic components for EFL learners. There
are very few lessons that treat the basics of English speech articulation in high schools,
whereas only two or three courses are taken by university students of English that
present issues such as descriptive phonetics and listening comprehension skills.
The problems of teaching EFL in the Sudanese context are attributable to many factors.
According to Mitchell and El Hassan (1993) there are no practical teacher books to be
used during the teaching of English. This means teachers perform language teaching
depending on their own experience, which is not always scientific.
Moreover, results of a related study (Fareh 2010) revealed that the teaching of EFL, in
Sudan and other Arab countries, forms a challenge which arises due to a number of
reasons. Text books used do not consider many of the essential educational
requirements such as the learners’ level of English, attitudes, interests, etc. Their
contents are not authentic, and these are presented at a high level of language
demanding much from the learners. Moreover, the content of these courses does not
meet the needs of the learners and are often too large to finish within a term or
semester. Furthermore, the teaching strategies are typically teacher-centered in which
the learners have little opportunities to practise language skills in the target language.
This situation is exacerbated by the use of inappropriate methods of language
instruction, e.g., teaching English pronunciation and listening skills are not always
carried out by the use of language labs.
Arguably, assessment of EFL in Sudan largely focuses on the learners’ writing and
reading abilities while listening and speaking, including pronunciation, receive little
attention from assessors. Consequently, the learners do not show much development in
the learning of these skills.
CHAPTER TWO: LINGUISTIC BACKGROUND AND LITERATURE 51
2.2.10 Summary
This section provides a summary of chapter two. The chapter reviewed the testing
methods used in the measurement of speech intelligibility in second or foreign language
studies. It also reviewed the contributions of previous literature on speech intelligibility
problems that are faced by Sudanese EFL learners.
1. Several methods and tests deal with speech problems; however, previous studies
show that the Modified Rhyme Test and SPIN represent a highly adequate
approach to speech intelligibility measurement.
2. There is wide a range of phonetic and phonological differences between English,
which represents the target language, and Arabic, which represents learners’ L1.
These differences are worthy of study and are assumed to form a potential source
of learning difficulties for Sudanese EFL learners.
3. Vowels represent the most difficult area of English sounds for Sudanese EFL
learners to understand and produce. Previous studies refer to L1 effects, wrong
implementation or lack of knowledge of English phonetics and phonology.
4. English consonants are less problematic for the learners; however, learners have
difficulty identifying and pronouncing some English consonants such as /U, 6/, /\,
&/, /5,</, /V5, F</, /0/ and /R, X/.
5. The learners face more problems in their attempts to pronounce onset and coda
consonant clusters. Coda clusters are more difficult to understand than initial
clusters. Consonant clusters such as those that occur in English do not exist in the
Arabic language. Therefore, adult L2 learners are equipped with their L1
phonotactic constraints and have to deal with the mismatch that exists between L1
and L2.
6. Related studies have provided few accounts of the phonetic and acoustic correlates
of the learning problems experienced by Sudanese EFL learners. Therefore, much
more profound investigation is necessary to provide a clearer picture.
7. Arabic learners of English often perceive the phonological principles of English;
however, they fail to implement them and this is attributable to the paucity of L2
knowledge.
8. Studies of English as a second or foreign language use native speakers of English
as control groups/model speakers for comparative purposes. Error analysis based
on the differences which exist between learners’ performance and that of native
speakers. Differences are highly predictive of difficulties experienced by the
learners, whilst similarities imply fewer problems manifested in the learning of L2
speech sounds. Several types of errors have been detected in related studies, which
include substitutions, conflations, confusion, developmental interlingual errors and
insertion/deletion.
9. The study of the perception and pronunciation problems in English dealing with
Sudanese EFL learners receives little attention. The school and university
syllabuses give insufficient space to the teaching of these aspects of knowledge,
whilst the way these skills are taught is inadequate and traditional.
10. The related literature shows that most English pronunciation and perception
errors are due to the following: (i) the intricate nature of the English vowels, (ii)
unfamiliarity of ESL/EFL speakers with large numbers of vowel sounds, (iii)
52 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Intelligibility of RP English
to Sudanese listeners
3.1 Introduction
This chapter aims to present experimental evidence for the causes of speech
intelligibility problems which Sudanese university EFL learners face. The investigation
attempts to account for the linguistic factors that are assumed responsible for these
problems. In a recognition task of L2 speech, for example, the learners’ L1 represents
one of the linguistic factors affecting the learning process. That is, ESL/EFL learners
are sensitive to the speech sounds of their mother tongue, most of which are easily
intelligible to them. This means they do not have problems identifying sounds in their
own language. However, problems arise when the learners are involved in perception
tasks using second or foreign language speech. These problems form one of the urgent
ESL/EFL issues which require measuring the learners’ receptive intelligibility. The
measurement of receptive intelligibility addresses the listener’s ability to recognize the
acoustic waveform produced by the speakers as string of meaningful units (words) (see
Kent, Dembowski and Lass 1996). Among the different types of instrumental analysis
which treat speech recognition, segmental intelligibility measurement can be considered
an advantageous method. Therefore, this study is done on the basis of segmental
analysis of vowels, single consonants and consonant clusters of English. It targets the
types of identification errors made by the Sudanese listeners in the native speech,
accounting for issues like how vowels, consonants and clusters of English manifest
themselves as learning problems. Specifically, it is assumed that many reasons are
responsible for the intelligibility problems among Sudanese EFL learners. In an EFL
/ESL context, previous studies revealed that differences in phonetic and phonological
implementation in a learner’s mother-tongue often result in misperception of the
speech sounds of L2. According to Fokes, Bond and Steinberg (1985) Arab listeners of
English are inconsistent in identifying aspirated and unaspirated voiceless stops. They
have more difficulty with the labial than the alveolar categories. The identification
problem is attributed to the effect of the place of articulation of the stops and the
identity of the vowels. Moreover, voicing decisions at the labial place of articulation are
more difficult than at the alveolar place for all subjects. Acoustically, intelligibility
problems faced by EFL/ESL learners often occur due to the influence of consonants
on vowels as an example of the ways in which speech sounds interact in different
phonetic environments. Therefore, listeners need to know that in some environments,
the English vowel /KÖ/ as in beat, bead should not be realized precisely the same as /KÖ/ in
peat, keep which often reduces the intelligibility of a foreign learner of English (Allen
and Miller 1999). Problems such as these also require drawing attention to the learners’
54 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
explicit knowledge of English speech sounds. Many of the error analysis of L2 speech
sounds point out that learners’ misperception of L2 pronunciation are the result of
partial learning, orthographical differences, and so on, which support the hypothesis
that when L2 norms are lacking, learners usually fall back on habits of their mother-
tongue. This chapter attempts to examine experimentally the negative effect of two
linguistic elements on speech intelligibility of Sudanese EFL learners: (i) transfer of the
learners L1 (Arabic) and (ii) lack of explicit knowledge of English speech sounds.
Finally, this theme is discussed into four sections where each section integrates with the
others in a way as to provide coherence between the components of such sections.
3.2 Method
The Modified Rhyme Test (MRT) was used in the experiments. The MRT is considered
to be a highly accurate and reliable measure of intelligibility (Logan, Greene and Pisoni
1989). Speech intelligibility measures involve word identification tasks in a closed set of
four alternatives, where the listeners are asked to select the response they think the
speaker intended. The score is the number of correctly responded-to items. Test items
normally target phonemes, multi-phonemes, or words. Phonemes refer to vowels and
single consonants, whilst multi-phonemes refer to consonant clusters. The formal
assessments of phonemes and multi-phonemes interpret the responses as either
intelligible or unintelligible; a score of (close to) 100% is interpreted as completely
intelligible performance (Lafon 1966). Word intelligibility, on the other hand, was
determined on the basis of final words embedded in short redundant SPIN sentences.
SPIN is an abbreviation of ‘Speech Perception in Noise’ Test (Kalikow, Stevens and
Elliott 1977, Wang 2007, Wang and Van Heuven 2007). The test asks listeners to
recognise 25 keywords embedded in meaningful and highly predictable sentences, as in
She wore her broken arm in a sling (keyword underlined). Listeners only write down the
final word that they think they heard in each sentence. This part of the SPIN test
proved to be efficient at assessing speech recognition abilities (Rhebergen and Versfeld
2005). Although the listeners’ performance is primarily quantified in terms of number
of whole words correctly recognized, partially correct answers are also important since
CHAPTER THREE: INTELLIGIBILITY OF RP ENGLISH TO SUDANESE LISTENERS 55
they give information about the perception of specific phonemes in onset, nucleus and
coda position.
3.2.2 Participants
The subjects of the study were ten Sudanese university English students in the
Department of English at El Gadarif University in the Sudan. The subjects involved in
these experiments specialized in English language teaching (Teaching English as a
Foreign Language, TEFL). They had studied for six semesters when they participated in
the listening test. During the period of study, which extends for four years, students
attended three courses in the field of pronunciation; these are (i) an introduction to
phonetics, (ii) phonology and (iii) practical phonetics, delivered in three successive
semesters. They also attended two classes on English listening skills, which usually take
place in semesters one and three. English is treated as a foreign language (not a second
language), the learning of which starts in the fifth year of primary school and continues
at secondary schools for three years. English lessons obtained during these stages vary
between 5 and 6 hours per week; English is treated as a school course that provides
basic principles of the language in a traditional way of language teaching.
The test materials were produced by one male native speaker of RP English. The
speaker was asked to read the test material with RP accent. He received advice to
perform constant reading.
The experimental stimuli included four tests. These were (i) a vowel test, which was
composed of minimal quartets including short and long vowels as well as diphthongs,
(ii) single consonants in either onset or coda position and (iii) consonant clusters in
onset or coda position. These target sounds were embedded in meaningful C*VC*
words (where C* stands for one to three consonants). (iv) The fourth test comprised 25
sentences taken from the high-predictability set included in the SPIN (Speech
Perception in Noise) test (Kalikow et al. 1977). These are short everyday sentences in
which the sentence-final target word is made highly predictable from the earlier words
in the sentence, as in She wore her broken arm in a sling (target word underlined). Word
stimuli in the first three tests were embedded in a fixed carrier sentence Say…again,
which insured a fixed intonation with a rise-fall accent on the target word. The vowel
and the single consonant tests contained items on each individual vowel or consonant
phoneme in the RP inventory. 12
12
Inadvertently, the vowel test did not include an item targeting the vowel /7/ as in boat.
56 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
The consonant test targeted all the consonants in onset position and in coda position.
For the cluster test, the number of test items had to be limited as the total inventory of
onset and coda clusters is very large; including all the clusters would have been too
demanding on the listeners. Nine onset and eight coda clusters were selected that
present potential problems to Sudanese-Arabic learners of English (Allen 1997, Patil
2006). All items in the tests were chosen such that they occurred in dense lexical
neighbourhoods, i.e. there should be many words in English that differ from the test
item only in the target sounds. For instance, the vowel /+/ was tested in the word pit,
since the /p_t/ consonant frame can be filled by many other vowels, as in peat, pet, pat,
pot, part, port, put, putt and pout. These so-called lexical neighbours, differing from the
target word in only the identity of the test sound, make up the pool of possible
distracters (alternatives) in the construction of the MRT test. When selecting the three
distracters needed for each test item, lexical neighbours that differ from the target in
only one distinctive feature, were preferably selected. For the target pit, we selected
alternatives with vowels that differed from /+/ in just one vowel feature, i.e. pet
(differing in height), put (differing in backness) and pot. The latter alternative differs
from the target in both height and backness; this solution was preferred over the one-
feature difference in peat (or Pete) as it was decided to exclude proper names and low-
frequency alternatives as much as possible, which may show a larger decrement in
recognition than high-frequency words. The full set of test items is included in the
Appendix.
The stimulus sentences were typed on sheets of paper (one sheet for each test) and
then read by a male native speaker of RP English. Recordings took place in a sound-
treated room. The speaker’s voice was digitally recorded (44.1 KHz, 16 bits) through a
high-quality swan-neck Sennheiser HSP4 microphone. The speaker was instructed to
inhale before uttering the next sentence so that each utterance would have
approximately the same loudness, intonation and temporal organisation. The target
words were excerpted from their spoken context using the high-resolution digital
waveform editor in the Praat speech processing software (Boersma and Weenink 1996).
Target words were cut at zero-crossings to avoid clicks at onset and offset. Target
words and SPIN sentences were then recorded onto Audio CD in seven tracks. The
first track contained two practice trials for the vowel test and was followed by track 2,
which contained the 19 test vowel items. Tracks 3 and 4 contained the practice and test
trials for the single consonant tests and tracks 5 and 6 contained the cluster items.
Track 7 comprised the 25 SPIN sentences with no practice items. In the single
consonant and cluster tests trials targeting onsets preceded the items targeting codas.
Other than that, the order of the trials within each part of the test battery was random.
Trials were separated by a 5-second silent interval. After every tenth trial, a short beep
was recorded, to help the listeners keep track on their answer sheets.
CHAPTER THREE: INTELLIGIBILITY OF RP ENGLISH TO SUDANESE LISTENERS 57
The stimuli were presented over loudspeakers in a small classroom that seated ten
listeners. Subjects were given standardized written instructions and received a set of
answer sheets that listed four alternatives for each test item. They were instructed for
each trial to decide which of the four possibilities listed on their answer sheet they had
just heard on the CD. They had to tick exactly one box for each trial and were told to
gamble in case of doubt. Alternatives were listed in conventional English orthography.
In the final test (SPIN), subjects were instructed to write down only the last word of
each sentence that was presented to them. There were short breaks between tests and
between presenting the practice items and test trials. Subjects could ask for clarification
during these breaks in case the written instructions were not clear to them.
I will now present the results of the test battery in four sections, one for each test. Each
section will first outline the structural differences between the sounds in the source
language (Sudanese Arabic, SA) and in the target language (RP English). Such com-
parisons may help understand why certain English sounds are difficult for Sudanese
learners and others are not.
In this part, I present the results and the discussion of four sections separately which
include vowels, consonants, clusters and SPIN sentences of English.
3.4.1 Vowels
Figure 3.1 presents the percentage of vowels correctly identified by the Sudanese-
Arabic university students broken down by target vowel. As is shown by Figure 3.1, the
listeners overall correctly identify no more than 47.8 percent of the English vowel
tokens spoken by the native speaker. However, responses to individual vowels differ
widely, with percentages anywhere between 0 and 100. In detail, there is a complete
failure in the recognition of the short vowel /¡/ and the long vowel /#Ö/. These are
followed by high rate of misperception of the lax English vowels /+/ and /7, G, n/.
Similarly, tense vowels /«Ö, WÖ/ and diphthongs like /G, W, G+, C+, +, #7/ also proved to
be problematic. However, listeners show no errors in perceiving the two vowels /n+, nÖ/,
while few errors are made in the perception of the short vowel /3/. The low
percentage reveals that listeners find the perception of the English vowels difficult due
to different reasons, which will be discussed later.
58 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Figure 3.1 Mean percentage of native RP English vowels correctly identified by Sudanese
listeners broken down by target vowel. Error bars represent ±1 Standard Error.
Table 3.1 shows the results in more detail. This is a confusion matrix with the stimulus
sounds (‘target’) presented to the listeners listed in the rows and the responded vowels
(‘Perceived RP vowels’) listed in the columns. Correct responses are listed in the cells
along the main diagonal of the matrix (indicated in bold print), while incorrect
responses (so-called confusions) are located in off-diagonal cells. Confusions that occur
in 30 percent of the cases or more have been highlighted in the matrix (grey-shaded
cells). These cells identify types of errors that point to specific difficulties on the part of
the listeners.
CHAPTER THREE: INTELLIGIBILITY OF RP ENGLISH TO SUDANESE LISTENERS 59
Table 3.1 Confusion matrix of 20 English vowels and diphthongs (in the rows) perceived by ten
Sudanese-Arabic listeners (in the columns). Correct responses are on the main diagonal, indicated
in bold face. Confusions ( 30%) are in grey-shaded cells. The vowel /7/ should have been
presented but was not.
Perceived RP vowels
Target
¡ «Ö #Ö 3 #7 C+ G G G+ + KÖ + n nÖ 7 n 7 WÖ 7
¡ 0 1 9
«Ö 4 1 2 3
#Ö 0 1 9
3 9 1
#7 5 1 4
C+ 3 5 2
G 3 5 1 1
G 2 6 2
G+ 1 1 8
+ 5 2 3
KÖ 1 4 5
+ 7 3
n 3 7
nÖ 10
7
n+ 10
7 2 8
WÖ 4 6
7 3 7
3.4.2 Discussion
The perception of the English vowels forms a serious problem for Sudanese Arabic
listeners of this study. The listeners frequently confused the low central short vowel /¡/
with the peripheral low and back short vowel /n/, whilst the half-open vowel /«Ö/ was
identified as /nÖ/ because their L1 (Arabic) inventory lacks central vowels (Brett 2004).
It is most likely that linguistic differences between the listeners’ L1 and L2 have a
negative transfer (mapping model, cf. Kuhl 2000) on the listeners’ perception process.
That is, listeners are not familiar with the types of vowels needed in English because
they are not distinguished in the Arabic phoneme system. Therefore, they tend to
mentally equate L2 vowel sounds to their L1, so that the non-native listener fails to hear
a contrast between two sounds that native listeners and listeners of the target language
make.
60 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
In a similar case Tomokiyo, Black and Lenzo (2003) reported that difficulty to achieve
inter-coder agreement between Arabic and English vowels. Especially the presence of
an /G/ or /Q/ vowel proved difficult for the Arabic listeners to identify with a great deal
of consistency. Tomokiyo et al. refer this to the influence of Modern Standard Arabic
(MSA), where formal marking (i.e. in the writing system) indicates the existence of only
/C, K, W/. More importantly, duration often has a negative influence on the recognition
of English vowels. This appears in several cases where the Sudanese listeners conflated
/7/ with /WÖ/ and /+/ with /KÖ/ and confused /#Ö/ with /nÖ/. Such a type of error
motivates the hypothesis that duration is an important acoustic cue used in cross-
linguistic speech perception (Hillenbrand and Clark 2000). According to Hillenbrand
and Clark, due to duration shortening the vowel, /3/ tends to be heard as /'/ and /#/
as /n/, whilst the lengthened /'/ tends to shift to /¡/ and /¡/ to /#/ or /n/, a change
process which leads to confusion. However, Hillenbrand and Clark observed only
minor changes in the perception of /7, WÖ/ and /+, KÖ/; for these vowels the effect of
incorrect duration was negligible. This account implies that duration cue does not have
serious effects on all short/long vowel contrasts. A more specific case was reported by
Munro (1993) that the English vowels interpreted by Arabic groups (including
Sudanese) manifested the same ordering of vowel duration differences for front vowels,
but a different ordering for back vowels. This means that if the English vowels
perceived or produced by Arabic speakers tend to longer/shorter, it is probably not
because their L1 is a quantity language in which length is an intrinsic element that
requires vowels to be realized as either short or long (geminated). Rather, it is because
similar cues are used in English. This data raises the prediction that English tense-lax
vowels are close to Arabic long/short vowels in terms of quality and duration.
Moreover, it is possible to account for such perception errors as inadequate knowledge
of English vowels, which prompts listeners to conflate, guess, or fall back on their L1
norms (Flege and Font 1980, Fokes, Bond and Steinberg 1985, Walker 2001). It is also
probable that because Sudanese listeners descend from a language background with a
small number of vowels, they find the perception of the English vowels difficult.
According to Cruttenden (2008) this is most predictable in those areas where vowels
are close together in the vowel space, so that confusions are possible within these areas:
/+, KÖ/, /7, WÖ/, /G, 3/, and /¡, nÖ, b, #Ö/.
In conclusion of this section, it should be noted that there is also confusion in the
group of diphthongs. The diphthong /#7/ is misidentified as /7/, /+/ as /G/ and
/C+/ as /G+/. Misidentification of such English vowels can be attributed to the fact that
each two confused diphthongs share at least one sub-phone; a feature which serves to
complicate the perception task for listeners.
Figure 3.2 shows the results of the perception test of the ten Sudanese listeners for the
English single consonants, presented to them in the onset of syllables (upper panel) or
in the coda (bottom panel).
CHAPTER THREE: INTELLIGIBILITY OF RP ENGLISH TO SUDANESE LISTENERS 61
Figure 3.2 Mean percentage of correctly recognized consonants by ten Sudanese listeners, broken
down by 24 target consonants. Lower and upper panels present the results for coda and onset
consonants, respectively. Error bars are ± 2 Standard Errors.
Figure 3.2 shows that the overall identification of the onset consonants is better than
that of coda consonants, with means of 95% against 75% correct. In onsets, listeners
show near-perfect perception of stops /D, V, F, M/ and the fricatives /H, X, U, 5/ as well as
/O, P, J, L/. However, a few errors were made in the identification of voiceless labial
/R/ and voiced velar /I/. Listeners also substituted /I/ for /M/, which are produced at
the same place of articulation (velar) and /d</ for /F/. Other errors occurred in the
recognition of the voiceless fricatives /6/ and the voiced /\/. Here listeners confused
the voiced /\/ with voiceless /U/, /6/ with /U/ and /F/ with /&/ whilst /R/ was
perceived as /V5/. An interesting finding is that listeners were observed to frequently
perceive the retroflex /T/ as /Y/.
Table 3.2 presents the Sudanese listeners’ perception of English onset consonants in
more detail. The diagonal line running across the table displays the correct scores of
perception while the scores scattered around it represent the problem areas.
62 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 3.2 Confusion matrix of 22 English onset consonants (in the rows) as perceived by ten
Sudanese-Arabic listeners (in the columns). Further, see Table 3.1.
Perceived RP consonants
Target
D V5 F & H I J F< M N O P R T U 5 V 6 X Y L \
D 10
V5 9 1
F 10
& 2 8
H 10
I 9 1
J 10
F< 10
M 10
N 10
O 10
P 10
R 1 9
T 6 4
U 10
5 10
V 10
6 1 1 8
X 10
Y 1 9
L 10
\ 1 9
Compared with onset consonants, results in Figure 3.2 show that more errors are made
by the listeners in the perception of coda consonants; the overall mean percentage of
correctly identified consonants is poorer for codas than for onsets. A confusion of 90%
was made in the recognition of the voiceless stop /R/ as /F/, /M/ and /P/. Listeners
also made errors in the perception of /I/; i.e., they confused /I/ with/M/ and /I/ with
/P/. Conversely, they confused /M/ with /I/ and /M/ for /V/, whilst /V/ was
misidentified as /F/ or /M/. Nasal codas proved to be a problematic area of perception
where the confusion rate ranged between 50% and 60%. For example, listeners
frequently confused /0/ and /O/ with /P/. On the other hand, labio-dental /H/ was
confused with /X/ and /X/ with /\/. Listeners show very few errors in identifying /D/,
/U/ and /V5/, while they made no errors in the perception of /N/, /5/ and /F</. The
confusion matrix of coda consonant perceptions is presented in Table 3.3. In the table
CHAPTER THREE: INTELLIGIBILITY OF RP ENGLISH TO SUDANESE LISTENERS 63
the plosives /R, V, F, M, I/ appear more problematic, whilst /N, 5, F</ were perfectly
perceived.
Table 3.3 Confusion matrix of 19 English coda consonants (in the rows) perceived by ten
Sudanese-Arabic listeners (in the columns).
Perceived RP consonants
Target
D V5 F F< H I M N O P 0 R U 5 V 6 X \ &
D 8 1 1
V5 9 1
F 6 1 3
F< 10
H 6 4
I 5 3 2
M 1 6 2
N 10
O 6 4
P 7
0 1 3 5
R 3 5 1 1
U 9 1
5 10
V 2 1 5
6 4 6
X 7 3
\ 6 4
& 1 9
3.3.4 Discussion
One of the findings is that the Sudanese listeners confused English /T/ and /Y/. This
finding supports the claim that the learners’ production of L1 sounds often influences
the way they perceive an L2 counterpart. That is, the /T~Y/ glide confusion is very
likely because the English /T/, which is not a trill but a frictionless continuant, is
mistaken for the nearest vowel-like sound in Arabic, which would be /Y/. There are
strong indications that /Y/ is perceptually close to English /T/. There is a sound
change in progress in which young speakers of English now pronounce onset /T/ as
/Y/ (see Watt, Docherty and Foulkes 2003). In the majority of English accents /T/ is
articulated as a voiced alveolar or post-alveolar approximant. The retroflex variant of
/T/ is distinguished by a particularly low F3 that is close to F2, while energy above F3 is
normally weak due to the existence of two anterior constrictions in the vocal tract, one
made by the tip or blade of the tongue and the other by the narrowed lip. The Arabic
/T/, on the other hand, is normally a tap or an alveolar trill that requires vibration of the
tongue against the ridge. Allophonic variation is mainly concerned with the distinction
between single and geminate /T/ in intervocalic position, whereby single /T/ is
64 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
produced as a tap and geminates as trills (as they are in Spanish). Because of these
phonemic and acoustic features, the substitution of /Y/ for /T/ can occasionally occur
(Khattab 2002). This conclusion suggests that such a type of problem occurs due to the
learners’ lack of knowledge and to insufficient practice of the English /T/ as a post-
alveolar approximant.
On the other hand, the replacement of /I/ by /M/, /\/ by /U/, /H/ by /X/, and /6/ by
/U/ shows a systematic pattern of errors. The first two errors are a shift of voiced to
voiceless sounds. These cases are produced at the same place of articulation; the sounds
/I/ and /M/ are velar, while /\/ and /U/ are alveolar. It is most probable that the
perception errors /I, M/ and /\, U/ are the result of the effect of similarity of the place
of articulation. However, it is possible to suggest that errors such as these can occur
due to a violation of the norm of the voiced/voiceless feature; i.e. these sounds are
probably substituted because the voicing feature is not distinguished, or resists learning.
However, Flege and Font (1981) attribute this type of error in English stops to the
place of articulation rather than to voicing. Additionally, the confusion of /6/ for /U/ is
probably caused by interference of the perceptual strategies of the listener’s L1 where
the English (inter)dental /6/ was mistaken for the nearest Arabic sound, which is the
(alveolar) dental /U/. The substitution of /&/ for /\/ and /6/ for /U/ is often attributed
to the L1 effect. That is, in the consonant inventory of Sudanese and other Arabic
dialects, the interdental /6, &/ merged with the apico-dental (often labelled as alveolar
or sibilant) /U, \/ (Corriente 1978, Dickins 2007, Karouri 1996, Watson 2002). Thus, an
Arabic word like /J3Ö&C/ ‘this’ is pronounced as [J3Ö\C], whilst /63ÖDKV/ ‘firm’ is
pronounced as [U3ÖDKV], a problem which is reflected in the perception of L2 speech
sounds. The affricate /V5/ was also misperceived as /R/ because the articulation of the
two stops /V/ and /R/ involves a complete closure followed by a release. This makes
listeners think of affricates as stops with a slow fricative release. It is very common
among L2 interlocutors that when there is background noise or unfamiliarity with the
speaker’s accent, intelligibility is compromised (Ball and Rahilly 1999, Subramaniam and
Ramachandrainh 2006).
the Sudanese listeners use the acoustic correlates of Arabic stops instead, which triggers
the confusion. This type of error of English stops is described as a wrong
approximation of the length of the vowel duration that should precede or follow such
stops. To avoid these problems, Arabic speakers learning English need to do a
modification in their L1 correlates of voiced and voiceless stops towards the English
norm (Fokes et al. 1985, Khattab 2000). They need to use a longer VOT value for
initial voiceless plosives and to lengthen the vowel preceding the syllable-final voiced
stops/obstruent. Other perception errors are that the Sudanese listeners confused the
voiceless coda consonants with their voiced counterparts as in /U~\/ and /H~X/ as a
result of the similarity in the place of articulation, whilst the confusion of /P, 0, O/ is
due to nasality. Many types of errors of perception are the result of similarity of the
place and manner of articulation, on both onset and coda level. The absence of some
phonemes like /X, 0, R/ from the Arabic inventory adds to the perception problems of
listeners.
Figure 3.3 shows means (and standard error) for a group of ten Sudanese listeners in
the perception of English consonant clusters. As the figure shows, in contrast to vowels,
consonant clusters yield fewer errors of perception. Furthermore, the performance of
the listeners for onset clusters is better than for coda clusters; the overall correct scores
being 75 and 71%, respectively.
Listeners misrecognized /FT/ as /IT/ which is more frequent than /FT/ as /MN/ and
these are followed by the misidentification of /UN/ as /UP/. They are also observed to
interchangeably make errors in perceiving /URN/ as /URT/, /MN/ as /IT/ and /URT/ as /RT/
or /UMY/. However, there are no errors in the perception of the initial clusters /IN/,
/RN/ and /UY/. On the other hand, final clusters are more prone to misperception. That
is, the rates of perception errors shown in Figure 3.3 indicate that the most perception
errors are manifest on the coda level; and these are the substitution of /DF/ for /NF/,
/UV/ for /UM/, /P\/ for /O\/ and /P\/ for /F\/. Listeners also made errors in identifying
/NO/, /VU/, /PV/ and /OR/, but fewer errors were observed in recognizing the item /I\/,
whilst /0M/ was correctly recognized. More details are shown in Tables 3.4 and 3.5
below. They provide a clearer picture of the correct and confused consonant clusters.
The correct scores of perception appear on the diagonal line running across the table in
bold face, while the cells scattering around represent the confusion areas.
66 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Figure 3.3 Mean percentage of English onset and coda clusters correctly identified by ten
Sudanese listeners. Error bars are ± 2 Standard Errors.
Table 3.4 Confusion matrix of eight English onset consonant clusters (targets, in the rows)
perceived by ten Sudanese-Arabic listeners (responses, in the columns).
Table 3.5 Confusion matrix of eight English coda consonant clusters (targets, in the rows)
perceived by ten Sudanese-Arabic listeners (responses, in the columns).
3.4.6 Discussion
The plosive+liquid replacement of /FT/ by /IT/ and the fricative+plosive /UV/ by /UM/
can be accounted for as an alveolar-to-velar shift within the same manner of
articulation. The misperception of /MN/ as /IT/ (velar+liquid) is attributable to the
factor of velarity in the first cluster members and to the manners of articulation in the
second. Generally speaking, these types of perception errors support the linguistic
hypothesis that the perception of L2 sounds is often influenced by the perceptual and
articulatory properties of L1 (Canepari 2005, Cruttenden 2008), where listeners often
resort to the nearest corresponding sound. Moreover, such a type of perception error
where a voiced obstruent precedes the voiced liquid /T/ often takes place due to
phonological alternations in similar consonant clusters – mostly in homorganic
C+liquid sequences. These phonological alternations usually occur when the speech
signal is not detected well due to the lack of experience with voicing leads in
phonetically voiced stops, or due to the absence of appropriate phonetic cues (Seo
2003). Similar interpretations apply to the misperception of the voiceless sibilant+
voiceless stop+liquid clusters /URN/ as /URT/ interchangeably and the misperception of
/URT/ as /RT/ and /UMY/, where substitution errors of the third cluster member /N, T, Y/
took place, respectively. However, this type of error points also to the influence of the
similarity of the manner of articulation shared by the approximants. On the other hand,
the confusion of the coda nasal+fricative clusters P\/ as /O\/ is due to nasality, but the
confusion of nasal+plosive clusters /P\/ for /F\/ is probably due to the influence of
the place of articulation shared by such members. Additionally, listeners follow a repair
strategy in perceiving /DF/ as /NF/. They adopt the nearest speech sound that aids them
to understand a word/message; i.e. listeners transfer their L1 phonotactic constraints
when listening to English. 13 This strategy reflects the prominent role played by the
Sonority Sequence Principle in accounting for phonotactic patterns across languages
(Carr 1999, Clements and Keyser 1988, Gierut 1999, Gierut and Champion 2001). Thus,
The SPIN-test (Speech Perception in Noise test) targets word recognition at the
sentence level. It aims to examine the learners’ performance in speech perception by
including the effect of semantic context. In the SPIN test, listeners are exposed to a set
of 25 specific meaningful sentences. Their task is to write down the last word
embedded in each sentence. In this way, the final goal of such types of tests is to
provide a measure of the ability of a listener to understand speech in an everyday
listening situation.
Figure 3.4 provides the means of Sudanese listeners’ performance on the SPIN test.
60
Percent correct responses
40
20
0
ons_cor nuc_cor cod_cor word_cor word_comp
Figure 3.4 Percentage of (parts of) English words correctly recognized by Sudanese-Arabic
listeners (further see text).
to recognize some sounds in the words correctly. For instance, correct identification of
sounds in the onset position of syllables (‘ons_cor’) is at 70%, whilst vowels (‘nuc_cor’)
and coda consonants (‘cod_cor’) are around 45% correct. The mean of the component
identification (‘word_comp’) is about 50%. The observation that onsets were perceived
more accurately than the vowels and codas ties in with the more detailed results of the
MRT tests. Together, these results indicate that onsets consonants, whether single or
clusters, were identified more successfully than vowels and codas.
3.4.8 Discussion
The Sudanese listeners had a poor perception in simple and predictable English
sentences that reached 30% correct. However, they had a better performance on single
and cluster consonants and were poor especially on the vowel level. These observations
provide empirical evidence that words and vowels are the most problematic aspects for
the listeners. It is possible to predict that vowel perception would be more of a
challenge for Sudanese-Arabic listeners of English than single and cluster consonants.
This prediction is based on the observation that the learners’ L1 (Arabic) has only five
or six vowels, which makes it difficult for such learners to correctly classify the vowels
in the much richer system of any variety of English (Cruttenden 2008). Moreover,
observations bear out the prediction that the large number of consonant sounds
existing in the listener’s L1 facilitated the perception task (positive transfer); i.e. listeners
are at least more familiar with consonants than vowels.
Table 3.6 presents a correlation matrix for the test results obtained separately for
vowels, single consonants, cluster consonants and SPIN sentences of ten Sudanese
listeners. Within the category of consonants and consonant clusters separate test
components are distinguished for target sounds in onset position, coda position and
averaged over both positions. The correlation coefficient computed is Pearson’s
product moment correlation r, which expresses the strength of the linear relationship
between two sets of scores. The value of r ranges between –1 and +1. If r is positive,
then higher scores on one variable (e.g. test score X) tend to go together with higher
scores on the second variable (test score Y). If r is negative, the relationship between
the two sets of scores is reversed, i.e., higher scores on test X go together with lower
scores on test Y (and vice versa). The absolute size of r expresses the strength of the
relationship. If r = 0 there is no relationship between the two variables at all, when r =
|1| the relationship is perfect, so that the score on Y can be predicted with certainty
from X (and vice versa). The best way to interpret intermediate r-coefficients, is to
square the value of r. So, if test scores X and Y are correlated at r = .7, then test score
X can be predicted from test score Y with an accuracy of 49 percent (r2 = .49) on a
scale between 0 and 100, i.e., between zero correlation and perfect correlation (see e.g.
Woods, Fletcher & Hughes 1986) for more discussion on how to interpret correlation
coefficients).
70 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 3.6 Correlation coefficients of scores on vowels, single consonants (in onsets, codas and
both), cluster consonants (in onsets, codas and both) and SPIN sentences. R-values indicate the
linear relation between the listeners’ test scores for any pair of test components.
Bolded |r| > .6: Correlation is significant at the 0.05 level (2-tailed).
Bolded |r| > .7: Correlation is significant at the 0.01 level (2-tailed).
Normally, one would expect listeners who are good at identifying one type of sound,
for instance vowels, to be also good at recognizing other sounds, such as consonants
and clusters. By extension of the same argument, listeners who are good at identifying
sounds (whether vowels or consonants) should also be good at recognizing words. I
would expect all correlation coefficients in Table 3.6 to be positive. Some positive
correlations, however, are fairly trivial and should not be considered. It is predictable,
for instance, that correct perception of either onset or coda consonants should
correlate strongly with the averaged score of onset and coda consonants, simply
because 50 percent of the average is determined by each of the component scores. I
will not discuss such part-whole relationships in the remainder of this section.
Both positive and significant correlations are found only between variables measured in
words used in the SPIN-test. Observed, first of all, that the recognition of onset
consonants in SPIN words correlates with recognizing the vowels in such words (r
= .710, p < .01). This is what I would expect but, of course, it goes against the
inexplicable negative correlation reported earlier between onset consonants and vowels
CHAPTER THREE: INTELLIGIBILITY OF RP ENGLISH TO SUDANESE LISTENERS 71
in the MRT words. I also find that the recognition of onset and coda consonants are
correlated in the SPIN-words. Interestingly, the chances of recognizing the entire
SPIN-word are better if the listener identifies the coda correctly (r = .899, p < .01) than
when he correctly identifies the vowel (r = .700, p < .01) or the onset consonant (r
= .567, ins.). This observation goes against the general claim that sounds contribute less
to word recognition as they occur later in the word (e.g. Marslen-Wilson & Welsh 1978,
Nooteboom 1981).
I should point out, finally, that it is strange to find no correlation between any of the
individual test components in the MRT word tests and the listeners’ performance on
the contexted SPIN word recognition test.
To sum up, the perception of the listeners in the SPIN materials is very poor at the
sentence level, but it provides feedback about which of the three types of English
phonemes is most problematic for Sudanese listeners. In this connection, the results of
the Sudanese listeners’ correct word identification in the SPIN-test are comparable to
those obtained for Mandarin Chinese listeners exposed to a similar SPIN test (Wang
2007). Similarity of performance between the two groups can be attributed to the fact
that both Chinese and Sudanese listeners speak English as a second/foreign language.
The listeners also come from linguistic backgrounds that are entirely unrelated to
English; Chinese is a Sino-Tibetan language, whilst Arabic is Semitic. In contrast,
Dutch listeners in (Wang 2007) had high word correct percentage, due to more
exposure to English than the non-Germanic groups. Furthermore, phonetically, the
Dutch L1 sounds are closer to the English targets than either those of Arabic or
Mandarin. Predictably, American listeners had the best performance on the SPIN test
simply because they are native speakers of English (Wang 2007).
Durational aspects do not show serious effects on the identification of English vowels
because there is some kind of correspondence between the listeners’ L1 (Arabic)
long/short vowel durations and those of the English tense-lax vowels. However, the
confusion within the tense-lax vowel pairs /7, WÖ/ and less frequently /+, KÖ/ indicates
interference of the subjects’ L1 and probably the lack of knowledge of English vowel
sounds.
With regard to the interdependency existing between the perception and production of
speech sounds, differences in the place and manners of articulation between English
and Arabic phonetic systems require that the Sudanese listeners enhance their L1
phoneme inventory to that of L2 to achieve a better performance of English speech.
The perception of the English single and cluster consonants is more difficult in the
72 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
coda than in the onset position. The listeners transfer their L1 phonotactic constraints
when listening to English consonant clusters. This mostly occurs with coda consonants
where the listeners fail to distinguish or implement certain phonetic features.
Conclusions drawn above provide cognitive insights that help understanding the nature
and the causes of the speech perception problems that are experienced by Sudanese
listeners of English. Thus, they represent useful guidelines that can contribute to the
learning and teaching of such types of problems in ESL/EFL contexts. One important
guideline is that successful pedagogical implications of speech perception should target
the mastery of the basic principles of English phonology, phonetics and acoustic cues.
Many second/foreign language learners lacking such knowledge have difficulties
treating English speech issues, e.g., recognizing English vowels in different contexts, or
discriminating between quartets such as pit, pat, pot, put, etc. So, there is a need
sometimes for pupil involvement in group work for task-based learning, whereby some
pupils may have roles which require them to listen or speak quite a lot. Moreover, the
listeners’ L1 inventory has a negative effect on the process of the speech intelligibility.
This requires that it should be taken more seriously and more practically during the
learning/teaching tasks of English speech perception and production. The teachers, for
example, need to create an ‘English atmosphere’ in the classroom where more exposure
to native English speech is necessary to reduce the L1 effect.
Chapter Four
Learning a second language speech can often be described as a process that depends on
phonological representations where a native source language L1 influences the target
language L2. A negative influence of an L1 with few vowel contrasts may interfere with
attempts on the part of ESL/EFL learners to distinguish between English minimal
pairs like bet/bait, cat/cart, din/den, sin/thin, half/halve, bed/bet, wit/wet, worse/worth, pea/bee,
peer/pair etc. In this task, learners exert an effort in producing the intended speech
sound correctly, although most of them fail. One reason why the learners face
problems such as these is the discrepancy of the perceptual representations of
phonemes exists between L1 and L2. Previous studies revealed that Japanese EFL
learners have perception and production problems with the English /T~N/ contrast in
words like lot vs. rot (Lee 1969). In a more recent related study, Arabic learners of
English were shown to have difficulty distinguishing /&, \, 6, U/ because English
fricatives are softer than their Arabic counterparts (Koeczynski and Mellani 1993).
Linguists are very much concerned with measuring these types of errors, which are
manifest in the performance of the second or foreign language learners. A test that
addresses speech production issues such as these (in words like lake vs. rake) and
accuracy of L2 sounds measures segmental intelligibility. When L2 speech sounds are
recognized correctly by native speakers this constitutes evidence that the L2 production
distinguishes the required categories. However, failure is also useful evidence, which
provides insight that helps to predict the nature and the causes of intelligibility
problems (Flege 1976). This study measures the segmental intelligibility of the speech
sounds produced by Sudanese university EFL learners (native speakers of English are
included in this study but as a control group only). It attempts to account for the extent
to which linguistic elements can impede the intelligibility of these speaker groups when
Dutch listeners of English assess them. The involvement of Dutch listeners of English
as a judgment group was intended to provide additional feedback on the quality of
Sudanese-Arabic accented English in an international context. The mere dichotomy of
native/non-native speaker has proven to be of limited value (Atechi 2006, Smith 1992).
By including non-native Dutch listeners of English, native RP speakers as control
groups and Sudanese EFL learners as the test group, the study is expected to provide
more evidence of intelligibility problems under investigation.
74 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Arguably, Sudanese EFL learners typically make a wide variety of production errors in
their vowels, consonants and clusters of English. Substitutions of English vowels are
observed in words such as pot, put, pat coat, palm, warm, flute, etc. It is assumed that these
types of errors occur because the speakers are not familiar with a large number of
vowels such as those of English. Similar errors are also observed in the performance of
these subjects producing English consonant clusters. For example, a vowel sound is
usually inserted before (prothesis) or between (anaptyxis) the members of English
clusters in words such as flow sprint, special, and so on. Doing this, speakers attempt to
achieve perceptible pronunciation even though consonant clusters are absent from the
Arabic sound system. Differences of phonological representations between English and
the Sudanese learners’ L1 (Arabic) make the issue concerned more difficult.
This study reports the intelligibility of English speech sounds produced by Sudanese
EFL learners as opposed to those of native English speakers, assessed auditorily by
Dutch listeners of English.
4.2 Objective
Objective of the study is to find experimental evidence for the causes of speech
intelligibility problems experienced by Sudanese university speakers of English based
on the assessments of Dutch listeners of English. The data obtained can also help
understand and draw cognitive insights into the nature and causes of pronunciation
problems the learners face.
4.3 Participants
These are ten Sudanese University students of English at Gadarif University in Sudan.
The subjects involved in these experiments specialize in English language teaching
(Teaching English as a Foreign Language, TEFL) and have already spent six semesters
of study. During the period of study, which extends for four years, the students attend
three courses in the field of pronunciation: (i) an introduction to phonetics, (ii)
phonology and (iii) practical phonetics delivered in three following semesters, besides
two classes in English listening skills that usually take place at semester one and three.
The Arabic language is the mother-tongue for all the students, whilst English is treated
as a foreign language (not a second language) the learning of which starts at the basic
level in the fifth year and continues at secondary schools for three years. The English
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 75
lessons obtained at such stages vary between 5 and 6 hours per week. At primary and
secondary school the basic principles of English are taught in a traditional way.
One of the ten university students was involved in the perception tests as a speaker of
Sudanese accented English. This speaker was asked to read out a list of English
stimulus items which include vowels, single and cluster consonants, besides SPIN
sentences. The Sudanese model speaker was selected by means of a quality sound test
from among a number of 11 Sudanese speakers of English. The sound quality test was
administered online and candidates of different nationalities were invited to listen to the
test and provide scores to each speaker by clicking on one of the grade options
provided. Assessment of the speakers’ sound quality depended on the computation of
the total mean of the results of each speaker. Finally, the speaker with the individual
mean closed to the grand mean was chosen as the representative subject.
The participant herein is the native speaker of English (RP accent) who was involved
earlier in the perception tests as a model speaker of English, as described in chapter
three. As explained in chapter three, this speaker was asked to read out stimulus items
which included vowels, single and cluster consonants of English, as well as SPIN
sentences.
Participants here included ten Dutch students who were preparing for bachelor and
master degrees in various fields of study at Leiden University. These participants took
part in the perception tests as listeners only (see § 4.3.3.2).
Despite the fact that English and Dutch languages are strongly related languages, Dutch
listeners of English face a variation of learning problems of English vowels and
consonants.
Both English and Dutch have a large number of vowels. Moreover, both English and
Dutch vowels fall into three categories (i) checked vowels, (ii) free steady-state vowels
and (iii) diphthongs. However, there are also differences between the two vowel
systems and the associated phonotactic possibilities. For example, the Dutch vowel
inventory includes a set of combinations of free vowel+glide sequences that does not
exist in English.
As a case in point, Dutch listeners confuse English /3/ and /G/ due to the
circumstance that Dutch has only /'/ in this part of the vowel space, which is
76 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
positioned between the two English vowels /3/ and /G/. 14 The major cause of these
perception errors is the influence of L1 vowel inventory (Cutler et al. 2005, Flege 1992).
The inability of Dutch listeners to distinguish between the English vowels /3/ and /G/
in minimal pairs such cattle ~ kettle is described as an impact of pseudo-homophones
that collapses minimal pairs such as the previously mentioned ones. Robust priming
effects were found for Dutch ESL listeners who respond faster to cattle after first
hearing kettle (and vice versa) but not for native English listeners (Cutler et al. 2005).
There are other vowels that may cause learning problems for Dutch learners of English.
According to Flege (1992) English /K/ and /W/ are classified as similar to Dutch /K/
and /W/. He adds that Dutch /K/ is lower than its English counterpart, yet this need not
cause serious learning problems. However, English /W/ appears to cause a learning
problem because some learners substitute it for Dutch /7/ (see also Collins and Mees
1999, Wang and Van Heuven 2007). Additionally, the English central vowel/¡/ is
classified as a new vowel to Dutch learners of English since there is no phonetic
representation for it in the Dutch vowel inventory. So, it is expected to cause learning
problems for Dutch ESL learners. However, for some reason, /¡/is classified as a
similar phoneme that represents no learning problem. Firstly, it is because the vowel
exists in Dutch inventory but it goes unexploited by Dutch. Secondly, it is because the
Dutch vowel /#/ has acoustic values similar to those of American and British /¡/.
These are reasons why Dutch learners of English exert little effort identifying English
/¡/.
Dutch and English consonants are similar in most respects; however, there are also
differences. Some English fricatives form perception and articulation problems for
Dutch learners. There is a problem with the articulation of /&/ and /6/, in that
members of pairs such as /6~U/ and /&~F/ are not clearly distinguished. Most Dutch
speakers of English have learnt some English in primary and secondary school so they
already know the /6~U/ contrast. However, previous studies show that these speakers
have difficulty in distinguishing between the English fricatives /6~U/ (Collins and Mees
1999). This is probably because the dental fricative /6/ is absent from the Dutch
consonant inventory. Yet another learning problem with English consonants is the
substitution of /X/ for English /Y/. This error is most likely due to an orthographical
effect: the sound written as w in Dutch would be a good approximation of English /X/
but it is not used in this way (Collins and Mees 1981).
On the other hand, the perception of English fricatives seems to be less problematic.
Heeren and Schouten (2008) reported that the identification and discrimination of
British-English /6~U/ by Dutch listeners improved after training, which is consistent
with results from earlier training studies. That is, results show that trained listeners
performed better in the post-test than in the pre-test and in several respects they also
did better than the untrained control group. The improvement in their performance
14Arguably, Dutch students with Southern (Limburgian) accents would have difficulty learning
English /G/ due the fact that their L1 has [3] rather than ['] as the realisation of the lax low front
vowel (Smakman, personal communication).
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 77
excluded acquired similarity, but acquired distinctiveness was not found exclusively at
the phoneme boundary. Furthermore, control listeners, who received no training, also
improved by simply performing the tests twice in pre-test and post-test due to
experience of the control group in the design of a phoneme training study. Moreover,
Iverson et al. (2008) state that Dutch speakers use the phonetic categories of their L1
perceiving and producing English /X/and /Y/. This means the learners incorporate
English /Y/ with their L1 Dutch /8/ category and English /X/ with their L1 Dutch
/X/ category. This learning strategy suggests the learning problems of these phonemes
can be attributed to perceptual interference. Furthermore, Iverson et al. (2008)
conclude that Dutch speakers are consistently accurate in identifying and producing
English /X/ and /Y/ because of their experience with English and because of their
eagerness to learn new languages.
Dutch listeners and Sudanese learners of English were matched at important points
such as age and education level. Both groups of subjects were university students
preparing for bachelor degree, and were in a similar age bracket (around 19-25 years
old). These characteristics have important influence on second language learning.
Other conditions related to language proficiency such as phonetic distinctions,
training of L2 speech and everyday exposure to English could affect intelligibility
(see Kluge et al. 2007, Scott 1999). Dutch listeners enjoyed a good command of
English both in read speech and spontaneous speech, a feature which enables non-native
listeners to make relatively effective judgments and fewer understanding errors.
Dutch listeners can be assumed to be unfamiliar with Sudanese-accented Arabic
English. Thus, they can be labelled as naive listeners (Best and Tyler 2007), a
characteristic which is considered an effective determinant of speech intelligibility.
The involvement of Dutch listeners as non-native speakers of English in the
intelligibility assessment along with native listeners of English (the same test will be
done by native British and American listeners of English in a later chapter) was
intended to determine whether English with an unknown accent is a greater
handicap for non-native than for native listeners, even if the non-native listeners’ L1
is rather similar to the target language.
The participants were also selected on the basis of their language background, such
that all of them speak Dutch as their mother-tongue. There are close similarities
between the linguistic systems of Dutch and English, so that Dutch listeners should
be able to understand English better than most other non-native listeners. Inclusion
of Dutch listeners will allow testing whether there is a difference in intelligibility of
Sudanese-accented English between two groups of non-native listeners: (i) Dutch
listeners, who do not share the L1 with the speakers, and (ii) Sudanese listeners,
who share the speakers’ L1.
78 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
The Modified Rhyme Test (MRT) was used in the experiments. The MRT is considered
to yield a highly accurate and reliable measure of intelligibility (Logan, Greene and
Pisoni 1989). Speech intelligibility measures involve word identification tasks in a closed
set of four alternatives from which the listeners are asked to select the one they think
the speaker intended. The score is the number of correctly responded-to items. Test
items normally target phonemes, multi-phonemes or words. Phonemes refer to vowels
and single consonants, whilst multi-phonemes refer to consonant clusters. The formal
assessment of phonemes and multi-phonemes score the responses as either intelligible
or unintelligible; put in figures, a score of (close to) 100% is interpreted as completely
intelligible performance (Lafon 1966). Word intelligibility, on the other hand, was
determined by the recognition of final words embedded in short redundant SPIN
sentences. SPIN is an acronym for the ‘Speech Perception in Noise’ test (Kalikow,
Stevens and Elliott 1977, Wang and van Heuven 2003, Wang 2007). The test asks
listeners to recognise 25 short meaningful and highly predictable everyday sentences
and write down only the final word embedded in each sentence, as in She wore her broken
arm in a sling (target word underlined). This part of the SPIN test proved to be efficient
at assessing speech recognition abilities (Rhebergen and Versfeld 2005). Although the
listeners’ performance is primarily quantified in terms of number of whole words
correctly recognized, partially correct answers are also important since they give
information about the perception of phonemes in onset, nucleus and coda position.
The experimental stimuli include four tests. These are (i) a vowel test, which is
composed of minimal quartets including short and long vowels as well as diphthongs,
(ii) single consonants in either onset or coda position and (iii) consonant clusters in
onset or coda position. These target sounds were embedded in meaningful C*VC*
words (where C* stands for one to three consonants). Word stimuli in the first three
tests were embedded in a fixed carrier sentence Say…again, which insured a fixed
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 79
intonation with a rise-fall accent on the target word. The vowel and the single
consonant tests contained items on each individual vowel or consonant phoneme in the
RP inventory. Moreover, the consonant test targeted all the consonants in onset
position and in coda position. For the cluster test, the number of test items had to be
limited as the total inventory of onset and coda clusters is very large; including all the
clusters would have been too demanding on the subjects. Nine onset and eight coda
clusters were selected that represent problems to Sudanese-Arabic learners of English
(Kaye 1997, Patil 2006). All items in the tests were chosen such that they occurred in
dense lexical neighbourhoods, i.e. there should be many words in English that differ
from the test item only in the target sounds. For instance, the vowel /+/ was tested in
the word pit, since the /p_t/ consonant frame can be filled in by many other vowels, as
in peat, pet, pat, pot, part, port, put, putt and pout. These so-called lexical neighbours,
differing from the target word in only the identity of the test sound, make up the pool
of possible distracters (alternatives) in the construction of the MRT test. When
selecting the three distracters needed for each test items, I preferably selected lexical
neighbours that differ from the target in only one distinctive feature. For the target pit,
alternatives with vowels that differed from /+/ in just one vowel feature were selected,
i.e. pet (differing in height), put (differing in backness) and pot. The latter alternative
differs from the target in both height and backness; this is preferred to the one-feature
difference in peat (or Pete) as it was decided to exclude proper names and low-frequency
alternatives as much as possible. The full set of test items is included in Appendices 4.1-
4.4.
4.5.2 Recordings
The stimulus sentences were typed on paper sheets (one sheet for each test), and then
read by a male Sudanese EFL leaner and native speaker of RP English. Recordings took
place in a sound-treated room. The speaker’s voice was digitally recorded (44.1 KHz, 16
bits) through a high-quality swan-neck Sennheiser HSP4 microphone. The speakers
were instructed to inhale before uttering the next sentence. The target words were
excerpted from their spoken context using the high-resolution digital waveform editor
contained in the Praat speech processing software (Boersma and Weenink 1996). Target
words were cut at zero-crossings to avoid clicks at onset and offset. Target words and
SPIN sentences were then recorded onto Audio CD in seven tracks. The first track
contained two practice trials for the vowel test and was followed by track 2, which
contained the 19 test vowel items. Tracks 3 and 4 contained the practice and test trials
for the single consonant tests and tracks 5 and 6 contained the cluster items. Track 7
comprised the 25 SPIN sentences with no practice items. In the single consonant and
cluster tests, trials targeting onsets preceded the items targeting codas. Other than that,
the order of the trials within each part of the test battery was random. Trials were
separated by a 5-second silent interval. After every tenth trial, a short beep was
recorded, to help the listeners keep track on their answer sheets.
80 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
The stimuli were presented over loudspeakers in a small classroom that seated ten
listeners. The listeners were given standardized written instructions and received a set
of answer sheets that listed four alternatives for each test item. They were instructed for
each trial to decide which of the four possibilities listed on their answer sheet they had
just heard on the CD. They had to tick exactly one box for each trial and were told to
gamble in case of doubt. Alternatives were listed in conventional English orthography.
In the final test (SPIN), listeners were instructed to write down only the last word of
each sentence that was presented to them. There were short breaks between tests and
between presenting the practice items and test trials. Subjects could ask for clarification
during these breaks in case the written instructions were not clear to them.
I will now present the results of the test battery in four sections, one for each test. Each
section will first outline the structural differences between the sounds in the source
language, Sudanese Arabic (SA), and in the target language, RP English. Such com-
parisons may help understand why certain English sounds are difficult for Sudanese
learners and others are not.
4.6.1 Vowels
4.6.1.1 Results
Results in Figure 4.1 include the means of the perception test of English vowels
responded to by ten Dutch listeners. Dutch listeners had low scores in perceiving /3,
#7, G/ produced by Sudanese speakers of English, whilst they totally misidentified the
short vowels /G, n/ and the diphthong /G+/. However, they made few errors in
recognizing the vowels /+Ö, «Ö, 7, WÖ, 7/ and even fewer errors were made in the
perception of /#Ö/ and /C+/. These types of perception errors, which cover all short and
long vowels as well as diphthongs, indicate that English vowels spoken by Sudanese
university students of English are less intelligible to Dutch listeners. Several factors may
cause these perception problems, which will be discussed later. It is noteworthy that
Dutch listeners identify /+, nÖ, n+, +/ with no errors.
On the other hand, means in Figure 4.1 show that Dutch listeners have a higher
identification rate of the English vowels spoken by native speakers of English than that
of Sudanese EFL learners; overall perception rate is 88% against 50% when the
listeners were exposed to Sudanese speakers. More specifically, Figure 4.1 shows that
English vowels /¡, «Ö, n, +Ö, #Ö, G+, C+, +, nÖ, n+/ were perfectly perceived and few errors
were made in the recognition of /7, WÖ, 7, G, #7/, whilst the front short vowel /G/ was
hardly recognized. Interestingly, the correct scores of the listeners at issue are strikingly
different; their perception is low with Sudanese speakers and high with the native
speakers of English. The error patterns of the listeners with the two speaker groups
present interesting parallel cases.
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 81
Figure 4.1 Mean percent correct identification of English vowels by ten Dutch listeners. The
vowels were spoken by a Sudanese (‘non-native’) and a native RP speaker of English.
Tables 4.1 and 4.2 are confusion matrices. They provide a numerical account of the
correct scores and the confusions made by Dutch listeners when they heard English
vowels spoken by Sudanese EFL learners and the native English speakers, respectively.
The tables show the correct scores along the diagonal in the tables with the problematic
vowels in the off-diagonal cells. Table 4.1 includes the perception data of vowels
spoken by Sudanese speakers, whilst Table 4.2 includes the data of vowels spoken by
native speakers of English. Missing responses occurred with three stimulus vowels /G+,
¡,G/ (20%, 10%, 10%, respectively). These have been omitted from Table 4.1 to make
it easy to read. In Table 4.1 the vowels /+Ö, 7, WÖ, G, «Ö, ', 3, 7/ form the most
problematic areas, whilst in Table 4.2 /7, WÖ, G, G, 3/ were highly confused vowels. The
tables also show that listeners misidentify the vowel /#7/ as /7/ with both speakers (in
70% of the Sudanese speaker and in just 10% of the native RP token).
82 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 4.1. Confusion matrix of 18 English vowels and diphthongs spoken by a Sudanese EFL
speaker (in the rows) and perceived by ten Dutch listeners (in the columns). Correct responses
are on the main diagonal, indicated in bold face. (Confusions t 3 are indicated in grey-shaded
cells). The vowel /7/ should have been presented but was not. Three responses are missing (see
text).
Perceived RP vowels
Target
¡ «Ö #Ö 3 #7 C+ G G G+ + KÖ + n nÖ n+ 7 WÖ 7 7
¡ 9
«Ö 5 4 1
#Ö 9 1
3 4 3 1 2
#7 3 7
C+ 7 2 1
G 0 7 3
G 6 3
G+ 5 0 3
+ 10
KÖ 4 6
+ 10
n 2 0 8
nÖ 10
n+ 10
7 4 6
WÖ 5 5
7 4 6
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 83
Table 4.2 Confusion matrix of 18 English vowels and diphthongs spoken by a native speaker of
RP English (in the rows) and perceived by ten Dutch listeners (in the columns). Correct
responses are on the main diagonal, indicated in bold face. (confusions t 3 are indicated in grey-
shaded cells). The vowel /7/ should have been presented but was not.
Perceived RP vowels
Target
¡ «Ö #Ö 3 #7 C+ G G G+ + +Ö + n nÖ n+ 7 WÖ 7 7
¡ 10
«Ö 10
#Ö 10
3 10
#7 9 1
C+ 10
G 3 7
G 2 8
G+ 10
+ 10
+Ö 10
+ 10
n 10
nÖ 10
n+ 10
7 7 3
WÖ 3 7
7 3 7
More perception errors of English vowels were made by Dutch listeners when they
heard Sudanese EFL learners. There were interchangeable substitutions of the English
vowels /7~WÖ, +~+Ö, 3~G/, which may be attributed to the influence of the listeners’ L1
vowel inventory. Collins and Mees (1981) confirmed that the English tense and lax
pairs /WÖ~7/ and /3~G/ are the most difficult vowel sounds for Dutch listeners/
speakers to produce/emulate. Confusions of these English vowel pairs frequently occur
because there are no similar vowel sounds in their L1. Interestingly, Wang (2007)
reported similar results where Dutch listeners repeatedly confuse /+~+Ö, 7~WÖ, 3~G/
when they listen to Chinese speakers of English due to the lack of a clear category
boundary between /3/ and /G/ and because of the differences that exist between the
speakers’ L1 and L2. Interestingly, the Dutch listeners repeated similar perception
errors: /7~WÖ, 3~«, G~«Ö,#7~7/ when responding to the native RP speaker, although,
84 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
obviously, the number of perceptual confusions in the latter case was much smaller. It
would appear, therefore, that the difference in intelligibility of the native RP and
Sudanese-Arabic speakers for Dutch listeners is the joint product of an incorrect
representation of the English vowel system both on the part of the Sudanese speakers
and of the Dutch listeners.
The tense vs. lax perception errors such as /+Ö~+, WÖ~7/ are probably caused by the
duration difference between English and Arabic. However, acoustically this claim
seems to be less probable. This is because the long/short vowels of the Sudanese
speakers’ L1 (Arabic) show correspondence to the English tense-lax vowels. Therefore,
it is possible to classify such types of errors as by-products of the incorrect English
vowels produced by Sudanese speakers, which probably resulted from the wrong
realization or implementation of the English vowels. The wrong realization of English
vowels can be attributed to interference of the Sudanese speakers’ L1 (Munro 1993,
Munro, Derwing and Morton 2006). In a related study, Bobda (2000) found that
Sudanese speakers render English vowels /«Ö/ to /« or G/ and /G+/ to /G/ due to their
L1 linguistic background. Actually, the incorrect production of the central and back
English vowels represents frequent types of errors among Arabic speaking groups,
which probably occurs due to the total absence of these types of vowels from Arabic
vowel inventory (see Brett 2004). These findings indicate that Sudanese speakers of
English have difficulty learning central and back English vowels.
However, comparatively, the findings reveal an advantage for the native speakers of
English who are clearly more intelligible to Dutch listeners than the Sudanese speakers.
The close relationship between English and Dutch vowel inventories, which
correspond to some extent in terms of number vowels phonetic features may partly
explain the difference (Wang and Van Heuven 2004). In addition to a linguistic ad-
vantage, however, Dutch listeners have had more exposure to English speech in every
day life, which represents a kind of systematic practice of English. This circumstance
enables the listeners to overcome many of the learning difficulties that might be
experienced by non-native speakers of English lacking exposure to English.
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 85
4.6.2 Consonants
4.6.2.1 Results
Figure 4.2 presents the correct identification of English consonants in a perception test
done by ten Dutch listeners.
Figure 4.2 Correctly identified English consonants in a perception test done by ten Dutch
listeners. The results are shown separately for consonants produced by the Sudanese-accented
and the native RP speaker.
Generally, listeners’ performance on the consonants is better than on the vowels; the
mean vowel intelligibility of the Sudanese EFL tokens and that of the native speaker of
English is 50 and 88%, respectively. For the consonants, correct identification scores
are 78% and 81% for consonants spoken by Sudanese and 100% and 99% for
consonants spoken by native speakers of English, in onset and coda positions,
respectively. Dutch listeners, therefore, made more perception errors when they
responded to English consonants spoken by Sudanese speakers. In onset position,
frequent substitution errors were obtained in consonant pairs /F~V/, /&~\/, /6~U/,
/I~M/ and /P~N/. Fewer errors were made in the perception of /F, D, H, X, Y, V5, P/,
86 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
where /F/ was misperceived as /R/, /D/ as /X/ or /H/ interchangeably, /H/ as /X/ or
/Y/, /R/ was misperceived as /V5/ and /P/ as /N/. However, the listeners performed
better on coda consonants. The most frequent error patterns for codas are the
substitution of the obstruent pairs /&~\/, /M~I/, /6~U/, /V~F/, /U~\/ and /0~P/.
Although the error rates are low, they are systematic and revealing: listeners often
repeated the same types of perception errors in both onset and coda positions,
particularly with Sudanese speakers. On other the hand, Figure 4.2 shows that Dutch
listeners had nearly perfect perception of the English consonants spoken by native
speakers, particularly for onset consonants. Only 10 percent of the perception errors
were made where /&/ was replaced by /F/. However, the listeners made more errors in
coda consonants. The nasal /P/ was misidentified as /0/ and /O/, /&/ was replaced by
/V/, /M/ by /I/ and less frequently /\/ was replaced by /U/. These results indicate that
Dutch listeners found the native speakers of English more intelligible than Sudanese
speakers.
Tables 4.3-4-5-6 present a numerical account of the confusion structure in the per-
ception of the English consonants. Tables 4.3 and 4.4 show the correct identification of
the English consonants of ten Dutch listeners read by Sudanese speakers in both onset
and coda positions. Tables 4.5 and 4.6 display the percentage of the same listeners in
the same perception test but with the items spoken by the native speakers of English.
The correct identification appears along the diagonal running across the table, while the
incorrect scores are in the off-diagonal cells. An interesting finding is that listeners
made more perception errors with coda consonants, irrespective of the native versus
non-native background of the speaker. The tables also show that, irrespective of the
speaker, Dutch listeners found the English onset consonants spoken by the Sudanese
group more difficult than coda consonants.
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 87
Table 4.3 Confusion matrix of English onset consonants spoken by a Sudanese EFL speaker (in
the rows) and perceived by ten Dutch listeners (in the columns). Correct responses are on the
main diagonal, indicated in bold face. Confusions appear in off-diagonal cells (confusions t 3 are
indicated in shaded cells).
Perceived RP consonants
Target
D F & F< H I J M N O P R T U 5 V 6 V5 X Y \
D 9 1
F 0 1 9
& 3 7
F< 10
H 1 9
I 5 4 1
J 10
M 10
N 10
O 10
P 4 6
R 8 2
T 10
U 10
5 10
V 10
6 7 3
V5 10
X 2 1 5 2
Y 10
\ 10
88 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 4.4 Confusion matrix of English coda consonants spoken by Sudanese EFL learners (in the
rows) perceived by ten Dutch listeners (in the columns). Correct responses are on the main
diagonal, indicated in bold face. Confusions appear in off-diagonal cells (confusions t 3 are
indicated in shaded cells).
Perceived RP consonants
Target
D F & F< H I M N O P 0 R U 5 V 6 V5 X \
D 9 1
F 10
& 5 5
F< 10
H 10
I 8 2
M 2 3 4 1
N 1 9
O 1 9
P 10
0 1 9
R 2 8
U 10
5 10
V 3 7
6 6 3 1
V5 10
X 10
\ 5 5
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 89
Table 4.5 Confusion matrix of English onset consonants spoken by a native speaker of RP
English (in the rows) and perceived by ten Dutch listeners (in the columns). Correct responses
are on the main diagonal, indicated in bold face. Confusions appear in off-diagonal cells.
Perceived RP consonants
Target
D F & F< H I J M N O P R T U 5 6 X Y \
D 10
F 10
& 1 9
F< 10
H 10
I 10
J 10
M 10
N 10
O 10
P 10
R 10
T 10
U 10
5 10
6 10
X 10
Y 10
\ 10
90 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 4.6 Confusion matrix of English coda consonants spoken by a native speaker of RP
English (in the rows) and perceived by ten Dutch listeners (in the columns). Correct responses
are on the main diagonal, indicated in bold face. Confusions appear in off-diagonal cells
(confusions t 3 are indicated in shaded cells).
Perceived RP consonants
Target
D F & F< H I M N O P 0 R U 5 V 6 V5 X \
D 10
F 10
& 8 2
F< 9 1
H 10
I 10
M 1 9
N 1 9
O 9 1
P 3 7
0 10
R 1 9
U 10
5 10
V 10
6 8 2
V5 10
X 10
\ 1 9
The replacement errors /V~F, M~I, H~X, P~N, \~U/ indicate similarity in the place of
articulation between such tokens. The errors might be caused by the unfamiliarity of
Dutch listeners with Sudanese English. It is also possible to refer these types of English
consonant perception errors to different voicing contrasts utilized in their production-
Arabic consonant inventory; i.e. absence of energy required for English voiceless
consonants. The latter reasoning applies particularly to the misperceptions of the
English /V/ as /F/, /&/ as /\/ and /6/ as /U/, where the speakers’ L1 (Arabic) transfer
acts as a barrier that blocks the acquisition of the L2 consonants and passes only Arabic
speech sounds, i.e. the L1 filter effect. Previous studies revealed that many Arabic
speakers of English have difficulty producing /6, &, U, \/ due to L1 interference (Altaha
1995, Rababah 2003, do Val Barros 2003). A good example of L1 interference, has
been observed in a recent study which revealed that the boundaries between fricative
and dental fricative pairs /V~F, &~\, 6~U/, in Sudanese colloquial Arabic, have almost
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 91
become blurred (Dickins 2007). This is most probably the reason why Dutch listeners
substitute these fricatives. Interestingly, this conclusion accounts for the repetition of
the same error patterns made by Dutch listeners for the onset and coda consonants that
were produced by the Sudanese speakers. More interestingly, these error patterns did
not occur at all when the English consonants were produced by the native speaker of
English.
The misperception of /H/ as /Y/ can be explained in orthographic terms assuming that
Dutch /Y/ is treated as /X/ because of the absence of an energy contrast – not
consistent or totally absent – in Dutch /H/ or /X/. Phonologically, the interchangeable
substitutions of /H/ for /X/ are seen as the result of an L1 filter in the Dutch listeners’
perceptual and productive sound inventory. This is because the contrast between these
bilabials is not a matter of a fortis/lenis (voiced~voiceless) property (Collins and Mees
1981). The misidentification of English /P/ as /O/ which were pronounced by native
speakers of English in word-final position, can be attributed to the wrong realization of
these phonemes by Dutch listeners. The reason for this is that Dutch and English
nasals have different phonetic characteristics in word-final position, i.e., Dutch nasals
occurring in word-final retain their voicing feature but English final nasals do not or
only partially (see Tucker and Warner 2010). Interestingly, these perception errors do
not occur when the same English target sounds were read by Sudanese speakers –
which may be due to an interlanguage benefit between Dutch listeners and Sudanese
speakers.
4.6.3.1 Results
Figures 4.3 and 4.4 present the correctly identified English consonant clusters spoken
by Sudanese and native speakers of English in a perception test done by ten Dutch
listeners.
92 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Figure 4.3 Correctly identified English onset consonant clusters by ten Dutch listeners spoken by
a Sudanese and a native RP speaker of English.
As the results in Figures 4.3 and 4.4 show, Dutch listeners achieved better performance
on English cluster consonants than on vowels and single consonants. Their perform-
ance was even better when they were exposed to the native speakers of English than to
the Sudanese speakers. The overall means of the vowels are 50% and 88%, onset and
coda consonants 78% and 81% against 99% and 99% and onset and coda cluster
consonants are 86% and 81% against 100% and 91% for Sudanese and native speakers
of English, respectively. Onset clusters spoken by the native speaker were perfectly
identified by the listeners except an incidental error rate of 10% made in the perception
of /URT/. However, more substitution errors /FT/ for /DT/ and /IT/, /IN/ for /MN/, /RN/
for /HN/, /UN/ for /UP/, and /URN/ for /UMY/ were made by the listeners when they heard
the same onset clusters spoken by Sudanese speakers. These findings indicate that the
onset consonant clusters read by the native speakers of English are more intelligible to
Dutch listeners than those of the Sudanese speakers are. They also show that the
misperception within the cluster pairs /MN~IN/, /UN~UP/ and /RN~HN/ is revealing and
more systematic than /FT~DT, IT/ on the one hand, and /URN~UMY/ on the other.
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 93
Figure 4.4 Correctly identified English coda consonant clusters by ten Dutch listeners spoken by
a Sudanese and a native speaker of RP English.
In Tables 4.7 and 4.9 there are relatively few errors in the perception of the onset
clusters, whether spoken by both Sudanese or the native speaker of English. Just one
single misperception occurred with the native speaker: /URT/ was perceived as /URN/.
However, a few more errors were made by Dutch listeners when the English clusters
were produced by the Sudanese speakers, as Table 4.7 shows. These clusters often
contain /I/ as the first element of either the stimulus or the response.
On the other hand, Tables 4.8 and 4.10 show that Dutch listeners made more
perception errors in the perception of the coda clusters produced by Sudanese EFL
learners and native speakers of English. The data also show similar patterns of
perception errors for coda clusters, irrespective of the speaker type. The listeners
94 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
substituted /P\/ for /O\/ and less frequently /DF/ for /NV/ or /NF/. The listeners were
also observed to mistake /0M/ for /PF/, /UV/ for /UM/, /PV/ for /OR/ and /P\/ for /VU/
or /F</. However, the error rates of Dutch listeners in the cluster items spoken by
Sudanese EFL learners are high, particularly for coda clusters. These findings reveal
that there is a positive relation between the listeners’ performance in single and cluster
consonants rather than between the clusters and vowels (see further § 4.6.5).
Table 4.7 Confusion matrix of English onset consonant clusters spoken by a Sudanese speaker of
English (in the rows) and perceived by ten Dutch listeners (in the columns). Correct responses
are on the main diagonal, indicated in bold face. Confusions appear in off-diagonal cells
(confusions t 3 are indicated in shaded cells)..
Table 4.8 Confusion matrix of English coda consonant clusters spoken by a Sudanese speaker of
English (in the rows) and perceived by ten Dutch listeners (in the columns). Correct responses
are on the main diagonal, indicated in bold face. Confusions appear in off-diagonal cells
(confusions t 3 are indicated in shaded cells).
Table 4.9 Confusion matrix of English onset consonant clusters spoken by a native speaker of
RP English (in the rows) and perceived by ten Dutch listeners (in the columns). Correct
responses are on the main diagonal, indicated in bold face.
Table 4.10 Confusion matrix of English coda consonant clusters spoken by a native speaker of
RP English (in the rows) and perceived by ten Dutch listeners (in the columns). Correct
responses are on the main diagonal, indicated in bold face. Confusions appear in off-diagonal
cells (confusions t 3 are indicated in shaded cells).
4.6.3.2 Discussion
Errors made by Dutch listeners in the perception of the velar /MN~IN/ and alveolar /UN
~UP/ initial cluster members spoken by Sudanese subjects probably occur due to
insufficience or absence of energy required for voiceless sounds. Moreover, the error
pattern /FT~DT/ and /URT~URN/ can be attributed to the similarity in the manner of
articulation, or to voicing, whilst the /RN~HN/ misperception can be seen as being due to
labiality shared by the first cluster members. Additionally, the listeners’ errors in the
initial member of coda clusters /P\~O\/ are most likely caused by nasality. Dutch
listeners were observed to repeat similar types of errors with both Sudanese and native
96 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
speakers, which indicates that these error patterns have to do with the fact that the
listeners are Dutch. Despite the fact that English and Dutch different phonological
systems, they are closely related to one another. Consequently the Dutch listeners
showed better performance on English clusters produced by native speakers than on
their counterparts produced by Sudanese speakers. Wang (2007) reported similar
conclusions and argued that that Dutch listeners achieved better performance on the
English clusters produced by Americans due to the rather close linguistic similarity that
exists between the Dutch listeners’ L1 and the L2.
Figure 4.5 presents the scores of ten Dutch listeners obtained on the SPIN test, the
items of which were read by both Sudanese (left-hand part of figure) and native
speakers of English (right-hand part).
Figure 4.5 Mean percentage of correctly recognised words (CW) by ten Dutch listeners obtained
in a SPIN test. Also, percentages of correctly identified word components (onset, vocalic nucleus,
coda) are indicated. Items were read by one Sudanese EFL learner (left-hand part of figure) and
one native speaker of English (right-hand part).
As Figure 4.5 shows, Dutch listeners had a poor perception of keywords in simple and
predictable English sentences that reached 27% when the sentences were spoken by the
Sudanese speaker. However, the listeners had a better performance of 70% on the same
materials read by the native speaker. Similarly, they had lower scores on onset, nucleus
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 97
and coda positions in the SPIN items read by the Sudanese speaker (68%, 51% and
42%, respectively) against higher scores (96%, 91% and 76%) when the same SPIN
items were read by the native speaker. These results indicate that the SPIN sentences of
native speakers are more intelligible to Dutch listeners than the sentences of Sudanese
speakers. The results also reveal that Dutch listeners of English managed to recognize
many sounds in the words correctly even if the failed to recognize a keyword in its
entirety. The onsets were perceived more accurately than the vowels and the codas,
which observation ties in with the results of the MRT tests.
Moreover, the findings indicate that onsets consonants, whether single or in clusters,
were identified more successfully than vowels and codas. This implies that the listeners’
performance is always better when they hear native speakers.
4.6.5 Correlations
Tables 4.11-12 present the correlations between the four parts of this study. These
parts include vowels, single and cluster consonants and words as whole units of English.
The tables show how the perception of English vowels and consonants is correlated in
the MRT items and with the their counterparts (segments, clusters and whole words) in
the SPIN sentences.
Table 4.11 Correlation matrix of dependent variables (identification scores) for materials
produced by a Sudanese-Arabic EFL speaker.
consonants
word_nuc
cons._ons
word_ons
clust_cod
cons_cod
clust_ons
word_all
clusters
vowels
consonants .154
cons_onset .336 .944
cons_coda –.329 .688
clusters -.489 –.329 –.498 .188
clust_onset –.630 –.385 –.551 .153 .951
clust_coda –.266 –.224 –.375 .205 .933 .777
word_all .359 –.060 –.006 –.179 –.297 –.270 –.292
word_onset .818 .309 .485 –.232 –.623 –.699 –.456 .569
word_nucleus .461 .217 .269 –.014 –.144 –.322 .080 .659 .604
word_coda .138 –.274 –.274 –.165 .239 .315 .122 .285 .263 –.137
Bolded r > .8: p d 0.01 (2-tailed).
Bolded r < .7: p d 0.05 (2-tailed).
98 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 4.12 Correlation matrix of dependent variables (identification scores) for materials
produced by a native speaker of RP English.
consonants
word_nuc
word_ons
clust_cod
cons_cod
cons_ons
clust_ons
word_all
clusters
vowels
consonants –.186
cons_onset .171 –.021
cons_coda –.211 .982 –.207
clusters .278 –.078 –.444 .000
clust_onset a a a a a
clust_coda .278 –.078 –.444 .000 1.000 a
word_all –.454 –.186 –.032 –.176 –.284 a –.284
word_onset –.046 .003 .035 .000 –.035 a –.035 .379
word_nucleus –.420 –.175 –.212 –.131 –.230 a –.230 .945 .292
word_coda –.446 –.215 .014 –.216 –.087 a –.087 .939 .410 .811
aNo r can be computed because at least one of the variables is constant (perfect scores only).
Bolded r > .8: p d 0.01 (2-tailed).
The computation of the correlation coefficient of vowels, single and cluster consonants
and SPIN sentences provides statistical support. A positive relationship exists between
correct identification of nuclei and words, r = .659 (ins.) and .945 (p < .01) produced by
Sudanese and native speakers, respectively. This relationship reveals that listeners
usually get a correct word score whenever a nucleus vowel is correctly recognized,
which in turn indicates that the perception of vowels is a decisive factor of word
predictability. However, there is a negative relationship between the coda consonants
and word codas spoken by both Sudanese and native speakers: r = –.165 and –.216,
respectively. This relation indicates that when Dutch listeners make perception errors
on consonant codas they also tend to make perception errors on word codas with both
speakers. It suggests single and cluster coda consonants are more difficult to perceive
than their onset counterparts. A weak positive correlation exists between vowels and
onset consonants: r = .336 and .171 for Sudanese and native speakers of English,
respectively. It gives rise to the prediction that the correct identification of English
vowels assumes correct identification of English onset consonants. However, these
weak positive or negative relations imply a rather unstable performance on the part of
the Dutch listeners when they are exposed to Sudanese speakers. This can be
interpreted as a by-product of incorrect English pronunciation of Sudanese speakers.
That is, incorrect pronunciation of some CVC stimuli changes their meaning, which
influences their predictability. Data of a similar test (Wang 2007) supports the claim
that correct identification of Dutch listeners responding to Chinese EFL speakers is
poorer (32%) than when responding to native speakers of English (67%). Dutch
CHAPTER FOUR: INTELLIGIBILITY OF SUDANESE ENGLISH TO DUTCH LISTENERS 99
listeners had a high word correct percentage due to more exposure to English than the
either the Sudanese or Chinese listener groups. Moreover, linguistically their L1 norm is
much closer to English than that of the Sudanese and Chinese listeners. The Chinese
and Sudanese-Arabic L1 linguistic systems are entirely unrelated to English, the former
being a Sino-Tibetan and the latter a Semitic language. Previous studies, which
measured the perceptual similarity between languages on the basis of their overall
sound structure, found that the mean distance of Dutch from English is 3.7 and that
the proximity of Dutch to English is based on known genetic and structural similarities.
According to the same study Arabic is 12.5 distance units away from English, which
labels it as the farthest language from English and Dutch compared to other languages
(Bradlow, Clopper and Smiljanic 2007). In conclusion, the findings reveal that the
perception of vowels and coda consonants are more difficult for Dutch listeners than
single and cluster consonants.
4.6.6 Conclusions
Dutch listeners made more perception errors on English central and back vowels read
by Sudanese speakers than with those of the native speakers, probably due to incorrect
English source vowels. These type of vowels are absent from the Sudanese speakers’ L1
(Arabic) vowel inventory.
Similar perception errors were experienced in perceiving English onset and coda
consonants produced by Sudanese speakers. The English fricatives /&, \, 6,U/ proved to
be problematic for Dutch listeners. These types of perception errors can be interpreted
as a by-product of partial learning or insufficient practice.
Generally, fewer errors were made on the level of consonant clusters. Moreover, the
listeners made even fewer perception errors of English single and clustered consonants
when the material was read by a native speaker of English.
Dutch listeners found native speakers of English more intelligible than Sudanese
speakers because English and Dutch are closely related languages with rather similar
sound systems. Secondly, Dutch listeners have regular exposure to target language,
which facilitates learning of English. Thirdly, they are not familiar with Sudanese-
accented English.
5.1 Introduction
Researchers need to test in greater detail the ways in which non-native speech of
English varies from that of the native speakers and to determine the extent to which
such variation can impede intelligibility between the speech interlocutors. A task such
as this requires looking at the phonetic and phonological difference between L1 and L2
to find out which segmental variations are possible and how they can impede or
enhance speech intelligibility. This is often necessary since phonemic variation between
languages has negative effects on the learning of L2 speech; i.e. many studies of non-
native speech indicated the potential for reduced comprehension, particularly when
actual practice of the second/foreign language is infrequent. According to Jenkins
(2000), (incorrect) habit formation is one of the major factors responsible for
intelligibility problems where the muscular habits that are always operated to produce
the L1 speech sounds, are automatically activated in L2 production. This process
requires non-native speakers to pay more attention to produce accurate speech.
However, as soon as these speakers release control to focus on the content of the
message, they produce erroneous pronunciation. This situation continues until
sufficient practice leads to the mastery of L2 sounds that are phonetically different
from those of the L1. However, incorrect speech habits are not the underlying cause of
the pronunciation problems in foreign-accented speech. The incorrect production of
L2 speech sounds occurs due to categorical differences between L1 and L2, where non-
native speakers use incorrect perceptual representations (normally L1 sounds) for the
production of L2 sounds (Flege 1976). Many L2 speakers of English fail to distinguish
between phonemic and allophonic sounds of English, or they often conflate or confuse
certain speech sounds as result of differences between L1 and L2. For example, Arabic
speakers of English conflate /D/ and /R/, because the latter has no phonological
representation in Arabic (Cruttenden 2008, Flege 1976). Similar problems occur among
Russian speakers, who confuse clear /N/ as in leaf, black and lose and dark / /as in pool,
full and milk, which form contrastive phonemes in Russian, but are allophones in
English.
102 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
5.2 Objective
The results of this chapter will provide useful feedback on how well the Sudanese-
Arabic EFL learners (at the university level) are understood by listeners in the target
population, i.e. by native listeners of English. In doing so, this study attempts to
account for issues such as what English speech sounds are problematic and what
linguistic elements cause of such problems. Therefore, the study provides cognitive
insights into the nature and the causes of error patterns detected in the investigated area.
5.3 Method
The Modified Rhyme Test (MRT) was used in the experiments. The MRT is considered
to be a highly accurate and reliable measure of intelligibility (Logan, Greene and Pisoni
1989) at the phoneme level. Speech intelligibility measures involve word identification
tasks in closed sets of four alternatives, where the listeners are asked to select the
response they think the speaker intended. The score is the number of correctly
responded-to items. Test items normally target phonemes, multi-phonemes, or words.
Phonemes refer to vowels and single consonants, whilst multi-phonemes refer to
consonant clusters. Phoneme and multi-phoneme responses are scored as either
intelligible or unintelligible. A score of (close to) 100% is interpreted as completely
intelligible performance (Lafon 1966).
Word intelligibility, on the other hand, was established by having listeners recognise 25
keywords, each one embedded in final position in a short everyday sentence taken from
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 103
5.3.2 Participants
The study participants were ten Sudanese university students in the Department of
English at Gadarif University in the Sudan. The learners involved in these experiments
specialized in English language teaching (TEFL). They had studied for six semesters
when they participated in the test. During the period of study, which extends over four
years, students attended three courses in the field of pronunciation; these are (i) an
introduction to phonetics, (ii) phonology and (iii) practical phonetics, delivered in three
subsequent semesters. They also attended two classes on English listening skills, which
usually took place in semesters one and three. English is treated as a foreign language
(not a second language), the learning of which starts in the fifth year of primary school
and continues at secondary schools for three years. English lessons obtained during
these stages vary between 5 and 6 hours per week; English is treated as a school subject
that provides basic principles of the language in a traditional way of language teaching.
A Sudanese model speaker was selected by means of a quality sound test from among a
number of 11 Sudanese speakers of English. The quality sound test was administered
through the internet. Candidates of different nationalities were invited to listen to the
recordings of the 11 speakers and then assess the sound quality of the speakers by
clicking on one of the grade options provided. Assessment of the speakers’ sound
quality depended on the computation of the total mean of the results of each speaker in
the test wherein the speaker with the mean judgment score closest to the grand mean
was chosen as the representative learner.
In the control part of the study a single male native speaker of English (RP accent) was
used as a model speaker of English. He was asked to read out stimulus items which
include vowels, single and cluster consonants of English, as well as the SPIN sentences.
104 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
The group of native English listeners comprised ten British and ten American speakers
of English preparing for BA or MA degrees in various academic disciplines at Leiden
University. Listeners were recruited by means of online or poster invitation. The
subjects were asked to fill in short questionnaire before they started answering the
perception test. In the questionnaires, they provided information about their
nationalities as either British or American speakers of English and their linguistic
backgrounds. Moreover, the listeners did not speak Arabic, which represents the first
language of the Sudanese speakers involved in the experiments. Moreover, all
respondents declared that they were unfamiliar with English spoken with a Sudanese-
Arabic accent. All respondents used their first language on a daily basis, within their
expatriate communities. Some subjects, friends or family of some of the Leiden-based
students, did the experiments online in their home country.
The experimental stimuli included four tests. These were (i) a vowel test, which was
composed of minimal quartets including short and long vowels as well as diphthongs,
(ii) single consonants in either onset or coda position and (iii) consonant clusters in
onset or coda position. These target sounds were embedded in meaningful C*VC*
words (where C* stands for one to three consonants). The fourth test comprised 25
sentences taken from the high-predictability set included in the SPIN (Speech
Perception in Noise) test (Kalikow, Stevens and Elliott 1977, also see above). Word
stimuli in the first three tests were embedded in a fixed carrier sentence [Say…again],
which insured a fixed intonation with a rise-fall accent on the target word. The vowel
and the single consonant tests contained items on each individual vowel or consonant
phoneme in the RP inventory. 15 Moreover, the consonant test targeted all the
consonants in onset position and in coda position. For the cluster test, the number of
test items had to be limited as the total inventory of onset and coda clusters is very
large; including all the clusters would have been too demanding on the listeners. Nine
onsets and eight coda clusters were selected that represent problems to Sudanese-
Arabic learners of English (Allen 1997, Patil 2006). All items in the tests were chosen
such that they occurred in dense lexical neighbourhoods, i.e. there should be many
words in English that differ from the test item only in the target sounds. For instance,
the vowel /+/ was tested in the word pit, since the /p_t/ consonant frame can be filled
in by many other vowels, as in peat, pet, pat, pot, part, port, put, putt and pout. These so-
called lexical neighbours, differing from the target word in only the identity of the test
sound, make up the pool of possible distracters (alternatives) in the construction of the
MRT test. When selecting the three distracters needed for each of the test items, lexical
neighbours that differ from the target in only one distinctive feature, were preferably
15 Inadvertently, the vowel test did not include an item targeting the vowel /7/ as in boat.
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 105
selected. For the target pit, we selected alternatives with vowels that differed from /+/ in
just one vowel feature, i.e. pet (differing in height), put (differing in backness) and pot.
The latter alternative differs from the target in both height and backness; we preferred
this to the one-feature difference in peat (or Pete) as we decided to exclude proper names
and low-frequency alternatives as much as possible. The full set of test items is included
in Appendix 4.2.
The stimulus sentences were typed on sheets of paper (one sheet for each test), and
then read by 11 male Sudanese EFL learners (see above) and one native speaker of RP
English. One representative Sudanese speaker was selected from the larger group of 11
by means of a quality sound test (see § 5.3.2.1). The native speaker of English was a
British male candidate who was selected as a model speaker of RP English (see §
3.2.2.2). Recordings took place in a sound-treated room. The speaker’s voice was
digitally recorded (44.1 KHz, 16 bits) through a high-quality swan-neck Sennheiser
HSP4 microphone. The speakers were instructed to inhale before uttering the next
sentence so that clear recording is achieved. The target words were excerpted from
their spoken context using the high-resolution digital waveform editor included in the
Praat speech processing software (Boersma and Weenink 1996). Target words were cut
at zero-crossings to avoid clicks at onset and offset. Target words and SPIN sentences
were then recorded onto Audio CD in seven tracks. The first track contained two
practice trials for the vowel test and was followed by track 2, which contained the 19
vowel test items. Tracks 3 and 4 contained the practice and test trials for the single
consonant tests and tracks 5 and 6 contained the cluster items. Track 7 comprised the
25 SPIN sentences with no practice items. In the single consonant and cluster tests,
trials targeting onsets preceded the items targeting codas. Other than that, the order of
the trials within each part of the test battery was random. Trials were separated by a 5-
second silent interval. After every tenth trial a short beep was recorded, to help the
listeners keep track on their answer sheets.
The stimuli were presented over loudspeakers in a small classroom that seated ten
listeners. Subjects were given standardized written instructions and received a set of
answer sheets that listed four alternatives for each test item. They were instructed for
each trial to decide which of the four possibilities listed on their answer sheet they had
just heard on the CD. They had to tick exactly one box for each trial and were told to
gamble in case of doubt. Alternatives were listed in conventional English orthography.
In the final test (SPIN), subjects were instructed to write down only the last word of
each sentence that was presented to them. There were short breaks between tests and
between presenting the practice items and test trials. Subjects could ask for clarification
during these breaks in case the written instructions were not clear to them.
106 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
5.5.1 Vowels
This section will present the results of the test battery in four sections, one for each test.
Each section will first outline the structural differences between the sounds in the
source language, Sudanese Arabic (SA) and in the target language, RP English. Such
comparisons may help understand why certain English sounds are difficult for
Sudanese learners and others are not.
5.5.1.1 Results
Figure 5.1 presents the total mean correct identification scores obtained by the two
groups of native listeners of English, i.e. ten British and ten American listeners, on the
vowel part of the MRT tests.
Correct vowel identification (%)
Figure 5.1 Correct responses (%) to English vowel tokens of ten British and ten American
listeners. The error bars include ±2 Standard Errors of the mean. The vowels were produced by
one Sudanese and one native speaker of British English.
As Figure 5.1 shows, vowel identification scores for the native listeners (British and
American) are higher when they were exposed to English vowel tokens produced by
the native speaker but lower when the same vowel tokens were read by the designated
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 107
Sudanese speaker. Overall mean correct for the British listeners is 67% and 93% against
65% and 91% for American listeners in the vowel tokens of English, respectively. A
repeated measures analysis of variance (RM-ANOVA) with native language of the
speaker (native, foreign) as a within-subject factor and nationality of the listener (British,
American) as a between-subjects factor shows that only the effect of speaker type is
significant, F(1, 18) = 152.3 (p < .001). The effect of listener and the listener × speaker
interaction are insignificant, F(1, 18) < 1 for both main effect and interaction.
The confusion matrices in Tables 5.1-2 present details about the listeners’ performance
on the vowel identification task. It is obvious from the tables that the listeners found
the English vowels produced by the Sudanese speakers more difficult than those read
by the native speakers. Table 5.1 shows that the British listeners totally misperceived
the English front mid close /G/ as /+/ or – less often – as /KÖ/. The English open /3/
also proved to be difficult for the listeners. It was frequently misheard as /¡/ and less
frequently as /7/. Another type of frequent perception error was the confusion of the
English tense /KÖ/ for its lax counterpart /+/. Moreover, the English tense /KÖ/ was
replaced by /3/ or /G/ but less often. Perception errors involving the central and back
English vowels included the replacement of the English /n/ by /7/ and less often by
/¡/ or /3/, whilst the back low /#Ö/ was substituted for /«Ö/. Other miscellaneous
errors were the misperception of /n/ as /¡/ or /3/ and /nÖ/ as /#Ö/. Similar perception
error patterns were found for the American listeners exposed to the same English
vowel tokens spoken by the Sudanese speaker (see Table 5.2). Interestingly, most of
these errors have to do with the central and back vowels, which implies a systematic
relation with the production of the English source vowels. This relationship will be
discussed later. On the other hand, no serious problems were found when the English
vowels were read by the native speaker. However, the English lax-tense pairs /7~WÖ/,
/+~KÖ/ were often substituted by both British and American listeners.
108 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 5.1 Confusion matrix of English vowels and diphthongs produced by a Sudanese EFL
learner (in the rows) and perceived by ten British listeners (in the columns). Correct responses are
on the main diagonal, indicated in bold face. Confusions ( 30%) are in grey-shaded cells. The
vowel /7/ should have been presented but was not.
Perceived RP vowels
Target
«Ö ¡ #Ö 3 #7 C+ G G G+ + KÖ + n nÖ n+ 7 WÖ 7 W
«Ö 6 1 2 1
¡ 9 1
#Ö 3 7
3 5 3 2
#7 9 1
C+ 10
G 0 9 1
G 2 7 1
G+ 1 3 6
+ 10
KÖ 1 1 5 3
+ 10
n 2 1 0 7
nÖ 1 9
n+ 1 1 8
7 1 9
WÖ 1 1 8
7 2 1 7
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 109
Table 5.2 Confusion matrix of English vowels and diphthongs produced by a Sudanese EFL
learners (in the rows) and responded to by ten American listeners (in the columns). Correct
responses are on the main diagonal, indicated in bold face. Confusions ( 30%) are in grey-
shaded cells. The vowel /W/ should have been presented but was not.
Perceived RP vowels
Target
«Ö ¡ #Ö 3 #7 C+ G G G+ + KÖ + n nÖ n+ 7 WÖ 7
«Ö 5 1 4
¡ 6 4
#Ö 1 8 1
3 7 1 1 1
#7 10
C+ 9 1
G 1 9
G 10
G+ 4 2 4
+ 10
KÖ 1 5 4
+ 1 1 8
n 3 1 6
nÖ 10
n+ 10
7 9 1
WÖ 1 5 4
7 1 9
Most likely many of the errors which were made by the British and American listeners
identifying English vowels produced by Sudanese speakers, have linguistic causes. The
replacement of the English /G/ by /+/ can be attributed to two elements. Firstly, it is
probably triggered by an L1 effect which permits only vowel sounds available in the
Arabic vowel repertoire, viz. /K, C, W/, while it blocks /G/, since the latter is not part of
the Arabic vowel system (see Kopczski and Mellani 1993). This assumption is less
probable, however, since previous studies have shown that Arabic speakers developed
/G/ (Munro 1993, Dickins 2007). 16 Secondly, a replacement error of this type can most
16 Sudanese Arabic also developed monophthongs. These include /G/, which historically
descends from the diphthong /CL/ as in /CLP/ ‘an eye’, which coalesced (merged) in dialects such
as Cairene and Central Sudanese. In Arabic varieties spoken in large parts of the Levant these
110 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Similarly, the misperception of /3/ as /¡/ or /7/ is due to an incorrect English vowel.
That is, Arabic speakers almost always have problems with the pronunciation of the
front open /3/. They tend to pronounce the English /3/ in the same way they
produce their L1 vowel back open lengthened /C/; i.e., in Sudanese and Cairene Arabic
/C/ is pronounced as in [D3ÖD] ‘door’ (Kaye 1997). It is likely this is the reason why
native Arabic speakers are advised to keep the English short vowel /3/ fully front
(Cruttenden 2008). Along similar lines Bobda (2000) concluded that Sudanese speakers
of English fluctuate between /¡, «Ö, 7/ due to interference from their Arabic L1
background. The confusion of lax-tense /+~KÖ/ by the British and American listeners
can also be attributed to an incorrect vowel production that probably resulted from the
wrong implementation of English vowel categories. It is less probable that these
substitution errors are the result of incorrect vowel length in the learners’ L2. This is
because a vowel distinction in both English and Arabic vowel systems is based on
short/long contrasts. However, Munro (1993) reported that the English vowels spoken
by Sudanese Arabic EFL learners are influenced by their L1 (Arabic) vowel system,
which has a short/long vowel contrast that is solely based on quantity. Thus, a
substantial number of subjects pronounced English tense-lax vowels in terms of Arabic
long/short vowel categories. However, this assumption seems to be weak because
short/long contrast is also used in English tense-lax vowel distinction. These types of
errors often happen when the speakers have had relatively little exposure to English
speech.
5.5.2 Consonants
5.5.2.1 Results
Figure 5.2 presents the mean percentage of correctly identified consonants by two
groups of native listeners of English, i.e. ten British and ten American listeners. Again,
vowels are realized as /Gu/ or /nt/. In Sanani and a number of Peninsula dialects, the diphthongs
are maintained in all phonological contexts. Moreover, among some Cairene speakers the mono-
phthongs are shortened in closed syllables to give short /G/ or /n/, hence they are not consider-
ed to be separate vowels (Watson 2002).
17 Actually, the English pronunciation preferences often do not pay attention to the relation
between sounds and letters, as equally as it considers social conventions, then, a sort of balance
would occur. This feature makes English pronunciation a problematical area particularly for non-
native speakers because the relation between letters and sounds is not clear (Wells 1999).
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 111
the MRT items were spoken by a designated representative Sudanese learner of English
and by a native speaker of RP English.
Correct consonant identification (%)
Figure 5.2. Mean correct identification of English onset and coda consonants by 10 British and
10 American listeners of English. The error bars include ±2 Standard Errors of the mean. The
consonants were produced by one Sudanese and one native speaker of British English.
As Figure 5.2 shows, the perception level of the British and American listeners in
English consonants is very high. The overall percentage of correctly identified
consonants by these listeners is 85.0 and 84.8 % when the consonants were produced
by the Sudanese speakers and 99.0% and 99.2% when they were spoken by native
speakers of English. The RM-ANOVA shows that the effect of speaker type is highly
significant, F(1, 18) = 94.5 (p < .001). Moreover, the British listeners showed better
understanding of the English consonants read by the Sudanese speakers, but the
difference is insignificant, F(1, 18) < 1. Furthermore, the level of performance on the
consonants read by the native speaker is almost the same, between the two listener
groups so that the speaker × listener interaction remains insignificant, F(1, 18) < 1. It is
probably because both listener types are native speakers of English. However, a few
English onset and coda consonants were misperceived (see Tables 5.3-4-5-6).
112 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 5.3 Confusion matrix of English onset consonants produced by a Sudanese EFL speaker
(targets, in the rows) and responded to by ten British listeners (in the columns). Correct
responses are on the main diagonal, indicated in bold face. Confusions ( 30%) are in shaded
cells.
D V5 F V & H I J L M N O P R T U 5 6 X Y \
D 10
V5 10
F 2 5 3
V 4 6
& 0 10
H 10
I 8 2
J 1 8 1
L 10
M 10
N 10
O 10
P 9 1
R 1 1 8
T 1 9
U 10
5 10
6 5 5
X 2 7 1
Y 10
\ 10
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 113
Table 5.4 Confusion matrix of English onset consonants produced by a Sudanese EFL speaker
(targets, in the rows) and responded to by ten American listeners (in the columns). Correct
responses are on the main diagonal, indicated in bold face. Confusions ( 30%) are in shaded
cells.
Perceived RP consonants
Target
D V5 F V & H I J F< M N O P R T U 5 6 X Y \
D 10
V5 10
F 0 10
V 5 5
& 0 10
H 10
I 10
J 1 9
F< 10
M 10
N 10
O 10
P 10
R 10
T 10
U 10
5 10
6 7 3
X 9 1
Y 10
\ 10
114 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 5.5 Confusion matrix of English coda consonants produced by a Sudanese EFL speaker
(targets, in the rows) and responded to by ten British listeners (in the columns). Correct
responses are on the main diagonal, indicated in bold face. Confusions ( 30%) are in shaded
cells.
Perceived RP consonants
Target
D V5 F F< & H I M N O P 0 R U 5 V 6 X \
D 9 1
V5 10
F 10
F< 10
& 7 3
H 10
I 10
M 10
N 8 1 1
O 10
P 10
0 3 7
R 1 9
U 10
5 1 9
V 10
6 1 5 4
X 1 9
\ 6 4
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 115
Table 5.6 Confusion matrix of English coda consonants produced by a Sudanese EFL speaker
(targets, in the rows) and responded to by ten American listeners (in the columns). Correct
responses are on the main diagonal, indicated in bold face. Confusions ( 30%) are in shaded
cells.
Perceived RP consonants
Target
D V5 F F< V5 H I M N O P 0 R U 5 V 6 X \
D 10
V5 10
F 10
F< 10
& 5 5
H 8 2
I 10
M 2 4 4
N 10
O 10
P 10
0 1 9
R 3 7
U 10
5 1 9
V 10
6 4 1 2 3
X 1 9
\ 4 6
As for the onset consonants produced by the Sudanese EFL speaker, both British and
American listeners totally misidentified /&/ as /\/, while frequent misperceptions of
/6/ as /U/ and /F/ as /V/ were also observed. It is worth mentioning that the American
listeners always misperceived /F/ as /V/. These are probably the most serious errors
experienced by the listeners involving the English consonants read by Sudanese
speakers. Similar error patterns of the dental fricative consonants of English were made
in the coda consonants read by the Sudanese speakers. These included the replacement
of /&/ by /\/, /6/ by /U/, /\/ was replaced by /U or 6/ whilst /6/ was replaced /U or
&/ and there was /0~P/ confusion, for both listener groups. Miscellaneous other
confusions such as /M~I/ and /H~X/ were found for the American listeners only.
The British and American listeners suffered from several other confusions, which
included /X~R, 5~V5, R~M/ in coda position. The error frequency obtained for the
fricative consonants is higher for onsets but lower for the coda position.
116 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
In contrast to the above, the listeners showed nearly perfect perception of all English
onset and coda consonants when these were articulated by the native speaker. As for
the onset consonants, the British listeners misperceived /&/ as /6/ and /6/ as /U/,
whilst the American listeners showed perfect perception. As for coda consonants, the
most prominent type of error was an interchangeable (symmetrical) confusion of
/O~P/ by the British listeners, which showed up as an asymmetrical substitution of
/0/ for /P/ in the responses of the American listeners.
The conflation of /&/ with /\/ and /6/ with /U/ which were read by Sudanese speakers
can be attributed to incorrectly produced English consonants. This conflation resulted
from interference of (L1) Sudanese colloquial Arabic (in formal Arabic these sounds are
pronounced correctly) (Mohammed 1991). In the Sudanese consonant inventory the
interdental /6, &/ merged with the apico-dental (often labeled as alveolar or sibilant) /U,
\/ (Dickins 2007, Watson 2002, Corriente 1978). Thus, Arabic words like /J3&C/ ‘this’,
are mispronounced as J3\C/, whilst /63DKV/ ‘firm’ is mispronounced as /U3DKV/, which
influenced the production of the English dental and alveolar fricatives. Actually, in a
number of Arabic dialects, the line separating dental continuants from sibilant (hissing)
sounds is becoming blurred. That is, the consonant chart of Central Arabic (CA) and
Modern Standard Arabic (MSA) contain three subsets grouped as stops /V F, V/,
sibilants /U \, \/ and interdentals /6, &, &/. This means that the distinction between
sibilants and interdentals has been lost at the colloquial dialectical level, but not in
formal Arabic. However, the loss of such boundaries is compensated for by four
distinctive features for two subsets which include voiced-plain, voiceless-plain,
voiceless-emphatic and voiced-emphatic consonants (Schmidt 1987, Watson 2002,
Dickins 2007). This change, therefore, has side-effects involving the perception of L2
dental fricatives. According to Kaczwski and Mellani (1993), to avoid these types of
confusions, Arabic speakers (of different colloquial dialects) of English need to
rearrange the distinctive features lying between inter-dentals and alveolar from those of
Arabic. Furthermore, the distinction between English /6, &/ does not always lie in their
articulation since most EFL learners can perform them correctly in isolation. However,
the problem aggravates when such dentals are combined with /U/ and /\/, particularly
in languages which contain no dental fricatives. All of /U, \/ and /6, &/ are produced
nearer to the upper incisors, so that learners need to practice drills containing
combinations involving such sounds (Cruttenden 2008).
phonemes and (ii) the potential frequency of such pairs in interactions. This claim
motivates the prediction that in an error hierarchy, contrast between phonemes such as
/6~U, &~\/ may imply a high functional load due to their rare occurrence in many
languages, which in turn leads to intelligibility problems. Thus, the intricate learning
nature of these phonemes, as both rare and highly marked sounds across languages,
practically plays a major role in labelling them as a prominent issue of speech
intelligibility problems (see Jenkins 2000, Seidlhofer 2005, Van den Doel 2006).
Other substitution errors of English /M~I/ coda consonants which were read by the
Sudanese learners are likely due to the lack of a clear voicing feature separating voiced
from voiceless stops, which occurs across very narrow (VOT) boundaries. Moreover, it
is probably because in the Arabic inventory the VOT values of final plosives are
normally low or absent which make the voicing distinction between such pairs blurred.
Consequently, the native listeners made incorrect judgments of the English velar
consonants. The misrecognition of the English /N/ as /P/ is attributable to similarity of
the place of articulation. However, it is most probably due to the effect of pre-pausal
features that affect a wide range of modern Arabic dialects, including Central Sudanese
dialects.
Other perception errors like /R~H, X~D/ can be attributed to labiality shared between
bilabial stops and labio-dental fricatives or to voicing. Background noise or
unfamiliarity with the speaker’s accent often delays intelligibility between speech
interlocutors (Ball and Rahilly 1999). On the other hand, the native listeners do not
show serious perception errors of English consonants read by the British speakers,
which are most likely due to similarity of their linguistic backgrounds. In other words,
both British and American listeners benefited from the similar linguistic background
shared with the native speakers.
5.5.3.1 Results
Figure 5.3 presents the mean percentage of correctly identified consonant clusters in
the responses by ten British and ten American listeners of English. The clusters were
read by one Sudanese and one British speaker of English.
As Figure 5.3 shows, both British and American listeners achieved a less than optimal
identification of the English clusters read by Sudanese speaker: correct identification is
84 and 88% for British and American listeners, respectively. Their performance is near-
ceiling with the same consonant clusters read by the native speakers: the overall mean
scores are 98% and 96%, respectively. The overall effect of speaker type (native, non-
native) is highly significant as shown by an RM-ANOVA, F(1, 18) = 24.8 (p < .001).
The results also seem to indicate that the Sudanese speakers were more intelligible to
the American than to the British listeners but the RM-ANOVA shows that neither the
main effect of listener nationality, F(1, 18) < 1, nor the speaker × listener type
interaction, F(1, 18) = 1.8 (p = .198) reach significance.
118 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Figure 5.3. Percentage of correctly identified English onset and coda clusters by 10 British and 10
American listeners of English. The error bars include ±2 Standard Errors of the mean. The
consonant clusters were produced by one Sudanese and one native speaker of British English.
Tables 5.7-8-9-10 present the confusion matrices of the British and American listeners’
perception results of English onset and coda clusters, produced by the Sudanese
speakers. As Tables 5.7-8 show, few errors were made in the perception of the onset
English clusters such as the replacement of /MN/ by /IN/ by both groups of listeners.
Moreover, the British listeners replaced /RN by /HN/ whilst American listeners replaced
/RN/ by /FT/ and /UY/ by /UR/. Generally, Tables 5.7-8 do not show any serious
difference in the error rates between the two listener groups. This is probably so
because generally the onset clusters are easier to identify.
However, the British and American listeners made more perception errors in the coda
clusters as Tables 5.9-10 show. The British listeners misidentified /UV/ as /UM/, /PV/ as
/0M or OR/ and /0M/ as /PF/. Other miscellaneous errors which showed no regular
pattern are the misperception of /P\/ as /F\, or /VU, UV/ as /MF/. On the other hand,
fewer errors were made by the American listeners in the perception of English coda
clusters produced by the Sudanese speaker. These included the replacement of /UV/ by
/UM/ and /0M/ by /PF/. This finding reveals that the error frequency in the perception
of the English consonant clusters by the British and American listeners is more
remarkable in the coda clusters.
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 119
The error rate is smaller when the English consonant clusters were read by the native
speaker in both onset and coda clusters. As the results show, the perception by both
listener groups is nearly perfect. The British listeners misperceived /UV/ as /UM/ whilst
American listeners replaced /UN/ by /UR/. The nasal cluster member /P\/ was also
mistaken for /O\/ by both listener groups.
Table 5.7 Confusion matrix of English onset clusters produced by one Sudanese speaker (targets,
in the rows) and responded to by ten British listeners (in the columns). Correct responses are on
the main diagonal, indicated in bold face.
Table 5.8 Confusion matrix of English onset clusters produced by one Sudanese speaker (in the
rows) and responded to by ten American listeners (in the columns). Correct responses are on the
main diagonal, indicated in bold face.
Table 5.9 Confusion matrix of English coda clusters produced by one Sudanese speaker (targets,
in the rows) and responded to by ten British listeners (in the columns). Correct responses are on
the main diagonal, indicated in bold face. Confusions ( 30%) are in shaded cells.
Target
DF I\ NO 0M PV P\ UV VU DN F\ MF NF NM NV l6 PF UM
DF 6 2 1 1
I\ 9 1
NO 9 1
0M 7 3
PV 9 1
P\ 9 1
UV 5 1 4
VU 10
Table 5.10 Confusion matrix of English coda clusters produced by one Sudanese speaker (targets,
in the rows) and responded to by ten American listeners (in the columns). Correct responses are
on the main diagonal, indicated in bold face. Confusions ( 30%) are in shaded cells.
DF I\ NO 0M PV P\ UV VU UM NU NV N6 OR F\ PF NH NM NF
DF 4 5 1
I\ 10
NO 6 1 2 1
0M 6 1 3
PV 2 5 1 2
P\ 7 2 1
UV 3 7
VU 10
The replacement of the onset cluster /MN/ by /IN/ is the most frequent perception error
pattern, which is most likely made due to the lack of clear distinctive voiced and
voiceless features that occurs across very narrow VOT boundaries. In the Sudanese L1
(Arabic) inventory, the distinction of consonant stops such as these uses VOT and
aspiration features but these are activated in different ways than in English. While both
English and Arabic fall into the two-category group of languages in terms of the
number of stop categories they contain, the two languages differ in their VOT patterns.
Arabic follows a binary system of presence or absence of glottal pulsing during the
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 121
closure period of the stop, while in English there need not be any vocal cord vibration
during the production of either of members of the pair /M, I/ (Kattab 2000). Con-
sequently, the EFL speaker incorrectly produced the English velar consonants, which
misled the target listeners to choose the right acoustic feature, VOT/aspiration, to
distinguish the initial consonant in /MN~IN/. In the coda position, the misidentification
of the second consonant in clusters as in /UV~UM/ and /PV~0M/ probably occurred
because of the similarity of the manner of articulation between second cluster members
(plosives). Nevertheless, perception errors such as the misidentification of /OR, 0M/ as
/PF/ reflects the effect of plosive release: i.e. weakly exploded stop consonants are
often vulnerable to confusion.
Other miscellaneous errors such as /RN~HN/, /RN~FT/, /P\~F\/ and /UV~MF/ in both
onset and coda positions, which do not show a clear pattern, can possibly be
understood as the result of differences in phonotactic restrictions between English and
Arabic. Many findings in the field of non-native speech perception have shown that the
perception of speech segments is determined by two factors; language-specific and
language-universal constraints. That is, phonotactic restrictions in each language
determine the sound sequences in a syllable where particular sounds can appear in the
onset/coda position.
Interestingly, the findings reveal that Sudanese speakers are more intelligible to
American listeners than to their British counterparts (for ANOVA see the results
above), which may imply that the American listeners are more familiar with the foreign
accent, or with foreign accents in general, than the British listeners are. On the other
hand, the British listeners benefited from the fact that they spoke the same variety
(British English) as the native speaker since the British listeners obtained higher scored
on the native speaker’s materials than the American listeners did.
5.5.4.1 Results
Figure 5.4 presents the mean correct scores on the SPIN test obtained by ten British
and ten American listeners. The sentences were read by one Sudanese and one British
speaker of English. Error bars (± 2 standard error, SE) are also shown. The figure also
shows the correct identification scores on components of the SPIN keywords. Separate
scores were computed for the onsets, vocalic nuclei and codas of the SPIN keywords.
Also, a composite score was computed by taking the mean of these three component
scores. Note that the composite score is always higher than the word-recognition score:
for a keyword to be counted as correctly recognized, all components had to be
identified correctly by the listener. I will present and statistically analyse only the word-
recognition scores. The component scores will be analysed in a later section when I will
make an attempt to predict word recognition from the component scores.
122 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Figure 5.4 Mean correct recognition of keywords and components thereof by ten British and ten
American listeners of SPIN sentences produced by one Sudanese (top panel) and one British
speaker (bottom panel) of English. Error bars are ± 2 SE.
As Figure 5.4 shows, the performance of the British and American listeners is nearly
perfect on the SPIN sentences produced by the native RP speaker, with overall mean
values of 93 and 95%, respectively (right-most bar in each cluster). However, lower
word-recognition rates were obtained when the same sentences were read by the
Sudanese speaker of English: the overall means drop to 65 and 69% for the British and
American listeners, respectively. Moreover, in comparison to the British listeners, the
American listeners show a higher intelligibility level of the SPIN sentences irrespective
of the speaker’s accent. The main effect of speaker type (Sudanese EFL versus native
British) was highly significant by a RM-ANOVA, F(1, 18) = 239.9 (p < .001). The
effect of listener type (American versus British), however, is a trend at best, F(1, 18) =
3.3 (p = .085). The speaker × listener interaction is totally insignificant, F(1, 18) < 1.
Figure 5.4 also provides details on the listeners’ performance in the perception of the
SPIN keyword components produced by the Sudanese and the British speaker. The
correct identification by British and American listeners of onset consonants in the
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 123
keywords is 85 against 93% when the consonants were read by the Sudanese and
British speaker, respectively, F(1,18) = 90.8 (p < .001). However, the listeners
responded perfectly to the same consonants spoken by the British speakers; the mean
correct score is 100% for both listener groups, F(1, 18) = 7.5 (p = .013) for both the
main effect of listener group and for the speaker × listener interaction.
The results for the vowel nuclei show a small difference of perception between the
British and American listeners; the mean correct identification scores here are 76
against 84% when the items were read by the designated Sudanese EFL speaker, and 97
and 100% when the items were read by the native speaker, F (1, 18) = 136.2 (p < .001)
for the speaker effect and F (1, 18) = 10.4 (p = .005) for the main effect of listener
nationality. However, the interaction between speaker and listener groups is a trend at
best, F (1, 18) = 3.0 (p = .099).
On the other hand, performance on the coda consonants proved to be the poorest of
all and the British listeners had higher scores than the Americans when the sentences
were read by the British speaker; the mean scores are 97 against 96%. However, both
listener types showed a lower score when the same coda consonants were read by the
Sudanese speaker; the mean scores are 69 against 75%, respectively. Again, the effect of
speaker type was highly significant, F (1, 18) = 191.2 (p < .001), whereas the effect of
listener group was not, F (1, 18) = 1.3 (p = .271). The interaction between speaker type
and listener group just fails to reach significance, F (1, 18) = 4.3 (p = .053).
Both British and American listeners obtained excellent recognition scores on simple
and predictable English sentences produced by the native RP speaker. However, the
American listeners performed slightly better than their British counterparts, regardless
whether the materials were spoken by the Sudanese or the native RP speaker of English;
the mean recognition scores found for these two groups of listeners are 69 and 95%
and 65 and 93%, respectively. The listeners’ performance is always better when they
hear native speakers. Interestingly, the American listeners tend to have better scores
irrespective of speaker type. Possibly, the SPIN sentences, which were developed in the
USA, refer to American rather than to British everyday situations. The coda consonants
proved to be a difficult area in which the listeners showed a low performance, in
comparison to the onset consonants and nucleus vowels. The correlation figures below
may provide more insight.
5.6 Correlations
Tables 5.11-12 present correlation matrices for vowels, single and cluster consonants
and the component scores on the SPIN keywords: i.e. vowels, single and cluster
consonants, the mean of the latter three components, and the recognition scores on the
entire keyword in the SPIN sentences. The correlation coefficients were computed for
the mean percent correct scores of British (upper part of tables) and American (lower
124 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
part of tables) native listeners, separately for the non-native (Table 5.11) and native
speaker (Table 5.12). The tables present linear product-moment correlation coefficients
(r) between the listeners’ perception scores for all tests and test components in the
battery.
Table 5.11 Correlation matrix for scores on vowels, single consonants, cluster consonants and
(components of the) SPIN test (onset correct, nucleus correct, coda correct, mean of onset +
nucleus + coda, whole word correct) read by one Sudanese speaker of English.
The computation of the correlation of the SPIN results provided different figures with
respect to listener and speaker nationality backgrounds (for an explanation of the
concept of the correlation coefficient, see chapter three). With regard to SPIN test
components read by the designated Sudanese EFL learner, a correlation between the
onset consonants and nucleus vowels yielded a positive r = .692 (p < .05) for the
American listeners, whilst it shows a positive but insignificant r = .339 for the British
listeners. These figures imply that the vowel nucleus is predictive of the onset correct
perception, particularly for the American listeners. Moreover, the coda consonant
component correlates with the onset and nucleus vowels positively at r = .375 and .552
for the British listeners and at r = .445 and .573 for the American listeners, respectively.
These relations indicate that both British and American listeners identify the onset
CHAPTER FIVE: INTELLIGIBILITY OF SUDANESE ENGLISH TO NATIVE LISTENERS 125
consonants well whenever they succeed in identifying nucleus vowels and coda con-
sonants (and vice versa).
On the other hand, we find no useful correlation between vowels, consonants and
clusters and their SPIN component counterparts, which null-effect we did not expect.
There are weak correlations between SPIN coda consonants and consonants at r = .222.
This indicates that vowels and consonants have a negative association with the SPIN
components, except the coda consonants, which have a positive relationship to
consonants. Similarly, English vowels heard by American listeners showed a relatively
high positive correlation with coda consonants (although the correlation is not
significant) at r = .625. It is possible to attribute the absence of correlation between the
SPIN components and their MRT counterparts (vowels, consonants and clusters), to
the learners’ paucity of exposure to English, which leads to less consistent performance.
Table 5.12 Correlation matrix for scores on vowels, single consonants, cluster consonants and
SPIN test (onset correct, nucleus correct, coda correct, mean of onset + nucleus + coda, whole
word correct) read by one British speaker of English.
Individual vowels, consonants and clusters show some weak correlations. Clusters
showed a positive (but statistically insignificant) correlation with consonants at r = .385.
Moreover, clusters correlate positively with vowels (r = .347) and with coda consonants
(r = .326). This shows that cluster consonants are to some extent good predictive
elements of correct perception of vowels and coda consonants read by Sudanese
speakers and responded to by British listeners.
5.7 Conclusions
Errors made by British and American listeners in the perception of the English front,
central and back vowels produced by Sudanese speakers were largely due to fact that
the learners’ native language, Sudanese Arabic, which distinguishes merely three vowel
qualities. These English vowels are not part of the speakers’ L1 vowel inventory so they
represent learning difficulty. Moreover, the paucity of knowledge of the English sound-
letter correspondences on the part of the learners often leads to the misperception of
these vowels. Such perception errors often take place due to partial learning or
insufficient practice.
Frequent confusions were made by the British and American listeners of English in the
perception of the English dental fricatives /U, 6/ and /\, &/ read by the Sudanese
speaker due to interference from the Sudanese-Arabic source consonant system. The
incorrectness in this context is caused by the filter effect of the speakers’ L1 Sudanese
Arabic (SA) consonant inventory in which contrasts that exist between English
consonants are not made.
British and American listeners showed no serious perception problem with English
speech sounds which were produced by the native control speaker.
The results also reflect the effect of the linguistic backgrounds of speech participants
on intelligibility. That is, native listeners are better equipped to interpret the speech of a
native talker. On the other hand, non-native talkers may produce the L2 speech sound
with a articulation base that is typical of their L1 rather than of the target language
which leads to misinterpretation of such a sound. This means that ESL/EFL listeners
from the same native language background as the talkers will be more likely to access
the correct phonemic category than EFL/ESL listeners and speakers who do not have
the same native language.
Vowels and coda consonants (rather than consonant clusters and single initial
consonants of English proved to be the most problematical area in the perception of
Sudanese-Arabic accented English for native (British and American) listeners.
Chapter Six
Producing the English vowels is one of the most challenging tasks for Sudanese
university EFL learners. Such learners arguably have difficulties, e.g., distinguishing
between English vowels like /G/ and /«Ö/ in words like gale ~ girl and /#Ö, 3, ¡,n/ in
words like cart, cat, cut, cot. Cross-linguistic studies have shown that segmental errors like
these frequently occur in ESL/EFL due to differences between L1 and L2 (Flege 1995,
Gilbert 1984). Many learners whose L1 lacks contrastive sounds of L2 tend to replace
L2 sounds by the nearest sound available in their L1. The English vowel /W/, for
example, may be realized with significantly higher F2 values in English than in French
due to absence of an /[/ category in English. This is probably why substitution of
English /W/ in French tous /VW/ is perceived as /[/ by native French listeners (Flege
1976). Findings such as these suggest that language-specific differences are responsible
for learning difficulties of L2 speech sounds. The lack of L2 knowledge may also
contribute to production problems of English vowels by ESL/EFL learners. This has
to do with the explicit knowledge acquired by the L2 learners through pronunciation
lessons taught. Most ESL/EFL classes focus on teaching language aspects such as
syntax, vocabulary and morphology to help learners to grasp the structure of English
sentences. However, learning to produce correct pronunciation is not given much
attention in these syllabuses. Although a few lessons treat phoneme articulation in a
broader sense, the accompanying exercises do not address any specific pronunciation
difficulties. In these lessons, teachers ask the learners to pronounce repeatedly a set of
minimal pairs, etc. The learners react to such pronunciation tasks reluctantly and this is
probably why the lessons are less effective. In the Sudanese context, for example, EFL
learners receive lessons for the development of the listening skills, in which tape
recordings are played. Most other communication skills take place inside the class room.
Therefore, the learners do not get sufficient opportunities to practise skills needed in
real life.
To account for the processes involved in cross-language speech production like these
and to predict difficulties experienced by adult second or foreign language (L2) learners,
the spectral and temporal patterns of L2 speech sounds produced by these learners
should be examined. Instrumental studies focused on aspects like formant frequency
(in Hz) in the production of L2 vowel by L2 learners. Focus was limited to areas of
difference where a vowel in L1 has no counterpart in L2. However, other studies went
further to examine even the production of L2 vowels that have a phonological
counterpart in L1, seeking to achieve several goals. Firstly, by examining the
production patterns they wish to obtain conceptual and productive insights into the
mechanisms that the second (or foreign) language learners adopt in order to deal with
128 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
the target English phonological system. Secondly, they aim to establish insights into the
extent and nature of the similarities and differences between the phonetic inventories
of the learners’ native language (L1) and those of the target L2 (Flege 1976). In the
present study, the phonetic and acoustic distance that exists between English (L2) and
Sudanese EFL learners’ (L1) Arabic form a major factor of L2 production which
motivated the present investigation. That is, differences in spectral properties (i.e., F1
and F2 formant values) between English and Arabic represent one example, where the
Sudanese Arabic long vowels /KÖ/, /CÖ/ and /WÖ/ showed relatively lower F1 and F2
values compared with English (Elobeid and Maaly 1996). This property may influence
the learners’ articulation of L2 through interference of L1 perceptual vowel
representations. Similar problems might arise in the production of L2 due to
differences of the vowel space and temporal cues between the L1 and L2. However, if
some vowels in L1 show correspondence to others in L2, this should also be
considered.
6.2 Methods
6.2.1 Material
Recordings were made on a laptop computer using Adobe Audition software. The
subjects were seated in a quiet room with their lips a few centimetres away from a
head-mounted close-talking microphone. They were asked to read a list of mono-
syllabic English words which included all the target English vowels. These words were
embedded in a carrier sentence (Say …again). The carrier sentence was intended to help
the subjects to speak at a constant rate. The list of items (including keywords) can be
found in Appendix 3.1. The subjects were encouraged to give their best possible
production of such words. If the experimenter suspected that an error in the
production was simply a reading error, rather than a genuine indication of the subject
inability to pronounce a certain word, the subject was asked to repeat the word. The
recorded material was then submitted to acoustic analysis using Praat software
(Boersma and Weenink 1996).
6.2.2 Speakers
Ten Sudanese native Arabic speakers preparing for a bachelor degree in English
language teaching were recruited primarily from the student population at Gadarif
University. In selecting the participants, semi-final learners who had reached a
considerable level of English were preferred. This is because they were expected to
achieve better performance. Practically, these students use English only inside the
CHAPTER SIX: ACOUSTIC ANALYSIS OF ENGLISH VOWELS 129
classroom and in other academic activities such as debates, discussions, etc. For the
control group of native speakers, the data published by Deterding (1997) was used,
which provides measurements of English vowels recorded by five male and five female
BBC broadcasters. The data is found in a directory that contains ten files in Excel
format. Each file contains the measurements of the first three formants of the eleven
monophthongal vowels of RP. Importantly, the words containing the target vowels
were not spoken in sentences but in isolation.
6.3 Procedure
When studying the details of vowel production, the customary procedure is to measure
the lowest two resonance frequencies of the vocal tract, denoted as the first and second
formants (F1, F2), respectively. F1 and F2 can be related to vowel quality in a fairly
straightforward fashion (e.g. Delattre, Liberman and Cooper 1955). F1 corresponds
closely to the degree of mouth opening (close versus open vowels) whilst F2 is a
correlate of vowel backness. The task of formant measurement was done in a number
of steps. Firstly, I roughly estimated where the formants were by looking at the
spectrogram of the stimuli, particularly the target vowels. Formant tracks were
automatically computed for the lowest three formants (F1, F2, and F3) in the frequency
range between 0 and 3200 Hz and superposed onto the spectrogram. Whenever there
was a visual mismatch between the formant tracks and the spectrogram, the model
order (number of formants required) and/or the frequency range of the Linear
Predictive Coding (LPC) analysis was changed, until a satisfactory match was obtained.
Then segmentation points were set in a text grid at the onset and offset of the target
vowel while the number of formants to be extracted (two or three) and frequency cut
off (in Hz) were noted on a separate tier. Using a script, the duration and the formant
frequencies were extracted from the recordings off-line. Formant values were extracted
at the temporal midpoint of the target vowel. 18 The data were then further analysed
with SPSS statistical software.
Then, in order to make acoustic distances between vowels in the formant space
optimally correspond to auditory distances, formant values were rescaled from hertz to
Barks (using the conversion formula advocated by Traunmüller 1990). 19
18
I gratefully acknowledge the help of Ing. Jos J.A. Pacilly, senior technician at the LUCL
Phonetics Laboratory, in writing the necessary Praat scripts.
19
The Bark scale is a psycho-acoustical transformation proposed by Zwicker (1961). Bark has to
do with measurements of loudness. The scale ranges from 1 to 24 corresponding to the first 24
critical bands of hearing. There are subsequent band edges (in Hz) at 20, 100, 200, 300, 400, 510,
630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700,
9500, 12000, 15500 Hz. According to Smith and Abel (1999) Bark units represent samplings of a
continuous variation in the frequency response of the ear to a sinusoid or narrow band noise.
130 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
6.4.1 Vowels
Figures 6.3-4 below present acoustic vowel charts of eleven English vowels produced
by Sudanese and British speakers, respectively. As a correlate of vowel height F1 (in
Barks) is plotted vertically against F2 (in Barks), which is plotted horizontally (from
right to left) as a correlate of vowel backness. Each point in the graph represents the
centroid (mean F1-F2 coordinates) in the acoustic vowel space of one vowel type,
measured at the temporal midpoint of the ten tokens produced by the Sudanese
speakers (or by a variable number in the L1 control data). In the graphs long (tense)
and short (lax) English vowels are indicated separately. The short vowels are the corner
points of the polygon with the grey shading.
First formant (F1, Bark)
Figure 6.3 Mean positions in the vowel space of English vowel tokens produced by Sudanese
speakers. Long tense vowels are linked by the unshaded polygon, whilst the short lax vowels are
shown in the shaded polygon. F1 values are plotted vertically and F2 horizontally.
132 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Figure 6.4 Mean positions in the vowel space of English vowel tokens produced by British
speakers. Long tense vowels are linked by the unshaded polygon, whilst the short lax vowels are
shown in the shaded polygon. F1 values are plotted vertically and F2 horizontally.
It is apparent from the results that the English vowel space of the Sudanese speakers
differs from that of the natives. In the vowel area, the short and long English vowels of
these speakers appear to be closely similar (though not identical) whilst their British
equivalents are dissimilar, which reveals an important discovery. This implies that the
Sudanese speakers follow the same track in producing the short and long English
vowels which make their acoustic output of such vowels manifest a kind of
correspondence. In the vowel space, the high front vowel /KÖ/ is situated closer to the
low front /+/. Similarly, the rounded back /nÖ/ and /n/ appear closer to each other, but
in the case of the native speakers these pairs are totally separate, i.e. /nÖ/ is located high
back, whilst /n/ tends to be low back in the vowel area. This suggests that the vowels
produced by Sudanese learners do not conform to native English patterns. Similarly,
the English long vowel /WÖ/ of the Sudanese speakers is produced further back than
that of the British speakers. More interesting differences are that several Sudanese
English vowels do not show a clear learning pattern, i.e., do not look like those of the
target language. As Figure 6.1 shows, /G/ is less open and is in fact quite close to /+/.
The short open /3/ is quite near /¡/ and /#Ö/, unlike that of the native speakers which.
These types of pronunciation problems occur due to different factors.
CHAPTER SIX: ACOUSTIC ANALYSIS OF ENGLISH VOWELS 133
6.4.1.2 Discussion
The statistical analysis of acoustic output reveals that the dispersion of the English
vowels spoken by the Sudanese speakers and their British counterparts uses different
contrastive categories. Generally, this suggests that Sudanese EFL learners have
problems in implementing native English norms. In detail, one of the most interesting
findings is that the members of the English tense-lax vowels pairs /WÖ~7/ and /nÖ~n/
are very close to one another in the vowel space. This pattern of error reveals a clear
effect of the speakers’ L1 vowel system; i.e. the English tense/lax vowels were
pronounced according to the subjects’ L1 productive strategy (Mitleb 1981). On the
other hand, the English tense vowel /KÖ/ shows no serious production problems,
probably because it is similar to the Arabic /KÖ/ (see Munro 1993). The misclassification
of /G/ as /+/ (Figure 6.1) indicates no distinct learning of this vowel. It is probably due
to the fact that the English /G/ has no equivalent in Arabic, so that Arab students tend
to replace it by /+/ or /¡/ (Kopczwski and Mellani 1993). However, this claim sounds
less plausible, since previous studies have shown that Sudanese Arabic has /G/ (Munro
1993, Dickins 2007). 20 Therefore, most probably this type of error refers to spelling/
graphical differences that exist between English and Arabic, where the Sudanese-
Arabic speakers pronounce English /G/ in the same way it is spelt. Therefore, the
English vowel /G/ in words such as enter, envelope, wet, let, etc., is frequently
mispronounced as /+/ by the Sudanese speakers. The major cause of this confusion is
probably partial learning of the English front vowels. Moreover, this type of error is
also attributable to transfer of the Arabic spelling system, which maintains a direct
letter-to-sound correspondence. This means that each vowel or consonant of Arabic
has one sound, which corresponds to its spelling, but there are no silent (unpro-
nounced) letters.
The fluctuation of the English front low short vowel /3/, which is physically shown in
a mid position between // and /¡/, points to the lack of this vowel type in the
learners’ L1 vowel inventory (Brett 2004). This type of problem may exist due to
differences of vowel realization between English and Arabic. In a related study of
Arabic vowels, the Sudanese informants tended to produce Arabic vowels, e.g., /CÖ/
(typically sounds like /3/) with rising tones (Algamdi 1998). Arguably, this is one of
the reasons why Arabic speakers are frequently advised to keep the English /3/ fully
front to avoid confusion with /¡/ (Cruttenden 2008).
The lack of vowel contrasts in Arabic makes the learning of English vowels difficult.
Arabic and English show similar simple syllable nuclei in that both show phonetically
short and long vowel patterns. But because Arabic has fewer contrasts, the range of
20
Sudanese Arabic also developed monophthongs. These include /G/, which historically descend
from the diphthong /CL/ as in /CLP/ ‘an eye’, which coalesced (merged) in dialects such as
Cairene and Central Sudanese. In Arabic varieties spoken in large parts of the Levant these
vowels are realized as /Gu/ or /nt/. In Sanani and a number of Peninsula dialects, the diphthongs
are maintained in all phonological contexts. Moreover, among some Cairene speakers the mono-
phthongs are shortened in closed syllables to give short /G/ or /n/, hence they are not con-
sidered to be separate vowels (Watson 2002).
134 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
allophonic variation of each vowel phoneme is greater than that of English; e.g., Arabic
/C/ has allophones within the area bounded by /', 3, #, ¡/. Thus, English contrasts
such as bet-bat, cat-cot, cot-cut, cot-caught trigger difficulty (Lehn and Slager 1983). All in all,
error patterns such as these are often accounted for on the basis of differences of
formant values that exist between L1 and L2, as previous studies have shown. These
differences result in incorrect articulation of L2 vowels (Liberman et al. 1957, Scholes
and Robert 1968).
Figure 6.5 presents the mean durations of English vowel tokens of Sudanese university
students and native speakers of English. Duration values are arranged in descending
order from left to right. Durations are measured in milliseconds. In the figure, the
native speakers’ vowel durations appeared longer than their Sudanese counterparts
because they were spoken in isolation.
Figure 6.5 Mean duration (s) of English vowels produced by Sudanese (square markers) and
native (circles) speakers of English, broken down by vowel type.
Z-normalization was used to get more insightful vowel duration values (see
normalization above). The computation of correlation coefficients revealed a strong
positive relationship between the Sudanese speakers’ mean vowel durations and those
CHAPTER SIX: ACOUSTIC ANALYSIS OF ENGLISH VOWELS 135
of the native speakers (r = .943, p < .01). Moreover, the mean duration values of the
pure English vowels produced by Sudanese speakers are as follows: /+/ 59 ms, /KÖ/ 145
ms, /G/ 69 ms, /n/ 108 ms, /nÖ/ 199 ms, /7/ 90 ms, /WÖ/ 159 ms, /3/ 150 ms, /¡/ 81
ms, /«Ö/ 109 ms and /#Ö/ 211 ms (see Appendix 6.1 for individual vowel durations and
mean norm vowel durations). This statistical fact implies that the English vowel
durations of Sudanese speakers correspond relatively well to English vowel duration
norms (see Catford 2001, Jacewicz, Fox and Salmons 2006). In other words, the
tense/long English vowel durations of Sudanese learners correspond to the longest
native RP durations whilst the lax/short ones correspond to shortest durations.
The observed correspondence fits the assumption that the Arabic tense-lax vowel
categories resemble those of English in terms of quality and duration. However, the
resemblance is not perfect since each of the two languages possesses distinctive
acoustic features (see Elobeid and Maaly 1996). In other previous studies, Sudanese
speakers showed English vowel duration ordering similar to that of the native speakers,
in particular in tens/lax vowel pairs; however, in terms of vowel quality (location in the
F1-by-F2 space) they are insufficiently distinct from one another. This is likely because
the Sudanese learners incorrectly interpret English tense/lax vowels in terms of Arabic
temporal properties (Mitleb 1984, Munro 1993). Actually, in terms of acoustic cues, the
Arabic long/short vowel distinction can best be described as a tense-lax contrast based
on quantity (Alghamdi 1998, Flege and Port 1981, Hassan 2003, Koeczynski and
Mellani 1993, Walkers 2001). 21 On the other hand, in English, the distinction between
the tense-lax vowel pairs is primarily a qualitative difference perceived by the native
speakers (Carrs 1999, Catford 2001, Cunningham-Anderson 2003). Thus, cross-
linguistic differences such as these potentially lead to difficulty for ESL/EFL learners.
The results also imply that the Sudanese speakers are aware of the English long/short
vowel contrast but they have difficulty implementing the exact acoustic norms of the
English vowels. Moreover, the poor performance in this area could be attributed to the
speakers’ relatively little exposure to English vowel sounds.
21
Vowel quantity is defined as that phonological distinction of a vowel relative to one or more
other vowels of similar timbre in the language. Contrasts in vowel quantity are often acoustically
realized by the duration of vowels where a long vowel quantity has a duration that extends twice
that of a short vowel. The greater duration associated with a long vowel quantity also allows the
possibility for a more extreme articulation than a corresponding short vowel quantity. Con-
sequently, the vowel spectrum, in particular the first and second formant frequencies, and
therefore perceived timbre, may also be affected by vowel quantity (Takayuki et al. 1999).
136 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Since there are only perception data at this moment on the English vowel tokens of a
single (representative) Sudanese-Arabic EFL speaker, I would like to make an educated
guess of how native English listeners would identify all the Sudanese L2 English vowel
tokens collected in this study (or how Sudanese L2 listeners would identify the L1
English vowels produced by many different RP speakers). In order to do so, Linear
Discriminant Analysis will be used (LDA). LDA (Klecka 1980, Strange, Bohn, Trent
and Nishi 2004) is an automatic classification technique that can be trained to optimally
classify the vowel tokens in this study in terms of the English vowel categories. In the
training stage of the analysis, exemplars of L1 tokens of English were fed to the
algorithm, in terms of F1 and F2 (Bark transformed and subsequently z-normalised
within speakers) as well as vowel duration (z-transformed). As the results will point out,
the algorithm, once trained on the native English vowel data, achieved a good
classification of the native English vowel tokens (76% correct identification; chance
would be 9% correct, i.e. 1 in 11). Then the same algorithm (optimized for L1 English
vowel categories) was used to classify the Sudanese L2 English vowel tokens. In this
way, the LDA functions as a model of a typical native L1 listener on the assumption
that an L1 listeners knows where the vowel tokens in his language are typically located
and how far individual vowel tokens may stray away from their prototypes (i.e.
centroids in the F1-by-F2 (-by duration) space. I have also repeated the process and
trained the model with Sudanese L2 English tokens; then it was examined how well the
LDA model identified the vowels spoken by Sudanese learners and by native speakers
of English.
Tables 6.1-2-3-4 below show the results of the LDA in confusion matrices.
Table 6.1 Confusion matrix of Sudanese accented English vowels classified by Linear Dis-
criminant Analysis. The algorithm was trained and tested on RP vowels (76.4% correctly
classified vowel tokens). Correctly classified vowels are on the main diagonal (bolded).
In the rows of the matrices, the vowel types are listed as intended by the speakers,
whilst in the columns the vowel types identified by the LDA are displayed as the most
likely category. As a result, the main diagonal in the matrix contains the correct
identifications, while confusions are found in the off-diagonal cells. I will first examine
Table 6.1, which contains the results of the LDA when trained and tested on L1
English vowels.
Table 6.1 shows, that correct classification of vowel type ranges between 60% (for /7/)
and 97% (for /KÖ/) with an average of 76.4%. The strongest confusion is found
between /WÖ/ and /7/: the tense vowel is misclassified as its lax counterpart in 25% and
the lax member is confused with the tense member in 19%. Even though the
classification is imperfect, (as would be the classification by human listeners) I may
now classify the Sudanese L2 tokens by applying the native classification schema. The
results are presented in Table 6.2.
Table 6.2 Confusion matrix of Sudanese accented English vowels classified by Linear Dis-
criminant Analysis. The algorithm was trained on RP data but tested on Sudanese-Arabic
accented L2 vowels (42 % correct vowel classification). Correctly classified vowels are on the
main diagonal (bolded). Confusions t 30% are indicated in grey-shaded cells.
The performance of the LDA in Table 6.2 was poor (42% overall correct vowel
identification ) in comparison to the previous one (76.4%). Similar types of errors were
repeated where /WÖ/ was almost always replaced by /nÖ/ and less often by /7/ and /nÖ/
by /n/. Other frequent errors were the misclassifications of /+/ as /KÖ/, /G/ as /+/, /3/
as /¡/, /#Ö/or /«Ö/ and finally /«Ö/ was misidentified as /G/ and less often as /¡/ and
/n/. The last analysis is an LDA trained on L2 data and used to classify native English
vowels.
138 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 6.3 Confusion matrix of Sudanese accented English vowels classified by Linear Dis-
criminant Analysis. The algorithm was trained and tested on Sudanese-Arabic accented EFL
vowels. Correctly classified vowels are on the main diagonal (bolded). 54.7% of the vowel tokens
were correctly classified. Confusions t 30% are indicated in grey-shaded cells.
Table 6.3 shows that many of the English vowels produced by the Sudanese speakers
were misclassified, with a mean correct of 54.7% and lots of confusions. For example,
/+/ was misclassified as /KÖ/ (57% confusion), /n/ as /nÖ or 7/, /7/ as /n/ and /¡/ as
/#Ö or nÖ/ and /G/ was misclassified as /+/ (46%). The results also showed that /«Ö/ was
mispronounced as /G, 3, ¡, nÖ/. Interestingly, there were no serious errors made in the
classification of /KÖ/. There are other slight mispronunciations of English vowels made
by the subjects, which do not reflect a clear error pattern.
In Table 6.4 the rate of confusion was even worse (48.7%) when the same English
vowel tokens were identified automatically in native listeners’ terms. For instance, /n/
was misclassified as /nÖ/ and sometimes as /¡, #Ö, WÖ, 7/. Moreover, /nÖ/ was almost
misclassified as /WÖ/ and less often as /7/, whilst tense-lax pair /WÖ~7/ was inter-
changeably misclassified. Automatic identification also shows that the tense vowel /+/
is often replaced by /G/ or vice versa. Furthermore, the English vowel tokens /G, ¡, «Ö, 3,
#Ö/ were interchangeably substituted for one another, however, the English vowel pair
/+~KÖ/ was rarely confused.
CHAPTER SIX: ACOUSTIC ANALYSIS OF ENGLISH VOWELS 139
Table 6.4 Confusion matrix of Sudanese-accented English vowels classified by Linear Dis-
criminant Analysis. LDA trained with L2 vowels but tested on L1 vowels. Correctly classified
vowels are on the main diagonal (bolded). 48.7% of the vowel tokens were correctly classified.
Confusions t 30% are indicated in grey-shaded cells.
In conclusion, the classification matrices show that the production of English vowels
proved to be more problematic for Sudanese speakers. However, results of the native
speakers revealed better performance, as Table 6.1 shows. These results allow the
prediction that the Sudanese speakers do not follow certain learning patterns, probably
because these types of vowels are lacking in Arabic language. The data also bear out the
prediction that Sudanese listeners/speakers were more intelligible to each other than to
the native English speakers and vice versa, which reflects the inter-language speech-
intelligibility effect in which speech participants benefit if speakers and listeners share
the same native language. 22
22 Inter-language means using a language system, which is neither the L1, nor the L2. It is a third
language, with its own grammar, its own lexicon and so on. The rules used by the learner are to
be found in neither his own mother tongue, nor in the target language. In this context, inter-
language describes the possibility that, in interactions, listeners can explicitly categorize unfamiliar
speakers due to regional dialects/linguistic backgrounds (Van Heuven and Wang 2007). Obvious-
ly, for English native listeners, the native speakers of English are most intelligible. Similarly, the
non-native listeners find the non-native with the same linguistic background more intelligible
than the natives. This is called matched inter-language speech intelligibility benefit. On the other
hand, the type of degraded level of intelligibility that occurs between native and non-native
speech participants is referred to as mismatched inter-language speech intelligibility benefit (Bent
and Bradlow 2003).
140 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
6.5 Conclusions
The articulation of the /G, ¡, «Ö, n, #Ö, WÖ, nÖ, +, 3/ proved to be difficult as the subjects show
a poor performance. However, there are remarkably few errors made in the
pronunciation of the tense vowel /KÖ/. This is probably because the Sudanese speakers
have similar equivalents for such vowels.
Unlike the native speakers’ vowels, Sudanese EFL learners’ vowels are mostly
distinguished with lower formant values (probably due to inventory differences
between L1 and L2). The speakers need to enhance their vowel inventory to produce
less foreign-accented English vowels.
The English short/long vowel durations of the Sudanese learners show similar
ordering to those of the native English speakers. However, some vowel durations are
slightly lengthened, probably due to the circumstance that the learners tend to produce
English vowels with their L1 productive strategies.
Both speaker types benefit from their national backgrounds (inter-language), as was
shown by the results of applying automatic vowel classification in native English and
Sudanese-accented EFL vowel tokens after that the classification algorithm had been
trained with native and EFL data. If it is accepted that the automatic classification
procedure mimics the performance of a human (native or non-native) listener, the
results support the hypothesis that each of the Sudanese and British speakers manifest
a greater level of intelligibility when they are perceived by (simulated) listeners with the
same native-language background.
Production errors detected in this study followed different directions which suggest
that the Sudanese learners of English do not follow a clear learning pattern.
Chapter Seven
7.2 Objective
The objective of the study is to find experimental evidence for the production
problems with English consonants spoken by Sudanese university EFL learners. It has
been argued that pronunciation difficulties arise due to differences between L1 and L2
speech sounds. These difficulties do not only result in pronunciation problems but also
they lead to the perception of unintended English speech sounds by the native English
listeners, causing intelligibility problems.
The data obtained can help understand which English consonants are the most difficult
to produce and what the causes of these difficulties are. Thus, it would be possible to
obtain cognitive insights into the L2 production problems and to utilize these insights
for pedagogical purposes.
7.3 Methods
7.3.1 Material
Stimuli comprised a list of CVC words included in a carrier phrase Say …again. These
The consonants were plosives, fricatives and affricates produced by 11 Sudanese
university EFL learners (see Appendices 3.2a-b). The nasals and semivowels were
excluded as they were not expected to present production problems. Moreover, the
onset C1 could be each of the possible onset consonants specified above (i.e. excluding
/0/ and /</ because they do not occur in initial position). Similarly, C2 components
could be each of the possible coda consonants, i.e., excluding the semivowels /J/, /L/,
/Y/, /T/, which do not occur in coda position. Additionally, /</ was not tested in coda
position, even though it occurs in words such as beige or rouge. Such French loans are
too infrequent to warrant the inclusion of /</.
7.3.2 Participants
Eleven male Sudanese Arabic speakers were recruited primarily from the student
population at Gadarif University. In selecting the subjects, I focused on semi-final
students who had reached a considerable level of English proficiency and, hence, from
whom a relatively good performance was expected. These students specialized in
English. In general, they used it inside the classroom and in other academic activities
such as debates, discussions, etc.
As there were no native speakers at hand, I took recourse to published native speakers’
data related to this study for comparison purposes.
CHAPTER SEVEN: ACOUSTIC ANALYSIS OF ENGLISH OBSTRUENTS 143
Voice onset time (VOT) data of Docherty (1992) were used for comparison purposes.
Five male native speakers of Southern British English, aged between 18 and 22,
provided Docherty’s data. These were students preparing for a bachelor degree at
Edinburgh University but had been educated and brought up in South-East England.
None of these subjects had a regional accent and there were also no systematic
differences between them.
I used Centre of gravity (COG) and spectral SD values of Maniwa et al. (2009). This
study included eight fricatives of English recorded by 20 male and female native
speakers of American English (aged 19-34). The fricatives were embedded in /#C#/
non-words. Each syllable was recorded in isolation in conversational and in clear
speaking style. Data on the preceding vowel duration were extracted from House
(1961), English consonant durations from Catford (1977) and peak intensity data from
Ball and Rahilly (1999).
7.4 Procedure
Materials were recorded on a laptop computer using Adobe Audition. The subjects
were seated in a quiet room with their lips a few centimetres away from a head-
mounted close-talking microphone. They were asked to read a list of monosyllabic
English words which included all the target English consonants. These words were
embedded in a carrier sentence (Say …again). The carrier sentences were intended to
help the subjects to speak at a constant rate. Moreover, keywords were provided in the
list along with the target words as a guideline to help learners achieve correct
pronunciation (see Appendices 3.2a-b). The subjects were encouraged to give their best
possible production of the words. If the experimenter suspected that an error in the
production was simply a reading error, rather than a genuine indication of the subject’s
inability to pronounce a certain word, he asked the subject to repeat the word. The
recorded materials were then submitted to acoustic analysis using Praat software.
7.4.2 Praat
For speech analysis, the Praat speech-processing programme was used. Praat is an
open-source software tool, which is used for speech signal editing and labelling, as well
as for various acoustic (spectral, formant and duration) analyses and manipulations
(Boersma and Weenink 1996). It has other advantages of being easily adaptable for
specific research purposes; results can also be exported to Excel-compatible
spreadsheets for offline statistical analysis of results.
This section presents the acoustic characteristics of the English plosives, fricatives and
affricates produced by Sudanese EFL Learners. The measurements included the voice
144 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
onset time (VOT), duration of the preceding vowel in different consonant environ-
ments, consonant duration, peak intensity, centre of gravity (COG) and the standard
deviation (SD) of the spectrum (explained in § 7.5.8).
English plosives differ from other consonants in their ways of articulation. That is to
say, they can be characterized acoustically by three main phases which include (i) a
closure leading to silence (the silent interval), (ii) a release noise burst and (iii) a fast
movement of the articulators into or away from the vowel. In the case of the first phase,
a perceptible period of silence appears throughout the whole spectrum. However, in
the voiced plosives /D, F, I/ there is usually only a near-absence of energy but some
low-frequency energy is maintained during the closure. This low frequency dis-
tinguishes the voiced plosives from their voiceless counterparts by the presence of a
voice bar, which appears in the spectrum below 250 Hz. As a result, the closure release
has a relatively higher intensity in voiceless than voiced stops because at the moment of
release, intra-oral pressure is lower in voiced stops than the voiceless stops. Release is
the second phase. It causes a rapid escape of air, which in turn gives rise to random
pressure variations, i.e. a noise burst. According to Kent, Dembowski and Lass (1996)
there is a short period of varying constriction of the upper vocal folds, which occurs
immediately after release and which results in a post-release fricative-like periodic
sound. For the voiceless plosives /R, V, M/, there is usually a higher onset or offset in
fundamental frequency into the following and/or preceding vowel. Moreover, there is
likely to be a marked rising bend of F1 of the adjacent vowel in the case of /D, F, I/ that
is not as marked in the case of /R, V, M/. Furthermore, distinctions between different
plosives, i.e. bilabial, alveolar and velar stops, are indicated by the noise frequency of
the burst that appears at the onset of the release stage together with bends of F2 and F3,
which are also known as formant transitions. Such transitions move into the following
or preceding vowels. These stages are referred to as overlapping rather than discrete
and are not necessarily evident in any individual stop token. Articulators are the third
phase, which move apart from each other giving rise to turbulence due to airflow
through the glottis (aspiration). The airflow just starts prior to the onset of the vocal
fold vibrations. The next section provides details and illustrations about the plosives’
VOT.
Voice onset time (VOT) is a term which is widely used to describe the timing of voicing
in stops. It refers to the interval (in ms) which exists between the release of the stop
closure and the start of the voicing for a following voiced segment. The voice onset
time is used as an acoustic parameter to distinguish syllable-onset cognates in many
languages of the world (Docherty 1992). Figure 7.1 provides spectrographic illustrations
of the voice onset time of some English initial plosives: bab, dad and gag. In the pattern
shown, the voice onset time codes the voicing category. As the spectrogram shows,
CHAPTER SEVEN: ACOUSTIC ANALYSIS OF ENGLISH OBSTRUENTS 145
there is no voicing during the closure of any of the three initial plosives (0 ms VOT).
Immediately after the silence (in Figure 7.1, this is shown as a white bar), there is a
burst of energy (a noise burst, in Figure 7.1, this is shown as a dark line between the
silence and the vowel bar) followed by voicing. The vowel sound appears as a wide
black bar, which normally follows the burst. In this way, measuring the time from the
burst to the beginning of the following vowel is called the voice onset time (VOT).
Failure of ESL/EFL speakers to produce voice onset time for plosives like /R,V, M/
with long-lag values that correspond to the values of the native speakers in the same
phonetic context is detectable by the native speakers. Such differences of VOT values
contribute to the appearance of foreign accent – intelligibility problems.
Figure 7.1 An illustration of the voice onset time (VOT) in native English plosives.
In this section, I present the voice onset time (VOT) of English plosives produced by
Sudanese EFL learners. Figure 7.2 shows the VOT of English initial and coda plosives
produced by Sudanese EFL learners. Figure 7.3 shows the voice onset time (VOT) of
English plosives which were produced by native speakers (data from Docherty 1992).
Both datasets were produced in a carrier phrase (Say ..… again).
Figure 7.2 presents the Voice Onset Time (VOT, in ms) measured for the English
plosives produced by 11 Sudanese EFL learners for onset and coda consonants
separately and broken down further by place of articulation.
146 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Target consonant
Figure 7.2 Mean Voice onset time (VOT, in ms) of English plosives produced by 11 Sudanese
learners of English. Onset and coda stops are plotted separately and broken down by place of
articulation. Error bars are ±2 standard errors of the mean.
Figure 7.3 Presents VOT measurements obtained from a group of native English
control speakers (Docherty 1992). These measurements were done for plosives in onset
position only. For the sake of comparison I copied the corresponding EFL values into
Figure 7.3, which therefore partially repeats the EFL data in Figure 7.2.
Figure 7.3 shows that the English VOT of the Sudanese learners is very different from
the English norm. The difference is highly significant by a paired t-test, t(5) = 13.7 (p
< .001).
CHAPTER SEVEN: ACOUSTIC ANALYSIS OF ENGLISH OBSTRUENTS 147
Target consonant
Figure 7.3 Mean Voice Onset Time (VOT, in ms) in initial plosives produced by British (native)
and Sudanese (non-native) speakers of English. Data of native speakers were extracted from
Docherty (1992). Sudanase EFL data is my own (see Figure 7.2).
The learners’ VOT of both voiced and unvoiced stops almost falls within the short-lag
range of the continuum. This is quite different from the native speakers (Figure 7.3)
whose voice onset time falls within the short-lag range for the voiced plosives, whilst
those of the voiceless counterparts fall in the long-lag range. This finding reveals
systematic acoustic differences between English and the Sudanese learners’ L1 (Arabic).
The distribution of their VOT values shows a different organization than those of
English, which probably reflect different categorical distinctions between English and
Arabic. Similar findings were reported by Flege and Port (1980), Fokes, Bond and
Steinberg (1985) and Khattab (2002). They demonstrated that the acoustic correlates of
voicing for onset stops in Arabic are the presence of glottal pulsing (pre-voicing/
phonation), which occurs during the closure interval for voiced stops (i.e. negative
VOT), and presence of a noise burst for unvoiced stops (i.e. short positive VOT). On
the other hand, in English, the voicing contrast is shown by the presence of silence (in
the oscillogram this appears as entirely a flat line) during the closure interval followed
by a short noise burst for voiced plosives (i.e. short positive VOT) and aspiration for
voiceless plosives (i.e. long positive VOT). One more aspect of difference is that the
learners produced relatively shorter VOT values for the English voiceless stops in the
coda position than in the onset, which shows a feature of L1 influence. That is, Arabic
speakers of English tend to show hardly any voicing contrast in coda plosives. However,
148 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
the voiced onset and coda plosives showed similar results; coda VOT values are shorter
than in onsets, which prompt caution about such a finding. More interestingly, even on
the individual speaker level, the VOT total mean values tend to be shorter for the
English plosives: 4, 6, 9, 11, 12, 5, 0, î5, î14, î4 and î55 ms for the eleven individual
speakers (for individual voice onset time mean values of each plosive see Appendix 7.1)
than those of the native speakers. This is due to the use of Arabic acoustic correlates
(Flege and Port 1981). These findings suggest that the VOT data of the Sudanese
learners are unstable, showing no length pattern similar to that of the native English
VOT norm. This implies that the learners have difficulty acquiring the English voicing
contrast properly, particularly for the coda consonants. They also suggest that the
learners do not adopt a clear learning pattern in the production of the English plosives,
due to partial learning. Flege (1976) concluded that VOT differences between Arabic
and English are neither due to the confounding factor of vowel context which requires
speakers to produce English stops with longer VOT values, nor to a lack of experience.
Flege described such differences as the result of wrong phonetic representations (these
are normally L1 sound categories) which L2 learners use as guides for the production of
L2 speech sounds categories. Yet despite all, it appears that the learners often acquire
some English voicing contrast features correctly; i.e., the VOT values of some stops
increase as the articulation moves further back in the mouth. This property was fully
shown by the voiceless onset plosives /R, V, M/, whose voice onset time values increase
monotonically with place of articulation: 34, 42 and 53 ms, respectively. The voiceless
coda stops showed a similar rank order for /R/ and /V/ VOT values (0 and 3 ms,
respectively); however, the /M/ VOT value is 0. Similarly, the voiced coda plosives /D, F,
I/ also follow the predicted effect of place of articulation, since their VOT values are 8,
7 and 0 ms, respectively. This data strongly suggests that Sudanese EFL learners have
insufficient experience with English.
Figure 7.4 presents the duration (in ms) of the vowels preceding voiced versus voiceless
target consonants, as spoken by the Sudanese learners. The consonant types have been
broken down by position in the syllable (onset versus coda) and by manner of
articulation (plosive, fricative, affricate).
CHAPTER SEVEN: ACOUSTIC ANALYSIS OF ENGLISH OBSTRUENTS 149
Figure 7.4 Mean duration of the English vowel preceding onset and coda plosives, fricatives and
affricates produced by Sudanese learners. Error bars are ±2 standard errors of the mean.
Figure 7.4 shows that the Sudanese learners’ English vowel durations preceding voiced
and voiceless plosives relatively correspond to the English norms for the preceding
vowel duration, but the difference between the voiced and voiceless members in each
pair never reaches significance. Even the largest difference found in any of the six pairs,
i.e., in coda affricates, falls short of significance, t(19) = 1.5 (p = .154, two-tailed). The
mean duration of the vowels are longer before voiced plosives and shorter before
voiceless stops in both onset and coda positions, which finding concurs with
Cruttenden (2008) and Dretzke (1998) (see also Appendix 7.3). Similarly, the duration
values of the vowels preceding onset fricatives show a distinctive pattern. This is
because vowels preceding the voiced fricatives are longer than vowels before the
voiceless counterparts. However, fricatives and affricates show variation in the duration
of the preceding vowels. This appears clearly in the coda fricatives and onset affricates
where the duration values of the preceding vowel do not conform to the pattern that is
expected of the English voicing contrast. Affricates also show similar differences,
where the durations of the vowels preceding onset affricates violate the native English
norm, although coda affricates reflect the correct voicing contrast similar to that of the
native speakers. Unstable duration values such these probably occur due to the
influence of the speaking style. One more interesting finding is that duration values are
nearly equal for vowels followed by alveolar and dental fricatives, particularly in coda
position. This might be due to an incorrect production of English fricatives, which
probably resulted from the voicing influence of the source consonants. That is, learners
150 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
tend to substitute their L1 counterpart fricatives boundaries between /U/ and /6/ and
/\/ and /&/ which are blurred (see Dickins 2007, Watson 2002), probably due to the
circumstance that Arabic speakers of English often fail to implement a proper voicing
contrast with final English consonants. In both cases, the ultimate result of this is an
incorrect preceding vowel duration.
Figure 7.5 presents the mean duration values of the English onset and coda plosives,
fricatives and affricates produced by Sudanese learners.
Figure 7.5 Total mean duration values of the English onset and coda plosives, fricatives and
affricates produced by Sudanese learners. Error bars are ±2 standard errors of the mean.
Figure 7.5 shows that generally, the English voiceless consonants which were produced
by Sudanese learners, have longer duration values than their corresponding voiced
consonants on both onset and offset positions; coda affricates differ, though. Moreover,
onset consonants show longer duration values than coda consonants. These findings
reveal that the voicing contrast of such consonants shows pattern that consistently
correspond to results in other literature on native English consonant duration (Catford
1977, Cruttenden 2008, House 1961). Moreover, affricates show similar duration
features in onset position but differ in codas. However, the English consonant
durations of the Sudanese learners tend to be longer than the durations of the native
CHAPTER SEVEN: ACOUSTIC ANALYSIS OF ENGLISH OBSTRUENTS 151
speakers. The overall mean durations of the individual consonants tokens produced by
the Sudanese EFL learners and those produced by native speakers concur with these
results (see Appendix 7.4).
Intensity correlates with the (perceived) loudness of a sound. Intensity is the square of
the amplitude of the sound wave integrated over a moving average (time window) that
should be long enough to include at least two glottal pulses. It is determined by the size
of the variation of air pressure, and is conveniently expressed in decibels (dB)
(Ladefoged 2003). Voiced sounds have greater intensities at low frequencies (typically
below 1000 Hz) than voiceless sounds. This feature labels (low-frequency) intensity as a
relative cue that can be used for distinguishing between voiced and voiceless
consonants.
Figure 7.6 Mean peak intensity of the English onset and coda plosives, fricatives and affricates
produced by Sudanese learners. Error bars are ±2 standard errors of the mean.
152 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Generally, the data of the English consonants’ intensity of the Sudanese learners show a
variation of intensity rates at both onset and coda positions (see Figure 7.6; for values
of all the consonants, see Appendix 7.5). The onset lenis plosives have greater intensity
(67 dB) than their fortis counterparts (64 dB). Similarly, the lenis coda plosives,
fricatives and affricates have marginally greater intensity than the fortis consonants: 57,
67 and 65 dB against 57, 66 and 63 dB, for plosive, fricative and affricate pairs,
respectively. This means that the onset plosives and all voiced and voiceless coda
consonants show relative but insignificant correspondence to English intensity where
the voiced sounds tend to have greater intensity than the voiceless sounds (Ladefoged
2003). However, the onset lenis fricatives have lower intensity (65 dB), and so do
affricates (60 dB), than their onset fortis counterparts (65 and 67 dB, respectively).
Acoustic correlates are used as parameters to measure issues such as the difference
between speech sounds and qualities of these sounds, etc. Formant values are used as
correlates to distinguish between different vowel sounds since they are linked to the
relative positions and movements of the tongue. However, formants are inappropriate
measures for most consonants. Instead, the spectral centre of gravity (COG) may used
to capture information on the place of articulation of fricatives (and of frication noise
in general). Computationally, COG is the mean frequency calculated as the first spectral
moment, expressed numerically as fi·Ei/Ei, where f and fi are frequencies in Hertz,
E(f) and Ei the spectral power as a function of the frequency (see Figure 7.7). For
instance, if the frequency range is sampled between 0 and 10,240 Hz with, say, 1024
points separated by 10-Hz intervals, the COG is the weighted mean of these 1024
frequencies (at 5, 15, 25, 35, …, 10,235 Hz), where each frequency is weighted by its
intensity. If the emphasis is on the high end of the spectrum, as in /U/-like fricatives,
the COG will assume a relatively high value; if low frequencies are dominant, as for
velar and uvular fricatives, the COG will be found at a relatively low frequency. Figure
7.7 provides an illustration of a sound with low-frequency emphasis. It can be seen that
the COG, indicated by the dashed vertical line is at a rather low value, to the left of the
centre of the analysis bandwidth.
CHAPTER SEVEN: ACOUSTIC ANALYSIS OF ENGLISH OBSTRUENTS 153
Figure 7.7 Illustration of the Centre of Gravity. COG is represented by the dashed line. It is the
mean of all frequencies within the analysis band, weighted by the acoustic energy at each
frequency (from Van Son and Pols 1999).
The place of articulation of a fricative (or noise burst of the homorganic plosive or
affricate) defines the size of the resonating cavity beyond the constriction point. The
larger (especially longer) the cavity beyond the constriction point, the lower its
resonance frequency, and thereby the COG value. However, there is (at least) one
second parameter that is needed to define the gross shape of the friction spectrum.
Two different fricatives, for instance /H/ and /5/ may have similar COG values but
differ in the distribution of intensity around the centre of gravity. Typically, /H/ has a
flat and level spectrum with intensity evenly distributed over all frequencies whereas /5/
has its energy concentrated more closely around the COG. Such a measure is afforded
by the standard deviation (SD) of the spectrum. If the spectrum contains just one sine
wave, the SD would be zero, indicating that the spectrum is maximally compact: there
is no energy in the spectrum at any frequencies other than at the COG. If the spectrum
is white noise, then there would be maximal dispersion of energy over all available
frequencies within the range analysed, which would be the analysis range divided by 12.
In my analyses, the range extends between 0 and 11,025 Hz (i.e. the digital sampling
frequency divided by 2, a result which is also called the Nyquist frequency. An /H/-like
noise spectrum would then have a spectral SD approximating 10,025/12 = 3183 Hz.
In terms of the earlier example, the spectral SD would be computed by taking the
difference between each of the 1,024 frequencies fi and the COG fx and then
determining the root-mean-square average of these differences after weighting each
individual difference by the intensity of the fi.
In my recordings, COG and the Spectral SD were computed for the middle portions of
the friction sounds. The exact time points of the onset and offset of the noise bursts
for plosives, affricates and fricatives were marked in Praat Textgrids. An analysis
window was then automatically defined between 25 and 75 percent of the duration of
154 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
the friction portion of the target sound, such that the COG and Spectral SD were
measured for the central half of the friction portion, which can be assumed to be
relatively stable and optimally representative (see also Maniwa et al. 2009).
In the remainder of this chapter I will concentrate on the COG and spectral SD of the
fricatives. Only for fricatives there are data available in the literature on native speakers
of English that can be compared with the results. No such data can be found for
plosives and affricates. The full data for all manner categories can be found in
Appendix 7.2).
Figure 7.8 presents the centre of gravity (COG) values and the spectral SD of the
English fricatives produced by Sudanese learners and native speakers of English. The
latter data were obtained by estimating the values of the measurement points at 25, 50
and 75% of the friction duration in Figure 2 in Maniwa et al. (2009: 3968). It is assumed
that the mean of the three COG and SD measurements in the central 50% of the
fricative duration is equivalent to a single COG and SD determination averaged over
the middle 50% of the duration of the fricative noise, as was done in my own analysis.23
The first thing which is observed in Figure 7.8 is that the Sudanese learners use the
two-dimensional friction space less effectively than the native speakers do. For one
thing, native /U/ has a COG over 7,000 Hz with a fairly narrow concentration of
energy, while the EFL counterpart has the COG at a substantially lower frequency
(approximately 5,000 Hz) and with a wider spread of energy. Overall, the Sudanese
speakers show roughly the same COG values for voiced and voiceless cognates
whereas the native speakers observe a large difference in COG such that voiced
fricatives have clearly lower values than their voiceless counterparts. This latter
difference is what should be expected given that the voiced fricatives have a lot of low
frequency energy as a result of vocal cord vibration. The results are compatible with my
earlier finding that Sudanese EFL speakers fail to make a proper distinction between
the voiced and voiceless fricatives. Interestingly, the /U~\/ pair do not suffer from this
shortcoming: even though the COG values of the EFL /U~\/ are much lower than
those in native English, the absolute difference between the cognates is of equal
magnitude. The relative location of /U/ versus /\/ in the native data differs radically
from that in the EFL data. In the EFL data there is a tendency for the COG and
spectral SD values to be strongly correlated, r = .837 (p = .019, two-tailed). The voice-
less sounds are always characterized by a higher COG and a larger spectral SD than
their voiced counterparts, which shows that vocal cord vibration is largely absent from
the voiced counterparts. This is in clear distinction to the native English data, where
COG and spectral SD are not correlated, r = .387 (p = .344, two-tailed, ins.). The
spectral SD of native voiced fricatives is always larger than that of the voiceless
counterpart, while the COG is at lower values. This finding is compatible with the
23 In fact, Maniwa et al. (2009) collected two sets of COG and SD measurements; one set was
defined on conversational speech, the second set was collected for optimally clear repetitions of
the target items. I assume that the speaking style of the recordings in my own materials is more
like clear speech than like conversational speech.
CHAPTER SEVEN: ACOUSTIC ANALYSIS OF ENGLISH OBSTRUENTS 155
Figure 7.8 Centre of gravity (COG) and Spectral standard deviation values (in Hz) of the English
fricatives produced by Sudanese learners (top panel) and native speakers of English (bottom
panel).
In order to determine how well the fricatives are distinct in the EFL data I ran Linear
Discriminant Analyses (LDAs) on the fricative tokens, categorizing place of articulation
for voiced (three categories) and voiceless (four categories) fricatives separately, with
COG and spectral SD as predictors (see § 6.4.1.4 for an explanation of the procedure).
COG and spectral SD values were z-normalised within individual speakers (over
fricative tokens only) in order to abstract away from speaker-individual differences in
mean COG and spectral SD. The results of these two LDAs are shown in Table 7.1,
which is a confusion matrix of predicted and observed category membership (place of
articulation). The upper part of Table 7.1 shows the results for voiced fricatives, the
lower part deals with the voiceless counterparts. Overall correct assignment of place of
articulation amounted to 60% correct for the voiced fricatives (27 points better than
chance, which is 33%). Correct place assignment rose to 59% for the voiceless
fricatives, which is more than twice as good as chance (= 25%). For both the voiced
and the voiceless fricatives, place assignment based on COG and spectral SD is quite
156 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
reasonable, between 64 and 77% correct, with one notable exception: the dental place
of articulation was poorly recognized by the LDA (40 % correct or less). The dental
fricatives, /6, &/ present an obvious problem for the Sudanese-Arabic EFL speakers.
These fricatives are most systematically but asymmetrically confused by the LDA with
labials, showing that the dental tokens are included in the scatter cloud of the labials but
not vice versa. 24
Table 7.1 Observed versus predicted place of articulation, based on Linear Discriminant Analysis
with COG and spectral SD as predictors for voiced and voiceless English fricatives spoken by
Sudanese-Arabic learners. Correct predictions in bold face.
24 Previous findings showed that overlapping exists between alveolar and dental fricatives of the
Sudanese-Arabic dialect that suggests a sort of retraction or merger between dental, alveolar and
palato-alveolar sounds (see Dickins 2007, Watson 2002). This retraction forms a major cause of
intelligibility problems, and impedes a precise articulation of fricatives (Cruttenden 2008, Raphael,
Borden and Harris 2003).
CHAPTER SEVEN: ACOUSTIC ANALYSIS OF ENGLISH OBSTRUENTS 157
Table 7.2 Observed versus predicted voicedness versus voicelessness, based on Linear Discrimi-
nant Analysis with COG and spectral SD as predictors for English onset and coda fricatives
spoken by Sudanese-Arabic learners. Correct predictions in bold face.
Table 7.2 shows that, overall, voiced versus voiceless fricatives are poorly discriminated
in terms of COG and Spectral SD. When the results are lumped together across onset
and coda positions, mean correct assignment of the voicing feature is 67%, which is
only 17 points better than chance. Performance of the algorithm did not improve
noticeably when I performed separate analyses for onset and coda positions, with mean
percentages correct voicing assignment of 69% and 66%. The results reveal a bias
favouring voiceless decisions, indicating that voiced fricatives have greater overlap with
their voiceless counterparts than vice versa.
The automatic determination of voicing is the only possible comparison that can be
made with published data on English speakers. Maniwa et al. (2009) mention an overall
percentage of correctly assigned voicing of 95. Taking this information into account, I
may conclude that the voiced-voiceless distinction is insufficiently well coded in the
COG and spectral SD properties of English fricatives as pronounced by Sudanese-
Arabic learners of English.
7.5.9 Conclusions
The acoustic analysis of temporal and intensity measures of English consonants which
were produced by Sudanese EFL learners, permit the following conclusions:
The Sudanese EFL learners tend to apply Voice Onset Time (VOT) trends that differ
from those of the native speakers of English. The EFL voiced and voiceless plosives
fall in the short-lag range of the native English continuum, most likely due to L1
influence. Therefore, the learners need to enhance their VOT strategies in order to
produce a correct voicing contrast.
English dental and alveolar fricatives /&, 6, U, \/, labiodentals /H, X/ and /5, </ have
Centre of gravity (COG) values which are closer to one another than in the native
English reference data. Moreover, coda affricates show unstable patterns of duration
158 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
(i.e. they tend to be longer) for both the consonants themselves and for the vowels
preceding them.
Although duration values of the preceding vowel show the same (relative) ordering
along the acoustic continuum that is found in the English reference data, the durations
are unstable in comparison to native English realisations. Unstable durations most likely
occurred as a result of categorical differences between the Sudanese learners’ L1
(Arabic) and English. Similarly, consonant duration values tend to be twice as long as
those of L1 English. Most probably, longer durations in the learners’ production are
due to the lack of their knowledge of English consonants and their slow speaking rate.
The Centre of Gravity data reveal relative correspondence to the native English
patterning. Correspondence takes place because the Sudanese data of the sibilant
fricatives appear with spectral peaks at relatively higher frequencies than non-sibilants.
This correspondence occurs probably because Arabic has many consonants that
resemble those of English. However, the COG values of the native speakers’ fricatives
are higher than those of the Sudanese EFL learners. This is probably because the
learners are not skilful enough at producing precise English fricatives due to insufficient
practice or partial learning.
Chapter Eight
Acoustic analysis of
English consonant clusters
8.1 Introduction
This chapter focuses on the production problems of English consonant clusters that
are experienced by Sudanese university EFL learners. It attempts to provide acoustic
accounts for acoustic problems with the English consonant clusters that were produced
by such learners. Cross-linguistic studies paid much attention to the acquisition of
English singleton consonants. However, relatively little investigation has been done on
the production of consonant cluster problems among EFL learners. Initial and coda
consonant clusters occur in a large number of English vocabulary items, which suggests
the necessity of further effort on the part of Sudanese EFL learners in the production
of consonant clusters. More importantly, research revealed that incorrect perception
and production of English consonant clusters of two or three segments such as /VT, RT,
URN/ result in intelligibility problems of many second language learners (see Altenberg
2005, McLeod and Arciuli 2009). More specifically, Sudanese university EFL learners
arguably have pronunciation problems of such types of English sounds with words
beginning and/or ending with clusters like: flow, clock, special, twelve, glass, string, proper,
ground, etc. A process of vowel epenthesis often occurs before these clusters (e.g. spell
becomes ispell or espell) or between the cluster members where flow becomes (‘>’) filow,
glass > gilass, cream > kiream, and text > tekist, etc. (Mohamed 2005, Patil 2006). An
insertion of /+/ between the members of English onset obstruent clusters /U + {V, R, M, N,
Y, P, O}/ as such is intended to facilitate producing cluster consonants of English. In
general, Arabic syllable structure does not permit consonant clusters of two segments
such as /RN, RT, IT, UR, 6Y/ or three-segment clusters like /URT, UMT, UVT, URN/, etc., nor does
it allow them in coda position. Similarly, Sudanese Arabic (SA) allows only CV, CVC
and CVV syllables, but complex syllables such as those yielded by English onset and
coda consonant clusters as e.g. in split, twelfths, bursts and glimpsed are forbidden in SA
(Broselow 1984, Kaye 1997, Mohamed 2005, Raimy 1997). Production problems of
English consonant clusters occur due to different constraints on word syllabification
that exist in English and in Arabic. Studies on second-language acquisition attribute
problems with consonant clusters to motoric output constraints that are based on
permissible types of syllables in the first language (Carlisle 2001). These constraints
result in epenthetic vowels among many Spanish speakers of English as a repair strategy
(Altenberg 2005, see also Davidson 2006). Other studies on English consonant clusters
refer the inaccuracy of production to incorrect acoustic cues used by second-language
learners. This study investigates the learning problems with English consonant clusters
in an experimental approach aiming to find an empirical account for the causes of such
problems. I argue that the acoustic analysis of the durational properties of the English
160 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
8.2 Objective
This section aims to examine the production errors in English consonant clusters made
by Sudanese EFL learners. The investigation attempts to derive acoustic accounts on
the basis of duration differences that may exist between the target and Sudanese EFL
leaners’ clusters.
8.3 Participants
Eleven male Sudanese-Arabic speakers were recruited primarily from the student
population of the Department of English at Gadarif University. The total number of
the student population was 22. These students specialized in English as a foreign
language. They were all semi-final students who had reached a considerable level of
English proficiency. In general, they used English inside the classroom and in other
academic activities such as debates, discussions, etc. However, only 11 students were
selected for the experiment. Only this subject of participants speak Arabic as their
mother-tongue, whilst the others speak Arabic as a second language.
Two native speakers (one male, one female) of RP English served as the control
speakers in this study. The EFL speakers’ production will be compared with the
properties of the control speakers’ tokens.
8.4 Methods
8.4.1 Material
A number of seventeen onset and coda cluster items were chosen as the stimulus
material for this study. These cluster consonants form problem areas for Sudanese
learners of English. All words are meaningful; non-existent words were not used in the
experiment. The pairs were varied according to certain factors observed in literature on
production errors in English onset and coda clusters challenging Arabic native speakers
learning English (Patil 2006, Altaha 1995). The distribution was as follows:
CHAPTER EIGHT: ACOUSTIC ANALYSIS OF ENGLISH CONSONANT CLUSTERS 161
The set of 17 clusters was almost evenly distributed between onset (eight items) and
coda positions (nine items). For a full list of words included in the experiment see
Appendix 3.3.
8.4.3 Praat
For speech analysis, the Praat speech processing programme was used. Praat is an open
software tool, which is used for speech-signal editing and labelling, as well as for
various acoustic (spectral, formant and duration) analyses and manipulations (Boersma
and Weenink 1996). It has other advantages of being easily modified for specific
research purposes and the results can be exported to Excel-compatible spreadsheets.
In this section I present the acoustic results of the English consonant clusters produced
by Sudanese EFL university learners, in both onset and coda positions. There are two
sections in this part arranged according to cluster position.
This section describes the measured durational properties of the English onset and
coda consonant clusters, which were read by both Sudanese EFL learners and native
speakers of RP English. Measurement aimed at testing the production problems with
the learners’ different durational properties of the English clusters targeted. In greater
162 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
detail, these included all the cluster members of the target items of the test list; the first
(C1) , second (C2) and the third (C3) consonant cluster members, in both initial and coda
positions. Plosives were split up into two subphones, viz. a silent interval (‘si’) reflecting
the closure duration and a second portion (called ‘rest’) containing the release noise
burst. For the sake of data processing, however, all consonant types were split up into
si- and rest-components, where the si-component was set to zero when the consonant
was not a plosive. Accordingly, in Figures 8.2, 8.3, 8.5 and 8.6, you will find keywords
such as C1—si, which stands for the silent interval of first consonant cluster member.
C1—rest stands for the noise burst duration when the first cluster member is a plosive.
Similarly, C2—si represents the silence duration of the plosive, but this time of the
second consonant cluster member /R, V, M/. In this case, C2—rest indicates the noise
burst of such a plosive. Finally, C3—rest stands for the third consonant cluster member;
in the data these are usually /N/ or /T/. The spectrogram, in Figure 8.1, provides an
illustration of the durational components. Notice that the same legend is used for the
coda clusters, where C1—rest refers to the first cluster members /P, 0/ or /N/.
Figure 8.1 illustration of the durational properties of the English onset and coda consonant
clusters. It shows the positions of durational property of consonant cluster in the spectrogram
using keywords such as C1—si, C2—si, C1—rest, C2—rest, C3—rest (further see text).
CHAPTER EIGHT: ACOUSTIC ANALYSIS OF ENGLISH CONSONANT CLUSTERS 163
Figures 8.2-3 show the mean duration of English initial consonant clusters produced by
the Sudanese EFL learners and of the native speakers of RP English. T-tests were used
to determine the statistical significance of the duration difference per test component
between the two speaker groups. However, the results show no significant differences
t(11) = –.299 (p = .771, two-tailed) for the silent interval of the onset plosive C1—si,
t(1.038) = 2.0 (p = .293, two tailed) for C1—rest, t(1.003) = .802 (p = .569, two-tailed)
for C2—si, t(1.218) = 1.2) (p = .435, two-tailed) for C2—rest and t(11) = –.1 (p = .955,
two-tailed) for C3—rest.
Figure 8.2 Mean duration (ms) of nine English initial consonant clusters (plosives to the left,
fricative clusters to the right). Components of the clusters are shown as separate bars (further see
text).
164 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Figure 8.3 Mean duration (ms) of nine English initial consonant clusters (plosives to the left,
fricative clusters to the right). Components of the clusters are shown as separate bars (further see
text).
As Figures 8.2-3 show, there is a variation of differences between the mean duration
values of the English onset consonant clusters produced by Sudanese and native
speakers of English. The English plosive/liquid clusters /RN, MN, FT/ produced by the
EFL speakers tend to have longer silence durations than those of the native speakers;
mean cluster durations are 164, 117 and 103 ms, against 138, 72 and 97 ms, respectively.
However, the cluster /IN/ shows a lower duration (102 ms) compared to that of the RP
speakers (124 ms). Moreover, clusters starting with voiceless stops, like /RN, MN/, have
even longer silence duration, but shorter silent intervals are observed when such
clusters contain /IN, FT/. This is in contrast with their counterparts in the RP native
speakers’ tokens, which do not show this pattern of duration distribution. Similarly, the
stop+liquid clusters produced by Sudanese EFL learners showed longer noise burst
values compared with those of the native speakers: these are 30, 29, 49 and 25 ms
against 16, 21, 12 and 29 ms, respectively. Moreover, it was observed that the alveolar
voiceless fricative C1 /U/ in English initial consonant clusters /UN, URN, URT, UV, UY/ was
produced with shorter friction duration than those of the native English speakers.
Mean durations of /U/ in each of such clusters are 114, 97, 94, 111 and 141 ms for the
Sudanese EFL learners and 212, 184, 172, 228 and 242 ms for the native speakers.
More interestingly, in contrast to the results of the native speakers, the C2 of the
Sudanese EFL learners following the English fricative /U/ in /URN, URT/ manifested
longer duration values: 111, 110 and 98 ms against 73, 84 and 57 ms for the native
CHAPTER EIGHT: ACOUSTIC ANALYSIS OF ENGLISH CONSONANT CLUSTERS 165
speakers, respectively. The acoustic analysis of the English clusters of Sudanese EFL
learners indicates no vowel epenthesis which might have occurred in initial English
consonant clusters such as /UN, URT, URN, UV, UY/ nor between the two cluster members e.g.
initial /RN, IN, MN, FT/ and coda /DF/, in contradistinction to what has been suggested in
the literature. More or les unexpectedly, the results show that the stops following the
fricative /U/ which were produced by Sudanese EFL learners, have stronger aspiration
than those of the native RP speakers (see also Figure 8.4). These finding suggest the
existence of production problems with initial plosive+liquid clusters and codas.
Figure 8.4 Duration (s) of components of onset clusters beginning with fricatives for English
native speakers and for Sudanese learners of English.
Notice that the overall duration of the fricative clusters is much longer for the native
speakers (317 ms) than for the Sudanese learners (259 ms). It is not the case, however,
that all consonant clusters produced by the EFL speakers are shorter since the plosive
clusters of the learners were about equal in duration to those of the native speakers. In
addition to the shorter duration of the fricative clusters (possibly indicating incomplete
or sloppy articulation), the internal division of the component durations differs
considerably between the foreign and native tokens. In the EFL tokens, the fricative
lasts about as long as the rest of the cluster, whilst the /s/ in the native clusters is about
twice as long as the rest of the cluster.
166 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Figures 8.5-6 show mean duration values of English coda consonant clusters produced
by the Sudanese EFL learners and by the native speakers of RP English. A t-test is used
to determine the difference in duration values per test component between the two
speaker groups. However, the results show no significant differences, t(11) = .064 for C
1— si, t(11) = .983 for C 1— rest, t(11) = .009 for C 2— si, and t(11) = .213 for C 2— rest.
Figure 8.5 Mean duration values (ms) of nine English coda consonant clusters (plosives to the
left, other consonant clusters to the right). (Sub)components of the clusters (Consonant 1,
Consonant 2 and Consonant 3 are shown as separate bars (further see text).
CHAPTER EIGHT: ACOUSTIC ANALYSIS OF ENGLISH CONSONANT CLUSTERS 167
Figure 8.6 Mean duration (ms) of nine English coda consonant clusters (plosives to the left,
other consonant clusters to the right). (Sub)components of the clusters (Consonant 1, Consonant
2 and Consonant 3 are shown as separate bars (further see text).
Results in Figures 8.5-6 show that Sudanese EFL learners tend to produce inaccurate
English coda consonant clusters. This is observed in the production of several cluster
consonants. The production of the English coda cluster /DF/ implies inaccuracy of
acoustic cue implementation compared with those of the native speakers in Figure 8.4.
First, whilst the native speakers tend to make a longer silence duration (177 ms),
Sudanese EFL learners tend to make a shorter duration (118 ms). Second, /DF/
production revealed that the speaking style of the Sudanese learners differs from that of
the native speakers. The findings do not show much difference in terms of cluster types,
which suggests that similar production strategies are used irrespective of cluster type.
Findings based on the production of English consonant clusters support the hypothesis
that Sudanese native speakers of Arabic have difficulty with English consonant clusters.
The reversed aspiration process in plosives /R/ and /V/ preceded by /U/ in initial
English consonant clusters such as /URT, URN, UV/ read by Sudanese EFL learners (see
Figures 8.2-3-4) is most likely due to phonological differences which exist between
168 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
English and the learners’ L1. As the data of the native RP speakers shows, in English
the voiceless /R, V, M/ are aspirated at the beginning of a (stressed) syllable but remain
unaspirated when in final position or when preceded by tautosyllabic /U/ (e.g. Spencer
1996). On the other hand, the Sudanese EFL learners’ plosives /R, V/ following the
fricative /U/ are strongly aspirated, whilst /U/ itself has a weak and short frication (see
Figure 8.4). As related research showed, the L1 stress system is responsible for such a
type of problem since aspiration is dependent on stress (Spencer 1996). In Sudanese
Arabic (and essentially in traditional Arabic syllable structure) there are certain prefixes
such as in in-ka ¥tal ‘was killed’, in-ka ¥sara ‘was broken’ and in -tab ¥dala ‘exchanged’, etc.,
where stress (indicated by ‘¥’) lodges on the second syllable, so that the vowel in the first
syllable escapes stress (Kenstowicz 1994). Thus, due to interference of this L1 syllable-
structure rule, Sudanese EFL learners’ plosives have strong aspiration, while /U/ has a
weak friction. Therefore, it is probably this point of syllable structure that accounts for
the contrast above. Moreover, this view would also account for the absence of vowel
epenthesis in my results as a repair strategy for Arabic EFL speakers, which view
dominates the previous literature. Note, once again, that the acoustic analysis provided
in the present chapter has not indicated any phonetic properties suggesting vowel
epenthesis. 25 That is, there is no epenthetic vowel in the English initial fricative+plosive
clusters of the learners, but an incorrect articulation of initial English fricative+plosive
clusters with a weak friction of English /U/ followed by a strong aspiration of /R, V, M/.
The findings also imply that the occurrence of vowel epenthesis in the production of
English clusters, which is hypothesized to be due to L1 transfer, can be reduced by
different factors such as the learners’ knowledge, practice, modification, etc., of English
consonant clusters. Similar results were reported by related research in which post-test
results revealed that Arab EFL learners produced lower error rates in English
consonant clusters compared to their pre-test findings where they showed low and less
accurate performance. The results show that the EFL learners managed to explore the
phonotactic constraints of English better after exposure to a small amount of training,
which provides a shortcut to using English constraints. This means that teaching
phonotactics guided the learners’ attention to the presence of such cues. As non-native
learners transfer their L1 phonotactic constraints to English, phonotactics should
represents an important part of L2 ear training and pronunciation programs. This
conclusion indicates that appropriate practice would help EFL learners to perceive and
pronounce without epenthetic vowels, the legal English consonant clusters that are
illegal in their native language. Moreover, training might play a role in limiting L1
transfer in auditory processing (Al-jasser 2008). 26
25 It is hypothesized that if a schwa occurs between two cluster members, the vocal tract between
the constrictions of the two consonants will be sufficiently open for a vowel to be perceived.
Moreover, the tongue shapes arise if the output of the phonology for CC word-initial clusters
does not actually include a schwa gesture with its own target (Davidson et al. 2004).
26 Actually, vowel epenthesis has been observed in the English of native Arabic speakers. Studies
demonstrated that syllable structure changes by syllable preference laws where, if a change
compromises syllable structure, it is not a syllable change but a change of some other parameters
that may affect syllable structure. Observations made in the relationship between the members of
consonant clusters reveal that closed syllables trigger errors among speakers who come from
languages lacking such types of syllables. Most closed syllables targeted are modified by the L2
speaker through epenthesis or deletion of vowels, due to L1 transfer. This appears among many
CHAPTER EIGHT: ACOUSTIC ANALYSIS OF ENGLISH CONSONANT CLUSTERS 169
The influence of English spelling can often add to incorrect production of English
consonant clusters. While the English spelling system is complex, colloquial Sudanese
Arabic has a simple phonetic spelling system, which follows a direct letter-sound-
correspondence. Therefore, to pronounce words phonetically, my learners get serious
intelligibility problems due to spelling differences (see also Mohamed 2005, Patil 2006).
These errors are attributable to the unfamiliarity of the learners with English consonant
clusters or to their slow speaking style. To avoid such problems, learners have to
improve their abilities to produce English cluster consonants. They need to be mentally
prepared for a major shift in articulation. This requires that both instructors and
learners must be cognitively aware of the existence of clusters as complex consonantal
entities, which necessitates additional perceptual effort and conscious articulatory focus.
native Arabic speaker groups. However, when learners have had considerable exposure to
English, this phenomenon diminishes (Carlisle 2001).
Chapter Nine
Intelligibility assessment:
written questionnaires
9.1 Introduction
This chapter used a written questionnaire that asked informants overt questions about
their speech intelligibility problems focusing on pronunciation and perception abilities
that represent major components of intelligibility. The questionnaire invited both
Sudanese University EFL learners and their teachers to delineate these problems giving
details about their nature, causes and the contribution of the courses taught, and so on.
9.2 Objective
The questionnaires in this study aim to provide feedback about the speech intelligibility
problems of Sudanese University EFL learners. The feedback comes in the form of
impressions and judgments provided by both EFL learners and teachers. Part of the
questionnaire data is also intended to provide background information about speech
intelligibility problems, contributing in this way to the literature.
Moreover, the assessment of the data acquired from these informants may yield insights
which probably support or refute the conclusions arrived at by other speech
intelligibility measurements adopted in the study.
9.3 Subjects
Data was collected from twenty respondents including a number of ten Sudanese
university EFL learners preparing for their bachelor degree at Gadarif University and
ten school and university teachers of English. Because of resource constraints, the
respondents were sampled purposively (Trochin 2006, Reimer 2008). This approach of
sampling corresponds to statistical tables for the estimation of the sampling error.27
27
Sampling accuracy refers to the measurement of variance that occurs around the estimated
statistics treating a given sample of population. Sample accuracy is important where statistical
tables can show the degree of precision (sampling errors) that is obtainable for samples of
different sizes (e.g. the sampling error of sample size between 10 or 90 equals 3.0%). Interestingly,
we can obtain accurate results with a small sample size. The achievement of accuracy depends on
how the sample chosen is a truly representative of the population. When invalid populations are
used, erroneous predictions occur. Moreover, sample sizes should be determined by theoretical
requirements like the precision of the sample operation and ultimately constraints of time and
cost (McCollough and Van Atta 1963).
CHAPTER NINE: INTELLIGIBILITY ASSESSMENT, WRITTEN QUESTIONNAIRES 173
The content of this test included sample behaviour of the syllabus taught. These were
basic principles of English phonetics and phonology, and perception and production of
English speech sounds. The content included basic English phonology principles such
as phonemes, allophones and acoustic cues, accompanied by practice activities. The
content of the teacher questionnaires included items such as principles of English
pronunciation, perception and production matters and intelligibility problems. It also
covered areas such as structure of the curriculum and the teaching methods of English
and the students’ performance.
Although the reliability issue applies mostly to research results and conclusions, I
considered it desirable at the time of the questionnaire design to have reliable (accurate)
tests from an earlier stage and to avoid running the risk of missing data on any relevant
research question. Usually, the reliability of the data is determined according to the
frequency of choices. The more often the item is chosen from among the options given,
the more reliable it is. This is because the more agreement of data sources on a
particular issue, the more reliable the interpretation of the data.
In the data display, choices construe the total means and standard deviation of the
performance of the subjects in each item. The data display of the items of the two
questionnaires are arranged into three groups of tables, in terms of their domain, i.e. (i)
174 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
general matters, (ii) the perception and (iii) the production of English speech sounds,
respectively. This arrangement achieves clarity in data presentation and makes it easier
for comparison.
The subjects were asked to write down their answers on the right place, tick, cross or
account for matters raised in the test. Some students had difficulty in replying to the
test items; e.g., they misunderstood the questions or provided inappropriate answers.
Therefore, I helped them continue performance by translating the test item into Arabic.
Translation and elucidation of test items took place inside the classroom for all subjects
so that the students share an equal understanding of the test items. On the other hand,
teachers were well aware of the topic of the questionnaire and they provided useful
information. Most of them found completing the questionnaire both demanding and
useful, yet it took some of them up to three months to hand in the answer sheets.
This section describes the scoring procedures, which were applied to the questionnaire
data of both the students and teachers. There are marks for each test item, which are
assigned by a number of grade descriptors, e.g., good, weak, etc. The concepts of these
grades are either frequency (often, rarely, etc.) or quality (weak, good, etc.) grades. In
more detail, marks were assigned by figures that range between 5 and 1 where 5
represents the highest mark, while 1 represents the lowest mark. Thus, the grades are
interpreted as follows: (i) in the case of quality grades, 5 equals A [full mark/excellent],
4 equals B [very good], from 2.5 to 3 equals C [good] and marks from 1, 2 till 2.4 equal
D/E [weak/not].
On the other hand, (ii) the occurrence or absence of a language phenomenon such as
an error, problem or difficulty mostly deals with frequency grades, and is scored as
follows: Grade 5 equals A [permanently], which means that this phenomenon always
occurs. Grade 4 equals B [frequently], which means this phenomenon often occurs.
Grades from 2.5 to 3 equal C [neutral with respect to frequency] and marks from 2.4
until 2.0 equal D and 1 equals E, which latter two grades are interpreted as [rarely] and
[none], respectively. Notice that in all cases, the tables below present the results in in
terms of scale values highlighting the most frequent responses.
I will now present the results of the questionnaire obtained from the students. I will
first deal with the items that ask about general matters, then deal with questions relating
to perception problems and finish with the items that ask about production problems.
CHAPTER NINE: INTELLIGIBILITY ASSESSMENT, WRITTEN QUESTIONNAIRES 175
Since the sample of respondents is fairly small, it is important to ascertain that the
respondents show at least a reasonable agreement amongst themselves. Agreement is
expressed in terms of the reliability coefficient called Cronbach’s alpha. The coefficient
computed for the ten respondents was D = 0.860, which shows that the level of
agreement was good.
This section will present the results of the students’ level of intelligibility and their
impressions of the courses taught.
Table 9.1 presents the assessment of the students of intelligibility and the courses
taught components. The table shows the distribution of the four responses per item
over the five scale values, as well as the mean and standard deviation of the scale values.
Table 9.1 Student responses to the four questionnaire items that pertain to general matters. The
table shows the distribution of the four responses per item over the five scale values, as well as
the mean and standard deviation of the scale values. The modal (most frequent) response
category is highlighted in the table. See appendix 9.1a for a verbatim copy of the questionnaire
items.
Scale value
No. Item
1 2 3 4 5 mean SD
1.1 Understand spoken English 2 0 8 0 0 2.6 0.84
1.2 Native speaker understand you 1 3 4 2 0 2.7 0.94
1.3 Practical & interesting courses 0 1 6 0 3 3.5 1.08
1.4 Relevant & authentic courses 0 1 5 1 3 3.6 1.07
Generally, the results of the questionnaire in Table 9.1 show that the Sudanese
university learners of English have difficulty in identifying English speech sounds. The
subjects claim to habitually face intelligibility problems; however, they show a positive
impression about the phonology and phonetic courses to be learnt.
The section below, will present the students’ results for the extent of difficulty and
success that the learners experience in the perception of the English phonemes.
Table 9.2 presents the responses to the ten questionnaire items that pertain to speech
sound perception. It shows the distribution of the ten responses per item over the five
scale values, as well as the mean and standard deviation of the scale values. The modal
(most frequent) response category is highlighted in the table. For tabulation purpose,
some items in the table are identified by only one word; see Appendices 9.1a-b for the
verbatim text of each questionnaire item.
176 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 9.2 Distribution of responses of the students in the written survey about the intelligibility
problems they experience. The table shows the distribution of the four responses per item over
the five scale values, as well as the mean and standard deviation of the scale values. The modal
(most frequent) response category is highlighted in the table. See appendix 9.1a for a verbatim
copy of the questionnaire items.
Scale value
No. Item
1 2 3 4 5 mean SD
1.3.1 Successful in perceiving E. consonants 0 1 9 0 0 2.9 0.31
2.3.2 Plosives 0 2 6 2 0 3.0 0.66
3.3.3 Fricatives 0 4 5 1 0 2.7 0.67
4.3.4 Nasals 0 4 4 2 0 2.8 0.78
5.3.4 Approximants 1 1 6 2 0 2.9 0.87
6.3.6 Difficulty with final clusters 0 1 9 0 0 2.9 0.31
7.3.7 Difficulty with initial clusters 1 2 5 2 0 2.8 0.91
8.3.7 Difficulty to distinguish short vowels 0 1 7 2 0 3.1 0.56
9.3.7 Difficulty to distinguish long vowels 0 6 3 0 1 2.6 0.96
10.3.7 Difficulty to distinguish diphthongs 7 3 0 0 0 1.3 0.48
The results in Table 9.2 show that my students have difficulty recognizing short vowels,
long vowels and diphthongs. Moreover, the short vowels and diphthongs are more
problematic than the short vowels. More importantly, the results reveal that the
subjects do not report serious problems in the perception of English consonants,
although fricatives and nasals were often found to be a bit difficult. The performance
on initial cluster consonants is less problematic than on final clusters. Thus, these
results indicate that the English consonants are more intelligible to Sudanese listeners
of English than the vowels and consonant clusters.
In this section, I will present the results of the learners’ production of English speech
sounds. These cover the learners’ ability of correct L2 sounds production and level of
intelligibility.
Table 9.3 presents the types of problems the students experience in producing the
English speech sounds. It also gives background information on the level of success
these students think they achieved in learning English speech sounds and the effect of
their L1. Table 9.3 shows the distribution of the ten responses per item over the five
scale values, as well as the mean and standard deviation of the scale values. The modal
(most frequent) response category is highlighted in the table.
CHAPTER NINE: INTELLIGIBILITY ASSESSMENT, WRITTEN QUESTIONNAIRES 177
Table 9.3 Student responses to the questionnaire items that pertain to speech sound production.
The table shows the distribution of the four responses per item over the five scale values, as well
as the mean and standard deviation of the scale values. The modal (most frequent) response
category is highlighted in the table. See appendix 9.1a for a verbatim copy of the questionnaire
items.
Scale value
No. Item
1 2 3 4 5 mean SD
5.2.5 Problems with. pronunciation 0 4 0 6 0 3.2 1.03
6.2.6 Difficulty experienced with E. sounds 0 1 9 0 0 2.8 0.33
1.4.1 How successful in producing E. cons 0 0 10 0 0 3.0 0.00
2.4.2 Successful in producing E. plosives 2 0 7 1 0 2.7 0.94
2.4.3 Successful in producing E. fricatives 1 1 5 3 0 3.0 0.94
2.4.4 Successful in producing E. nasals 1 4 5 0 0 2.4 0.69
2.4.5 Difficulty in producing E. approximants 3 1 5 1 0 2.4 1.77
3.4.3 Difficulty in producing E. plosives 1 2 4 3 0 2.9 0.99
4.4.4 Difficulty in producing final clusters 0 2 6 2 0 3.0 0.66
5.4.5 Difficulty in producing short vowels 2 4 4 0 0 2.2 0.78
6.4.6 Difficulty in producing long vowels 4 4 2 0 0 1.8 0.78
7.4.7 Difficulty in producing diphthongs 7 3 0 0 0 1.3 0.48
9.4.9 Difficulty to pronounce cloth, rich, chair, etc. 0 2 6 2 0 3.3 0.82
10.4.10 Difficulty to pronounce words with silent letters 0 1 6 2 1 2.8 1.22
11.4.11 Difficulty to pronounce here, there, three, final /r/ 0 0 4 0 6 4.2 1.03
12.4.12 Words ending in -ary, -ory, -able 2 1 4 1 2 3.0 1.41
13.4.13 Learning E. pronunciation improves intelligibility 0 1 2 2 5 5.0 0.00
8.4.8 Mother-tongue affects E. pronunciation 2 0 7 0 1 2.6 0.84
A reliability analysis for the teachers’ responses to the questionnaire was done first.
Cronbach’s alpha was computed as before but turns out to be rather low in the case of
the teacher responses, D = 0.553, which shows that the level of agreement was poor to
178 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
moderate at best. Closer inspection of the reliability data reveals that one respondent
correlated negatively with each of the other nine teachers. Therefore, I eliminated the
single contradictory respondent and recomputed alpha, which then rose to D = 0.616,
which is at least a moderate reliability.
I will now present the results of the questionnaire obtained from the teachers. I will
first deal with the items that ask about general matters, then deal with questions relating
to perception problems and finish with the items that ask about production problems.
Table 9.4 presents the responses about the courses, learning strategies and intelligibility
components of the survey, which was conducted through teacher assessment of student
performance.
Table 9.4 Distribution of teacher responses to the four questionnaire items that pertain to general
matters. The table lists the responses per item over the five scale values, as well as the mean and
standard deviation of the scale values. The modal (most frequent) response categories are
highlighted in the table. See appendix 9.1b for a verbatim copy of the questionnaire items.
Scale value
No. Item
1 2 3 4 5 mean SD
1.1 How intelligible are the students? 4 3 2 1 0 2.0 1.05
1.2 Is intelligibility pronunciation-related? 1 2 6 1 0 2.7 0.82
1.3 Are learning strategies effective? 1 4 2 2 0 2.6 1.01
1.4 Relevant & authentic courses? 0 2 5 3 0 3.1 0.74
The results of the teacher questionnaires (Table 9.4) show that teachers think
favourably of the courses and the learning strategies, which indicates that the courses
are effective and are urgently needed for the achievement of speech intelligibility. The
teachers’ assessments also reveal a tight relationship between pronunciation and speech
intelligibility problems. These results support the students’ findings (cf. Tables 9.1 and
9.4).
I will now present the results of the questionnaire obtained from the teachers. The
results deal with problems facing the learners in identifying English speech sounds.
Table 9.5 presents the responses to questionnaire items referring to learning difficulties
and the effects of some linguistic factors on intelligibility of Sudanese learners of
English. The components of the survey were conducted through teacher assessment of
student performance.
CHAPTER NINE: INTELLIGIBILITY ASSESSMENT, WRITTEN QUESTIONNAIRES 179
Table 9.5 Distribution of responses of the instructors in the written survey about the intelligibility
problems facing Sudanese EFL students. The table lists the ten responses per item over the five
scale values, as well as the mean and standard deviation of the scale values. The modal (most
frequent) response category is highlighted in the table. See appendix 9.1b for a verbatim copy of
the questionnaire items.
Scale value
No. Item
1 2 3 4 5 mean SD
2.1 Difficulty to regroup with same vowel/consonant sound 0 6 2 2 0 2.6 0.84
2.2 Difficulty to find out an odd vowel/consonant sound 1 5 4 0 0 2.3 0.68
2.3 Difficulty to discriminate between voiced/voices cons. 1 0 6 3 0 3.1 0.88
2.4 Difficulty perceiving E. final clusters 1 2 6 1 0 2.7 0.82
2.5 Difficulty perceiving initial clusters 1 2 6 1 0 2.7 0.82
2.6 Difficulty to distinguish E. short vowels 0 7 2 0 1 2.5 0.97
2.7 Difficulty to distinguish long vowels 4 4 2 0 0 1.8 0.79
2.8 Difficulty to distinguish diphthongs 2 6 2 0 0 2.0 0.67
2.9 Degree of perception errors due to L1 interference 0 4 5 0 1 2.8 0.92
2.10 Degree of perception errors due to lack op L2 knowledge 0 4 5 1 0 2.7 0.68
The results of the teacher questionnaires reveal that Sudanese listeners of English
encounter difficulties recognizing English speech. According to the assessment of the
language teachers (Table 9.5), the subjects concerned repeatedly make errors in the
perception of the English phonemes. The English vowels are reported as the most
difficult to understand, i.e., the listeners’ level of perception of the short, long and
diphthongal vowels is claimed to be poor. Despite the fact that the single and cluster
consonants of English are a bit more intelligible than vowels, these too constitute a
perception problem. The instructors claim that they regularly face difficulty on the part
of their students regrouping and sorting out the words with the same consonant sounds;
minimal pairs or quartets. It is worth noting that the feedback of the students and the
teacher’s questionnaires reflect the same judgment viz. that English vowels are more
difficult to understand than the consonants. (cf. Tables 9.2 and 9.5).
I will now present the results of the questionnaire obtained from the teachers that deal
with problems facing the learners in producing English speech sounds.
Table 9.6 presents the responses about learning difficulties and the effects of some
linguistic factors on intelligibility of Sudanese learners of English. The components of
the survey were conducted through teacher assessment of student performance.
180 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 9.6 Distribution of instructors’ responses to the questionnaire items that pertain to speech
sound production of Sudanese EFL students. The table lists the ten responses per item over the
five scale values, as well as the mean and standard deviation of the scale values. The modal (most
frequent) response category is highlighted in the table. See appendix 9.1b for a verbatim copy of
the questionnaire items.
Scale value
No. Item
1 2 3 4 5 mean SD
2.1.4 To pronounce fricatives /U, \/, /6, &/, /H, X/, /5, </ 1 6 3 0 0 2.2 0.63
2.1.5 To produce a consistent vowel quality 4 6 0 0 0 1.6 0.52
3.3.1 Difficulty in producing initial E. clusters 2 5 2 0 1 2.3 1.16
3.3.2 Difficulty in producing final clusters 2 4 4 0 0 2.2 0.79
3.1.1 Difficulty in producing short vowels 2 6 1 0 0 1.9 0.60
3.1.2 Difficulty in producing long vowels 2 5 2 0 1 2.3 1.16
3.1.4 Difficulty in producing diphthongs 4 5 1 0 0 1.7 0.68
2.2.1 Mother-tongue interference 0 3 6 1 0 2.8 0.63
2.2.2 Use universals to achieve intelligibility 3 5 1 0 1 2.1 1.20
2.2.3 Avoid difficult sounds 3 2 2 2 1 2.6 1.43
2.2.4 Overgeneralisation 2 3 3 1 1 2.6 1.27
2.2.5 Substitute sounds of L1 for L2 2 5 2 1 0 2.2 0.92
2.2.6 Ability to dissociate sounds of L1 for L2 3 5 2 0 0 1.9 0.74
As the assessment by Sudanese English language teachers shows (Table 9.6), Sudanese
EFL learners have little knowledge of the English vowels and they are weak in the
pronunciation of such vowels, but they show an acceptable level in producing the
English consonants. These instructor assessments clearly converge with the results of
the student questionnaire; the students claim to frequently face difficulties in the
pronunciation of short, long and diphthongal vowels of English, but they assert having
no serious problems in the production of the consonants (compare: Tables 9.3 and
9.6 ). The results also reveal that the speakers frequently substitute L1 for L2 in their
production of the En1glish sounds.
It could be observed in the preceding sections that the Sudanese EFL students and
their instructors often agree on which aspects of English pronunciation and listening
ability are easy or difficult. This is a good thing. It would be highly undesirable if
students have a completely different view than their instructors of their strengths and
weaknesses. In order to quantify the degree of correspondence between student and
teacher judgments, Figure 9.1 presents the mean rating of the students and teachers on
the intelligibility of Sudanese learners of English in a scatterplot for the 15 question-
naire items that are shared between students and instructors. The black line which runs
across the figure, shows the linear relation between the students’ responses and the
corresponding responses given by the instructors. The dotted line is a reference line
which defines positions in the graph where students and instructors would have given
the same evaluation of the students’ performance.
CHAPTER NINE: INTELLIGIBILITY ASSESSMENT, WRITTEN QUESTIONNAIRES 181
Figure 9.1 Total mean rating of the two subject groups shown by the scatter plot. Data points
tightly cluster around the line is indicative of a positive correlation between the students and
instructors’ results.
Figure 9.1 shows that there is a moderately strong linear relationship between the
students’ and the instructors’ judgments, with a significant positive correlation of r
= .569 (p < .01, one-tailed). This indicates that the self-rated performance by the
students and the assessment of the students’ performance by their instructors
correspond reasonably well to each other, although the students tend to have a more
optimistic view of their proficiency than their instructors have, as is evidenced by the
fact that the majority of the scatter points in the graph lie above the reference line.
of the explicit language knowledge. Many results in the previous literature support their
conclusions (Mohammed 1991, Fahal 2004).
The results also revealed that RP long and diphthong vowels proved to be more
difficult to learn than short vowels. Similar results were reported in related studies
where the native speakers of Arabic have difficulty distinguishing between English
central and back vowels such as /#, nÖ, 7/ as in cot, caught and boat, all of which are often
pronounced as /nÖ/ due to absence of these vowels from their L1 (Brett 2004).
On the other hand, the results of both the students and the language teachers suggest
that the English single and cluster consonants are comparatively better perceived and
produced by the Sudanese learners than the vowels. This is probably because the
learners are more familiar with consonant sounds than vowels.
An interesting finding is that the content of the phonology and phonetics syllabus
taught are assessed as effective and feasible as the data show. This finding gives a hint
of inconsistency between the students’ performance and their assessment of the
syllabus taught. In other words, the data reveal that the students’ scores of the English
speech sounds are generally low, especially those of the vowels, whilst the courses are
referred to as practical and interesting. It is probably because other linguistic aspects
such as insufficient cognitive knowledge or communicative context, etc., contributed to
this problem.
Moreover, the output of the coefficient correlation shows a positive relation between
the performance of both the students and teachers at r = .569 (p < .025, one-tailed)
which indicates that both the students and the teachers are in conformity with each
other in terms of the feedback obtained through the questionnaires.
The correlation also revealed that the students’ results concur with those of the teachers;
however, the students’ judgments tend to be higher than those of the their instructors,
probably because the former are not critical enough of their own level of achievement.
Chapter Ten
Conclusion
10.1 Introduction
Experimental work treats two language domains, L2 speech production and perception,
which represent the major components of speech intelligibility. Specifically, the
investigation targets segmental analysis of the English vowels, single and cluster
consonants, which form the basic building blocks of spoken words. This is because
(some) linguists assume that more than 50 percent of the intelligibility of a spoken
utterance depends on correct sound production (Fraser 2005) rather than on other
matters such as incorrect syntax and morphology.
This chapter identifies and elucidates the issues approached in the present study, on the
problems of speech intelligibility among Sudanese university EFL learners. The study
has yielded a large amount of information concerning the topic at hand. Each area of
investigation in this study contributes to an understanding of the problems of speech
intelligibility facing these learners and therefore to a better understanding of how the
entire problem could be approached. The chapter will provide an account of the most
general aspects of this knowledge divided into three sections: summary, conclusion and
recommendations.
184 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
10.2 Summary
Next, the study included three production tests targeting the measurement of the
acoustic correlates of vowels, consonants and clusters spoken with Sudanese-Arabic
accented English (chapters 6, 7 and 8). In the data analysis the results obtained for the
Sudanese EFL speakers were compared to acoustic properties of the same (or similar)
stimuli produced by native speakers of RP English.
Comparison of Sudanese EFL learners’ data with native speaker control other data
forms an essential ingredient in the evaluation of both auditory and productive task
performance. The purpose of comparison is to scrutinize issues like relative accuracy,
correctness and standardization of the learners’ performance in both auditory and
productive tasks. As part of the comparison, I involved Dutch and American listeners
of English in auditory tasks to obtain a better understanding of the learning problems
investigated. Written questionnaires were also used (chapter 9) as part of an assessment
process that asked the participants (both Sudanese EFL students and their teachers) to
CHAPTER TEN: CONCLUSION 185
reply to questions and perform tasks. The purpose of the written questionnaires is to
supply data about the same field of investigation but with a different instrument.
Moreover, the data may reinforce the findings obtained from the perception and
production tests. The final objective of the accounts is to give insight into the nature
and causes of learning problems of speech intelligibility. Accounts also provide
statistical insight into these errors in terms of means, frequency and correlation for
credibility purpose. In this way, segmental error patterns manifest in the learners’
performance will present answers to the questions raised and clarify the causes of
intelligibility problems experienced by Sudanese EFL learners.
There is a clear convergence in the results of the tests throughout the study. Moreover,
the findings support many tendencies reported in previous literature.
10.3 Conclusion
This section provides answers to the questions that are raised in the study. It also
provides miscellaneous conclusions in other aspects of the research that do not address
specific research questions.
The findings of this study reveal that Sudanese EFL learners face speech intelligibility
problems. Relatively, they experience difficulties in recognizing and producing native
English speech. The learners’ perception level (e.g. segmental intelligibility as quantified
by means of the Modified Rhyme Test) of English speech sounds is low. Their mean
correct scores in the identification test of the English vowels, codas in single and
consonant clusters, and word recognition in SPIN sentences, which represent the most
problematic areas, are 47.8, 66.0, 71 and 33%, respectively (Chapter 3). More
importantly, as EFL listeners, the learners produced different types of error patterns
when they are involved in interaction with the native speakers of English. These errors
included the confusion of the English /¡/ and /n/, substitution of /7/ for /WÖ/ and /H/
for /X/ or confusing the coda clusters /UV~UM/ and /DF~NF/, etc. These results
demonstrate that speech intelligibility may vary significantly depending on the speech
sounds present in the native language. The properties of the native language of the
learner and those of the target language determine the direction of difficulties and
pattern of errors that learners experience with L2 learning. This was observed in the
learners’ acoustic results of the English vowels, as shown by automatic classification
through Linear Discriminant Analysis (LDA). The classification data indicates that
vowels are problematic and they reveal a variation of error patterns like the confusion
of /WÖ/ as /nÖ/ and substitutions of /«Ö/ for /3, ¡,G, or n/, /¡~n/, etc. As the study
concludes, these patterns of errors indicate that the learners apply their L1 strategies to
the learning of English speech sounds.
Speech intelligibility problems also arise when Sudanese EFL learners are involved in
interaction with native British and American listeners. The data reveal that such
problems occur because the learners produce incorrect English speech sounds. For
186 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
example, the confusion of /3~¡/ and /n~7/ occurs due to phonological differences
between L1 and L2. The learners also often make production errors of a non-phonetic
nature such as those that relate to the difference of spelling systems between English
and Arabic. For instance, front vowel /G/ is frequently replaced by /+/ due to the
influence of the Sudanese-Arabic spelling system (Chapters 4, 5 and 6). Similarly, the
substitutions of the English consonants, e.g. /U/ for /6/, /V/ for /F/ etc., frequently
occur, particularly between L1 and L2 phonemes which share similar phonetic features.
As the study reveals, the ultimate causes of these problems are that the native listeners
are not able to determine the strategies in which the sound structures of the learners’
speech work. Thus, the native listeners’ failure to discover the systematicity of the
learners’ speech production makes it difficult for them to interpret the speech signal
correctly. The study also reveals that Sudanese-Arabic accented English deviates from
the native norms of English. Deviations become manifest primarily when the learners
attempt to produce English speech, where many systematic errors occur across sound
categories.
How intelligible are Sudanese university EFL learners to native English listeners? The
answer to this question accounts for productive intelligibility of Sudanese EFL learners
to native speakers of English. The learners show various levels of speech intelligibility
to the native listeners of English. Variation depends on the types of English speech
sounds and tasks involved. To start, both British and American listeners face
perception problems with practically all EFL vowels, part of the onset and coda
consonants and clusters produced by the Sudanese learners. The English speech sounds
which were produced by Sudanese EFL learners’ were identified by both British and
American listeners less successfully than when the same test items were read by a native
speaker of RP English.
Figure 10.1 below summarizes the differences in intelligibility of the Sudanese and
native speakers of English, as established from the responses given by British and
American native listeners.
Moreover, these results concur with the data obtained from the SPIN sentences where
Sudanese learners show lower intelligibility scores, of 69.2 and 64.8% with native
British and American listeners, respectively. In the SPIN words, the vowel nuclei
proved to be more difficult than singleton and cluster consonants (see Figure 10.1, see
also Chapter 5).
On the other hand, Sudanese EFL listeners have difficulty in understanding native (RP)
English speech. The lowest perception scores were found for the English vowels
(around 48% correct) and for word recognition in the SPIN test (around 30% correct).
Moreover, the perception of the coda consonants and clusters proved more difficult
than that of single onset consonants and consonant clusters: 66 and 71% against 94
and 75%, respectively (Chapter 3). The negative correlation of r = –.682 (p < .05)
between identification scores obtained for vowels and onset consonants indicates that
CHAPTER TEN: CONCLUSION 187
poorer identification of vowels goes together with better results for onset consonants.
On the other hand, vowels rather than consonants displayed a fairly high positive
correlation with correct word identification (r = .700, p < .01). It would seem therefore
that correct vowel identification is a more important determinant of word recognition
than identification of either onset or coda consonants. This conclusion may not be true
of word recognition in general but it seems valid in the special situation where native
English listeners are confronted with Arabic-accented English, in which the quality of
the consonants is generally better than that of the vowels.
Figure 10.1 Summary of perception differences of vowels, consonants and clusters of English
spoken by a Sudanese EFL learner and a British speaker of English.
The vowel results in Chapter 6 also respond to the question raised above. The chapter
examines the intelligibility of Sudanese EFL learners to native listeners of English,
where English listeners were simulated by Linear Discriminant Analysis (LDA). The
results of an acoustic analysis of the English vowels spoken by Sudanese EFL learners,
appear to be relatively similar to their counterparts in Chapter 5 where the same vowel
tokens were identified by native English listeners.
These conclusions suggest that although the learning of the English speech sounds is
problematic in general, vowels in particular form a major element blocking intelligibility.
Moreover, there is consistency with the previous studies where non-native English
listeners have greater difficulty in decoding impoverished (LPC-resynthesized) speech
than human speech (Reynolds, Bond and Fucci 2006). Some types of speech
intelligibility problems of Sudanese EFL learners indicate their limited English skills.
Linguistically, there is distance between the learners L1 and L2, a factor that presents an
essential source of their intelligibility problems. On the other hand, native
188 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
listeners/speakers benefit from their similar national background and so they show a
high intelligibility level.
English vowel production proved to be the most difficult aspect for the Sudanese EFL
learners, as the results have shown. A fair conclusion is that these learners make
relatively more production errors in English vowels than in English singleton and
cluster consonants. Acoustically, there is a large spectral contrast between the English
vowels produced by Sudanese EFL learners and those of the native speakers. Unlike
those of the native speakers, which show similar distribution in the vowel space across
speakers, English vowel tokens of the learners show incorrect distribution in the vowels
space. The members of short/long (lax/tense) vowel pairs are closer to each other,
whilst the central and back vowels of the learners exhibit no relation to the native
English vowel repertory. Statistically, identification results obtained by Linear
Discriminant Analysis (LDA) reveal rather poor English vowel production for the
learners targeted. When the LDA was trained on RP data but tested on L2 vowels, the
correct automatic identification is only 42%. Comparison of the LDA results with those
of the human identification of the same vowel tokens (in the next paragraph), provides
additional strong support that vowels are the most difficult sound type to pronounce.
The English vowels produced by Sudanese EFL learners were correctly identified by
British and American listeners (Chapter 5) in 68 and 63 percent, respectively as
determined by the Modified Rhyme Test. Their performance on the Arabic EFL single
and consonants and clusters, is relatively better. The single consonants were correctly
identified at 85.0 and 84.8% by the British and American listeners, respectively, while
scores on clusters were 84 and 88%, respectively.
In the preceding section it was concluded that correct vowel identification correlates
significantly with the word recognition scores obtained by native English (and
American) listeners to Sudanese-accented English. No such relationship could be
established for consonant identification and word recognition. The conclusion follows,
then, that vowel pronunciation is not only the most difficult problem for the Sudanese
learners, but the errors they produce are also most detrimental to their intelligibility at
the sentence level.
On the basis of the joint evidence provided by identification by machine (through LDA)
and by human listeners, it appears that Sudanese EFL learners find the pronunciation
of English vowels the most difficult. This would imply that vowel nuclei frequently are
an essential ingredient of correct word production. One more point is that when an
English vowel represents a perception problem, it also represents a production problem.
This point confirms that, there is a relationship between the ways Sudanese EFL
learners use in learning English vowels and the patterns of errors they make - L1 effect
(Flege 1981, 1995).
Thus, these findings show consistency with the literature that found a strong
relationship between segmental errors and degradation of intelligibility. That is, the
CHAPTER TEN: CONCLUSION 189
This part gives an account of the linguistic causes of the intelligibility problems of
Sudanese learners of English.
This section addresses the phonological differences that exist between English and
Sudanese Arabic. It seeks evidence of phonemic contrasts between these languages
discussing the potential of how these contrasts affect the learning of the target language.
It is assumed that there are differences in the inventory of each of these languages that
compromise the learners’ perception and production of English speech sounds. For
more detail, see § 2.1, which presents a contrastive analysis of the two inventories.
This section seeks to account for how lack of explicit knowledge of English hinders the
intelligibility of Sudanese EFL learners. It is argued that the mastery of English
phonetics and phonology is necessary for the achievement of intelligibility (see § 2.2).
The research is motivated by what Sudanese EFL learners actually acquire or hear when
they attempt to learn English, which part of their output is deviant from the correct
norm of the target language and what the causes are (i.e., difference in sound inventories,
differences between L1 and L2 rules, and the lack of explicit L2 knowledge, etc). To
answer the questions, error analysis methods were applied as a scientific procedure that
serves to obtain credible explanations (Ellis 2003, Taylor 1986).
production of English speech sounds range between 30 % to 90% in the area of vowels,
coda consonants, onset consonants and onset and coda clusters, while minor errors
range between 10% and 20% (see Chapters 3 - 9).
Data collection and error identification. I collected samples of the learners’ language that
effectively illustrate the features of their performance in order to compile a
comprehensive list of errors. The sample involved all the results of the study in which
the learners took part as either listeners/speakers (Chapters 3 - 9). Accounts of errors
are based on a number of mechanisms that started with a procedure such as the
recognition of an error (definition) and the effect of L1 transfer where the presence of
L2 errors mirrors L1 transfer. Other mechanisms are the process of using L2
knowledge in performance, in particular data dealing with communication problems,
importance of explicit knowledge of L2 speech sounds, training transfer and the
utilization of innate knowledge of linguistic universals (unmarked or common
phenomena). In regard to these mechanisms, the performance of the learners’ output
represents an important source of evidence for speech errors that occur in at the level
of the segment, syllable and words. Related literature, observations, analyses of rigorous
research also provide data that helps to assess, to make decisions and to determine
where errors occur; i.e., which speech sounds cause students difficulties, and what their
frequency and gravity are. Similarly, data of the written questionnaires of the EFL
Sudanese learners and teachers constitute an extra source of information. Finally, the
collected data is expected to provide a deeper understanding of the nature and
classification of speech intelligibility problems of the learners concerned.
Error description. The description of the learner errors (the learner’s problems in speech
perception and production) involved a comparison with the performance of the target
language. This refers to the performance of the native listeners/speakers who
participated in the study as control groups, the other groups involved and related
previous studies. Error description in this context identified problem areas like
confusion or substitution of speech sounds, etc.
Error explanation and classification. This refers to the description of the source of the
problems of speech intelligibility, which Sudanese EFL learners faced. It is an attempt
to establish the processes that are expected to be responsible for the occurrence of
these errors. Then a tentative classification of the errors follows aiming at identifying
the nature of the source of such errors. Classification treats types of errors such as
interference error (reflecting the L1), intralingual error (reflecting failure to learn, or
partial/incomplete learning of a rule), developmental error (reflecting errors that occur
while a learner is building – faulty – hypotheses about L2), etc. Tables 10.1-2 below
provide accounts for the causes of the errors/problems of the learners. In Table 10.1,
two gross error categories will be distinguished which describe the pattern observed in
the target language, i.e. English. In the error pattern which will be called ‘confusion’,
two sounds are used interchangeably as response categories. This is a symmetrical
confusion pattern. I reserve the term ‘substitution’ for asymmetrical confusion patterns
whereby a sound that should be perceived as phoneme /Z/ is (more or less)
consistently perceived as a token of phoneme /y/ but not vice versa.
CHAPTER TEN: CONCLUSION 191
Table 10.1 Causes of errors and/or speech intelligibility problems experienced by Sudanese EFL
learners in this study with focus on perception problems.
Perception Errors
No. Category Description Example Explanation
1. Confusion Listeners fail to dis- - /¡~n/ (L1 interference) when L2 knowledge
criminate between central - /#Ö~nÖ/ is lacking learners fall back on the
and back vowels - /n~¡/ habits of their L2
2. Confusion /«Ö/ is misperceived in - /«Ö/ as Involvement of L1 (due to partial
words like work/worse /n, ¡, G/ learning or insufficient L2
knowledge)
3. Confusion /G/ and /+/ are misper- - /G~+/ or Partial learning/transfer of L1 ortho-
ceived interchangeably as - /+~G/ graphy, incorrect perceptual repre-
in enter, pet. sentations
4. Confusion Listeners fail to - /#7~7/ Partial learning and L1 interference
distinguish between such - /+~G/
vowel tokens. - /C+~G+/
5. Substitution Listeners mistake the - /T~Y/ Due to close F2 and F3 (partial
English onset /T/ for learning or small L2 knowledge)
Arabic/Y/.
6. Substitution Listeners fail to distin- - /I~M/ Voicing feature resists learning due to
guish between such pairs. - /F~V/ insufficient knowledge
- /H~X/
- /\~U/
7. Substitution Listeners hear English - /6~U/ Incorrect perceptual representations
/6/ as /U/ Learners carry over L1 phonetic
habits into English.
8. Substitution Phonological alternations - /MN~IT/ The speech signal not detected well
or misperception of a - /UN~UP/ (lack of L2 experience or unfamiliar-
cluster or one of the - /URT~URN/ ity or place/manner of articulation
cluster members - /P\~O\/ effect)
- /UV~UM/
In Table 10.2, which summarizes error patterns found in the EFL production data of
the subjects, similar terminology is used. Here the pattern which is called ‘confusion’,
denotes a symmetrical error pattern: two phonemes which should be kept distinct in
English are used indiscriminately. In the substitution pattern phoneme /x/ is used (and
perceived as such by native English listeners) when phoneme /y/ should be used but
not vice versa.
192 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Table 10.2 Causes of errors and/or speech intelligibility problems experienced by Sudanese EFL
learners in this study with focus on production problems.
Production Errors
No. Category Description Example Explanation
1. Confusion Learners fail to discriminate - /n~¡/ Incorrect perceptual
between central and back - /«Ö~#Ö/ representations (L1 effect and
vowels - /n~7/ lack of L2 knowledge)
- /'~#Ö/
- /C7~7/
2. Substitution Learners fail to discriminate - /3>¡/ (Incorrect vowel source) L1
between English fully front interference
and central vowels
3. Substitution Learners fail to discriminate - /G>+/ spelling/graphical differences
between front vowels between L1 & L2
4. Substitution Diphthongs rendered to - /G+>G/ L1 interference (Sudanese
monophthong Arabic vowel source)
5. Substitution Voiced & voiceless - /6>U/ Incorrect representations of
fricatives are substituted in - /&>\/ English fricatives due to L1
initial and final positions. filter effect (also reduced
acoustic contrast in COG)
6. Substitution Learners fail to distinguish - /\>U/ Lack of clear distinctive
between consonants of the - /0>P/ voicing feature (lack of L2
same place of articulation exposure)
7. Substitution Learners show no clear - /MN>IN/ Weak explosion of the voice-
distinction producing either in onset less velar: phonotactic re-
the first or second onset strictions between L1 and L2
cluster member.
8. Substitution Learners show no clear - /PV>0M/ Unclear voicing feature:
distinction producing either - /P\>O\/ phonotactic restrictions
the coda first or second in coda between L1 and L2
cluster member.
9. Acoustic Learners produce incorrect - /D, R/ Involvement of L1 acoustic
feature or imprecise voice onset - /F, V/ correlates
(VOT) time (VOT), particularly in - /I, M/
coda plosives.
10. Acoustic Inaccurate or incorrect - /UN, URN, Involvement of L1 acoustic
feature production of fricative+ URT, UM/ cues (use of learner’s L1
plosive+liquid or fricative+ strategy)
liquid. Weak friction and
strong aspiration
The explanation of the sources of the speech intelligibility problems above helps to
infer the following:
The results of the acoustic correlates in this study reveal differences between the
English consonants produced by Sudanese EFL learners and those of the native
English (Chapter 7). These differences are the following:
1. The English voice onset time (VOT) produced by the Sudanese EFL learners
differs strikingly from the native pattern, in that the VOT of both the voiced and
voiceless stops falls in the short-lag range. The native speakers’ VOT is categorical,
where the voiced plosives fall within the short-lag range and the voiceless plosives
have VOT values in the long-lag range. Moreover, the learners’ VOT does not
reflect the effect of the place of articulation in which VOT should increase as the
stop consonant is articulated further back in the mouth. Acoustic differences such
as these indicate that the Sudanese-Arabic learners have difficulty in both detecting
and producing the precise voicing features of English stops.
2. Generally, the vowel duration values of the English vowels preceding obstruents
show relative correspondence to native English voicing contrast, but final
fricatives and affricates slightly differ. These differences are due to the L1
strategies and the slow speaking style of the learners.
3. The durations of the English consonants correspond to the English native norm,
where the voiceless obstruents have longer duration values than the voiced.
However, coda affricates have deviant (and probably incorrect) duration values.
4. The centre of gravity (COG) measurements reveal a relative correspondence to the
English pattern. The Sudanese-accented sibilant fricatives show spectral peaks at
relatively higher frequencies than non-sibilants, as they also do in native English
speech. This correspondence occurs probably because Arabic has many conso-
nants that resemble those of English. However, the COG values of the native
speakers are higher, in comparison to those of the Sudanese learners, possibly due
to a difference in speaking style.
Acoustic measurements of the English clusters reveal that the Sudanese EFL learners
have problems in producing English consonant clusters.
1. The learners’ plosives /R/ and /V/ in clusters beginning with the fricative /U/ are
strongly aspirated, whilst /U/ has weak frication. This contrasts with those of the
native English speakers, where the voiceless /R, V, M/ are aspirated only at the
beginning of a syllable but remain unaspirated when final or when preceded by /U/
in the same syllable.
2. The production of the English coda clusters proved to be difficult. They hardly
show a learning pattern that converges toward the native norm. The results also
show that speech intelligibility problems may have to do with the distribution of
sounds. A few errors of the English coda cluster consonants, like substitutions of
/P\~O\/ and /P\~F\/ seem to reflect the effect of differences in sound distribu-
tion between Arabic and English.
CHAPTER TEN: CONCLUSION 195
The written questionnaire data represents a useful contribution to the research. The
results strongly support the findings of the experimental chapters in the study. The
findings of both the students and the teachers show that there is a speech intelligibility
problem among Sudanese EFL learners. For example, the results reveal that the
learners have problems in recognizing native English speech sounds and they also find
it difficult to produce English short and diphthong vowels, fricatives and the nasal pair
/P~0/. However, both the students and the language teachers claim that the English
single and cluster consonants are comparatively better perceived and produced by the
learners than the vowels. The respondents attribute these problems to the lack explicit
knowledge, L1 interference and insufficient practice. A reliability analysis revealed that
the Sudanese EFL learners show a high agreement amongst themselves (Cronbach’s D
= .860), i.e. the share the same views of their strengths and weaknesses when it comes
to perceiving and producing English sounds. The agreement within the group of
instructors is lower (D = .616, even after eliminating the most a-typical respondent),
which shows that the instructors’ opinions on the students’ strengths and weaknesses
are more diversified. Nevertheless, students and instructors are in reasonable agreement
when the analysis is restricted to only those items that are shared between the student
and teacher versions of the questionnaire (r = .569), be it that overall the students rated
their level of proficiency in English more positively than was the case in the views of
their instructors.
10.3.7 Recommendations
Higher priority should be given to the production of English speech, which represents
a major learning problem for Sudanese EFL learners. In this respect, the emphasis in
production should be on getting the sounds right at the word level, dealing with words
in isolation and with words in controlled sentence environments. This way of speech
production enables learners/instructors to recognize which sounds are the most
difficult to distinguish, e.g. in minimal pairs like /n~nÖ/ as in cot/caught and /G~«Ö/ as in
bed/bird), which can have a negative impact on intelligibility when not properly dis-
tinguished.
necessary component of intelligibility in which the learners should surpass the threshold
level so that their production does not hinder their communicative abilities.
Sudanese EFL learners who are specialized in ELT at teacher colleges and education
faculties, should obtain a high level of intelligibility, since they represent a model for
English input to their students. Therefore, they should receive special assistance that
enables them to do their job properly. For example, listen-and-imitate techniques,
language laboratory exercises, free conversations, minimal pair drills, etc. are required.
Phonetic description of the articulatory system of the target language is also important
since it offers the learners an opportunity to develop explicit knowledge about the
perceptual representations of L2 sounds. This is because learners cannot produce a
speech sound correctly unless they acquire correct perceptual information about the L2.
Taking cues from the results, further large-scale and comprehensive investigations
should be conducted to cover other areas that have to do with the speech intelligibility
issue in the Sudanese EFL classroom. Therefore, research will be required in the
following themes:
Insufficient practice, wrong implementation and partial learning represent major causes
of such problems. So, a further study that treats the use of the language laboratory to
teach English phonetics and listening comprehension skills in Sudanese EFL teacher
CHAPTER TEN: CONCLUSION 197
Further study is also needed to investigate the possibility of giving more space to
English pronunciation in the curriculum. The materials and classroom activities in-
cluded in secondary and tertiary syllabi in Sudanese EFL settings scarcely incorporate
pronunciation teaching. The proposed study can focus on the teachability-learnability
scale; i.e. what English pronunciation features should be taught and how to sequence
and teach these features with consideration to the differences that exist in the learners’
L1? An important area to be considered is the segmental level, which includes vowel
and consonant sounds as well as syllables. Item sequencing in the syllabus should begin
with the basic sound knowledge which cover vowels, consonants and clusters, and
should end with words and sentences. The study should also consider to what extent
the explicit teaching of basic phonetics (for instance the organization and function of
the speech organs, such as lips, teeth, alveolar ridge, palate, tongue, vocal folds, etc.) is
helpful in the acquisition of EFL pronunciation skills.
Atechi, S. N. (2006). The intelligibility of native and non-native English speech: A comparative
analysis of Cameroon English and American and British English. Cuvillier, Goޠttingen,
Berlin.
Ball, M. J. & Rahilly, J. (1999). Phonetics: The science of speech. Oxford University Press,
New York.
Benki, J. R. (2003). Analysis of English nonsense syllable recognition in noise. Phonetica
60, 129-157.
Bent, T. & Bradlow, A. R. (2003). The interlanguage speech intelligibility benefit. Journal
of the Acoustical Society of America 114(3), 1600-1610.
Best, C. & Tyler M. (2007). Nonnative and second-language speech perception:
Commonalities and complementarities. In O. S. Bohn and M. J. Munro (Eds.),
Language Experience in Second language Speech Learning. In honor of James Emil Flege.
John Benjamins, Amsterdam, 13-34.
Bjarkman, P. C. & Hammond, R. M. (1989). American Spanish Pronunciation: Theoretical
and Applied Perspectives. Georgetown University Press.
Bobda, A. S. (2000). English pronunciation in sub-Sahara Africa as illustrated by the
NURSE vowel. A comprehensive and innovative review of speech in West,
East and Southern Africa. English Today 16, 41-48.
Boersma, P. & Weenink, D. (1996). Praat, a system for doing phonetics by computer, version 3.4.
Institute of Phonetic Sciences of the University of Amsterdam, Report 132.
Bond, K. (2001). Pronunciation problems for Brazilian students of English: Free
resources for teacher and students of English. Karen's Linguistics Issues.
Retrieved from www3.telus.net/linguisticsissues/pronunciation.html.
Bosman, A. J. (1989). Speech perception by the hearing impaired. Unpublished Ph.D.
dissertation, Utrecht University.
Bo-Young, K. (2005). The patterns of vowel insertion in IL phonology: The P-map
account. Proceedings from the Annual Meeting of the Chicago Linguistics Society 41,
University of Chicago, IL.
Bradlow, A., Clopper, C. & Smiljanic, R. (2007). Perceptual similarity space for
languages. Proceedings of the XVIth International Congress of Phonetic Sciences,
Saarbrücken, 1373-1377.
Brett, D. (2004) Computer generated feedback on vowel production by learners of
English as a second language. ReCALL Journal 16(1), 103-113
Broselow, E. (1984). An investigation of transfer in second language phonology.
International Review of Applied Linguistics 22, 253-326.
Broselow, E. (1992) Parametric variation in Arabic dialect phonology. E. Broselow, M.
Eid and J. McCarthy (Eds.), Current Issues in Linguistic Theory B5: Perspectives on
Arabic Linguistics. Philadelphia: John Benjamin, 7-45.
Brière, E. J. (1966). An investigation of phonological interference. Language 41(4), 768-
796.
Canepari, L. (2005). A handbook of phonetics: Natural phonetics: Articulatory, auditory and
function. LINCOM, München.
Carlisle, R. S. (2001). Syllable structure universals and second language acquisition.
International Journal of English Studies 1(1) 1-19.
Carr, P. (1999). An introduction: phonetics and phonology. Oxford, Blackwell.
Carrell, J. & Tiffany, W. R. (1960). Phonetics: Theory and application to speech improvement.
McGraw-Hill, London-New York.
REFERENCES 201
Gass, S. M., & Selinker, L. (2008). Second language acquisition: An introductory course (3rd ed.).
Routledge, New York.
Giegerich, H. J. (1992). An introduction to English phonology. Cambridge University Press,
Cambridge.
Gierut, J. (1999). Syllable onsets: Clusters and adjuncts in acquisition. Journal of Speech,
Language and Hearing Research 42, 708-726.
Gierut, J. & Champion, A. H. (2001). Syllable onsets II: Three-element clusters in
phonology treatment. Journal of Speech, Language and Hearing Research 44(4), 886-
904.
Gilbers, D. (1992). Phonological networks: A theory of segment representation. Ph.D.
dissertation, Groningen University.
Gilbert, J. B. (1984). Clear speech: Pronunciation and listening comprehension in American English.
Teacher’s manual and answer key.. Cambridge University Press, Cambridge.
Gilbert, J. (1995). Pronunciation practices as an aid to listening comprehension. In D. J.
Mendelson and J. Rubin (Eds.), A guide for the teaching of Second Language Learning.
Dominic Press, San Diego, 97-111.
Gimson, A. G. (1989). An introduction to pronunciation of English. Cambridge University
Press.
Goldrick, M. (2004). Phonological features and phonotactic constraints in speech
production. Journal of Memory and Language 51(4), 586-603.
Groenen, P., Maassen, B. & Crul, Th. (1996). The specific relation between perception
and production errors for place of articulation in developmental apraxia of
speech. Journal of Speech and Hearing Research 39(3), 468-482.
Gussenhoven, C. & Broeders, A. (1976). The Pronunciation of English; A course for Dutch
learners. Longman, London.
Gussenhoven, C. & Jacobs, H. (1998). Understanding phonology. Arnold, London.
Hassan, Z. M. (2003). Temporal compensation between vowel and consonant in
Swedish & Arabic in sequences of CV: C & CVC and the word overall
duration. PHONUM 9, 45-48.
Hayat, A. (2005). Transcribing Arabic phonemes. A preliminary attempt. I-mag 3, 29-33.
(available from www.I-mag.org).
Heeren, W. & Schouten, M. E. H. (2008). Perceptual development of phoneme
contrasts: How sensitivity changes along acoustic dimensions that contrast
phoneme categories. Journal of the Acoustical Society of America 124(4), 2291-2302.
Hewings, M. (2004). Pronunciation practice activities. A source book for teaching English
pronunciation. Cambridge University Press.
Hillenbrand, J. M., & M. J. Clark (2000). Some effects of duration on vowel recognition.
Journal of the Acoustical Society of America 108(6), 3014-3022.
Hoffer, B. (1970). Contrastive analysis of generative phonology. Journal-Newletter of the
Association of Teachers of Japanese 6(3), 3-11.
House, A. S. (1961). On vowel duration in English. Journal of the Acoustic Society of
America 33, 1174-1178.
Hudgins, C. V., Hawkins, J. E., Jr., Karlin, J. E., & Stevens, S. S. (1947). The
development of recorded auditory tests for measuring hearing loss for speech.
The Laryngoscope 57, 57-89.
Huthaily, Kh. (2003). Contrastive phonological analysis of Arabic and English.
Unpublished MA thesis, University of Montana.
204 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Hyman, L. M. (1975). Phonology: Theory and analysis. Holt, Rinehart & Winston, New
York.
Iverson, P. & Kuhl, P. K. (1995). Mapping the perceptual magnet effect for speech
using signal detection theory and multidimensional scaling. Acoustical Society of
America 97(1), 553-562
Iverson, P., Ekanayake, D., Hamann, S., Sennema, A. & Evans B. G. (2008). Category
and perceptual interference in second-language phoneme learning: An
examination of English /w/-/v/ learning by Sinhala, German, and Dutch
speakers. Journal of Experimental Psychology: Human Perception and Performance, 34
(5 ), 1305-1316.
Jacewicz, E., Fox, R. A. & Salmons, J. (2006). Prosodic prominence effects on vowels
in chain shifts. Language Variation & Change 18(3), 285-316.
Jenkins, J. (2000). The phonology of English as an international language: new models, new norms,
new goals. Oxford University Press, Oxford.
Jenkins, J. (2002). A sociolinguistically based, empirically researched pronunciation
syllabus for English as an International Language. Applied Linguistics 23(1), 83-
103.
Jesry, M. M. (2005). Theoretically-based practical recommendations for improving
EFL/ESL students’ pronunciation. Journal of King Saud University, Language &
Translation 18, 1-33.
Johnson, J. S. & Elissa J. N. (1989). Critical period effects in second language learning:
The influence of maturational state on the acquisition of English as a second
language. Cognitive Psychology 22, 60-99.
Jongman, A., Herd, W. & Al-Masri, M. (2007). Acoustic correlates of emphasis in
Arabic. Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken,
913-316.
Jones, M. & Llamas, C. (2008). Fricated realizations of /t/ in Dublin and Middles-
brough English: an acoustic analysis of plosive frication and surface fricative
contrasts. English Language and Linguistics 21(3), 419-443.
Kalikow, D. N., Stevens, K. N. & Elliott L.L. (1977). Development of a test of speech
intelligibility in noise using sentence materials with controlled word
predictability. Journal of the Acoustical Society of America 61(5), 1337–1352.
Kang, H., & Yoon, K. (2005). Tense and lax distinction of English [s] in intervocalic
position by Korean speakers: Consonant/vowel ratio as a possible universal
cue for consonant distinctions. Studies in Phonetics, Phonology and Morphology
11(3), 407-419.
Karouri, A. M. (1996). Phonetics of classical Arabic: A selectional study of the problematic sounds.
Khartoum University Press, Khartoum.
Kawasaki, H. (1982). An acoustic basis for universal constraints on sound sequences.
PhD dissertation, University of California, Berkeley.
Kawasaki, H. (1993). The phonetics of sound change. In Ch. Jones (Ed.), Historical
Linguistics: Problems and Perspectives. Longman, London.
Kaye, A. S. (1997). Arabic and its relationship to the other Semitic languages. In A.S.
Kaye (Ed.), Phonologies of Asia and Africa (including the Caucasus), Vol. 2.
Eisenbrauns, Winona Lake, IN, 188-204.
Kenstowicz, M. J. (1994). Phonology of generative grammar. Blackwell, Cambridge.
REFERENCES 205
Trudgill, P., & Hannah, J. (2002). Guide to the variations of standard English. Oxford
University Press, New York.
Tsukada, K. (2009). An acoustic comparison of vowel length contrasts in Arabic,
Japanese and Thai: Durational and spectral data. International Journal on Asian
Language Processing, 19(4), 127-138.
Tucker, B. V. & Warner, N. (2010). What it means to be phonetic or phonological: The
case of Romanian devoiced nasals. Phonology 27, 289-324.
Van Bezooijen, R. & Van Heuven, V. J. (1997). Assessment of speech synthesis. In D.
Gibbon, R. Moore, R. Winski (Eds.), Handbook of standards and resources for
spoken language systems. Mouton de Gruyter, Berlin/New York, 481-653.
Van den Doel, R. (2006). How friendly are the natives? An evaluation of native-speaker judgments
of foreign-accented British and American English. LOT dissertation series nr. 144.
LOT, Utrecht.
Van Heuven, V. J. (1986). Some acoustic characteristics and perceptual consequences
of foreign accent in Dutch spoken by Turkish immigrant workers. In J. van
Oosten & J. F. Snapper (Eds.), Dutch Linguistics at Berkeley, papers presented at the
Dutch Linguistics Colloquium held at the University of California, Berkeley on November
9th, 1985, Berkeley: The Dutch Studies Program, U. C. Berkeley, 67-84.
Van Heuven, V. J. (2008). Making sense of strange sounds. (Mutual) intelligibility of
related language varieties. A Review. International Journal of Humanities and Arts
Computing 2, 39-62.
Van Heuven, V. J. & Wang, H. (2007). Quantifying the interlanguage speech
intelligibility benefit. Proceedings of the 16th International Congress of Phonetic Sciences,
Saarbrücken, 1729-1732.
Van Son, R. J. J. H. & Pols, L. C. W. (1999). An acoustic description of consonant
reduction. Speech Communication 28, 125-140.
Venkatagiri, H. S. & Levis, J. M. (2007). Phonological awareness and speech
comprehensibility: An exploratory study, Language Awareness 16(4), 263-277.
Walker, R. (2001). Pronunciation for international intelligibility. Karen’s linguistics
issues: Free resources for teacher and students of English. English Teaching
Professional Magazine 22, 1-4.
Wang, H. (2007). English as a Lingua Franca. Mutual intelligibility of Chinese, Dutch and
American speakers of English. LOT Dissertation series nr. 147, LOT, Utrecht.
Wang, H. & Van Heuven, V. J. (2003). Mutual intelligibility of Chinese, Dutch and
American speakers of English. In P. Fikkert & L. Cornips (Eds.), Linguistics in
the Netherlands 2003, Amsterdam/Philadelphia: John Benjamins, 213-224.
Wang, H. & Van Heuven, V. J. (2004). Cross-linguistic confusion of vowels produced
and perceived by Chinese, Dutch and American speakers of English. In L.
Cornips & J. Doetjes (Eds.), Linguistics in the Netherlands 2004. John Benjamins,
Amsterdam/Philadelphia, 205-216.
Wang, H. & Van Heuven, V. J. (2006). Acoustical analysis of English vowels produced
by Chinese, Dutch and American speakers. In J. M. van de Weijer & B. Los
(Eds.), Linguistics in the Netherlands 2006. John Benjamins, Amsterdam/Phila-
delphia, 237-248
Watt, D. J. L., Docherty, G. J. & Foulkes, P. (2003). First accent acquisition: a study of
phonetic variation in child-directed speech. Proceedings of the 16th International
Congress of Phonetic Sciences, Saarbrücken, 1959-1962.
Watson, J. C. E. ( 2002). The phonology and morphology of Arabic. Oxford University Press.
210 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Wells, J. C. (1962). A study of the formants of the pure vowels of British English. M.A.
thesis. University of London. Website 2/1/2002. Wells, Formants of pure
vowels: relative amplitude.
Wells, J. C. (1999). British English Pronunciation preferences: a changing scene. Journal
of the International Phonetic Association 29(1), 33-50.
Woods, A., Fletcher, P. & Hughes, A. (1986). Statistics in language studies. Cambridge
University Press, Cambridge.
Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands
(Frequenzgruppen), Journal of the Acoustical Society of America 33, 248.
Samenvatting
Communicatie met behulp van taal vindt plaats tussen twee interactanten, een spreker
en een luisteraar. Wanneer de luisteraar de woorden herkent en begrijpt wat de spreker
tegen hem zegt, dan is de spraak verstaanbaar en de communicatie succesvol. Als een
luisteraar de spreker niet of niet goed verstaat, kan dat liggen aan elk van beide
interactanten. Dit proefschrift gaat over slechte verstaanbaarheid in situaties waarin de
spreker of de luisteraar (of beiden) de taal waarin gecommuniceerd wordt slechts
beheerst als een tweede of vreemde taal. Meer in het bijzonder gaat deze studie over
verstaanbaarheidsproblemen bij Sudanese universitaire studenten Engels als vreemde
taal (EVT) en over de taalkundige oorzaken achter deze problemen.
Het onderzoek omvat (i) drie auditieve perceptie-experimenten, (ii) drie productie-
experimenten en (iii) twee schriftelijke enquêtes. De experimenten richten zich op de
segmentele verstaanbaarheid van spraak die is geproduceerd in EVT met een Sudanees-
Arabisch accent.
Een derde informatiebron was een schriftelijke enquête waarin Sudanese EVT-leerders
en hun docenten werden gevraagd naar hun mening en waardeoordelen. Aan de hand
van een reeks vragen in gesloten (multiple-choice) of open format beoogden deze
enquêtes een beeld te krijgen van de subjectieve ideeën over sterke en zwakke punten in
de uitspraak en herkenning van Engelse klanken door Sudanese studenten Engels, in de
beleving van zowel die studenten zelf als die van hun docenten. Deze subjectieve
gegevens zijn een aanvulling op de objectieve onderzoeksgegevens uit de laboratorium-
experimenten, waardoor vollediger zicht wordt gekregen op het probleem.
Hoofdstuk 1 beschrijft het onderzoeksplan. Het biedt een inleiding in het onderwerp,
zet de doelstellingen uiteen en formuleert de onderzoeksvragen. Dit hoofdstuk geeft
ook algemene informatie over testmethoden, de opzet van de experimenten, keuze van
proefpersonen en testmaterialen.
Hoofdstuk 6 bevat een akoestische analyse van de Engelse klinkers zoals die geprodu-
ceerd zijn door 11 Sudanese EVT-studenten, waarbij gegevens van Deterding (1997)
gebaseerd op 10 moedertaalsprekers van het Brits Engels (5 manlijke en 5 vrouwelijke
radiosprekers van de BBC) het vergelijkingsmateriaal vormden. Een lijst met alle
Engelse klinkers werd door de Sudanese sprekers ingesproken in een vaste draagzin
(Say … again). De moedertaalsprekers hadden dezelfde Engelse klinkers ingesproken in
losse woorden. De klinkerrealisaties zijn akoestisch geanalyseerd waarbij de resonantie-
frequentie F1 (die een maat vormt voor de graad van mondopening bij de klinker-
articulatie) en de F2 (die overeenkomt met de tongpositie langs de voor-achterdimensie)
alsmede de klinkerduur gemeten werden. De resultaten laten zien dat de EVT-klinkers
veelal gearticuleerd werden op verkeerde posities in de klinkerruimte maar dat het
duurcontrast tussen gespannen en ongespannen Engelse klinkers uitstekend overeind
214 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Hoofdstuk 10, ten slotte, vat de belangrijkste bevindingen van deze studie samen, trekt
conclusies en doet aanbevelingen voor het Sudanese onderwijsveld en suggesties voor
verder onderzoek.
Summary
The primary function of language is social contact, which takes place between human
beings anywhere they are. A person speaks to influence the actions of his/her fellows,
i.e. to involve them into interactions. In all situations of language use, there are two
major roles, which are played by the speech participants – speaker and hearer. Normally,
these two functional roles are present either actually or implicitly in every speech act
when the speech participants achieve successful communication: i.e. when the hearer
understands what the speaker says, the speech act is described as intelligible. However,
when a speech participant fails to understand the speaker’s message, the speech is said
to be unintelligible. Failure to understand or produce intelligible speech has recently
been classified by linguists as speech intelligibility problems which may result from the
hearer’s or the speaker’s side or from both due to linguistic factors. Moreover, linguists
assume that most speech intelligibility problems occur between L1 and L2 speakers
coming from different language environments. This study attempts to investigate
speech intelligibility problems experienced by Sudanese university EFL learners and to
find experimental evidence on the nature and the linguistic causes of these problems.
The research comprised (i) three auditory perception experiments, (ii) three production
experiments and (iii) two paper-and-pencil questionnaires. The experiments target the
segmental intelligibility of speech produced in Sudanese-Arabic accented English.
In the perception tasks I used the Modified Rhyme Test (MRT) as a suitable instrument
for the measurement of segmental intelligibility (Logan, Greene and Pisoni 1989). The
test involves word identification tasks in a closed set of four alternatives, where the
listeners are asked to select the alternative they think the speaker intended. The score is
the number of correctly responded-to items. Test items target phonemes and multi-
phonemes. Phonemes refer to vowels and single consonants, whilst multi-phonemes
refer to consonant clusters. Word intelligibility, on the other hand, was determined on
the basis of final words embedded in short redundant sentences which were copied
from the Speech Perception in Noise (SPIN) test (Kalikow, Stevens and Elliot 1977),
which has been used successfully in related research. Measurement is based on the
recognition task of 25 words embedded in meaningful sentences in which one con-
textually predictable keyword had to be recognised, e.g. Spread some butter on your bread
(with the sentence-final keyword underlined).
The first perception test aims at testing how well Sudanese university EFL listeners
identify sounds and recognise words produced by native speakers of English (chapter
3). The second experiment compares the intelligibility of the Sudanese EFL learners
and native speakers of RP English using Dutch university students as non-native
listeners (chapter 4). The third experiment test the intelligibility of Sudanese EFL
learners and native speakers of RP English for both British and American listeners
(chapter 5).
Three speech production experiments were carried out in order to measure the acoustic
correlates of vowels (chapter 6), consonants (chapter 7) and consonant clusters (chapter
216 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Chapter 1 addresses the research plan. It discusses the topic area of the study setting
out the goals and formulating the research questions. Chapter one will also provide
information about the testing methods and experiment design, subjects and the test
materials.
Chapter 3 presents the identification of the English vowels, consonants, clusters and
words imbedded in SPIN sentences. The materials were spoken by a representative
native speaker of Standard British English (Received Pronunciation, or RP). The
listeners were a group of 10 Sudanese EFL university students of English. The test
material includes a list of monosyllabic words of English targeting vowels, consonants
and clusters, read in a fixed carrier phrase (Say ….again). The results reveal serious
problems experienced by the students in the perceptual identification of English speech
sounds and in word recognition. Sudanese EFL listeners misidentified vowels (48%
correct) more often than consonants (85% correct) and clusters (73% correct). Only 30
percent of the words in SPIN sentences were recognised correctly. The error analysis
SUMMARY 217
shows that the intelligibility problems are due to transfer of L1 norms and to
insufficient knowledge of the sound structure of the L2.
In Chapter 5 the same materials that were used in Chapter 4, were presented to 20
native listeners of English (10 British, 10 American). The data reveals that the Sudanese
EFL speaker is less intelligible for the target listeners than the native RP speaker.
British and American listeners show no serious perception problems with English
speech sounds produced by the native speakers, with scores of 92% (vowels), 99%
(consonants), 97% (clusters) and 94% (words). The corresponding percentages for the
non-native speaker were 66, 85, 86 and 67.
Chapter 7 focuses on the acoustic analysis of English consonants produced by the same
11 Sudanese university EFL learners that were studied in Chapter 6. Stimuli comprised
a list of monosyllabic CVC words embedded in a fixed carrier phrase (Say …again). All
onset and coda consonants of English were included in the test materials. The
consonant tokens were acoustically analysed in terms of Voice Onset Time (VOT),
preceding vowel duration, consonant duration, peak intensity as well as centre of
gravity (COG) and Spectral Standard Deviation. The results were compared with
literature data on the same acoustic parameters published for (either British or
American) native English. The findings show considerable discrepancies in the acoustic
parameter values obtained from native and non-native speakers. The Sudanese speakers
produce systematically different VOT and COG values, due to influence of their L1
sound system.
Chapter 8 deals with an acoustic analysis of English consonant clusters, which were
read by eleven Sudanese EFL learners, and by two native speakers of RP English (one
male, one female) serving as control speakers. A selection of onset and coda clusters in
meaningful English words was read in a fixed carrier phrase by both groups of speakers.
The durations of the consonants that made up the clusters were measured. Statistical
analysis reveals systematic deviations in the component durations between native and
non-native tokens, which are attributable to the influence of the learners’ L1. Counter
to expectation, however, no epenthetic vowels breaking up the clusters were found in
the recordings.
Chapter 10, finally, summarises the main findings of this study, draws conclusion and
makes recommendations and suggestions for teaching practice and future research.
Appendices
Appendix 3.1 Vowel list: /hVd/ meaningful words in a fixed carrier phrase (Say …..again); 19
different full vowels and diphthongs read by Sudanese EFL learners and native speakers of RP
English. The stimuli were used in the perception tests in chapters 3, 4 and 5 as well as in the
acoustic analysis in chapter 6.
Appendix 3.2.a Onset consonants list of meaningful words in a fixed carrier (Say …..again). The
stimuli were read by one Sudanese EFL learner and one native speaker of RP English. The
stimuli were used in the perception tests in chapters 3, 4 and 5 as well as in the acoustic analysis
in chapter 7.
Onset
No. consonants Keywords
1 got god, ghost
2 bang ban, bark
3 shut ship, shop
4 pin pit, pill
5 fit fish, fill
6 then this, them
7 thaw theme, thin
8 zeal zero, zebra
9 den dish, deaf
10 sip sit, sick
11 job jot, jog
12 vest vent, verb
13 tame take, tale
14 cold core, cop
15 chat chair, charge
Appendix 3.2.b Coda consonants list of meaningful words in a fixed carrier (Say …..again). The
stimuli were read by one Sudanese EFL learner and one native speaker of RP English. The
stimuli were used in the perception tests in chapters 3, 4 and 5 as well as in the acoustic analysis
in chapter 7.
Appendix 3.3 Onset and coda consonant clusters list of meaningful words in fixed carrier
(Say …..again). The stimuli were read by one Sudanese EFL learner and one native speaker of RP
English. The stimuli were used in the perception tests in chapters 3, 4 and 5 as well as in the
acoustic analysis in chapter 8.
Appendix 3.4 SPIN (Speech in Noise) sentence intelligibility test. Only contextually highly
predictable keywords were used. Keywords are always sentence final.
Appendix 3.5 Instructions and answer sheet of the identification test of English vowels read by
native speakers of RP English responded to by Sudanese EFL listeners
Instructions
You will hear 20 English-spoken items on the CD. Every item contains the same short
utterance “Say xxx again”, where xxx is a one-syllable word. Each time you hear an
item, decide which one of the four possibilities listed under A-B-C-D is the one that
was said. To indicate your choice, tick the appropriate box on your answer sheet.
Remember that you have to make a choice for every word you hear, one choice, no
more, no less. If you do not know what to answer, just gamble.
After you hear an item, you have five seconds you place your tick mark. To help you
keep track, you will hear a beep after every tenth item on the CD.
A. B. C. D.
a. ɷ net ɷ nut ɷ not ɷ nit
b. ɷ boy ɷ buy ɷ bay ɷ bow
A. B. C. D.
1. ɷ pat ɷ putt ɷ pot ɷ put
2. ɷ pet ɷ put ɷ pit ɷ pat
3. ɷ put ɷ pet ɷ pat ɷ pot
4. ɷ peat ɷ pat ɷ pet ɷ pit
5. ɷ net ɷ nut ɷ not ɷ nit
6. ɷ fill ɷ fool ɷ fell ɷ full
7. ɷ fool ɷ full ɷ fill ɷ fell
8. ɷ pit ɷ peat ɷ pet ɷ put
9. ɷ bard ɷ board ɷ bird ɷ beard
10. ɷ board ɷ bird ɷ beard ɷ bard
Appendix 3.6 Answer sheet of the identification test of English consonants read by native
speakers of RP English responded to by Sudanese EFL learners
Instructions
You will hear 50 English-spoken items on the CD. Every item contains the same short
utterance “Say xxx again”, where xxx is a one-syllable word. Each time you hear an
item, decide which one of the four possibilities listed under A-B-C-D is the one that
was said. To indicate your choice, tick the appropriate box on your answer sheet.
Remember that you have to make a choice for every word you hear, one choice, no
more, no less. If you do not know what to answer, just gamble.
After you hear an item, you have five seconds you place your tick mark. To help you
keep track, you will hear a beep after every tenth item on the CD.
A. B. C. D. A. B. C. D.
a. ͚ sap ͚ sack ͚ sat ͚ sag f. ͚ hit ͚ lit ͚ bit ͚ wit
b. ͚ match ͚ man ͚ mash ͚ mat g. ͚ pot ͚ cot ͚ jot ͚ got
c. ͚ pale ͚ page ͚ pane ͚ pave h. ͚ must ͚ bust ͚ gust ͚ dust
d. ͚ cog ͚ cop ͚ con ͚ cock i. ͚ rob ͚ cob ͚ job ͚ bob
e. ͚ heat ͚ heave ͚ heath ͚ he’s j. ͚ bed ͚ led ͚ wed ͚ red
If everything is clear, we will now start the test items proper. Turn the page over for the
answer sheet for the consonant test.
226 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
A. B. C. D. A. B. C. D.
1. ͚ sap ͚ sack ͚ sat ͚ sag 31. ͚ hit ͚ lit ͚ bit ͚ wit
2. ͚ match ͚ man ͚ mash ͚ mat 32. ͚ pot ͚ cot ͚ jot ͚ got
3. ͚ pale ͚ page ͚ pane ͚ pave 33. ͚ must ͚ bust ͚ gust ͚ dust
4. ͚ cog ͚ cop ͚ con ͚ cock 34. ͚ rob ͚ cob ͚ job ͚ bob
5. ͚ heat ͚ heave ͚ he’s ͚ heath 35. ͚ bed ͚ led ͚ wed ͚ red
6. ͙ bad ͙ ban ͙ bang ͚ bad 36. ͚ nut ͚ but ͚ gut ͚ shut
7. ͚ can ͚ cap ͚ cam ͚ cab 37. ͚ tin ͚ fin ͚ pin ͚ chin
8. ͚ sap ͚ sad ͚ sag ͚ san 38. ͚ hid ͚ lid ͚ bid ͚ fid
9. ͚ pad ͚ pan ͚ pack ͚ pat 39. ͚ tot ͚ pot ͚ lot ͚ not
10. ͚ sap ͚ sack ͚ sat ͚ sag 40. ͚ peel ͚ zeal ͚ feel ͚ seal
11. ͚ pun ͚ put ͚ pub ͚ puck 41. ͚ thaw ͚ law ͚ paw ͚ saw
12. ͚ cog ͚ cop ͚ con ͚ cock 42. ͚ pen ͚ ten ͚ den ͚ then
13. ͚ mash ͚ mad ͚ mat ͚ match 43. ͚ fen ͚ pen ͚ yen ͚ hen
14. ͚ cop ͚ cod ͚ con ͚ cock 44. ͚ fat ͚ hat ͚ bat ͚ chat
15. ͚ lane ͚ lace ͚ late ͚ lame 45. ͚ went ͚ bent ͚ rent ͚ dent
16. ͚ dad ͚ dam ͚ dan ͚ dab 46. ͚ rip ͚ dip ͚ tip ͚ sip
17. ͚ save ͚ sage ͚ sane ͚ safe 47. ͚ wick ͚ pick ͚ tick ͚ lick
18. ͚ rate ͚ race ͚ raze ͚ rape 48. ͙ fang ͙ bang ͙ gang ͙ rang
19. ͚ match ͚ man ͚ mash ͚ mat 49. ͙ den ͙ ten ͙ men ͙ pen
20. ͚ heat ͚ heave ͚ heath ͚ he’s 50. ͙ name ͙ game ͙ tame ͙ dame
Appendix 3.7 Instructions and answer sheet of the identification test of English consonant
clusters read by native speakers of RP English and responded to by Sudanese EFL learners.
Instructions
You will hear 20 English-spoken items on the CD. Every item contains the same short
utterance “Say xxx again”, where xxx is a one-syllable word. Each time you hear an
item, decide which one of the four possibilities listed under A-B-C-D is the one that
was said. To indicate your choice, tick the appropriate box on your answer sheet.
Remember that you have to make a choice for every word you hear, one choice, no
more, no less. If you do not know what to answer, just gamble.
After you hear an item, you have five seconds you place your tick mark. To help you
keep track, you will hear a beep after every tenth item on the CD.
A. B. C. D.
a. ɷ film ɷ fist ɷ fills ɷ filth
b. ɷ pry ɷ fry ɷ cry ɷ fly
A. B. C. D.
1. ɷ blaze ɷ craze ɷ glaze ɷ graze
2. ɷ sty ɷ spy ɷ sky ɷ sly
3. ɷ sprint ɷ splint ɷ squint ɷ print
4. ɷ smack ɷ snack ɷ slack ɷ stack
5. ɷ pry ɷ try ɷ dry ɷ ply
6. ɷ brain ɷ drain ɷ grain ɷ crane
7. ɷ blaze ɷ craze ɷ glaze ɷ graze
8. ɷ swine ɷ shrine ɷ twine ɷ spine
9. ɷ queen ɷ clean ɷ green ɷ glean
10. ɷ sprint ɷ splint ɷ squint ɷ print
11. ɷ buzzed ɷ bugs ɷ bulb ɷ bussed
12. ɷ filth ɷ film ɷ filled ɷ fibbed
13. ɷ lilt ɷ limp ɷ lint ɷ link
14. ɷ else ɷ elk ɷ elf ɷ elm
15. ɷ putts ɷ puns ɷ pulse ɷ punt
16. ɷ mask ɷ marched ɷ marked ɷ mast
17. ɷ butts ɷ buds ɷ buns ɷ bums
18. ɷ buzzed ɷ bugs ɷ bulb ɷ bussed
19. ɷ winch ɷ wins ɷ wind ɷ wink
20. ɷ wits ɷ wimp ɷ width ɷ wisp
APPENDICES 229
Appendix 3.8 Instructions and answer sheet of the SPIN word recognition test. Sentences were
read by a native speaker of RP English and responded to by Sudanese EFL learners.
Instructions
You will now hear 25 sentences on the CD. Each time you hear a sentence, write down
only the last word you think have heard. This time there will be no practice items.
Please write down a single word for every sentence you hear. After every sentence you
will have five seconds to write down a word. There will be a beep after every tenth
sentence.
Note we repeated the same the stimuli of identification tests of English vowels, consonants,
consonant clusters and SPIN sentences in chapters 3, 4 and 5.
230 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Appendix 4.1 Instruction and answer sheets of the identification test of English vowels read by
one Sudanese speaker and one native speaker of RP English. Test responded to by Dutch
listeners of English.
Instructions
You will hear 20 English-spoken items twice [in two texts A and B] on the CD. The A
texts will be pronounced by a non-native speaker of English, but the B texts by native
speaker. Note that the sound quality of the recording has been degraded by the addition
of noise. This was done to make the listening task more difficult. Every item contains
the same short utterance “Say xxx again”, where xxx is a one-syllable word. Each time
you hear an item, decide which one of the four possibilities listed under A-B-C-D is the
one that was said. To indicate your choice, tick the appropriate box on your answer
sheet.
Note that pronunciation errors may occur in the vowel or the consonants: only check
the sounds that are printed in bold face. Remember that you have to make a choice for
every word you hear, one choice, no more, no less. If you do not know what to answer,
just gamble.
After you hear an item, you have five seconds to place your tick mark. You may also
use part of this time to look at the response alternatives of the next item. To help you
keep track, you will hear a beep after every fifth item on the CD.
A. B. C. D.
2. ɷ net ɷ nut ɷ not ɷ nit
2. ɷ boy ɷ buy ɷ bay ɷ bow
If everything is clear, we will now start the test items proper. Turn the page over for the
answer sheet for the vowel test.
APPENDICES 231
A. B. C. D.
1. ɷ board ɷ bird ɷ beard ɷ bard
2. ɷ beard ɷ bard ɷ bird ɷ board
3. ɷ boy ɷ buy ɷ bay ɷ bow
4. ɷ male ɷ mile ɷ mill ɷ meal
5. ɷ let ɷ lit ɷ late ɷ light
Appendix 4.2 Instructions and answer sheets of the identification test of English consonants
read by Sudanese speakers and native speakers of RP English. The test was responded to by
Dutch listeners of English.
Instructions
You will hear 40 English-spoken items twice [in two texts A and B] on the CD. The A
texts will be pronounced by a non-native speaker of English, but the B texts by native
speaker. Note that the sound quality of the recording has been degraded by the addition
of noise. This was done to make the listening task more difficult. Every item contains
the same short utterance “Say xxx again”, where xxx is a one-syllable word. Each time
you hear an item, decide which one of the four possibilities listed under A-B-C-D is the
one that was said. To indicate your choice, tick the appropriate box on your answer
sheet. Note that pronunciation errors may occur in the vowel or the consonants: only
check the sounds that are printed in bold face. Remember that you have to make a
choice for every word you hear, one choice, no more, no less. If you do not know what
to answer, just gamble.
After you hear an item, you have five seconds to place your tick mark. You may also
use part of this time to look at the response alternatives of the next item. To help you
keep track, you will hear a beep after every fifth item on the CD.
A. B. C. D.
a. ͚ sap ͚ sack ͚ sat ͚ sag
b. ͚ match ͚ man ͚ mash ͚ mat
If everything is clear, we will now start the test items proper. Turn the page over for the
answer sheet for the consonant test.
234 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
A. B. C. D. A. B. C. D.
11. ͚ peel ͚ zeal ͚ feel ͚ seal 31. ͚ pale ͚ page ͚ pane ͚ pave
12. ͚ tot ͚ pot ͚ lot ͚ not 32. ͚ heat ͚ heave ͚ heath ͚ he’s
13. ͚ west ͚ vest ͚ nest ͚ best 33. ͚ match ͚ man ͚ mash ͚ mat
14. ͚ tin ͚ fin ͚ pin ͚ chin 34. ͚ rave ͚ race ͚ raze ͚ rape
15. ͚ nut ͚ but ͚ gut ͚ shut 35. ͚ save ͚ sage ͚ sane ͚ safe
16. ͚ rob ͚ cob ͚ job ͚ bob 36. ͚ dad ͚ dam ͚ dan ͚ dab
17. ͚ must ͚ bust ͚ gust ͚ dust 37. ͚ lace ͚ lane ͚ late ͚ lame
18. ͚ pot ͚ cot ͚ jot ͚ got 38. ͚ sap ͚ sack ͚ sat ͚ sag
19. ͚ hit ͚ lit ͚ bit ͚ wit 39. ͚ mash ͚ mad ͚ mat ͚ match
20. ͚ hold ͚ cold ͚ told ͚ gold 40. ͚ pun ͚ put ͚ pub ͚ puck
APPENDICES 235
A. B. C. D. A. B. C. D.
1. ͚ sap ͚ sack ͚ sat ͚ sag 21. ͚ hit ͚ lit ͚ bit ͚ wit
2. ͚ match ͚ man ͚ mash ͚ mat 22. ͚ pot ͚ cot ͚ jot ͚ got
3. ͚ pale ͚ page ͚ pane ͚ pave 23. ͚ must ͚ bust ͚ gust ͚ dust
4. ͚ pane ͚ pale ͚ page ͚ pave 24. ͙ fang ͙ bang ͙ gang ͙ rang
5. ͚ heat ͚ heave ͚ he’s ͚ heath 25. ͚ bed ͚ led ͚ wed ͚ red
6. ͙ bad ͙ ban ͙ bang ͚ bad 26. ͚ nut ͚ but ͚ gut ͚ shut
7. ͚ can ͚ cap ͚ cam ͚ cab 27. ͚ tin ͚ fin ͚ pin ͚ chin
8. ͚ sap ͚ sad ͚ sag ͚ san 28. ͚ hit ͚ lit ͚ bit ͚ fit
9. ͚ pad ͚ pan ͚ pack ͚ pat 29. ͚ tot ͚ pot ͚ lot ͚ not
10. ͚ save ͚ sage ͚ sane ͚ safe 30. ͙ then ͙ ten ͙ zen ͙ den
11. ͚ pun ͚ put ͚ pub ͚ puck 31. ͚ thaw ͚ law ͚ paw ͚ saw
12. ͚ raze ͚ rate ͚ rave ͚ rape 32. ͚ zen ͚ ten ͚ den ͚ then
13. ͚ mat ͚ match ͚ man ͚ mash 33. ͚ peel ͚ zeal ͚ feel ͚ seal
14. ͚ cop ͚ cod ͚ con ͚ cock 34. ͙ den ͙ ten ͙ men ͙ pen
15. ͚ lane ͚ lace ͚ late ͚ lame 35. ͚ went ͚ bent ͚ rent ͚ dent
16. ͚ dad ͚ dam ͚ dan ͚ dab 36. ͚ rip ͚ dip ͚ tip ͚ sip
17. ͚ rate ͚ race ͚ raze ͚ rape 37. ͚ wits ͚ sits ͚ pits ͚ fits
18. ͚ sap ͚ sat ͚ sag ͚ sad 38. ͚ rob ͚ cob ͚ bob ͚ job
19. ͚ wit ͚ with ͚ wick ͚ whiz 39. ͚ west ͚ vest ͚ nest ͚ best
20. ͚ save ͚ sage ͚ sane ͚ safe 40. ͚ hold ͚ cold ͚ told ͚ gold
236 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Appendix 4.3 Answer sheets of the identification test of English consonant clusters read by
Sudanese speakers and native speakers of RP English. The test was responded to by Dutch
listeners of English.
Instructions
You will hear 20 English-spoken items twice [in two texts A and B] on the CD. The A
texts will be pronounced by a non-native speaker of English, but the B texts by a native
speaker. Note that the sound quality of the recording has been degraded by the addition
of noise .This was done to make the listening task more difficult. Every item contains
the same short utterance “Say xxx again”, where xxx is a one-syllable word. Each time
you hear an item, decide which one of the four possibilities listed under A-B-C-D is the
one that was said. To indicate your choice, tick the appropriate box on your answer
sheet. Note that pronunciation errors may occur in the vowel or the consonants: only
check the sounds that are printed in bold face. Remember that you have to make a
choice for every word you hear, one choice, no more, no less. If you do not know what
to answer, just gamble.
After you hear an item, you have five seconds to place your tick mark. You may also
use part of this time to look at the response alternatives of the next item. To help you
keep track, you will hear a beep after every fifth item on the CD.
A. B. C. D.
a. ɷ film ɷ fist ɷ fills ɷ filth
b. ɷ pry ɷ fry ɷ cry ɷ fly
If everything is clear, we will now start the test items proper. Turn the page over for
the answer sheet for the cluster test.
APPENDICES 237
A. B. C. D.
1. ɷ wits ɷ wimp ɷ width ɷ wisp
2. ɷ buzzed ɷ bugs ɷ bulb ɷ bussed
3. ɷ filth ɷ film ɷ filled ɷ fibbed
4. ɷ lilt ɷ limp ɷ lint ɷ link
5. ɷ else ɷ elk ɷ elf ɷ elm
6. ɷ putts ɷ puns ɷ pulse ɷ puffs
7. ɷ winch ɷ wins ɷ wind ɷ wink
8. ɷ mask ɷ marched ɷ marked ɷ mast
9. ɷ butts ɷ buds ɷ buns ɷ bums
10. ɷ buzzed ɷ bugs ɷ bulb ɷ bussed
A. B. C. D.
1. ɷ blaze ɷ craze ɷ glaze ɷ graze
2. ɷ ply ɷ dry ɷ fly ɷ cry
3. ɷ sprint ɷ splint ɷ squint ɷ print
4. ɷ smack ɷ snack ɷ slack ɷ stack
5. ɷ pry ɷ try ɷ dry ɷ ply
6. ɷ brain ɷ drain ɷ grain ɷ crane
7. ɷ blaze ɷ craze ɷ glaze ɷ graze
8. ɷ swine ɷ shrine ɷ twine ɷ spine
9. ɷ queen ɷ clean ɷ green ɷ glean
10. ɷ sprint ɷ splint ɷ squint ɷ print
11. ɷ buzzed ɷ bugs ɷ bulb ɷ bussed
12. ɷ filth ɷ film ɷ filled ɷ fibbed
13. ɷ lilt ɷ limp ɷ lint ɷ link
14. ɷ else ɷ elk ɷ elf ɷ elm
15. ɷ putts ɷ puns ɷ pulse ɷ puffs
16. ɷ mask ɷ marched ɷ marked ɷ mast
17. ɷ butts ɷ buds ɷ buns ɷ bums
18. ɷ buzzed ɷ bugs ɷ bulb ɷ bussed
19. ɷ winch ɷ wins ɷ wind ɷ wink
APPENDICES 239
Appendix 4.4 Answer sheets of the SPIN test of English sentences, which were read by Sudanese
speakers and native speakers of RP English. The test was responded to by Dutch listeners of
English.
You will now hear 25 sentences twice [in two texts A and B] on the CD. Each time you
hear a sentence; write down only the last word you think you have heard. This time
there will be no practice items. Please write down a single word for every sentence you
hear. After every sentence, you will have five seconds to write down a word. There will
be a beep after every fifth sentence.
Appendix 6.1 English vowel durations (ms) of eleven Sudanese university learners of English.
Missing data are indicated by ‘---’.
Speaker no.
vowel
1 2 3 4 5 6 7 8 9 10 11
1. #Ö 252 207 174 133 293 160 356 215 130 232 170
2. 3 258 148 149 98 204 137 190 189 99 69 113
3. #7 415 200 241 245 247 280 366 256 272 161 190
4. C+ 207 141 197 174 261 188 353 165 158 177 164
5. G 200 42 59 38 112 78 103 66 53 89 59
6. G 278 433 269 166 279 247 318 217 246 210 211
7. G+ 237 145 181 114 221 179 302 163 181 241 180
8. + 92 42 61 58 67 58 68 51 32 61 47
9. KÖ 191 148 160 93 158 113 180 55 92 217 86
10. + 248 280 241 142 256 212 277 202 144 201 139
11. n 156 110 58 47 113 213 312 65 68 128 082
12. nÖ 264 188 186 147 265 158 318 191 128 178 163
13. QY 262 137 170 145 125 232 346 133 118 266 113
14. n+ 363 226 244 241 450 181 666 227 261 210 202
15. 7 137 056 82 92 88 92 92 75 72 103 78
16. WÖ 138 084 154 115 187 134 337 150 125 186 114
17. 7 234 129 163 165 203 203 197 164 238 --- 226
18. « 77 91 81 68 74 91 94 --- 77 99 66
19. «Ö 252 244 159 140 265 179 313 123 --- 46 90
242 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Appendix 6.2 Mean absolute duration (ms) of RP English vowels (abstracted from: Wells (1962;
see further http://www.phon.ucl.ac.uk/home/wells/formants/table-7-uni.htm).
Appendix 7.1 Individual speaker VOT mean values for English stops produced by Sudanese
speakers.
Appendix 7.2 Mean Centre of Gravity (COG) values (Hz) of the English obstruents produced
by Sudanese learners. The top panel shows COG values for voiced obstruents; voiceless
obstruents are shown in the bottom panel. The EFL values differ substantially from those
obtained from native English speakers (not shown here).
244 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Appendix 7.3 Duration (ms) of preceding vowels produced by Sudanese learners of English.
Native data from Kent, Dembowski and Lass (1996).
Appendix 7.4 Mean consonant duration (ms) of Sudanese ELF learners and native speakers of
English. Consonant duration data of native speakers cited from Lavoie (2001).
Appendix 7.5 Relative intensity rates (decibels) of English consonants and native speakers (Ball
and Rahilly 1999).
1- Students’ questionnaire
Dear students
This questionnaire is directed to provide data about speech intelligibility problems
experienced by Sudanese university learners of English. It attempts to find the effect of
both mother-tongue (Arabic) and the lack of L2 pronunciation knowledge of the
learners concerned.
Section [1]: Preliminary information. Please reply to the issues below:
Section [2] Please choose the most applicable answer from the following:
1- How well do you understand spoken English?
a- weak b- fair c- good d- very good e- excellent
2- How well do English native -speakers understand you?
a- weak b- fair c- good d- very good e- excellent
3- How interesting and practical are the courses you study?
a- not b- hardly c- average d- very e- maximally
4- How relevant and authentic are the courses you study, with respect to the
development of pronunciation skills?
a- not b- hardly c- average d- very e- maximally
5- How often do you have problems with the pronunciation of English sounds?
a- never b- rarely c- often d- frequently e- permanently
6- Which English sounds do you find difficult?
a- vowels b- consonants c- clusters d- … and … e- all
248 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
Read the statements BELOW and then choose one option from the ones below:
II- Explain why you have chosen a, b, c, d or e above?
a- I identify short vowel sounds.
b- I partly grasp them. Because I often misidentify the short vowel /3/ as /#Ö/.
c- I never recognise short vowels. I find it difficult.
III- Can you give further details these difficulties.
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in student’s copy)
D- Long vowels
1- I ……… experience problems in correctly perceiving long vowels like: /WÖ, nÖ, KÖ/, etc.,
in words such as calm, warm, worm, glue, choose, hoop, beat, boot, bough, leap, lead, read
a- never b- rarely c- often d- frequently e- permanently
2- Explain why you have chosen a, b, c, or e above?
a- I hear them well and identify them. b- I confuse / fail to discriminate.
c- I often confuse some long vowels. d- I totally fail to identify them.
3- Please can you give example(s) of the errors you commit? E.g., I fail to discriminate
between /7/ and /WÖ/ in minimal pairs like food/boot, full/fool, pull/pool, etc.
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in student’s copy)
APPENDICES 249
E- Diphthongs
1- I ……… experience problems in perceiving diphthongs like: /C+, G, +, n+, C7, G+/, etc.,
in words such as: fire/fear, bare/bowl, cow/power
a- never b- rarely c- often d- frequently e- permanently
2- Explain why you have chosen a, b, c, d or e given above.
a- I identify such sounds. b- I partly grasp such sounds.
c- I often fail to discriminate sounds. d- I do not realize them at all.
3- Give example/s of the errors you commit. e.g.: I fail to discriminate between /7/ in
boat and /nÖ/ in taught, caught.
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in student’s copy)
A- Consonant sounds
How successful are you in perceiving the following English consonants?
1- Plosives like: /R, D, V, F, M, I/
a- weak b- fair c- good d- very good e- excellent
2- Fricatives like: /H, X, U, \, 6, &/ or affricates like /V5, F</.
a- weak b- fair c- good d- very good e- excellent
3- Nasals like: /O, 0, P/
a- weak b- fair c- good d- very good e- excellent
4- Approximant like /N, T, Y, L/.
a- weak b- fair c- good d- very good e- excellent
B- Clusters
How often do you experience difficulty perceiving the clusters below, whenever you
hear them?
1- Initial clusters like prompt, play, scream, string, spring and sword.
a- never b- rarely c- often d- frequently e- permanently
2- Final clusters in words such as: ground, interrupt, risk, next , blink
a- never b- rarely c- often d- frequently e- permanently
3- What state do you find yourself in whenever you are exposed to speech including
this type of English clusters?
a- I hear consonant clusters well and understand them. For example, I hear
play as play and alert as alert, etc.
b- I substitute consonant clusters. Because I often hear clusters like /RN/ as
/DN/ or /DT/ and /HN/ as /HT/.
c- I never understand them at all. I find it difficult to discriminate them.
250 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
B- Long vowels
1- I ………… experience problems in producing long vowels like /nÖ, +Ö, «Ö/
a- never b- rarely c- often d- frequently e- permanently
2- Explain why you have chosen a, b, c or d above.
a- I produce the English long vowels correctly.
b- I can produce some of them correctly.
c- I often find it difficult to produce long vowels.
d- I cannot produce them at all.
3- Would you please give example/s of the errors you make? e.g.: I fail to discriminate
between /7, WÖ/ as in pairs like: foot/food, look/luke, full/fool, book/boo
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in student’s copy)
C- Diphthongs
1- I ……… experience problems producing diphthongs like /C+, G, +, n+, C7, G+/, etc., in
words like: bye, boy, hear.
a- never b- rarely c- often d- frequently e- permanently
2- Explain why you have chosen a, b, c or d above.
1- I produce them well.
2- I partly produce diphthongs.
3- I often find it a bit difficult to produce such English vowels.
APPENDICES 251
D- Consonant sounds
Draw a circle around one answer from the following:
How successful are you in producing the following English consonants?
1- Plosives like /R~D/, /V~F/, /M~I/.
a- weak b- fair c- good d- very good e- excellent
2- Fricatives like /U~\/, /H~X/, /5~< /, /6~&/
a- weak b- fair c- good d- very good e- excellent
3- Nasals like /O, P, 0/
a- weak b- fair c- good d- very good e- excellent
4 - Approximant like /N, T, Y, L/
a- weak b- fair c- good d- very good e- excellent
E- Clusters
How often do you find it difficult to produce clusters in continuous speech?
1- Initial clusters in words like prompt, play, scream, string, spring, and sword.
a- never b- rarely c- often d- frequently e- permanently
2- Final clusters such as ground, interrupt, risk, clerk, bring
a- never b- rarely c- often d- frequently e- permanently
3- What state do you find yourself in whenever you are involved in a speech that
requires the production of clusters?
a- I can produce consonant clusters correctly.
b- I can produce some of them correctly.
d- I find it difficult to produce consonant clusters.
c- I cannot produce them at all.
4 - In case you choose answers b, c or d, can you give examples of the errors you make?
E.g., I fail to pronounce words like: print, double, count, stay, spring, bare, eight, etc.
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in student’s copy)
252 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
2- Do you think your intelligibility in English would be better if you learn more about
English pronunciation?
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in student’s copy)
Section (1): General pronunciation matters. Show the most appropriate grade of your
students.
1- How intelligible do you rate your students’ English pronunciation?
a- weak b- fair c- good d- very good e- excellent
2- To what degree is student’s intelligibility pronunciation related?
a- weak b- fair c- good d- very good e- excellent
3- How often do pronunciation-learning strategies of your students influence their
perception and production of intelligible speech?
a- never b- rarely c- often d- frequently e- permanently
Section (2)
(A)- Please indicate the degree of difficulty you assume the students experience in the
following tasks.
1- To regroup the words that have the same vowel or consonant sounds in words like
meat, rate, maid, let, say, said, query.
a- never b- rarely c- often d- frequently e- permanently
2- To find out the odd member among consonant or vowel sounds such as dull, bull,
wool, pull or warn, dawn, scorn, barn.
a- never b- rarely c- often d- frequently e- permanently
3- To discriminate between voiced and voiceless consonantal sounds such as /R, V, M/
and /D, F, I/, etc.
a- never b- rarely c- often d- frequently e- permanently
4- To pronounce fricatives like /U, \, &, 6, H, X, 5, </.
a- never b- rarely c- often d- frequently e- permanently
5- To produce a consistent vowel quality.
a- never b- rarely c- often d- frequently e- permanently
254 TAJELDIN ALI: SPEECH INTELLIGIBILITY PROBLEMS OF SUDANESE EFL LEANERS
(B) - Please choose one option to reply to the following pronunciation phenomena.
1- How often does the interference of L1 [Arabic] and L2 orthography [spelling] cause
erroneous pronunciation?
a- never b- rarely c- often d- frequently e- permanently
2- To what extent do the subjects concerned tend to make value of the phonological
universals to achieve intelligible speech?
a- never b- rarely c- often d- frequently e- permanently
3- How often do subjects tend to avoid learning the problematic sounds?
a- never b- rarely c- often d- frequently e- permanently
4- How often do your students tend to apply a newly learnt pronunciation rule in an
inappropriate context [overgeneralization]?
a- never b- rarely c- often d- frequently e- permanently
5- How often do learners tend to substitute similar sounds of their L1 for L2 but which
are acoustically different?
a- never b- rarely c- often d- frequently e- permanently
6- How successful are the Sudanese university learners of English in disassociating their
L2 sound utterance from the repertoire of L1?
a- weak b- fair c- good d- very good e- excellent
Section (3)
What problems do you think do the students experience in the learning of vowels,
consonants and consonant clusters of English? Please give examples of such problems
[wrongly pronounced sounds].
[A] Vowels
I- Examples of pronunciation problems on the short vowels /G, #, ¡, 3, , 7, +/.
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in instructor’s copy)
II- Examples of pronunciation problems on the long vowel sounds /KÖ, WÖ,«Ö, #Ö, nÖ/.
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in instructor’s copy)
III- Examples of pronunciation problems on the following English diphthongs: /+, G+,
n+, 7, 7, G, C+, C7/.
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in instructor’s copy)
APPENDICES 255
[ B ] Consonants
I- Examples of pronunciation problems involving consonants [both onset and coda],
e.g. substitution of /D/ for /R/:
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in instructor’s copy)
[C] Clusters
I- Examples of pronunciation problems experienced with initial clusters. For example,
addition of /+/ in front of /UV/ and /URT/, etc.
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in instructor’s copy)
II- Examples of pronunciation problems involving final clusters: e.g.: /RV, UV, PV, 0M/, etc.
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in instructor’s copy)
Section (4)
Influence of mother-tongue and lack of pronunciation knowledge of the learners
Section [5]
What other linguistic elements do you suggest that you think delay the achievement of
intelligible speech but which have not been covered herein?
…………………………………………………………………………………………
…………………………………………………………………………………………
(more response space available in instructor’s copy)
From 2007 until 2010 he was affiliated to the Leiden University Centre of Linguistics as
a PhD candidate doing research on the intelligibility of English spoken by Sudanese-
Arabic students of English. During this period in the Netherlands he was supported by
a grant from the Sudanese Ministry of Education and exempt from his regular teaching
duties. The present dissertation is the result of this research project.