You are on page 1of 17

ARTICLE IN PRESS

Journal of Phonetics 37 (2009) 357–373


www.elsevier.com/locate/phonetics

Evidence for featural units in the acquisition of speech production skills:


Linguistic structure in foreign accent
Kenneth de Jonga, Yen-Chen Haoa,, Hanyong Parkb
a
Department of Linguistics, Indiana University, Memorial Hall 322, Bloomington, IN 47405, USA
b
Speech Research Laboratory, Department of Psychological and Brain Sciences, Indiana University, USA
Received 2 December 2008; received in revised form 16 May 2009; accepted 7 June 2009

Abstract

This study examines correlations in accuracy of the production of one set of segments with accuracy in segments that share a featural
contrast in Korean EFL (English as a Foreign Language) learners. Results indicate that accuracy rates for segment sets that
share gestures in production tend to correlate, while segments that contrast in the same feature, but require the acquisition of different
gestures do not correlate. Data here are from two tasks, a reading task and a mimicry task. Correlation results are similar across the two
tasks, though a larger range of inter-subject differences in overall accuracy is evident in the mimicry task. Comparison of correlation
patterns with previously published correlation patterns in perceptual identification indicates that patterns differ for perception and
production, indicating that the structure of the skill sets, and hence, the acquisitional units for production and perception are different.
r 2009 Elsevier Ltd. All rights reserved.

1. Introduction: Segmental relationships in second language example, exclusively examine the production of word-final
production /t/ and /d/ by Mandarin and Spanish learners of English.
Thus, the unit of interest is the particular allophonic
Developing spoken communication in a second language variants of /t/ and /d/ in that position. Similarly, Flege
(L2) requires learning a large array of motor skills in and Hillenbrand (1986) examine how French, Swedish, and
production, an array of perceptual skills, and internaliza- Finnish learners of English perceive and produce the
tion of the system which governs spoken behavior and how contrast between /s/ and /z/ in syllable final (coda) position.
it is to be perceived. The research reported in this paper The substance of the SLM is that production skills are
examines the acquisition of L2 production skills necessary acquired with respect first of all to the transfer of categories
for spoken communication. One of the fundamental issues from the native language (L1). The likelihood of transfer is
in understanding how L2 production skills develop is that modulated by the degree of perceptual similarity between
of determining what constitutes a skill. the segments in the two languages. While the SLM focuses
In many models used in the experimental and quantita- on transfer effects, other models of production learning,
tive phonological literature, the targets of analysis are very especially the Markedness Differential Hypothesis (MDH,
specific, commonly the segments that appear in particular Eckman, 1977) focus on aspects of the skills themselves.
phonological environments. For example, the Speech Eckman’s original insight was that various segments entail
Learning Model (SLM) developed by Flege and colleagues a greater degree of difficulty at some level that makes them
(Flege, 1987, 1988, 1995, and later variants) seeks to less commonly used in the world’s languages. This
characterize accented production in an L2 on a phone-by- property, called markedness, then tends also to show up
phone basis. Flege, Munro, and Skelton (1992), for in the learning of production skills, where more difficult
skills tend to be acquired later on in the learning process.
Corresponding author. de Jong, Silbert, and Park (2009) noted a somewhat
E-mail address: yehao@indiana.edu (Y.-C. Hao). stronger interpretation of the MDH is that there is an

0095-4470/$ - see front matter r 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.wocn.2009.06.001
ARTICLE IN PRESS
358 K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373

implicational relationship between structures, such that it is might not exhibit better performance with other segments
actually necessary to acquire a less marked structure before that do not involve that skill. A major question posed by
one can acquire a more marked structure. This stronger the current paper, then, is the degree to which perception
version can be called the Implicational Markedness and production skills, as evident in cross-subject accuracy
Hypothesis (IMH). patterns, are the same.
Models assumed in other L2 phonological research, Perceptual identification skills were evaluated in de Jong
which we call generalized featural models, differ from the et al. (2009) by examining cross-listener differences in
SLM and MDH, in that the units of acquisition are general identification accuracy. In that study, the identification
properties of the two languages’ phonological systems. accuracies of Korean EFL learners were examined to
Generalized phonological properties, such as prosodic determine which segmental contrasts tended to be learned
position or the presence of a featural contrast, are used together. College-age Korean learners of English in Korea
to predict the abilities of L2 acquirers with regard to were presented with productions of English consonants in
specific individual segments. Such models predominate the isolated nonsense mono- and di-syllables. The targets of
linguistically oriented literature. For example, Hancin- analysis in de Jong et al. (2009) were coronal and labial
Bhatt (1994) classified L1’s into two groups with respect to stops and fricatives, placed in pre-vocalic and post-vocalic
how learners will perceive and produce English dental position in monosyllables, and in the medial position of di-
fricatives; this classification is on the basis of features that syllables. The approach taken was to examine variation
generalize across the whole consonant system, dividing up among the listeners in their accuracies with different sets of
the phoneme inventory of the language. segments, with the expectation that accuracies for one set
While some models of production like Task Dynamics of contrasts should correlate with accuracies in another set
(Saltzman & Munhall, 1989) can be used to model very of contrasts that rely on the same perceptual skill. Thus, if
specific learning of particular segments in particular distinguishing /f/ from /p/ involves the same skill as
prosodic locations, the general thrust of models such as distinguishing /y/ from /t/, we would expect individuals
Articulatory Phonology (Browman & Goldstein, 1986, who are relatively good at distinguishing the labials to be
1988), a ‘conceptual front-end’ for the Task Dynamics relatively good at distinguishing the coronals. However,
model, is to treat particular allophones as the intersection accuracies in a contrasting pair that relies on different
of a set of specific gestures and how they are coordinated perceptual skills from those used for another pair would
with one another. Hence, examples such as the post-vocalic not necessarily correlate.
voicing contrast examined in Flege and Hillenbrand (1986), This correlation patterning is what was found. Korean
and Flege et al. (1992) would require the learning of learners who were particularly accurate at differentiating
general gestural coordination and dynamics that give rise one set of stops from fricatives were also more accurate at
to voicing contrasts across the obstruents with a voicing differentiating other stops from fricatives. Thus, for
contrast. These coordinations would cross-cut all of the example, the individual’s accuracy rates for differentiating
segments which have the same voicing contrast, for coronal stops from fricatives correlated with their accuracy
example, generalizing across labial and dental fricatives. rates for differentiating labial stops from fricatives.
However, the acquisition of the English dental fricatives Similarly, their accuracy rates for differentiating stops
would require learning oral gestures specific to the dentals, from fricatives correlated across prosodic positions; i.e.
and thus would likely be treated as a more particular skill their accuracy rates in pre-vocalic position correlated with
acquisition affecting only dental fricatives. Therefore, in a those in intervocalic and post-vocalic positions. This
Task Dynamics framework, we would not expect the systematic pattern suggested that some individuals were
learning of dental fricatives to impinge upon the learning of good at the manner contrast, regardless of the specific
labial fricatives, since labial fricatives require a different segments or prosodic positions involved, and hence the
gesture. results were interpreted as indicating a single perceptual
The tack taken in the current research is to examine skill (or skill set) for the manner contrast.
patterns of variation across a set of learners, expecting that, Such correlations were not found for different featural
if the units of acquisition are not segments, but are some contrasts; for example, accuracy for the manner contrast
property that generalizes across segments, then the never correlated with that of the voicing contrast in the
segments that share this unit of acquisition will tend to group examined in de Jong et al. (2009). Hence, perceiving
be better learned as a group by certain learners relative to voicing contrasts was interpreted as constituting a different
other learners. To clarify our manner of speaking about perceptual skill than perceiving the manner contrast. Some
these patterns, we will refer to units of acquisition as skills, listeners were better with the voicing contrast, but this did
perceptual or production action sub-systems that are not entail they were also better with the manner contrast.
apparent in the execution of a perceptual or production One notable deviation from this feature generalization
task. Subjects who have developed a particular skill, then, pattern was also found, however. A lack of correlation was
are expected to be systematically better with a particular found within the voicing contrast when prosodic location
aspect of particular segments that involve that skill. was examined. While voicing accuracy for different
Learners who have better acquired this skill, however, segments generally correlated with other segments, voicing
ARTICLE IN PRESS
K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373 359

accuracy in onset position did not correlate with voicing As examples, the presence of correlations between
accuracy in intervocalic or coda position. Similarly, voicing accuracy in producing voicing contrasts for coronal
accuracy for consonants placed between vowels (intervo- consonants and accuracy in producing voicing contrasts
calic position – VCV) did not correlate with that for for labial consonants would eliminate the null hypothesis
consonants at the end following a vowel (coda position – that voicing production is learned independently for
VC), and neither of these voicing accuracies correlated with coronal and labial consonants, and support a model in
that for consonants placed at the beginning before the which voicing is a unit in the process of acquiring the
vowel (onset accuracy – CV). This pattern, in contrast with production system of the L2. Such an outcome seems
the pervasive correlation for manner accuracies across particularly plausible in the case of voicing, since
these prosodic positions, suggested the existence of three voicing is generally modeled as involving separate glottal
different voicing perception skills, one for onset position, and oral gestures and a generalized rule for coordinating
one for intervocalic position, and a third one for the coda the glottal with the oral articulation. If learners are
position. The interpretation given for this pattern in de acquiring the glottal gestures and the general rule for
Jong et al. (2009) was that the laryngeal contrasts in coordination, then they should show correlated increments
Korean are affected by allophonic rules that target in production accuracy across the different places of
intervocalic and coda positions. Intervocalic lax stops in articulation. For manner contrasts, however, such an
Korean are extensively shortened and voiced, perhaps outcome would not be predicted by Task Dynamic
leading the Korean learners to have different category accounts that have different oral gestures for different
boundaries for the intervocalic and onset stops, each of fricatives and stops.
which would have to be adjusted for the English contrast. The current research also examines the effect of prosodic
In addition, laryngeal neutralization rules target Korean position. The logic of the analysis is the same. For example,
stops in coda position, creating an additional problem for if the production of each segment involves the same skill
the acquisition of laryngeal contrasts. regardless of position, and if a talker is particularly good at
The current research uses this same cross-subject correla- distinguishing stops from fricatives in onset position, they
tional approach for examining the nature of production skills should also be relatively good in other positions. Here, the
in a learner population. That is, it seeks to evaluate whether initial hypothesis from Task Dynamic modeling would be
there is evidence in patterns of accuracies in L2 productions that contrasts would generalize across prosodic positions;
that would indicate interpretable units of acquisition that since gestural schemata in previous research often required
generalize across segments. Three hypothetical results are scaling and retiming of the same gesture. However, the
evaluated. (1) Skills could be developed independently effect of scaling and retiming should not be under-
for each segment. (2) The generalization pattern could be estimated; it is quite possible that such effects render the
the same as found for the perception data in de Jong et al. production skills for a segment in one position to be so
(2009), suggesting the same units of acquisition for different as to be functionally unrelated to those skills in
production and perception. (3) The generalization pattern another position. Also, with the voicing contrast, the
could indicate production-specific considerations, as re- glottal gestures themselves are often quite different in
flected in gestural representations, such as posited in Task different prosodic positions.
Dynamics (Saltzman & Munhall, 1989); contrasts which Complicating these predictions is the fact that perceptual
share either gestural units, or coordination rules which factors are clearly involved in the acquisition of speech
combine various gestures into coherent syllable patterns, will motor skills (e.g., as in the general model presented in
act as units of acquisition. Guenther et al., 1999). Hence, another question to be
In order to distinguish one of the generalized models addressed in the current research is the degree to which
from a segmentally specific model, we take a segmentally accuracies in production are modulated by perceptual
specific model as the null hypothesis. If the production deficits. In the current study, the results of two production
skills for a consonant bear no relationship to the tasks are compared. One task involves simple reading, and
production skills for another consonant, we expect that the other task is a mimicry task, involving both production
there will be no necessary correlation between a talker’s and auditory perception skills.
accuracy in producing one consonant and their accuracy However, there are other reasons besides a unified skill
with other segments. If, however, a single production skill set that might explain a correlation between the accuracy
is involved with different segments, we expect a talker’s rates for two sets of consonants. One obvious cause for a
accuracy with one consonant contrast to be correlated with correlation would be that some acquirers are simply more
their accuracy with other consonants that share the same experienced with the L2 than are others, and so their
production skill. Note that the logic of the null hypothesis production skills across the board are better than are
used here means that the lack of a correlation does not rule others’. Thus, production accuracy for any two consonants
out the presence of a skill that does generalize across will be correlated. To test for this interpretation of
segments, since other factors, whether ceiling or floor correlations, we examine a variety of sets of consonants,
effects or just general noise in the data might mask the expecting that sets of consonants that share a skill set will
presence of a relationship. systematically exhibit stronger correlations.
ARTICLE IN PRESS
360 K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373

2. Methods generally exhibits more aspiration than its English counter-


part, and the voiced English stop corresponds to the two
The experiment includes two tasks: (1) Reading: Korean unaspirated Korean categories that contrast a lenis stop
learners of English read a list of English nonsense words with a relatively soft release characteristic and low
written in IPA orthography. (2) Mimicry: Four American fundamental frequency on the following vowel, and a
English speakers read a list of English nonsense words fortis stop with a sharp release and high fundamental
written in IPA. Their recordings were presented to Korean frequency in the following vowel (Ahn & Iverson, 2004;
learners of English, and the learners repeated each stimulus Han & Weitzman, 1970; Jun, 1998; Kang & Guion,
the American speakers produced. 2006; Kim, 1970; Oglesbee, 2008; Park, 2003; Silva,
2006). Though the match across the languages is not exact,
previous work shows that the voiced English stop is
2.1. Reading task generally identified as either a fortis or a lenis Korean
stop, and Korean listeners tend to subjectively rate the
2.1.1. Stimuli match as quite good (Park & de Jong, 2008; Schmidt,
The stimuli are nonsense words consisting of all 1996). Therefore the four English stops in the stimuli are
combinations of the vowel />/ and the consonants given considered to have similar counterparts in Korean.
in the top half of Table 1. These consonants are all of the Korean has no non-sibilant fricatives except for /h/, and
voiced and voiceless, coronal and labial, stops and previous work (Park & de Jong, 2008; Schmidt, 1996) has
non-sibilant fricatives, as arrayed in a 2  2  2 matrix in shown that Korean learners vary considerably as to which
Table 1. The consonants appeared alone in each of four Korean consonant label best fits the English fricatives,
prosodic locations, either before a vowel (Onset Position – both individually and across listeners, and they have lower
C>, e.g., ‘pa’), after a vowel (Coda Position – >C, e.g. ‘op’), confidence ratings when asked to apply the Korean labels
or between two vowels. The consonants between two to these fricatives, than when they label English stops.
vowels either had primary stress on the vowel preceding the Hence the four English non-sibilant fricatives do not seem
consonant (Post-Stress Position – >!C>, e.g., ‘oppa’), or on to correspond well to any sound in Korean.
the vowel following the consonant (Pre-Stress Position – With respect to the prosodic positions, Korean allows all
>C>!, e.g. ‘apah’). of the consonant contrasts to appear in pre-vocalic and
The eight segments examined here, /p b t d f v y j/, can be intervocalic position, but there are a number of allophonic
categorized into two general classes with respect to the rules in English and in Korean affecting the intervocalic
Korean phonological system. Korean anterior stops position. English has reduction rules in the post-stress
and fricatives are presented at the bottom of Table 1. environment, and Korean lenis stops in intervocalic
The English stops are all similar to stop segments in position are shortened and voiced throughout
Korean. Korean has labial and coronal stops, though the (Jun, 1996), much like post-stress non-coronal stops in
voicing contrast is somewhat different from that in English. English. However, Korean does not have a stress-accent
While American English stops in onset position exhibit a system like English, so it is difficult to evaluate how
two-way contrast between aspirated and unaspirated, Korean listeners will treat the two intervocalic positions.
Korean stops exhibit a three-way contrast between While various scholars have indicated subjective differ-
aspirated, fortis, and lenis. The aspirated Korean stop ences in the perceived prominence of different syllables,
their judgments often contradict one another and are
generally not verified by experimental data (de Jong, 1994,
Table 1 2000; Lim & de Jong, 1999; Sohn, 1999, p. 197). Some
Segments examined in current analyses and comparable Korean segments.
varieties do have a limited quantity system that is some-
English times analyzed as indicating a contrast in stress, but
quantity has not been noted to interact with segmental
Coronal Labial
allophonic variation. Korean also allows consonants in
Voiced Voiceless Voiced Voiceless coda position as well, but the number of contrasts is greatly
reduced by a variety of neutralization rules that collapse
Stops /d/ /t/ /b/ /p/ productions of fricatives and stops of the three different
Non-sibilant fricatives /j/ /y/ /v/ /f/
laryngeal types to single voiceless stops.

Korean 2.1.2. Talkers


Twenty talkers, 15 female and 5 male, were recruited
Coronal Labial
from the undergraduate student population at Kyonggi
Fortis Lenis Aspirated Fortis Lenis Aspirated University, in Suwon (located near Seoul), Korea. All were
h
in their mid-twenties (between 23 and 28, with the mean
Stops /t’/ /t/ /t / /p’/ /p/ /ph/
Fricatives /s’/ /s/
24.35), and none had resided for more than three months
in an English-speaking country prior to the experiment.
ARTICLE IN PRESS
K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373 361

Each of them was recruited from basic level English classes, prosodic locations and stress patterns, and they were
and all of them identified themselves as English majors. monitored for accurate interpretation of each stimulus.
With their experience with English classes in primary and Recordings were digitized at 44.1 kHz and target nonsense
secondary school, this would mean that they have extensive words were digitally spliced, amplitude normalized, and
experience with written English, though commensurate randomized. The edited stimuli were presented to Korean
experience with spoken English is rare. Hence, each of talkers.
them would probably be classified as an inexperienced
learner, with respect to native English spoken productions. 2.2.2. Talkers
However, analyses presented below indicate the presence of The same 20 Korean talkers in the Reading task
a broad range of production abilities among the talkers. participated in the Mimicry task. For half of the talkers,
the reading production task preceded the mimicry task,
2.1.3. Procedure and for the other half of them, it followed the mimicry task.
The 20 Korean learners were seated in a quiet room
individually, and presented the stimuli written twice from a
randomized list, and were asked to produce each item once. 2.2.3. Procedure
For all of the segments, except the dental fricatives, the The Korean talkers were seated in a quiet room
consonants were represented with the typical orthographic individually, and presented with the stimuli over head-
symbols. For the dental fricatives, the voiced and voiceless phones. Each subject was presented each of the produc-
segments were indicated with IPA symbols. Each talker tions once, and was asked to repeat each stimulus. Stimuli
was checked informally concerning their familiarity with were presented at a comfortable pace with five-second
the written IPA symbols, and all talkers were familiar intervals. The productions were recorded digitally onto a
with the symbols through their instruction in English. Sony MZ-R909 mini-disc recorder in non-compressed
In addition, after each prompt, the consonant value was mode by means of a stand-alone Sony ECM-MS907
also indicated with a list of three keywords with the microphone. A block of five items, randomly selected from
consonant appearing in different prosodic positions. In the the stimuli, was run first, and the listeners were asked if
Korean school system, students typically learn English they had any questions about the procedure. These practice
using the IPA symbols for the dental fricatives; the items were not included in the analyses. Items were
pronunciation of English words is commonly included randomized in blocks, with four different randomizations
with words targeted for learning in the textbooks used in used for the 20 subjects.
Korean secondary schools. This familiarity was also
evident in their responses in the parallel perceptual study 2.2.4. Assessing accuracy in Reading and Mimicry tasks
in de Jong et al. (2009). Stress was also indicated on the di- The productions of the twenty talkers in Reading and
syllabic forms with an acute accent, again following the Mimicry were divided into five blocks of four talkers each.
typical approach used in English instruction in Korea. The productions for each block were randomized and
The productions were recorded digitally onto a Sony presented to 10 native American English speakers,
MZ-R909 mini-disc recorder in non-compressed mode by recruited mostly from the undergraduate population at
means of a stand-alone Sony ECM-MS907 microphone. Indiana University. Hence, there were 50 native listeners,
The total tokens for each talker are 8 (consonants)  4 yielding 10 evaluations per Korean production. The
(prosodic locations)  2 repetitions ¼ 64. listeners were seated in a quiet room in groups of 1–5,
and presented with the stimuli over a centrally located
2.2. Mimicry task loudspeaker. The listeners were presented each of the
productions once, and were asked to identify the consonant
2.2.1. Mimicry production stimuli in each stimulus by circling the appropriate Roman
Four speakers of American English in their late twenties consonant symbol from a list of 15 alternatives presented
who grew up and resided at the time of recording in the on a paper response form. The 15 alternatives were chosen
northern Midwest produced the stimuli written in IPA. on the basis of pilot work to the Identification task
The stimuli were the eight segments in four prosodic presented in de Jong et al. (2009) with Korean listeners in
locations, and each combination appeared twice in the list. the United States. Along with each response alternative
Thus the total amount of tokens for each American was a keyword, which was chosen to exemplify each
speaker is 8 (consonants)  4 (prosodic locations)  2 segment. The response alternatives and keywords are given
(repetitions) ¼ 128. The American speakers were seated in Table 2. Also, listeners were provided with the option of
in a sound-treated room in the Indiana University indicating a write-in response (Other:___), which listeners
Phonetics Lab and recorded onto DAT using a free- used rarely (a total of 2.9% of the responses). The current
standing microphone (Electro Voice RE50). The talkers analysis, then, includes 5 (blocks)  4 (Korean talkers)  10
were familiarized with the orthographic stimuli before (American listeners)  8 (segments)  4 (prosodic locations)  2
recording to ensure that they knew what each of the letters (repetitions) ¼ 12,800 Mimicry responses, and 5 (blocks)  4
indicated and how the orthography cued the various (Korean talkers)  10 (American listeners)  8 (segments)  4
ARTICLE IN PRESS
362 K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373

Table 2
Orthographic response alternatives.

dog tell thin that fall vase sit zip pin ball ship vision hall chop job

d t y j f v s z p b sh zh h ch j

(prosodic locations) ¼ 6400 Reading responses, totaling One obvious problem with interpreting correlations as
19,200 data points. evidence for a unitary skill set is that it might be the case
that the various accuracy rates correlate with one another
simply because different talkers vary in overall skill with
2.2.5. Testing for generalization spoken English. To test for the strength of general
To test the generalized model, we examine accuracy correlation across different accuracies, we compare corre-
patterns with respect to a null hypothesis that different lations of one feature’s accuracy in one set of data with that
segments involve no common skills. We determine whether feature’s accuracy in a different set (e.g., voiced vs.
the accuracy for one set of segments correlates with the voiceless labials compared with voiced vs. voiceless
accuracy for another set of segments that contrast in the coronals) with correlations of that feature’s accuracy with
same feature. If each segment involves its own individual a different feature in a different set (e.g., voiced vs.
production skill, the development of abilities for one voiceless coronals compared with coronal stops vs.
segment set need not parallel that of abilities for another fricatives). Thus, for example, for voicing accuracy with
segment set that contrasts in the same feature. However, all coronals, we calculate the proportion of times that a
segments that share a production skill should develop in listener chooses a voiced segment when presented with a
parallel. Common gestural skills, thus, predict that listeners Korean production of /d/ or /j/, or chooses a voiceless
who are better at a gestural distinction in one set of segment when presented with a Korean production of /t/ or
segments (e.g., the voicing contrast between /p/ and /b/) /y/, and correlate this with the proportion of times the
will also be better at another parallel set (e.g., the voicing listener chooses a fricative when presented with /v/ or /f/,
contrast between /t/ and /d/). It is not necessary that the or chooses a stop when presented with /b/ or /p/. If the
actual accuracies for any two parallel sets will be the same, accuracies across features also correlate, it is individual
because the instantiation of the contrast in the acoustic subject’s English proficiency rather than featural general-
signal for one pair is not necessarily as robust for another ization that causes the correlation. However, if accuracies
pair. For example, the contrast between /p/ and /f/ is systematically correlate within a feature, but not across
systematically easier for native English listeners to differ- features, this would disconfirm the segmentally specific
entiate than that for /b/ and /v/ (Silbert & de Jong, in model, and provide strong evidence for generalized
review). However, what is necessary is that the abilities learning. The criterion for significance of these correlations
develop in a correlated fashion. It should also be noted is set at p ¼ .01.
that, since the null hypothesis is that segments involve only
individually specific skills, a lack of correlation will fail to 3. Results
provide evidence for a generalized skill, but does not
eliminate it as a hypothesis. 3.1. Reading
Accuracies for each set of consonants were determined
as the proportion of times an English listener selected a 3.1.1. Manner accuracy correlations across place
segment on the response sheet that matched the target of articulation
segment in terms of the feature in question. For example, The first case we examined concerns the development of
when comparing voicing accuracy for labials and coronals, non-sibilant fricative production as compared to stop
two variables are generated. Labial voicing accuracy is the production. This development would be considered the
proportion of times an English listener selected a voiced development of a ‘new’ skill in the SLM. First we examine
segment when presented with /v/ or /b/, or selected a whether Korean learners’ ability to distinguish stops and
voiceless segment when presented with /f/ or /p/. Coronal fricatives in labials correlates with that in coronals. The
voicing accuracy is the proportion of times English correlation across subjects is plotted in Fig. 1. The x-axis is
listeners selected a voiced segment when presented with the manner accuracy for each subject in labials, while the
/j/ or /d/, or selected a voiceless segment when presented y-axis is the manner accuracy in coronals.
with /y/ or /t/. The proportions for each talker, then, are As we can see, the correlation is very low, despite a good
plotted against each other, yielding scatter plots, and the distribution of accuracies over 30% of the scale in both
degree of correlation between the two variables is assessed dimensions (r2 ¼ .018). There are some talkers who were
using the Pearson r2 statistics, which vary from 0 to 1 and good at both labials and coronals, and some who are bad
whose value indicates proportion of variation in one at both. But there are also subjects who are good at one
variable shared with the other variable, from 0% to 100%. and bad at the other, suggesting that the acquisition of the
ARTICLE IN PRESS
K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373 363

1.0

0.9

0.8
Manner accuracy for coronals

0.7

0.6

0.5

R Sq Linear = 0.018
0.4

0.4 0.5 0.6 0.7 0.8 0.9 1.0


Manner accuracy for labials

Fig. 1. Accuracy rates for distinguishing coronal stops from fricatives plotted against rates for labials in the Reading task. Each symbol indicates one
talker.

manner contrast in one place of articulation does not entail diagonal apparent in Fig. 2 appears to reflect general facts
the acquisition in another place of articulation. This result about the salience of encoding for labial and coronal
differs from the assumption of generalized featural model. But consonants, and not necessarily something specifically
if we consider the articulation of these consonants, the result is about the acquisition process.
not unexpected. The different fricatives require the use of Yet, even with this offset, the two accuracies correlate.
different articulators and hence, different motor skills. There- When the Korean learners are better at manner contrasts in
fore when learners acquire the manner contrast among labials, voiceless segments, they also tend to be better at voiced
this ability is not necessarily applicable to coronals. segments. This result is expected. Considering articulation,
the oral gestures that are creating the distinction between
3.1.2. Manner accuracy across voicing the stops and fricatives are likely to be very similar for
Fig. 2 plots manner accuracy in voiced segments against voiced and voiceless segments.
that in voiceless segments. Here, we find a strong
correlation (r2 ¼ .408). Also apparent in Fig. 2, Korean 3.1.3. Manner accuracy across prosodic locations
learners are all better at producing voiceless segments than We also examined correlations of manner accuracies
voiced segments, as is evident in the large offset of all of the across the four prosodic locations: onset, intervocalic post-
data points below the diagonal, indicating higher voiceless stress and pre-stress, and coda. The results, in Table 3,
accuracy than voiced accuracy. This difference in manner indicate a general correlation across all of the positions,
accuracy for voiced and voiceless segments was also found except that the accuracy in coda position does not correlate
in the perceptual data in de Jong et al. (2009), and with any other position. This would imply that if the
corresponds to a difference in accuracy found with native learners acquire the manner contrast in one prosodic
listeners and talkers in the speech-in-noise data analyzed in location, it generalizes to other positions except the coda.
Silbert and de Jong (in review). Thus, the offset from the This seems to indicate that the production of manner
ARTICLE IN PRESS
364 K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373

1.0

0.9
Manner accuracy for voiced segments

0.8

0.7

0.6

0.5

R Sq Linear = 0.408
0.4

0.4 0.5 0.6 0.7 0.8 0.9 1.0


Manner accuracy for voiceless segments

Fig. 2. Accuracy rates for distinguishing voiceless stops from fricatives plotted against rates for voiced segments in the Reading task. Each symbol
indicates one talker.

Table 3
Pearson r2 values for manner accuracy rates across prosodic position in Reading.

Onset (C>) Pre-stress (>C>!) Post-stress (>!C>) Coda (>C)

Onset (C>) 1.000 .548 .324 .098


Pre-stress (>C>!) 1.000 .493 .178
Post-stress (>!C>) 1.000 .128
Coda (>V) 1.000
 Indicates a significance at po.01.

contrasts is a single skill that is obtained regardless of 3.1.4. Cross-feature correlation analysis
prosodic location. However, situating that production in a Although we find a correlation of manner accuracy
post-vocalic and coda position presents an additional between voiced and voiceless segments, and also across
problem. The coda position in Korean has a special status, various prosodic locations, it does not necessarily support
in that there is an extremely restricted set of segments that the generalized featural model. It might also be possible
occur in this position. Neutralization rules merge the three- that some subjects are better than others at everything.
way contrast in onset stops into the lenis stop, and also Accordingly, their accuracy for any contrast correlates
change the coda fricatives /s/, /s’/, and /h/ into the lenis /t/ with their accuracy for any other contrasts. To test this
(Kim & Jongman, 1996). Therefore it is not unexpected alternative interpretation, we took the accuracy for two
that the manner contrast in this position is different from different features, manner and voicing in different prosodic
that in other locations. locations, and correlated them with each other. The results
ARTICLE IN PRESS
K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373 365

are given in Table 4. The rows are manner accuracies correlation in question would not have been considered
across four prosodic positions, while the columns are significant. We are confident that the overall pattern allows
voicing accuracies. us to conclude that the correlations between manner
If it is true that some subjects learn better and are good accuracies indicate developmental grouping of parallel
at everything, we should find a correlation between the segments, rather than the overall proficiency of individual
accuracies of any two features. Yet, Table 4 shows that, talkers.
with one exception (manner in the coda marginally
correlates with voicing in the onset), there are no significant 3.1.5. Voicing accuracy
correlations across features. Since we conducted 16 After examining the manner accuracies, we turn to
analyses, this single correlation could easily be due another feature, voicing. First, we tested whether the
to chance. If we had adjusted alpha-levels downward to subjects’ voicing accuracy in labial consonants correlates
correct for the large number of significance tests, the with that in coronal consonants, as plotted in Fig. 3.

Table 4
Pearson r2 values for manner accuracy rates (rows) and voicing accuracy rates (columns) for each prosodic position in Reading.

Onset (C>) Pre-stress (>C>!) Post-stress (>!C>) Coda (>C)

Onset (C>) .073 o.001 .024 .034


Pre-stress (>C>!) .029 o.001 .036 .003
Post-stress (>!C>) .035 .061 .068 .002
Coda (>V) .278 .007 .016 .006
 Indicates a significance at po.05.

1.0

0.9
Voicing accuracy for coronals

0.8

0.7

0.6

0.5

0.4 R Sq Linear = 0.014

0.4 0.5 0.6 0.7 0.8 0.9 1.0


Voicing accuracy for labials

Fig. 3. Accuracy rates for distinguishing voiceless from voiced coronal obstruents plotted against rates for labial obstruents in the Reading task. Each
symbol indicates one talker.
ARTICLE IN PRESS
366 K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373

Unexpectedly, we did not find significant correlation ences reported in Yoshioka, Löfqvist, & Collier, 1982). The
(r2 ¼ .014). More subjects are better at labials, but there lack of generalization for the voicing contrast found here,
are also some who are better at coronals but not better at then, suggests that the motor skills required to distinguish
labials. It suggests that if learners acquire the voicing voicing in one set of segments (stops) may not, prepare
contrast in one place of articulation, they do not them to contrast voicing in the other set (fricatives).
necessarily apply it to other places of articulation, which However, a confounding problem with this interpreta-
does not fit the prediction of a generalized featural model. tion is that the distribution of accuracies is narrower for
Fig. 4 plots the relationship for voicing accuracy across voicing than for manner. The talkers, in general, differ
stops and fricatives. Again we did not find significant from one another less in voicing accuracy than manner
correlation (r2 ¼ .021). However, this lack of correlation is accuracy, perhaps due to the laryngeal contrasts being
not unexpected. The voicing contrast in stops is quite entangled with laryngeal contrasts in their L1. Hence, it is
different from that in fricatives (e.g. the voicing contrast in possible that, with a broader range of accuracies, a
stops involves aspiration, while that in fricatives is reflected correlation may become apparent. There is, however, little
in the generation of voicing in the frication). Note that this in the distributions in Figs. 3 and 4 to suggest this.
prediction of a gestural model is different from that above (In addition, the distribution in Fig. 3 reveals one outlier
for the manner contrast, where we predicted that the who was exceptional in being exceptionally bad with
production of manner contrasts would generalize across voicing in labials while being one of the better talkers with
voicing. There, the oral gestures for stops and fricatives are voicing in coronals. The lack of correlation is not due to
similar for voiced and voiceless segments, while here the this outlier, since removing this subject still does not yield a
glottal gestures and their coordination required for voiced significant correlation (r2 ¼ .061).)
and voiceless segments are different for stops and fricatives The voicing accuracies across different prosodic loca-
(Lisker, Abramson, Cooper, & Schvey, 1969; and refer- tions are displayed in Table 5. There are no significant

1.0

0.9

0.8
Voicing accuracy for fricatives

0.7

0.6

0.5

0.4 R Sq Linear = 0.021

0.4 0.5 0.6 0.7 0.8 0.9 1.0


Voicing accuracy for stops

Fig. 4. Accuracy rates for distinguishing voiceless from voiced fricatives plotted against rates for stops in the Reading task. Each symbol indicates one
talker.
ARTICLE IN PRESS
K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373 367

correlations. It appears that the mastery of voicing in one fricatives. Finally, regarding the voicing accuracies in
prosodic context is independent from that in other different prosodic contexts, we only found marginally
contexts. significant correlations between the two intervocalic posi-
tions, as shown in Table 7.
3.2. Mimicry For data in the Mimicry task, we found evidence that
seems to support a generalized featural model most of the
3.2.1. Manner and voicing correlations time. However, these high correlations might be attributed
In general, the mimicry data exhibit very strong manner to differences in the overall proficiency of individual
and voicing correlations across all sub-sets of the data. subjects. If this is true, the accuracy of any feature would
Talkers’ manner accuracies for labials and coronals correlate with that of any other feature. Hence it is
significantly correlate with each other (r2 ¼ .561), as they necessary to run a cross-feature correlation analysis to
do for voiceless and voiced segments (r2 ¼ .699). Manner verify the cause of correlation.
accuracies also correlate across all of the different prosodic
contexts, as indicated in Table 6. These correlations are 3.2.2. Cross-feature correlation analysis
very strong, especially as compared to the correlations As with the Reading data, we correlated manner
found for reading (above) and the correlations for and voicing accuracies in different prosodic contexts.
perceptual identification in de Jong et al. (2009). As Table 8 shows, the voicing accuracy in onset position
For the voicing accuracy correlation across place of does not correlate with manner accuracy in any other
articulation, we found significant correlation between position. Also, the voicing accuracy in intervocalic pre-
labials and coronals (r2 ¼ .657), but when we turned to stress position does not correlate with the manner accuracy
voicing accuracies across stops and fricatives, the correla- in intervocalic post-stress position. However, aside from
tion is only marginally significant (r2 ¼ .255, po.05). these, all the other cells show significant correlations.
As we proposed for the Reading data, this lack of strong Thus, for the mimicry results, we cannot rule out the
correlation might be due to the different motor skills possibility that the correlations are due to some of the
required to produce voicing contrasts in stops than in talkers being better overall at the mimicry task than others

Table 5
Pearson r2 values for voicing accuracy rates across prosodic position in Reading.

Onset (C>) Pre-stress (>C>!) Post-stress (>!C>) Coda (>C)

Onset (C>) 1.000 o.001 .032 .012


Pre-stress (>C>!) 1.000 .042 .148
Post-stress (>!C>) 1.000 .003
Coda (>V) 1.000

Table 6
Pearson r2 values for manner accuracy rates across prosodic position in Mimicry.

Onset (C>) Pre-stress (>C>!) Post-stress (>!C>) Coda (>C)

Onset (C>) 1.000 .646 .526 .486


Pre-stress (>C>!) 1.000 .753 .549
Post-stress (>!C>) 1.000 .663
Coda (>V) 1.000
Indicates a significance at po.01.

Table 7
Pearson r2 values for voicing accuracy rates across prosodic position in Mimicry.

Onset (C>) Pre-stress (>C>!) Post-stress (>!C>) Coda (>C)

Onset (C>) 1.000 .066 .037 .038


Pre-stress (>C>!) 1.000 .202 .070
Post-stress (>!C>) 1.000 .172
Coda (>V) 1.000
 Indicates a significance at po.05.
ARTICLE IN PRESS
368 K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373

Table 8
Pearson r2 values for manner accuracy rates (rows) and voicing accuracy rates (columns) for each prosodic position in Mimicry.

Onset (C>) Pre-stress (>C>!) Post-stress (>!C>) Coda (>C)

Onset (C>) o.001 .203 .503 .299


Pre-stress (>C>!) .023 .234 .584 .250
Post-stress (>!C>) .102 .178 .504 .265
Coda (>V) .072 .332 .465 .230
 Indicates a significance at po.05.
 Indicates a significance at po.01.

were. The mimicry task involves perceiving the auditory shows a significant effect (r2 ¼ .604), and then looked for
stimuli and reproducing them immediately. Therefore it the effect of the second factor, manner accuracy in labials.
requires the on-line coordination of perceptual skills and The result shows that there is no significant effect of the
motor control. Such coordination might be particularly second factor (r2 change ¼ .067), suggesting that after
difficult for certain subjects, perhaps due to constraints on the overall Mimicry ability is factored out, we no longer see
processing or working memory (Baddeley, 2007; Baddeley, the correlation of manner across place of articulation.
Gathercole, & Papagno, 1998). To factor out differences Turning to manner accuracy across voiced and voiceless
that might be due to non-feature-specific differences in segments, the manner accuracy of voiced and voiceless
accuracy, then, we further conducted a step-wise regression segments correlates before we enter the factor ‘mimicry
analysis to determine if correlations between features ability’. After removing the overall voicing accuracy
remain after the effect of overall proficiency is factored out. correlation first, the manner accuracy of the voiceless
segments still causes a significant r2 change (r2 change ¼ .176)
3.2.3. Stepwise residual analysis – manner in accounting for the manner accuracy of voiced segments.
To factor out overall non-feature-specific differences in It suggests that even if there is individual difference in the
accuracy, we first calculated the overall manner and ability to perform the Mimicry task, there is still additional
voicing accuracy of each talker in the whole set of data. correlation between the manner accuracy in voiced and
We then used these accuracy values as a measure of their voiceless segments. This correlation was also found in the
general ability to do the Mimicry task, which is logically Reading task.
independent of the manner accuracies we are testing. For the manner accuracy across different prosodic
We conducted a two-step regression analysis, in which a locations, we found correlation between any two prosodic
particular manner or voicing accuracy was first correlated contexts previously. To filter out the overall ability effect,
with the overall accuracy in the other feature, and then the we used the manner accuracy in one context (e.g. onset) as
residual was correlated with the related manner or voicing the dependent variable, and that in another context
accuracy. For example, a relation between manner (e.g. coda) as the second independent variable. Then we
accuracy in coronals and in labials was tested by first calculated the overall accuracy in the other two contexts
correlating manner accuracy in coronals with overall (in this case intervocalic pre-stress and post-stress) as the
accuracy in voicing, and then looking for a correlation of first independent variable that represents the overall ability.
the residual from this analysis with labial accuracy. The logic of using the accuracy in the other two prosodic
Similarly, for the voicing correlation across coronals and contexts instead of the overall accuracy is that these data
labials, we use voicing in coronals as the dependent constitute a different sample that can be used to index the
variable, the overall manner accuracy as the first factor, talkers’ overall abilities. If the second independent variable
and the voicing in labials as the second factor. If the second causes a significant r2 change, then it indicates featural
independent variable causes a significant r2 change in generalization across different prosodic positions. If the
accounting for the dependent variable, we can conclude overall accuracy accounts for most of the variance of
that featural generalization contributes to the correlation the dependent variable and the second independent
we found. If the second independent variable does not have variable does not have a significant effect, we may conclude
a significant effect, it suggests that the correlation we found that the general Mimicry ability accounts for the correla-
could just be due to overall differences between talkers’ tion we saw earlier.
abilities to do the Mimicry task. The r2 change of the second independent variable is
First we looked at the manner accuracies in coronals and summarized in Table 9. Most correlations still persist after
labials, where we found a significant correlation between filtering out the general correlation, except between the
these two sets before factoring out the background onset and intervocalic post-stress position, and between
correlation. We first accounted for the variance of manner onset and coda, both of which achieve marginal signifi-
accuracy in coronals by the overall voicing accuracy, which cance. It seems to suggest the existence of featural
ARTICLE IN PRESS
K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373 369

generalization across prosodic contexts in the Mimicry 4. Discussion


task. We did not expect to find correlations between the
coda and other positions due to the coda neutralization in 4.1. Overall patterns of correlation
Korean. There was no significant correlation in the
Reading task, yet, interestingly, in the Mimicry task we The current research employs two tasks, one with an
did find that the accuracy in coda position correlates with auditory input, and the second with a written input. Since
that in the two intervocalic positions. both tasks involve speech motor control, we expected to
find similar patterns in the Reading results and the
Mimicry results, and some patterns are, in fact, replicated
across the two tasks. The general pattern of results is
3.2.4. Step-wise residual analysis – voicing summarized in Table 10, with predictions of a gestural
For the voicing feature, we conducted the same stepwise acquisition model and results for the perceptual identifica-
regression analysis as with the manner accuracy. When tion data in de Jong et al. (2009). The general pattern of
the voicing accuracy of coronals is the dependent variable, correlation conforms to what would be predicted from
the effect of the voicing accuracy in labials after consider- a gestural acquisition model, with some exceptions.
ing the overall manner accuracy factor is significant Considering cross-segment correlations, we found signifi-
(r2 change ¼ .253), indicating that beyond the general cant correlation between the manner accuracy in voiced
ability to perform the Mimicry task, learners can apply the and voiceless segments in both tasks. This correlation is
voicing contrast they acquire in one place of articulation to expected since the motor skills required for the manner
another place of articulation. contrast in voiced segments are similar enough to those in
For the voicing accuracy in stops and fricatives, we did the voiceless segments. Such feature generalization is also
not find significant correlation in the previous analysis of observed in previous perception data (de Jong et al., 2009).
Mimicry, and after factoring out the overall accuracy, the A lack of correlation is also shared by the two
voicing accuracy in the stops is still not a significant factor production tasks. There is no manner accuracy correlation
accounting for the variance in the accuracy in fricatives across labials and coronals. This is expected since the
(r2 ¼ .003). This is expected since the voicing contrast in gestures required to contrast labial stops and fricatives are
stops and fricatives is quite different. This lack of different from those to contrast coronal stops and
correlation is also observed in the Reading data. fricatives. Hence learners have to acquire two sets of
Since there is no voicing accuracy correlation between gestures for the two places of articulation instead of
any two prosodic contexts in the Mimicry data, we did not acquiring one oral gesture that applies to both coronal and
conduct further regression analyses for this set. labial segments. This effect seems to reside specifically in

Table 9
r2 change by the second independent factor in manner accuracy across prosodic position in Mimicry.

Onset (C>) Pre-stress (>C>!) Post-stress (>!C>) Coda (>C)

Onset (C>) .186 .055 .155


Pre-stress (>C>!) .263 .230
Post-stress (>!C>) .351
Coda (>V)
 Indicates a significance at po.05.
Indicates a significance at po.01.

Table 10
Summary of significant correlations in Reading and Mimicry tasks with perceptual correlations from de Jong et al. (2009) and predictions from a gestural
acquisition model.

Accuracy measure Splitting feature Perceptual results Gestural hypothesis Reading results Mimicry results

Manner Place 
Manner Voice    
Manner Prosodic position   (all but coda) (initial position weaker)
Voicing Place   
Voicing Manner 
Voicing Prosodic position (only intervocalic)  (only intervocalic)

 Indicates a significance at po.05.


Indicates a significance at po.01.
ARTICLE IN PRESS
370 K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373

articulation, since the previous work in perceptual identi- These prosodic patterns differ markedly from those
fication found significant correlations. This divergence of obtained for manner accuracy, which suggest strongly that
the results for perception and production tasks would the acquisition of fricative production involves skills that
make sense, since the stop–fricative contrast is similar on generalize across prosodic positions. All of the correla-
the acoustic side, requiring the differentiation of stop and tions, except those involving coda position, are strongly
burst from low-intensity noise for both coronals and significant in the Reading data, and most are significant in
labials. However, on the production side, reference to the Mimicry data. Compared to the large allophonic
different articulators would suggest that the two sets differences between voicing contrasts in different prosodic
require different production skills. positions, variation in fricatives across positions is expected
Another, similar case is the voicing contrast in stops and to be quite small. The production uniformity would fit well
fricatives. There is no accuracy correlation in either with an explanation of the correlations as indicating that
Reading or Mimicry data. This might be due, again, to learners are developing the gestures for stops and fricatives
the use of different articulatory mechanisms for the voicing irrespective of their coordination with the surrounding
distinction in stops and fricatives. We might further context.
speculate that the production of the voiced fricatives is One striking aspect of prosodic correlations in the
particularly at issue, not only due to the aerodynamic Reading data, however, is the systematic lack of correla-
difficulty of maintaining voicing and creating frication, but tion between post-vocalic coda manner contrasts and the
potentially also because Korean does not have voiced other manner contrasts. It should be noted that manner
allophones of its sibilant fricatives. While the lax stops accuracies in the mimicry data here, as well as previous
have voiced allophones in intervocalic position, the perception data (de Jong et al., 2009) were obtained across
fricatives do not. Hence, the production of voiced frication all prosodic positions, including codas. Thus, this coda
is not required in Korean, and is a skill that needs to be difference is particularly localized in productions
developed specifically for English. Again this separation (Reading). An apparent source for this difference might
did not appear in perceptual data, where voicing identifica- be the Korean neutralization rules. Due to pervasive
tion accuracy did correlate across stops and fricatives. neutralization of post-vocalic consonants, it is possible
This divergence would also make sense in light of the that the coordination of fricative gestures with a preceding
common effects of voicing for stops and fricatives on the vowel itself is a novel skill. In this interpretation, the
acoustic side, where voicing creation in fricatives is learners must not only acquire the novel fricative gestures,
aerodynamically quite different from that in stops. but also have to acquire a novel coordinative skill to place
The general point, then, of these analyses is that the fricatives in post-vocalic coda position. This interpreta-
requirements for the acquisition of production skills are tion would also fit with previous studies that find coda
different from perception skills, and these differences tend to fricatives to be particularly prone to triggering vowel
encourage different sets of skills to be acquired as a group. epenthesis.
Put another way, the acquisition of production skills in the This interpretation is also supported by a more detailed
L2 appears to be well described as the acquisition of a set of examination of the accuracy patterns, plotted in Fig. 5.
gestures, affecting all segments that share the gesture. It is Here, manner accuracies in all positions except codas are
striking how well the predictions concerning cross-segment plotted as a function of accuracies in the coda. The lack of
correlations based on the criterion of shared gestures predict correlation is obvious, as is a generally well-distributed
the pattern of correlations in the current data. This differs range of accuracies (there are no apparent ceiling or floor
from perceptual acquisition, which seems to operate at the effects in the data). There is one relationship between the
level of features, which cross-cut segments. accuracy rates, however, that is apparent in the distribu-
Complicating this general picture are questions of tions, specifically the general lack of cases in which
prosodic position, which modulate the patterning. The accuracy in codas is greater than in other positions. This
voicing accuracy in one prosodic context usually does not pattern is not like the accuracy difference patterns noted
correlate with other contexts in the current data. In English above, for example, in Fig. 2, in which there was a general
the gestural composition of the voicing distinction for stops correlation of the two accuracies, however, shifted off of
is different in various positions, including aspiration in the the diagonal. Here, there is no relationship between coda
onset position, intervocalic lenition patterns, and vowel accuracy and accuracy in other positions for the cases in
duration differences recruited for post-vocalic coda voi- which coda accuracy is lower, the systematic relationship is
cing. Thus, perhaps, it is not surprising that the voicing only apparent in the empty lower right half of the figure.
accuracy for the different positions would not correlate. This pattern would be one predicted by the Implicational
These differences are further complicated by the Korean Markedness Hypothesis, where one structure cannot be
laryngeal contrasts, which also exhibit large differences acquired until another is acquired. Here, talkers cannot be
across prosodic positions. Hence not unexpectedly, we accurate with coda distinctions, until they are accurate with
found only a marginally significant voicing accuracy the distinction in other positions. Put another way, these
correlation between intervocalic positions in the Mimicry data suggest an interpretation where there are two aspects
data, and no correlation at all in Reading. to coda manner production; first, there is the acquisition of
ARTICLE IN PRESS
K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373 371

1.0

0.9

0.8
Manner accuracy in non-coda position

0.7

0.6

0.5

0.4

0.3

0.2 R Sq Linear = 0.166

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0


Manner accuracy in coda position

Fig. 5. Accuracy rates for distinguishing stops from fricatives in the coda position against the rates in the non-coda positions in the Reading task. Each
symbol indicates one talker.

the different fricative-stop gestures, and subsequently, correlated with one another. If we consider previous
there is the acquisition of the coordination that allows perception results (de Jong et al., 2009), correlation
these gestures to be grouped with a previous vowel as a between any two contexts suggests that Korean coda
coda. One cannot acquire the gestural coordination with- neutralization rules interfere with learners’ L2 production
out having already acquired the gestures that are coordi- more than perception. In the Mimicry task, on the other
nated. hand, a lack of correlation is found between the onset and
We also note one other complicating possibility, and that the post-nuclear contexts (intervocalic post-stress and coda
is that talkers who are particularly accurate with the position). It appears that the coda neutralization effect is
manner distinction in codas may be employing an reduced when a perceptual component is involved in the
epenthesis strategy, removing the segments from the coda. task. But it is unclear why we did not get correlation
The lack of correlation found here, then, could be due to between onset and intervocalic post-stress positions in
variation between subjects as to whether they employ this Mimicry since we did get the correlation in both
strategy. Our preliminary work on the perceptual data Identification and Reading tasks. It is possible that this
(Park & de Jong, 2006) does not suggest that this lack is due to the more complicated multi-step regression
conjecture will prove correct, but clearly this possibility is analysis, which would tend to inflate the effects of noise on
one worth pursuing in future research. the results.
Another difference between these two tasks is the voicing
4.2. Difference between Reading and Mimicry correlation between coronal and labial segments. The
articulatory mechanism to contrast voicing in different
Reading and Mimicry results, though similar, also differ places of articulation should be similar. Thus the lack of
in some aspects. For the manner accuracy across different correlation in the Reading task is unexpected. However, we
prosodic contexts, the coda position did not correlate with did get a significant correlation that indicates featural
any other context in the Reading task. But other contexts generalization in the Mimicry task and also in perception
ARTICLE IN PRESS
372 K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373

(de Jong et al., 2009). It is possible that this difference in applying it to all segments that participate in that contrast.
results is due to a smaller range of accuracies in the Therefore learners’ perception of one set of segments
Reading data, but it is also possible that, for some reason, develops in parallel with another set that is distinguished
the generalization of this feature is manifested particularly by the same feature. This is evident in the data that
in perception-related tasks. learners’ ability to identify a manner or voicing contrast in
one set of segments significantly correlates with that in
4.3. The relationship between current production tasks and another set. The only exception is with the voicing
perceptual identification skills distinction in different prosodic locations, which suggests
that the voicing contrasts in the three contexts are too
The range of the overall accuracy rates for each subject is different for learners to acquire them as a single featural
similar in the Reading task (.38–.71) and the Identification contrast.
tasks with similar subjects in de Jong et al. (2009) (.37–.76), Learners’ L2 production, on the other hand, appears to
but these two tasks displayed somewhat different patterns. proceed in terms of gestures and their coordination. The
In Identification, we found evidence for featural general- accuracy rate correlations are largely predictable from the
ization in most cases. Yet in Reading, there is no significant articulatory point of view. For example, the gestures for
correlation in some sub-sets, most of which can be the manner contrast in labials are physically different from
explained by articulatory phonetics or L1 phonological those in the coronals. Thus we did not find significant
rules. It might imply that featural generalization develops correlation between manner accuracies in labials and
earlier in perception. It is developed later in production coronals. But we did find correlation between manner
because the motor system is not as flexible as the auditory accuracies in voiced and voiceless segments, since the
perceptual system. Hence the gestures acquired by the gestures for producing the manner contrast are similar
learners to do certain contrasts in one context may not across voiced and voiceless segments. One additional thing
apply to other contexts. The talkers’ overall accuracy in the to note is that Korean coda neutralization rules impacted
Mimicry task is significantly lower than in the Reading task learners’ production patterns more than the perception
(t(19) ¼ 4.983, po.05), suggesting that the Mimicry task is patterns in the previous perception study. The manner
more difficult and their performance in mimicry is not only accuracies in codas did not correlate with other positions in
determined by each subject’s L2 proficiency but also by the Reading task, but correlated with other contexts in the
their ability to do this type of task (Baddeley, 2007; Identification task.
Baddeley et al., 1998). Before we filtered out the overall Particularly pressing questions that remain are whether
correlation, the correlation patterns of the Mimicry task these overall patterns will change as the learners’ profi-
differed drastically from those in the Reading task. We ciency level increases. However, what is clear currently is
found correlation in almost all sub-sets that contrast by the that, despite the fact that production and perception
same feature, which is similar to perception (de Jong et al., systems are in general very closely tuned to one another,
2009). After we ran the stepwise regression and factored during the process of second language acquisition, what
out the background correlation, its patterns were similar to constitutes a set of skills in perception is not the same as
those in the Reading task. But in addition, Mimicry that in production.
displayed some correlations that were observed in the
previous Identification task but not in the Reading task,
such as the correlation in manner accuracy across prosodic References
locations, and voicing accuracy across places of articula-
tion. In such cases the perceptual component may play a Ahn, S.-C., & Iverson, G. K. (2004). Dimensions in Korean laryngeal
more important role, and the motor constraints are phonology. Journal of East Asian Linguistics, 13, 345–379.
diminished. Baddeley, A. (2007). Working memory, thought, and action. New York:
Oxford University Press.
Baddeley, A., Gathercole, S., & Papagno, C. (1998). The phonological
5. Conclusion loop as a language learning device. Psychological Review, 105,
158–173.
Comparing the previous perception study (de Jong et al., Browman, C. P., & Goldstein, L. M. (1986). Towards an articulatory
2009) with the present one sheds light on the relationship phonology. Phonology Yearbook, 3, 219–252.
Browman, C. P., & Goldstein, L. (1988). Some notes on syllable structure
between perception and production. The average accuracy
in articulatory phonology. Phonetica, 45, 140–155.
rates of the Identification and Reading task are very similar de Jong, K. J. (1994). Initial tones and prominence in Seoul Korean. In:
(mean ¼ .58 for Identification; mean ¼ .55 for Reading), S.-H. Lee, & S.-A. Jun (Eds.), The Ohio State University working
as is the overall range of accuracies among the subjects, papers in linguistics (vol. 43, pp. 1–14).
suggesting that learners did not have more difficulty in one de Jong, K. J. (2000). Attention modulation and the formal properties of
stress systems. In J. Boyle, J.-H. Lee, & A. Okrent (Eds.), Chicago
as opposed to the other. However, they displayed different
Linguistic Society, 36, Vol. 1 (pp. 71–91). Chicago: Chicago Linguistics
learning units in perception and production. The acquisi- Society.
tion in perception seems to proceed along the lines of de Jong, K. J., Silbert, N., & Park, H. (2009). Segmental generalization in
features. Learners acquire a general featural contrast, second language segment identification. Language Learning, 59, 1–31.
ARTICLE IN PRESS
K. de Jong et al. / Journal of Phonetics 37 (2009) 357–373 373

Eckman, F. R. (1977). Markedness and the contrastive analysis Kim, H., & Jongman, A. (1996). Acoustic and perceptual evidence for
hypothesis. Language Learning, 27, 315–330. complete neutralization of manner of articulation in Korean. Journal
Flege, J. E. (1987). The production of ‘new’ and ‘similar’ phones in a of Phonetics, 24, 295–312.
foreign language: Evidence for the effect of equivalence classification. Lim, B. J., & de Jong, K. J. (1999). Tonal alignment in Seoul Korean.
Journal of Phonetics, 15, 47–65. Journal of the Acoustical Society of America, 106 (4, pt. 2): 2152,
Flege, J. E. (1988). Factors affecting the degree of perceived foreign accent 2aSC18.
in English sentences. Journal of the Acoustical Society of America, 91, Lisker, L., Abramson, A. S., Cooper, F. S., & Schvey, M. H. (1969).
370–389. Transillumination of the larynx in running speech. Journal of the
Flege, J. E. (1995). Second language speech learning: Theory, findings, and Acoustical Society of America, 45, 1544–1546.
problems. In W. Strange (Ed.), Speech perception and linguistic Oglesbee, E. (2008). Multidimensional stop categorization in English,
experience: Issues in cross-linguistic research (pp. 233–277). Baltimore, Spanish, Korean, Japanese, and Canadian French. Ph.D. dissertation,
MD: York Press. Indiana University, Bloomington.
Flege, J. E., & Hillenbrand, J. (1986). Differential use of temporal cues to Park, K.-C. (2003). The structure of the accentual phrase in Korean: The
the /s/-/z/ contrast by non-native speakers of English. Journal of the interaction between segments and suprasegments in three Korean
Acoustical Society of America, 79, 508–517. dialects. Ph.D. dissertation, Indiana University, Bloomington.
Flege, J. E., Munro, M. J., & Skelton, L. (1992). Production of the Park, H., & de Jong, K. J. (2006). Native Koreans’ perception of voicing
word-final English /t/–/d/ contrast by native speakers of English, in VC position: Prosodic restructuring effects in consonant identifica-
Mandarin, and Spanish. Journal of the Acoustical Society of America, tion. IULC Working Papers – Online, 6-04.
92, 128–143. Park, H., & de Jong, K. J. (2008). Perceptual category mapping between
Guenther, F. H., Espy-Wilson, C. Y., Boyce, S. E., Matthies, M. L., English and Korean prevocalic obstruents: Evidence from mapping
Zandipour, M., & Perkell, J. (1999). Articulatory tradeoffs reduce effects in second language identification skills. Journal of Phonetics, 36,
acoustic variability during American English /r/ production. Journal of 704–723.
the Acoustical Society of America, 105, 2854–2865. Saltzman, E. L., & Munhall, K. G. (1989). A dynamical approach to
Hancin-Bhatt, B. J. (1994). Segment transfer: A consequence of a dynamic gestural patterning in speech production. Ecological Psychology, 1,
system. Second Language Research, 10, 241–269. 333–382.
Han, M.-S., & Weitzman, R. S. (1970). Acoustic features of Korean /P, T, Schmidt, A. M. (1996). Cross-language identification of consonants. Part
K/, /p, t, k/, and /ph, th, kh/. Phonetica, 22, 112–128. 1. Korean perception of English. Journal of the Acoustical Society of
Jun, S.-A. (1996). Asymmetrical prosodic effects on the laryngeal gesture America, 99, 3201–3211.
in Korean. In B. Connel, & A. Arvaniti (Eds.), Phonology and phonetic Silbert, N. H., & de Jong, K. J. (in review). A quantitative evaluation of
evidence: Papers in laboratory phonology IV (pp. 235–253). Cambridge: the roles of distinctive features and perception in intersegment
Cambridge University Press. similarity. Journal of Phonetics.
Jun, S.-A. (1998). The accentual phrase in the Korean prosodic hierarchy. Silva, D. J. (2006). Acoustic evidence for the emergence of tonal contrast
Phonology, 15, 189–226. in contemporary Korean. Phonology, 23, 287–308.
Kang, K.-H., & Guion, S. G. (2006). Phonological systems in bilinguals: Sohn, H.-M. (1999). The Korean language. Cambridge: Cambridge
Age of learning effects on the stop consonant systems of Korean- University Press.
English bilinguals. Journal of the Acoustical Society of America, 119, Yoshioka, H., Löfqvist, A., & Collier, R. (1982). Laryngeal adjustments in
1672–1683. Dutch voiceless obstruent production. Annual Bulletin Research
Kim, C.-W. (1970). A theory of aspiration. Phonetica, 21, 107–116. Institute of Logopedics and Phoniatrics, 16, 27–35.

You might also like