You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/8072143

Rapid adaptation to foreign-accented English

Article in The Journal of the Acoustical Society of America · January 2005


DOI: 10.1121/1.1815131 · Source: PubMed

CITATIONS READS
494 2,805

2 authors, including:

Merrill F. Garrett
The University of Arizona
79 PUBLICATIONS 8,953 CITATIONS

SEE PROFILE

All content following this page was uploaded by Merrill F. Garrett on 08 March 2014.

The user has requested enhancement of the downloaded file.


Rapid adaptation to foreign-accented Englisha)
Constance M. Clarkeb) and Merrill F. Garrett
Department of Psychology, University of Arizona, Tucson, Arizona 85721

共Received 13 January 2004; revised 9 September 2004; accepted 14 September 2004兲


This study explored the perceptual benefits of brief exposure to non-native speech. Native English
listeners were exposed to English sentences produced by non-native speakers. Perceptual processing
speed was tracked by measuring reaction times to visual probe words following each sentence.
Three experiments using Spanish- and Chinese-accented speech indicate that processing speed is
initially slower for accented speech than for native speech but that this deficit diminishes within one
minute of exposure. Control conditions rule out explanations for the adaptation effect based on
practice with the task and general strategies for dealing with difficult speech. Further results suggest
that adaptation can occur within as few as two to four sentence-length utterances. The findings
emphasize the flexibility of human speech processing and require models of spoken word
recognition that can rapidly accommodate significant acoustic-phonetic deviations from native
language speech patterns. © 2004 Acoustical Society of America. 关DOI: 10.1121/1.1815131兴
PACS numbers: 43.71.Bp, 43.71.Hw 关RLD兴 Pages: 3647–3658

I. INTRODUCTION vocal tract characteristics, normalization was thought to in-


volve calibrating vowel perception according to the vowel
Foreign accent is a source of variability in speech that space dimensions of each speaker 共Joos, 1948兲. There is evi-
can be particularly detrimental to speech perception. Non- dence that vowel identification is indeed evaluated in a rela-
native speech can cause misidentification of words 共Lane, tive, rather than absolute, manner. Ladefoged and Broadbent
1963; Munro and Derwing, 1995a, b; van Wijngaarden, 共1957兲 found that identification of vowels in a /bVt/ context
2001兲 and increased processing time 共Munro and Derwing, changed according to the frequency range of formants in a
1995b兲. However, research shows that experience with ac- carrier sentence. Another finding consistent with the normal-
cented speech improves perceptual accuracy 共Bradlow and
ization hypothesis is the effect of constantly changing the
Bent, 2003; Clarke, 2000; Weil, 2001; Wingstedt and Schul-
talker in a series of spoken words. This results in slower
man, 1987兲. What is not clear is how much 共or little兲 expe-
processing and increased error rates, presumably due to the
rience is necessary for changes in perception to occur. Do
need to recalibrate to each talker 共Mullennix et al., 1989;
significant changes require hours, minutes, or only seconds
of exposure? Our study addressed this question by recording Sommers et al., 1994兲.
on-line changes in the ease of processing non-native speech However, recent findings indicate that many nonlinguis-
during the first few moments of exposure. A principal as- tic aspects of speech are not discarded, as the normalization
sumption was that even a short sample of foreign-accented hypothesis implies, but are retained and affect later speech
speech contains phonological regularities that the listener’s processing 共for a review, see Pisoni, 1997兲. For example,
speech processing system could exploit. Knowledge of these Palmeri et al. 共1993兲 reported faster recognition for previ-
regularities should lead to more efficient decoding of the ously presented words when the second presentation was in
speech signal, and therefore to improved processing effi- the same voice rather than a new voice. This should not
ciency after exposure to only a few accented utterances. occur if specific voice characteristics are discarded during
Traditionally, variability in the acoustic realization of speech perception. Consequently, many researchers have ar-
words and phonemes has been viewed as noise and an im- gued that some forms of variability should be reconceptual-
pediment to language perception. It was assumed that in the ized as useful information rather than noise, and that speech
initial stages of speech processing, a normalization mecha- perception models must incorporate variability as a funda-
nism strips away ‘‘nonlinguistic’’ aspects of the signal 共such mental aspect of spoken word recognition 共Luce and McLen-
as those due to vocal tract characteristics兲, revealing invari- nan, in press; Nygaard and Pisoni, 1995; Pisoni, 1997兲.
ant acoustic cues to phonetic identity 共Liberman et al., 1967; Against this background, foreign accent presents a use-
Shankweiler et al., 1977兲. In compensating for differences in ful kind of variability for experimentation. Non-native
speech contains multiple departures from native norms that
a兲
This work contributed to the first author’s doctoral dissertation and was can cause perceptual difficulty. These include deviations
conducted under the direction of the second author. Portions of this work from native phoneme prototypes as well as non-native pho-
were presented in ‘‘Perceptual adjustment to foreign-accented English with
short term exposure,’’ in Proceedings of the 7th International Conference netic context rules, syllable structure, and prosodic patterns.
on Spoken Language Processing, Denver, CO, 2002, edited by J. H. L. Foreign-accented speech thus requires adaptation to a range
Hansen and B. Pellom, pp. 253–256. of variability that supplements previous perception research
b兲
Present address: Department of Psychology, University at Buffalo, State
University of New York, Buffalo, NY 14260. Electronic-mail: on variability due to vocal tract characteristics, phonetic con-
cclarke2@buffalo.edu text, and speaking rate.

J. Acoust. Soc. Am. 116 (6), December 2004 0001-4966/2004/116(6)/3647/12/$20.00 © 2004 Acoustical Society of America 3647
This is not to say that the types of phonetic variability repetition accuracy for participants who had earlier repeated
observed in accented speech do not occur in native speech. a set of accented sentences. Because this improvement was
We assume here that the variations of accented speech can be attained with novel words, the researchers concluded that the
considered an extreme form of that seen among native speak- listeners had developed ‘‘phonological perceptual rules’’ 共p.
ers 共Nygaard and Pisoni, 1998兲. It might be supposed that 339兲—rules for translating the acoustic-phonetic input for a
adapting to a new native voice involves accommodation of particular accent to the native representations of the intended
new acoustic speech characteristics, such as voice quality, phonemes.
while adapting to an accented voice requires accommodation Several recent studies have replicated the findings of
of new phonetic characteristics. However, it has been shown perceptual improvement following exposure to accented
that familiarity with a native voice is based not only on voice speech. Clarke 共2000兲 trained native English listeners over
quality but also on pronunciation idiosyncrasies, i.e., pho- three sessions to recognize either several Spanish-accented
netic patterns 共Remez et al., 1997兲 typically due to dialect or voices or several Chinese-accented voices. In a subsequent
idiolect. Therefore, it is plausible that similar mechanisms word transcription test that included one of the Spanish-
are at work in coping with differences among native speakers accented voices and one of the Chinese-accented voices from
and with differences between native and accented speakers. training, listeners were more accurate with the accented
The advantage in studying accented speech is that these pro- voice they were trained with than the other accented voice.
cesses are exaggerated, whereas they typically occur too In a similar study, Weil 共2001兲 exposed listeners to a
quickly for observation with native speech. Marathi-accented voice using word and sentence transcrip-
Lane 共1963兲 seems to have been the first to establish that tion tasks over four training sessions. In a following tran-
word identification is poorer for accented than for native scription test, trained listeners were more accurate with the
speech. He found that word identification accuracy for Marathi-accented voice than were untrained listeners. Fi-
Serbian-, Japanese-, and Punjabi-accented English was ap- nally, Bradlow and Bent 共2003兲 also found a beneficial effect
proximately 36% lower than for native speech in all signal- of familiarity with a Chinese-accented voice in a sentence
to-noise ratio and filtering conditions. More recent work has transcription test after two days of transcription training.
shown lower intelligibility for a Mandarin accent in second In sum, there is clear evidence of perceptual learning for
language 共L2兲 learners of English 共Munro and Derwing, a variety of nonlinguistic speech characteristics—aspects of
1995a, b兲, and an English accent in L2 learners of Dutch speech once thought of as noise to be discarded during lan-
共van Wijngaarden, 2001兲. van Wijngaarden estimated that the guage processing. This information is retained and used in a
reduction in intelligibility for the speech of fluent non- beneficial way. Furthermore, the benefits are integral to the
natives was equivalent to lowering native speech by 3 to 4 speech system: In various cases it has been shown that task
decibels 共dB兲. improvement did not rely on the use of semantic context
In addition to poorer word identification, accented 共McGarr, 1983兲, becoming accustomed to odd physical char-
speech can slow perceptual processing. Schmid and Yeni- acteristics of the speech 共Greenspan et al., 1988兲, or memo-
Komshian 共1999兲 showed that mispronunciations were de- rizing alternative pronunciations for individual lexical items
tected more slowly for Spanish- and Tamil-accented speech. 共Nygaard et al., 1994; Wingstedt and Schulman, 1987兲.
And, in a speeded true/false sentence verification task, Mu- Rather, the benefits seem to result from adjustments at a
nro and Derwing 共1995b兲 found that listeners were slower to relatively early level of processing, probably the phonologi-
verify sentences produced by a Mandarin-accented speaker, cal level 共Duffy and Pisoni, 1992兲.
even when they transcribed them correctly. Existing studies demonstrate benefits of several hours
However the perceptual system handles variability in experience with accented speech. But an unanswered ques-
speech, some kind of learning must be involved. Normaliza- tion is when perceptual learning of accented speech begins.
tion as a way to compensate for talker-specific characteristics Does the change in processing require prolonged experience,
must be supplemented or supplanted by mechanisms that re- or is the response more rapid? Knowledge of the timeline for
tain information about those characteristics and apply it to perceptual learning will constrain the possible mechanisms
later perceptual processing. Many studies have demonstrated underlying this learning. Previous research with talker-
the perceptual benefits of experience with various speech specific characteristics 共Ladefoged and Broadbent, 1957兲 and
characteristics, including those due to talker differences compressed speech 共Dupoux and Green, 1997兲 has shown
共Nygaard and Pisoni, 1998; Nygaard et al., 1994兲, speaking changes with exposure to as few as one and ten sentence-
rate 共Dupoux and Green, 1997兲, hearing impairment 共Mc- length utterances, respectively. This indicates that some form
Garr, 1983兲, and synthetic speech 共e.g., Greenspan et al., of adaptation occurs almost immediately. Moreover, subjec-
1988; for a review, see Duffy and Pisoni, 1992兲. tive experience suggests that adaptation to accented speech
The last few years have seen an increased interest in the also occurs quickly. Many listeners report that when first
perceptual learning of accented speech characteristics 共Brad- listening to a non-native speaker, comprehension can be dif-
low and Bent, 2003; Clarke, 2000; Weil, 2001兲. However, the ficult, but after a few moments of exposure they ‘‘catch on’’
first experiments investigating this topic were in the 1980s. to the non-native speech patterns, and comprehension im-
Gass and Varonis 共1984兲 reported improved sentence tran- proves. These subjective reports might reflect an increased
scription accuracy following exposure to a story spoken by a reliance on utterance or situational context to interpret unin-
non-native speaker. Wingstedt and Schulman 共1987兲 created telligible words, or on rapid perceptual learning of phono-
a ‘‘fake’’ accent in Swedish speech and found higher word logical patterns, or both. The focus of the research reported

3648 J. Acoust. Soc. Am., Vol. 116, No. 6, December 2004 C. M. Clarke and M. F. Garrett: Rapid adaptation to accent
here was the learning of phonological patterns specific to the speaking parents. However, 31 participants had other family
accented speaker. We attempted to rule out other sources of members or close friends with a Spanish accent. Data from
improvement, such as the use of semantic context or general one additional participant were excluded because the partici-
increases in effort or attention. Thus, a finding of improved pant exceeded a 20% error rate. Participants were randomly
processing efficiency would suggest adaptive flexibility assigned to the three conditions 共16 each in accent, control,
within the phonological processing system itself. and no accent兲.
We used a cross-modal matching task with a reaction
time measure to track processing efficiency during the course 2. Design
of exposure to accented voices. Sentence-length English ut-
terances were presented, each ending in a key word not pre- Four blocks of four auditory sentences were presented to
dictable from the semantic context. Immediately following each group with no breaks between blocks. In each block,
each auditory sentence, a visual probe word appeared on a two sentences were followed by matching visual probe
computer screen. The task was a speeded yes/no response words 共yes trials兲, and two were followed by mismatching
indicating whether the probe word matched the final word of visual probe words 共no trials兲. Four block orders were cre-
the sentence. This task had three advantages. First, using ated using a Latin square design such that, across partici-
sentences rather than isolated words more closely matches pants, every sentence was presented in every block position.
conversational language and allows for the influence of all Within each group, an equal number of listeners heard each
phonological aspects of accented speech, including interword of the four block orders.
phonetic context effects and prosodic patterns. Second, reac- Because between-participants RT comparisons were
tion time 共RT兲 may be a more sensitive measure of subtle necessary, and given the typically high RT variability among
changes in processing efficiency during adaptation. RT has participants, all experimental RTs were normalized according
already proven sensitive to the perceptual difficulty caused to a separate measure of each participant’s speed at the task:
by accented speech even when intelligibility is high 共Munro Following the experimental trials, all listeners received eight
and Derwing, 1995b兲. Third, processing speed is sampled at baseline trials. The sentences in these trials were novel and
the end of every sentence, approximately once every 2 s of were produced by a different nonaccented speaker. The mean
speech, providing a relatively fine-grained temporal resolu- RT for these trials was subtracted from the experimental RTs
tion of the effects of perceptual learning. for each subject, and the difference RTs were the primary
dependent measure of processing speed.1
II. EXPERIMENT 1
This experiment looked for changes in processing effi- 3. Stimulus materials
ciency arising from a short period of exposure to accented We used 32 low probability 共LP兲 sentences from the
speech 共under one minute兲. Three stimulus conditions were Revised Speech Perception In Noise 共SPIN-R兲 test 共Bilger,
tested: 16 sentences produced by a native speaker of Spanish 1984; Kalikow et al., 1977兲. In LP sentences, the final word
with a moderate accent 共accent condition兲; 12 sentences pro- is not predictable from the meaning of the sentence 共e.g.,
duced by a native speaker of English, followed by four sen- Ruth must have known about the pie.兲. The 16 experimental
tences produced by the Spanish-accented speaker 共control sentences were recorded by a female native speaker of
condition兲; and 16 sentences produced by the native speaker American English 共age: 31兲 and by a female native speaker
of English 共no accent condition兲. of Mexican Spanish 共age: 45; age of English acquisition: 30兲.
It was predicted that the accent group would be slower In an earlier study, the native Spanish speaker’s accent was
than the control and no accent groups at the beginning of given a mean rating of 6.1 on a scale of 1 共no foreign accent兲
exposure, but as the experiment progressed the accent to 9 共strong foreign accent兲. The eight practice and eight
groups’ RT would decrease due to adaptation. For the last baseline sentences were recorded by another female native
four sentences, the accent group was expected to be faster speaker of American English 共age: 19兲.
than the control group because the control group would not Sentences were tape recorded in a quiet room using an
have had the previous 12 sentences to adapt to the accented Electro-Voice RE16 directional microphone, a DBX 760X
speech. This controlled for the possibility that the accent microphone preamplifier, and a Tascam 122 mkII tape deck.
group’s improvement was due solely to practice with the Stimuli were digitized 共Macintosh PowerPC 8100; 22.05
task. Finally, the no accent condition provided a comparison kHz, 16 bits兲, and each sentence was copied into its own file
for the level of processing efficiency the accent group with silence at the beginning and end trimmed. Each stimu-
reached by the fourth block 共i.e., to see whether they reached lus file was peak normalized to 90% of maximum amplitude
native-speech processing efficiency兲. resolution.
Because processing efficiency was measured by percep-
A. Method
tual identification of the final word of each sentence 共the
1. Participants target word兲, the characteristics of these words were care-
Forty-eight University of Arizona undergraduates 共37 fe- fully controlled. Target words in the experimental sentences
males, 11 males兲 participated and were paid or given partial were familiar, monosyllabic nouns with a mean frequency of
course credit. All were native speakers of American English 22 per million 共Kucera and Francis, 1967兲. In both the ac-
and reported no hearing problems at the time of testing. They cented and nonaccented productions, they were each cor-
were not fluent in Spanish and did not have native Spanish- rectly identified in isolation by more than 70% of listeners in

J. Acoust. Soc. Am., Vol. 116, No. 6, December 2004 C. M. Clarke and M. F. Garrett: Rapid adaptation to accent 3649
TABLE I. Mean percent error 共untransformed; and standard deviations兲 by experimental block according to
voice condition for experiments 1–3.

Block

Condition 1 2 3 4

Experiment 1
Accent 14.06 共15.73兲 4.69 共10.08兲 10.94 共15.73兲 4.69 共10.08兲
Control 3.13 共8.54兲 1.56 共6.25兲 0.00 共0.00兲 9.38 共15.48兲
No accent 1.56 共6.25兲 1.56 共6.25兲 6.25 共11.18兲 0.00 共0.00兲

Experiment 2
Accent 6.25 共11.18兲 4.69 共13.60兲 6.25 共14.43兲 6.25 共14.43兲
Control in noise 14.06 共15.73兲 10.94 共12.81兲 3.13 共8.54兲 4.69 共10.08兲

Experiment 3
Accent 1.79 共5.25兲 2.38 共5.94兲 7.14 共11.50兲 3.57 共6.96兲
Control in clear 2.38 共5.94兲 2.38 共5.94兲 1.19 共4.37兲 4.17 共7.35兲
Control in noise 11.90 共12.71兲 13.10 共14.60兲 11.90 共12.71兲 4.17 共7.35兲

a prior word intelligibility experiment (M accented⫽92.4%; followed immediately by a probe word presented in capital
M nonaccented⫽95.8%).2 Target words were never repeated letters on the computer screen. The probe remained until the
within the experiment. listener pressed the yes or the no button. Accuracy and reac-
To control for differences in duration, the accented and tion time feedback were provided on the computer screen
nonaccented productions of each target word were digitally after each response. RT measurement began at probe word
compressed or lengthened so they both equaled the mean onset. If a response did not occur within 4 s, it was recorded
duration of the two original productions.3 The accented as no response. Participants pressed a foot pedal to begin the
words were lengthened by a mean of 15% 共range: ⫹5% to next trial.
⫹28%兲, and the nonaccented words were compressed by a
mean of 11% 共range: ⫺5% to ⫺18%兲. Stimuli were not re-
screened for intelligibility, but the duration manipulations B. Results and discussion
were minimal and produced little to no distortion. The mean
durations of the preceding portion of each sentence were For all analyses, the between-participants block order
similar for the two voices (M accented⫽1.29 s; M nonaccented counterbalancing variable was included in the analysis of
⫽1.36 s) and were not altered. variance 共ANOVA兲 to remove variance due to counterbal-
For half of the experimental trials, the visual probe word ancing. If there was no effect involving counterbalancing
did not match the target word but was a phonetic neighbor group by a conservative criterion of p⬍0.25 共see Pollatsek
differing by one phoneme in either the onset 共one case兲, and Well, 1995兲, the analysis was performed with groups
vowel 共four cases兲, or coda 共three cases兲 position. These collapsed over this variable. To avoid confusion with the
words were also monosyllabic English nouns and were simi- experimental group variable, the counterbalancing group
lar to the target words in mean frequency 共20 per million兲. variable will be denoted by cbg. We used an alpha level of
0.05 for all analyses, and the modified Bonferroni correction
共Keppel, 1982兲 to control alpha inflation in planned con-
4. Procedure trasts.
Listeners were tested individually in a quiet room in
front of a computer monitor and a two-button response box.
1. Errors
They were instructed to respond quickly and accurately and
were warned that at some time during the experiment the Incorrect and no response trials were counted as errors.
voice would change. This voice change occurred between Mean error rates for the experimental blocks are shown in
blocks 3 and 4 for the control condition, and between block Table I. Error data were first transformed to rationalized arc-
4 and the baseline trials for the accent and no accent condi- sine units 共RAU兲 to convert percent error to a linear and
tions. The experiment began with eight practice trials, fol- additive scale 共Studebaker, 1985兲. To test for differences
lowed by four experimental blocks 共16 trials兲, and then eight among conditions for the first three blocks, a 3 共group兲⫻3
baseline trials. Trials in the practice and baseline blocks were 共block: 1–3兲⫻4 共cbg兲 mixed design ANOVA was performed.
presented in the same order for every subject. In each experi- This revealed a significant effect of group, F(2,36)⫽5.68,
mental block, sentence order was randomized differently for no effect of block, F(2,72)⫽2.33, ns, and a marginal inter-
each subject. action, F(4,72)⫽2.16, p⫽0.08. Tamhane post hoc tests for
Auditory stimuli were presented over headphones at ap- unequal variances showed the accent group’s error rate was
proximately 73 dB共A兲 sound pressure level. Stimulus presen- higher than the control group’s, t(18)⫽3.07, and marginally
tation and response collection were controlled by an IBM higher than the no accent group’s, t(23)⫽2.30, p⫽0.09. Fi-
compatible computer using DMDX software 共Forster and nally, for block 4, a 3 共group兲⫻4 共cbg兲 ANOVA showed a
Forster, 2003兲. Each trial began with an auditory sentence main effect of group, F(2,36)⫽3.49. A Tamhane post hoc

3650 J. Acoust. Soc. Am., Vol. 116, No. 6, December 2004 C. M. Clarke and M. F. Garrett: Rapid adaptation to accent
faster than the control group (M ⫽111, SD⫽117.27), t(36)
⫽3.03, but did not differ from the no accent group (M
⫽21, SD⫽102.31), t(36)⫽0.34, ns. The control group lis-
teners, who had as much experience with the task, but no
prior experience with the accented voice, were significantly
slower than the accent group listeners in block 4 when pre-
sented with the accented voice. This indicates that the accent
group’s improvement across the four blocks cannot be ex-
plained by practice effects alone. The results also show no
difference between the accent and no accent groups in block
4, suggesting that after only 16 sentences of exposure, the
group hearing the accented speech was processing it as
quickly as the group hearing native speech.
Two effects in the first three blocks were also of interest:
共a兲 whether listeners were slower to respond to the Spanish-
FIG. 1. Experiment 1 mean difference reaction times 共experimental
⫺baseline兲 according to condition. Error bars indicate standard errors.
accented speech than to the native speech, and 共b兲 whether
the control and no accent groups’ RTs decreased as they were
exposed to the nonaccented voice. A 3 共group兲⫻3 共block:
test for unequal variances showed the control group’s error 1–3兲 mixed ANOVA showed a significant main effect of
rate was marginally higher than the no accent group’s, group, F(2,45)⫽4.91. Planned contrasts indicated that, for
t(15)⫽2.52, p⫽0.07. the first three blocks, the accent group (M ⫽106, SD
The higher error rate for the accent group in the first ⫽71.35) was slower than both the control group (M ⫽44,
three blocks was unexpected because the accented target SD⫽89.68), t(45)⫽2.18, and the no accent group (M
words had been screened for intelligibility. Although only ⫽20, SD⫽79.20), t(45)⫽3.04. The slower RT to accented
correct responses were used in the RT analyses, they may speech is consistent with previous findings of processing dif-
have included more guess responses, which do not reflect the ficulty with foreign-accented speech 共Munro and Derwing,
time course of accurate phonological analysis and may have 1995b兲. There was also a significant main effect of block,
inflated the RTs. We note this issue here and return to it in F(2,90)⫽17.94, but no group⫻block interaction, F(4,90)
experiment 3. ⫽1.20, ns, suggesting all groups improved across the first
three blocks.
2. Reaction time Overall, the results matched experimental predictions.
The following data treatment applied to this and all sub- Listeners were initially slower to respond to the Spanish-
sequent RT analyses. Only RTs from correct responses were accented speech, but this difficulty decreased after a rela-
analyzed. RTs less than 200 ms or greater than 2000 ms were tively brief period of exposure. At the end of 16 sentences,
excluded, and RTs beyond two standard deviations above or mean RT had decreased by over 150 ms and was almost
below a given participant’s mean for experimental and base- identical to the no accent group’s RT to native speech. Given
line trials were replaced with that cutoff value. Each partici- that each sentence was approximately 2 seconds long, this
pant’s mean baseline RT was subtracted from that individu- exposure comprised less than 1 min of speech. Although
al’s experimental block mean RTs. These difference RTs practice effects likely played some part in the RT change
were used as the dependent measure. across blocks, as indicated by the improvement in the first
Mean difference RTs for the experimental blocks are three blocks for the control and no accent groups, it cannot
shown in Fig. 1. RTs for the accent group decreased by over explain the entire effect for the accent group.
150 ms from block 1 to block 4. This was statistically sig-
III. EXPERIMENT 2
nificant by a one-way ANOVA for the accent condition with
block as a repeated measures variable, F(3,45)⫽13.24. The control condition in experiment 1 ruled out practice
Planned contrasts indicated RT decreased significantly be- effects as an account of the accent group’s improvement over
tween block 1 (M ⫽178, SD⫽109.62) and block 2 (M the four experimental blocks. However, another possible rea-
⫽80, SD⫽97.41), t(15)⫽3.24, and between block 3 (M son for that improvement is that the listeners in the accent
⫽61, SD⫽77.66) and block 4 (M ⫽10, SD⫽65.34), t(15) condition developed strategies for understanding difficult
⫽2.52. This decrease in RT is consistent with the hypothesis speech, such as depending on the fact that the final word was
that processing efficiency increases with brief exposure to always a noun, or putting more effort or attention into the
accented speech. task.
To assess the possibility that practice with the task could Experiment 2 was conducted to evaluate these explana-
account for the RT change, the accent and control groups’ tions. Noise was added to the nonaccented sentences in the
RTs for block 4 were compared. A 3 共group兲⫻4 共cbg兲 control condition to make them more difficult to understand.
ANOVA included this as well as the comparison of the ac- A signal-to-noise ratio 共SNR兲 was chosen such that overall
cent and no accent conditions. A significant effect of group RTs to the nonaccented speech in noise were similar to RTs
was found, F(2,36)⫽5.50, and planned contrasts showed to the accented speech in the clear. Under these new condi-
that the accent group (M ⫽10, SD⫽65.34) was significantly tions, results for block 4 would remain the same as for ex-

J. Acoust. Soc. Am., Vol. 116, No. 6, December 2004 C. M. Clarke and M. F. Garrett: Rapid adaptation to accent 3651
periment 1 if the accent group is learning something specific
about how to process accented speech during exposure,
rather than learning general strategies or increasing effort. A
no accent condition was not included.
A. Method
1. Participants
Thirty-two University of Arizona undergraduates 共22 fe-
males, 10 males兲 participated and were paid or given partial
course credit. All were native speakers of American English
and reported no hearing problems at the time of testing. They
were not fluent in Spanish and did not have native Spanish-
speaking parents. However, 11 participants had other family
members or close friends with a Spanish accent. Data from
one additional participant were excluded because the partici-
FIG. 2. Experiment 2 mean difference reaction times 共experimental
pant exceeded a 20% error rate. Participants were randomly
⫺baseline兲 according to condition. Error bars indicate standard errors.
assigned to the two conditions 共16 each to accent and control
in noise兲.
significant by a 4 共block兲⫻4 共cbg兲 mixed ANOVA for the
2. Stimulus materials
accent condition, F(3,36)⫽8.83. Planned contrasts showed
Materials were identical to those in experiment 1. Using that RT decreased significantly between block 1 (M ⫽223,
Cool Edit 96 wave-editing software 共Syntrillium Software, SD⫽165.55) and block 4 (M ⫽36, SD⫽157.73), t(12)
Phoenix, AZ兲, the amplitude of the nonaccented sentences
⫽3.71, and between block 3 (M ⫽128, SD⫽141.44) and
was reduced to a mean of approximately 65 dB at the final
block 4, t(12)⫽3.18. Finally, a 2 共group兲⫻4 共cbg兲 ANOVA
word, and pink noise at 61 dB was added. The resulting
comparing the two voice conditions in block 4 showed that
mean SNR at the final word was approximately ⫹4 dB.
the accent group (M ⫽36, SD⫽157.73) was significantly
3. Procedure faster than the control in noise group (M ⫽143, SD
⫽121.69), F(1,24)⫽5.14.
Experimental procedures and block counterbalancing
Experiment 2 replicated the effect of increased process-
were identical to experiment 1, except that listeners in the
ing efficiency with short-term exposure to accented speech.
control group were warned that the first several sentences
were in noise. RTs for the accent group again decreased across the four
blocks, and were faster than those of the control in noise
B. Results and discussion group in the last block of the experiment. Because the con-
ditions were equated for difficulty, the assumption that the
1. Errors
accent group developed general strategies for coping with
Mean error rates for the experimental blocks are shown difficult speech would also apply to the control group. Yet
in Table I. Error rates for blocks 1–3 were higher for the the control listeners still took longer to process the accented
control in noise group 共9.38%兲 than for the accent group speech in block 4.
共5.73%兲. However, a 2 共group兲⫻3 共block: 1–3兲⫻4 共cbg兲
An important question regarding rapid adaptation to ac-
mixed ANOVA on the transformed error percentages 共in
cented speech is the effect of previous long-term experience
RAUs兲 revealed only a marginal effect of group, F(1,24)
with the accent in question. As noted earlier, a large propor-
⫽3.75, p⫽0.06. A one-way ANOVA on block 4 showed no
tion of listeners in experiments 1 and 2 had close family
significant difference between groups, F⬍1.
members or friends with a Spanish accent. In order to deter-
2. Reaction time mine whether the adaptation effect differed based on such
experience, the data from the accent groups in the two ex-
Mean difference RTs for the experimental blocks are
periments were combined, and the high experience listeners
shown in Fig. 2. To test whether the noise made the control
(n⫽15) compared with the low experience listeners (n
condition more difficult, a 2 共group兲⫻3 共blocks: 1–3兲⫻4
共cbg兲 mixed ANOVA 共Huynh–Feldt corrected because the ⫽17). High experience listeners were defined as stated
assumption of sphericity was violated兲 was performed on the above, and low experience listeners were those who reported
difference RTs for the first three blocks. There was no main either never having personally known anyone with a Spanish
effect of group, F⬍1, suggesting that the two conditions accent or only having Spanish-accented acquaintances with
were equated in difficulty for effects on RT. There was a whom they spent little time. Inspection of the data in Table II
significant main effect of block, F(2,48)⫽5.63, but no group shows both subgroups improved across the four blocks. Fur-
⫻block interaction, F⬍1, showing that both group’s RTs ther, in block 1 the low experience subgroup was over 80 ms
decreased equally across the first three blocks. slower than the high experience subgroup, although this defi-
Mean RT for the accent group decreased by almost 200 cit was eliminated by block 4. But the trend indicating an
ms from block 1 to block 4. This change was statistically advantage for the high experience subgroup was not statisti-

3652 J. Acoust. Soc. Am., Vol. 116, No. 6, December 2004 C. M. Clarke and M. F. Garrett: Rapid adaptation to accent
TABLE II. Mean difference reaction times 共in ms; and standard deviations兲 whether this is because of the Chinese accent itself or the
for accent conditions according to experience with the accent 共subgroups are specific voice, which was rated as more accented than the
combined for experiments 1 and 2兲.
Spanish-accented voice in the previous accent judgment ex-
Block periment. Using six sentences per block was intended to in-
sure that we could detect adaptation to the Chinese-accented
Experience 1 2 3 4
voice.
Experiments 1 and 2 The control in clear condition was included to test once
Low 239 共137.16兲 146共149.90兲 103共106.97兲 17 共89.59兲
again the prediction that RTs would initially be slower for
High 157 共134.30兲 107共151.09兲 84共130.96兲 29共149.47兲
accented speech than for nonaccented speech. A retest of this
Experiment 3 prediction was needed because we discovered that the intel-
Low 143 共158.21兲 70共124.72兲 30共110.91兲 11共110.64兲 ligibility of the Spanish-accented target words in experi-
ments 1 and 2 may have been lower than intended: As noted
earlier, the accent group had a higher error rate than the other
cally reliable. A 2 共accent experience兲⫻4 共block兲⫻4 共cbg兲 groups in experiment 1. We also conducted a second intelli-
mixed ANOVA 共Huynh–Feldt corrected for nonsphericity兲 gibility screening in which the target words were presented
revealed a main effect of block, F(2.94,70.54)⫽17.32, but in the sentence context used in the experiment 共rather than in
no effect of accent experience, F⬍1, or interaction, isolation兲. With this method, several of the accented words
F(2.94,70.54)⫽1.40, ns. used in experiments 1 and 2 had intelligibility rates less than
Although previous accent experience was not of central 70%. Low intelligibility rates raise the possibility that, in
interest in this study, the subgroup analysis suggests that it experiments 1 and 2, the accent groups’ RTs included several
may affect how efficiently accented speech is perceptually guess responses, which might be responsible for the higher
processed, at least initially. However, the important result is mean RTs rather than the hypothesized slower processing of
that even those listeners with little prior experience with a intelligible accented words. For the present experiment, only
Spanish accent showed faster RTs with brief exposure. This target words that had greater than 70% intelligibility in the
indicates the processing improvement is due to on-line learn- experimental sentence context were selected.
ing, rather than previous knowledge. Experiment 3 further We predicted that RT would decrease across the four
investigates this hypothesis by testing adaptation effects for a blocks for the accent condition and that, in the last block, the
less familiar accent. accent group would be faster than both control groups. This
was based on the assumption that the effects of short-term
IV. EXPERIMENT 3 experience found in experiments 1 and 2 were not due to
Experiment 3 explored whether the effect of rapid adap- accessing a stored template for interpreting Spanish-accented
tation would be found with another and less familiar accent, speech, but instead resulted from on-line learning of phono-
specifically a Chinese accent. This accent is less likely to be logical patterns.
encountered by the participant population 共living in Tucson, A. Method
AZ兲 than the Spanish accent. It is possible that listeners can 1. Participants
quickly adapt to accented speech only if they are already
familiar with the accent. Otherwise, adaptation may require Eighty-four University of Arizona undergraduates 共44
more experience 共possibly on the order of hours as in previ- females, 40 males兲 participated and received partial course
ous training studies兲. If adaptation to an unfamiliar accent credit. All were native speakers of American English who
requires relatively long exposure, it would suggest that the reported no hearing disorders at the time of testing. They
adaptation effects found in experiments 1 and 2 were due to were not fluent in Chinese and did not have native Chinese-
a quick accessing and application of stored knowledge of a speaking parents. However, 20 participants had other family
generally familiar accent, rather than an on-line adaptation members or close friends with a Chinese accent. Data from
process. one additional participant were excluded because the partici-
Three conditions were tested: an accent condition, in pant reported a strategy of not looking at the screen until
which all four sentence blocks were produced by a Chinese- after each sentence ended. Participants were randomly as-
accented speaker, a control in clear condition 共similar to the signed to the three conditions 共28 each to accent, control in
control condition in experiment 1兲, and a control in noise clear, and control in noise兲.
condition 共similar to the control in noise condition in experi-
ment 2兲. In contrast to the first two experiments, 24 experi- 2. Stimulus materials
mental sentences 共six per block兲 were used rather than 16 A new set of 40 LP sentences was chosen from the Re-
共four per block兲 because a pilot study showed that RTs to the vised SPIN Test 共Bilger, 1984; Kalikow et al., 1977兲. The 24
Chinese-accented voice were slower overall than to the experimental sentences 共six per block兲 were selected based
Spanish-accented voice of the previous two experiments. on their intelligibility when produced by the Chinese- and
Also, although RTs became faster across the four blocks in nonaccented voices. Each of the sentence final words was
the pilot experiment, they were not significantly faster on the identified at greater than 70% accuracy in the LP sentence
fourth block than those of the control in noise group, sug- context by a separate group of listeners (M accented⫽94.32;
gesting that listeners might need more exposure to the M nonaccented⫽99.16).4 The target words from the experimen-
Chinese-accented speech to fully adapt to it. It is not clear tal sentences were familiar, monosyllabic nouns with a mean

J. Acoust. Soc. Am., Vol. 116, No. 6, December 2004 C. M. Clarke and M. F. Garrett: Rapid adaptation to accent 3653
frequency of 20 per million 共Kucera and Francis, 1967兲. Tar-
get words were never repeated within the experiment.
The experimental sentences were recorded by the same
female native speaker of American English as in experiments
1 and 2 and by a female native speaker of Mandarin Chinese
共age: 24, age of English acquisition: 12兲. The Chinese speak-
er’s accent was given a mean rating of 7.6 on a scale from 1
共no foreign accent兲 to 9 共strong foreign accent兲 in an earlier
study. The practice and baseline sentences were produced by
the same female native speaker as in experiments 1 and 2.
The Chinese-accented sentences were recorded according to
the procedures described in experiment 1. In order to match
the nonaccented sentences with the highly intelligible
Chinese-accented sentences, the native speaker of American
English recorded additional sentences. These were recorded
FIG. 3. Experiment 3 mean difference reaction times 共experimental
onto CD in a WhisperRoom sound isolation booth using a
⫺baseline兲 according to condition. For blocks 1–3, each mean is based on
Shure SM57 Dynamic microphone, a Symetrix 302 micro- six trials. For block 4, means are based on two, four, and six trials. Error
phone preamplifier, and an Alesis ML-9600 disc recorder bars indicate standard errors.
共44.1 kHz, 16 bits兲.5 The stimulus file was downsampled to
22.05 kHz, and each sentence was copied into its own file, B. Results and discussion
trimmed, and peak normalized to 90% of maximum ampli-
1. Errors
tude resolution.
In order to control for differences in duration, the ac- Mean error rates for the experimental blocks are shown
cented and nonaccented productions of each target word in Table I. A 3 共group兲⫻3 共block: 1–3兲⫻4 共cbg兲 mixed
were digitally compressed or lengthened so they both ANOVA on the percent error data in RAUs revealed a main
equaled the mean duration of the two original productions. effect of group, F(2,72)⫽45.86, and no effect of block or
The same procedure was used to equate the sentence material interaction. Planned contrasts showed a higher error rate for
up to the final word because this portion was produced ap- the control in noise group than for the accent group, t(72)
proximately 250 ms slower on average by the accented ⫽7.42, and no difference between the accent and control in
voice. For the accented voice, the final word durations were clear groups, t(72)⫽1.53, ns. A 3 共group兲⫻4 共cbg兲 ANOVA
modified by a mean of ⫹4% 共range: ⫺10% to ⫹19%兲, and on block 4 showed no significant difference among groups,
the precursor portions by a mean of ⫺7% 共range: ⫺16% to F⬍1.
⫹3%兲. For the nonaccented voice, the final word durations
were modified by a mean of ⫺3% 共range: ⫺14% to ⫹13%兲, 2. Reaction time
and the precursor portions by a mean of ⫹10% 共range: ⫺3% Mean RTs for the experimental blocks are shown in Fig.
to ⫹23%兲. Again, the duration manipulations produced little 3. To verify that the reduction in SNR in the control in noise
to no distortion, and the stimuli were not rescreened for in- condition equated the RTs in the first three blocks with those
telligibility. of the accent group, and to test whether listeners were slower
The mean sound level measured at the final word was to respond to the accented voice than to the native voice 共in
approximately 74 dB for the Chinese-accented sentences and the clear兲, a 3 共group兲⫻3 共blocks: 1–3兲⫻4 共cbg兲 mixed
approximately 69 dB for the nonaccented sentences in the ANOVA was performed. There were significant main effects
control in clear condition.6 For the control in noise condition, of block, F(2,144)⫽36.92, and group, F(2,72)⫽3.26, but
the mean amplitude was lowered to approximately 63 dB at no interaction, F⬍1. Planned contrasts showed that the ad-
the final word, and pink noise at 62 dB was added to each dition of noise in the control condition was successful in
file, for a mean SNR of ⫹1 dB. In a pilot study, this SNR equating RTs for the control in noise (M ⫽104, SD
resulted in RTs similar to those in the accent condition. Fi- ⫽119.59) and accent (M ⫽87, SD⫽137.42) conditions,
nally, for experimental trials in which the visual probe word t(72)⫽0.57, ns. In addition, the accent group was margin-
did not match the corresponding target word, the probes dif- ally slower than the control in clear group (M ⫽28, SD
fered from their targets by one phoneme in either the onset ⫽93.34), t(72)⫽1.87, p⫽0.06. The difference between the
共four cases兲, vowel 共four cases兲, or coda 共four cases兲 posi- accent and control in clear groups may have been smaller
tion, and were similar to the target words in mean frequency than expected because the accent group adapted to the
共21 per million兲. Chinese-accented voice within the 18 sentences of blocks
1–3. To equate this analysis with the corresponding analysis
in experiment 1, the group difference for blocks 1 and 2 only
3. Procedure
共12 sentences兲 was tested. For the first two blocks, the accent
Experimental procedures and counterbalancing were group (M ⫽115, SD⫽160.15) was significantly slower than
identical to the previous experiments except that experimen- the control in clear group (M ⫽45, SD⫽106.28), t(72)
tal blocks consisted of six sentences 共three yes trials, three no ⫽2.01. RTs to the Chinese-accented speech were initially
trials兲 instead of four. slower than to the native speech, despite the fact that the

3654 J. Acoust. Soc. Am., Vol. 116, No. 6, December 2004 C. M. Clarke and M. F. Garrett: Rapid adaptation to accent
intelligibility of the Chinese-accented speech was high, and
the error rates were similar for the accent and control in clear
conditions. The results are again consistent with Munro and
Derwing’s 共1995b兲 findings that accented speech is pro-
cessed more slowly than native speech even when it is highly
intelligible.
RTs for the accent group decreased by just over 130 ms
from block 1 to block 4. This change was statistically sig-
nificant by a 4 共block兲⫻4 共cbg兲 mixed ANOVA 共Huynh–
Feldt corrected for nonsphericity兲, F(2.53,60.71)⫽16.23.
Planned contrasts showed that RT decreased significantly be-
tween block 1 (M ⫽151, SD⫽181.45) and block 2 (M
⫽79, SD⫽148.30), t(24)⫽4.47, and between block 2 and
block 3 (M ⫽30, SD⫽119.60), t(24)⫽2.43. Listeners re-
sponded faster to the Chinese-accented voice over the course
of 24 sentences.
A one-way ANOVA on block 4 RTs tested whether the
accent group was faster than the control groups. A significant
effect of group was found, F(2,81)⫽5.82, and planned con-
trasts showed the accent group (M ⫽20, SD⫽109.60) was
significantly faster than the control in clear group (M
⫽118, SD⫽105.39), t(81)⫽3.39, but was not significantly
faster than the control in noise group (M ⫽60, SD
⫽111.86), t(81)⫽1.39, ns. The predicted advantage for the
accent group in block 4 was borne out in the comparison
with the control in clear group but, unexpectedly, not in the
comparison with the control in noise group.
One difference between this experiment and the previ-
ous experiments was that six sentences were included in
block 4, rather than four. Given that the accent groups’ RTs
decreased significantly after only a few sentences, specifi- FIG. 4. Single-trial means of difference reaction times 共experimental
cally from block 1 to block 2, in both experiment 1 and the ⫺baseline兲 in blocks 1 共top兲 and 4 共bottom兲 of experiment 3.
current experiment, the control groups in the current experi-
ment may have adapted to the Chinese-accented speech control in noise group (M ⫽125, SD⫽164.94), t(80)
within block 4. If so, averaging across more sentences in ⫽2.38. When only the first two sentences of block 4 were
block 4 would attenuate the difference between the control considered, the accent group was significantly faster than
and accent conditions. To investigate this possibility, the both control groups.
mean RTs for block 4 were recomputed based on only the If adaptation could occur for the control in noise group
first four sentences, making them equivalent to the means in within only a few sentences of exposure to the accented
the first two experiments. A one-way ANOVA on the four- voice, the same pattern might also be seen for the accent
trial means for block 4 revealed an effect of group, group in the first block, when they were first exposed to the
F(2,81)⫽4.29. Planned contrasts again showed significantly accented voice. Examination of the mean RTs for each sen-
faster times for the accent group (M ⫽19, SD⫽129.02) than tence in block 1, shown in the top graph of Fig. 4, indeed
for the control in clear group (M ⫽118, SD⫽122.85), suggests a similar pattern. The accent group appears to im-
t(81)⫽2.87. In addition, the accent group was now margin- prove dramatically within the first three or four sentences.
ally faster than the control in noise group (M ⫽86, SD Experiment 3 indicates that the adaptation effect found
⫽135.54), t(81)⫽1.94, p⫽0.056. in experiments 1 and 2 for Spanish-accented speech also oc-
Inspection of the mean RTs for each sentence in block 4 curs during perception of a Chinese accent. The population
provides further insight into the situation. As shown in the sampled in this experiment have generally less opportunity
bottom graph of Fig. 4, RTs for the accent and control in to hear Chinese-accented speech in the ambient environment,
clear groups are relatively constant throughout the block, but in contrast to Spanish-accented speech, and few 共24%兲 re-
the control in noise group seems to have adapted to the ported having a close friend or family member with a Chi-
Chinese-accented voice within two or three sentences. Con- nese accent. To determine whether the adaptation effect in
sistent with this hypothesis, a one-way ANOVA for means experiment 3 occurred for those listeners least familiar with
based on the first two sentences of block 4 again showed a the accent, a reanalysis was performed including only par-
significant effect of group, F(2,80)⫽3.99, and planned con- ticipants who reported either never having known anyone
trasts now showed that the accent group (M ⫽15, SD with a Chinese accent or only having Chinese-accented ac-
⫽142.95) was significantly faster than both the control in quaintances with whom they spent little time 共accent: n
clear group (M ⫽132, SD⫽207.43), t(80)⫽2.50, and the ⫽21; control in clear: n⫽21; control in noise: n⫽22; see

J. Acoust. Soc. Am., Vol. 116, No. 6, December 2004 C. M. Clarke and M. F. Garrett: Rapid adaptation to accent 3655
TABLE III. Mean difference reaction times 共in ms; and standard deviations兲 共experiments 2 and 3兲, RT significantly decreased during
in block 4 based on two-trial means. 关n⫽number of participants. Means in block 4. It is not clear why the control in clear groups did not
the same row with different subscripts differ significantly 共see text for type
of analysis for each experiment兲.兴 adapt to the accented voice within the fourth block as the
control in noise groups did. One possibility relates to task
Block 4 trials difficulty. Because the first three blocks were likely easier for
Condition n 1&2 3&4 5&6 the control in clear groups, they may have been putting less
effort into the task. When the accented voice was introduced,
Accent
Experiment 1 16 ⫺17 共59.27兲 47 共131.23兲
they may have been caught off guard by the new difficulty.
Experiment 2 15 78 共154.34兲 21 共197.55兲 In contrast, listeners in the control in noise conditions may
Experiment 3 28 15 共142.95兲 25 共148.76兲 19 共131.42兲 have already been making substantial effort and were readier
Control in clear to deal with the new perceptual challenge. Although this in-
Experiment 1 16 114 共142.70兲 118 共165.34兲 terpretation is post hoc, it suggests a role for attention in the
Experiment 3 27 132 共207.43兲 130 共171.28兲 137 共169.39兲
Control in noise
process of adaptation to accented speech and is consistent
Experiment 2 16 203a (184.23) 90b (120.29) with evidence that normalization to native voice characteris-
Experiment 3 28 125a (164.94) 48b (152.88) 10b (124.69) tics consumes processing capacity 共e.g., Sommers et al.,
1994; for a review, see Nusbaum and Magnuson, 1997兲. On
a methodological note, this finding also shows that measures
Table II兲. The same pattern of effects was found: the accent of processing speed over time must be fine-grained enough
group sped up across the four blocks, F(2.89,49.08)⫽8.78 to detect very rapid changes.
共with Huynh–Feldt correction for nonsphericity兲, the control
in clear group was slower than the accent group in block 4, V. GENERAL DISCUSSION
t(61)⫽3.22, and the control in noise group was slower than
the accent group when the first two trials of block 4 were This study demonstrated that listeners adapt very
considered, t(61)⫽2.03. quickly to accented speech. Initial processing speed is slower
The analysis of only the less experienced listeners for accented speech, but in all three experiments this deficit
should be interpreted somewhat cautiously because the block attenuated with less than one minute of experience. In some
order counterbalancing was not maintained when the more circumstances, adaptation required exposure to only two to
experienced participants were excluded. Nevertheless, it in- four sentence-length utterances. The inclusion of control
dicates that the effect was not specific to the experienced groups discounted the possibility that the effect was purely
listeners alone. The findings with Chinese-accented speech due to practice with the task or general strategies for han-
are consistent with the idea that processing improves due to dling difficult speech. Further, the extension of the adapta-
on-line perceptual learning of phonological patterns, rather tion effect to a less familiar accent is consistent with the
than a precompiled phonological template. However, Chi- hypothesis that the listeners learned the characteristics of the
nese accents are not entirely novel to this participant popu- accented speech on-line.
lation, so this possibility cannot be completely ruled out. One limitation of this study is that, in the control condi-
The control in noise group’s rapid RT decrease within tions, there was a change in voice as well as a change in
block 4 indicates that adaptation to accented speech may accent, whereas the accent conditions had the same voice
occur even more quickly than the initial results of this study throughout. This confound is difficult to avoid. Ideally the
suggested. To test this more sharply, post hoc analyses were same speaker would produce both the native and accented
done for block 4 RT data from all three experiments using samples, but it is questionable whether one speaker can pro-
two-trial means 共see Table III兲. For experiment 1, t-tests duce both ‘‘dialects’’ authentically. It is possible that a
showed that there was no change in RT from the first two change in voice alone, apart from a change in accent, could
trials to the second two trials of block 4 for either the accent induce a normalization process that could account for the
or control 共in clear兲 conditions, t(15)⫽1.76accent and slower RTs for the control groups. The only estimate of the
0.09control , ns. For experiment 2, RTs for the accent group magnitude of RT effects due to voice change using natural
did not change, t(14)⫽1.32, ns, but the control in noise speech we are aware of is Mullennix et al.’s 共1989兲 naming
group got significantly faster from the first two to the second study. Mean naming latencies were 34 and 70 ms slower in
two trials of block 4, t(15)⫽2.63. In experiment 3, ANOVAs mixed-talker conditions than in single-talker conditions.
showed that neither the accent group’s nor the control in However, the usefulness of this estimate is questionable be-
clear group’s RTs changed during block 4, F’s⬍1. However, cause the stimuli included male and female voices, and nam-
the control in noise group’s RTs decreased significantly, ing times are not directly comparable to the RT data of the
F(2,54)⫽7.18. Planned contrasts indicated a significant de- present study. We see it as unlikely that a change from one
crease by the second two trials, t(27)⫽2.51. female voice to another would cause the approximately
The results are generally consistent. For the accent 100-ms slow-down found in this study. Nevertheless, effects
groups in all three experiments, RT did not change during of this factor should be investigated in future work. The
block 4, as expected given that listeners in this condition had voice change could also cause a more global ‘‘shock’’ reac-
already adapted to the accented voice. For the two control in tion, which might slow response times. Although listeners
clear conditions 共experiments 1 and 3兲, RT did not change were warned that the voice would change at some point dur-
during block 4, but for the two control in noise conditions ing the experiment, this explanation cannot be ruled out.

3656 J. Acoust. Soc. Am., Vol. 116, No. 6, December 2004 C. M. Clarke and M. F. Garrett: Rapid adaptation to accent
Another method for removing the confound might be to abstract linguistic units 关e.g., PARSYN 共Luce et al., 2000兲,
add a voice change to the accent condition by presenting a Shortlist 共Norris, 1994兲, TRACE 共McClelland and Elman,
different voice with the same accent in block 4. This experi- 1986兲兴.
ment, however, would test a subtly different question. A find- However, interspeaker variability includes more than
ing of generalization of learning from one voice to another differences in vowel space dimensions. In particular, it ex-
requires not only that the listener adjusts phonological pro- tends to higher levels of phonological representation, includ-
cessing to match the characteristics of the current speaker, ing phonetic context rules, syllable structure, and prosodic
but that those adjustments are sufficiently abstract to be ap- patterns. This is clearly true for accented speech, but also
plied in a new context 共taking voice as context兲. The ques- applies to idiolect and dialect differences among native
tion of generalization is very interesting, but we are currently speakers 共Klatt, 1988兲. Therefore, the traditional concept of
satisfied to establish only the first requirement. That is, it is speaker normalization should be expanded to include inters-
acceptable to us that the current data demonstrate only that peaker variability in complex phonological regularities, in
the listeners adapted to the specific accented voice they addition to simple acoustic properties, and must be integrated
heard. We suspect that adaptation to one voice would indeed as a critical and foundational aspect of spoken word recog-
generalize to a new voice. But adaptation to a single ac- nition.
cented voice is no less impressive even if generalization is Recent attention to perceptual learning in speech percep-
not found. The listener is still initially confronted with tion 共e.g., Norris et al., 2003; Pisoni, 1997兲 suggests some
speech patterns that deviate from native norms and adjusts promising lines for theory development. Theories that ac-
his or her perceptual processing to more effectively decode count for the adaptability of spoken word recognition have
them. potentially strong affinities with enriched conceptions of ba-
The findings of rapid adaptation to foreign-accented sic human learning capacities. A striking challenge in the
speech provide new evidence for the type of normalization integration of learning theories and language processing per-
first shown by Ladefoged and Broadbent 共1957兲, in which formance will be the need to accommodate such theory to
the acoustic-phonetic criteria for a vowel category were al- the rapidity with which effective change occurs. Further re-
tered based on the characteristics of a preceding sentence. search is needed to explore the characteristics of this remark-
We believe that a similar kind of ‘‘extrinsic’’ normalization able adaptability and uncover its boundary conditions. What
occurred in the present experiments 共Nearey, 1989兲. Speech are the consequences of rapid adaptation for long-term lin-
that deviated from native norms was evaluated more effi- guistic representations? Are the various aspects of speech
ciently when recent experience provided information about learned with a single learning mechanism or a variety? How
the systematic ways in which it departed from those norms. do the physical input and previous linguistic knowledge in-
A known problem for the extrinsic normalization hy- teract to drive adaptation? Answers to these and related ques-
pothesis is explaining how the initial segments or words are tions will form the foundation for better accounts of speech
correctly evaluated when there is no previous input upon perception as well as accounts of how the speech domain
which to calibrate. The approach suggested by Nearey relates to other cognitive capacities.
共1989兲 in his discussion of native speech is the same one
offered here. There are usually enough cues within a segment ACKNOWLEDGMENTS
or syllable that identification can be quite accurate even with This work was supported by an NSF Graduate Research
no previous experience with the speaker 共Shankweiler et al., Fellowship to CMC and by the Cognitive Science Program at
1977兲. This is probably largely true even for accented the University of Arizona. We are grateful to Georgine Sper-
speech. For example, within-category deviations from native anzo for assisting with data collection. We also thank Ken
phoneme prototypes can be identified and noted without any Forster, Rebecca Gomez, Mike Hammond, Paul Luce, David
additional reference information. In addition, there are likely Pisoni, and Natasha Warner for helpful discussion of the
certain kinds of information that can be extracted from the ideas presented here and Paul Luce for useful comments on
speech signal at a low level, such as general vocal tract pos- this manuscript.
ture and speech rhythm. Once some of these properties of the
speech are learned, they themselves can improve processing 1
To verify that the experimental groups did not differ systematically on the
efficiency as well as bootstrap the learning of more complex baseline measure, the baseline data in each of the three experiments were
phonological patterns. Finally, in most real world situations, analyzed. No significant differences among groups were found.
2
Four of the native English speaker’s target words were not included in the
higher level knowledge of the lexical, semantic, syntactic,
earlier word identification test. These missing values were replaced by her
and situational context can contribute to the perceptual learn- mean for the purpose of a paired-items t-test comparing the percent correct
ing of accented speech. identification for the two voices. The difference between voices was not
Rapid adaptation to foreign-accented speech is a clear statistically significant, t(15)⫽1.33, ns.
3
demonstration of the remarkable flexibility of spoken word Duration was manipulated using the pitch-synchronous overlap and add
共PSOLA兲 algorithm 共Moulines and Charpentier, 1990兲 provided in the Praat
recognition. Traditional models of spoken word recognition wave-editing program 共Paul Boersma and David Weenink, University of
do not address how this flexibility is achieved. Most assume Amsterdam兲. This algorithm uniformly modifies the duration of a wave-
phonetic context effects can be predicted and hard-wired, form with minimal change in its pitch or spectral characteristics. For voiced
portions of the signal, each pitch period is multiplied by a bell-shaped
and that interspeaker variability can be solved at an early
window and adjacent windows are overlapped and added according to the
stage with intrinsic or extrinsic vocal-tract normalization. compression factor. For voiceless portions, the windows are spaced equally.
The models then typically focus on an architecture based on The technique is widely used and results in high quality speech.

J. Acoust. Soc. Am., Vol. 116, No. 6, December 2004 C. M. Clarke and M. F. Garrett: Rapid adaptation to accent 3657
4
A paired-items t-test comparing the percent correct identification for the McClelland, J. L., and Elman, J. L. 共1986兲. ‘‘The TRACE model of speech
two voices was statistically significant, t(23)⫽2.42, p⬍0.05, indicating the perception,’’ Cognit Psychol. 18, 1– 86.
nonaccented words were more intelligible. It is difficult to exactly equate a McGarr, N. S. 共1983兲. ‘‘The intelligibility of deaf speech to experienced and
non-native speaker’s intelligibility with a native speaker’s. However, the inexperienced listeners,’’ J. Speech Hear. Res. 26, 451– 458.
Chinese-accented target words were still highly intelligible: 20 of the 24 Moulines, E., and Charpentier, F. 共1990兲. ‘‘Pitch-synchronous waveform
words 共83%兲 were identified correctly by over 90% of listeners in the processing techniques for text-to-speech synthesis using diphones,’’
screening test. Speech Commun. 9, 453– 467.
5
Due to different recording circumstances, the new stimuli had a more tinny Mullennix, J. W., Pisoni, D. B., and Martin, C. S. 共1989兲. ‘‘Some effects of
quality than the original sentence stimuli resulting from greater amplitude talker variability on spoken word recognition,’’ J. Acoust. Soc. Am. 85,
of the frequencies above approximately 3 kHz. In order to equate the sound 365–378.
quality, the new stimulus files were filtered using the Cool Edit 2000 soft- Munro, M. J., and Derwing, T. M. 共1995a兲. ‘‘Foreign accent, comprehensi-
ware package. Amplitudes of the frequencies above 3 kHz were linearly bility, and intelligibility in the speech of second language learners,’’ Lang.
reduced from 100% at 3 kHz to approximately 5% at 9.5 kHz to 0% at 10.5 Learn. 45, 73–97.
kHz using a passive filter with an FFT size of 8192 and a Blackman win- Munro, M. J., and Derwing, T. M. 共1995b兲. ‘‘Processing time, accent, and
dowing function. This produced a similar sound quality to the original comprehensibility in the perception of native and foreign-accented
recordings as judged by the first author. speech,’’ Lang. Speech 38, 289–306.
6
Although the stimuli in the accent and control in clear conditions differed Nearey, T. M. 共1989兲. ‘‘Static, dynamic, and relational properties in vowel
by 5 dB, the fact that the accent condition was louder predicts a perceptual perception,’’ J. Acoust. Soc. Am. 85, 2088 –2113.
advantage for the accented voice, which is opposite to the experimental Norris, D. 共1994兲. ‘‘Shortlist: a connectionist model of continuous speech
prediction for the first three blocks. recognition,’’ Cognition 52, 189–234.
Norris, D., McQueen, J. M., and Cutler, A. 共2003兲. ‘‘Perceptual learning in
Bilger, R. C. 共1984兲. Manual for the Clinical Use of the Revised SPIN Test speech,’’ Cognit. Psychol. 47, 204 –238.
共Univ. of Illinois, Champaign, IL兲. Nusbaum, H. C., and Magnuson, J. 共1997兲. ‘‘Talker normalization: Phonetic
Bradlow, A. R., and Bent, T. 共2003兲. ‘‘Listener adaptation to foreign- constancy as a cognitive process,’’ in Talker Variability in Speech Process-
accented English,’’ in Proceedings of the 15th International Congress of ing, edited by K. Johnson and J. W. Mullennix 共Academic, San Diego,
Phonetic Sciences, Barcelona, Spain, 2003, edited by M. J. Sole, D. Re- CA兲, pp. 109–132.
casens, and J. Romero, pp. 2881–2884. Nygaard, L. C., and Pisoni, D. B. 共1995兲. ‘‘Speech perception: New direc-
Clarke, C. M. 共2000兲. ‘‘Perceptual adjustment to foreign-accented English,’’ tions in research and theory,’’ in Handbook of Perception and Cognition:
J. Acoust. Soc. Am. 107, 2856共A兲. Vol. 11. Speech, Language, and Communication, 2nd ed., edited by J. L.
Duffy, S. A., and Pisoni, D. B. 共1992兲. ‘‘Comprehension of synthetic speech Miller and P. D. Eimas 共Academic, San Diego, CA兲, pp. 63–96.
produced by rule: A review and theoretical interpretation,’’ Lang. Speech Nygaard, L. C., and Pisoni, D. B. 共1998兲. ‘‘Talker-specific learning in speech
35, 351–389. perception,’’ Percept. Psychophys. 60, 355–376.
Dupoux, E., and Green, K. P. 共1997兲. ‘‘Perceptual adjustment to highly Nygaard, L. C., Sommers, M. S., and Pisoni, D. B. 共1994兲. ‘‘Speech percep-
compressed speech: Effects of talker and rate changes,’’ J. Exp. Psychol. tion as a talker-contingent process,’’ Psych. Sci. 5, 42– 46.
Hum. Percept. Perform. 23, 914 –927. Palmeri, T. J., Goldinger, S. D., and Pisoni, D. B. 共1993兲. ‘‘Episodic encod-
Forster, K. I., and Forster, J. C. 共2003兲. ‘‘DMDX: A Windows display pro- ing of voice attributes and recognition memory for spoken words,’’ J. Exp.
gram with millisecond accuracy,’’ Behav. Res. Methods Instrum. Comput. Psychol. Learn. Mem. Cogn. 19, 309–328.
35, 116 –124.
Pisoni, D. B. 共1997兲. ‘‘Some thoughts on ‘normalization’ in speech percep-
Gass, S., and Varonis, E. 共1984兲. ‘‘The effect of familiarity on the compre-
tion,’’ in Talker Variability in Speech Processing, edited by K. Johnson and
hensibility of nonnative speech,’’ Lang. Learn. 34, 65– 89.
J. W. Mullennix 共Academic, San Diego, CA兲, pp. 9–32.
Greenspan, S. L., Nusbaum, H. C., and Pisoni, D. B. 共1988兲. ‘‘Perceptual
Pollatsek, A., and Well, A. D. 共1995兲. ‘‘On the use of counterbalanced de-
learning of synthetic speech produced by rule,’’ J. Exp. Psychol. Learn.
signs in cognitive research: A suggestion for a better and more powerful
Mem. Cogn. 14, 421– 433.
analysis,’’ J. Exp. Psychol. Learn. Mem. Cogn. 21, 785–794.
Joos, M. 共1948兲. ‘‘Acoustic phonetics,’’ Lang. Suppl. 24共2兲, 1–136.
Remez, R. E., Fellowes, J. M., and Rubin, P. E. 共1997兲. ‘‘Talker identifica-
Kalikow, D. N., Stevens, K. N., and Elliott, L. L. 共1977兲. ‘‘Development of
tion based on phonetic information,’’ J. Exp. Psychol. Hum. Percept. Per-
a test of speech intelligibility in noise using sentence materials with con-
trolled word predictability,’’ J. Acoust. Soc. Am. 61, 1337–1351. form. 23, 651– 666.
Keppel, G. 共1982兲. Design and Analysis: A Researcher’s Handbook Schmid, P. M., and Yeni-Komshian, G. H. 共1999兲. ‘‘The effects of speaker
共Prentice–Hall, London兲, Chap. 8, pp. 144 –168. accent and target predictability on perception of mispronunciations,’’ J.
Klatt, D. H. 共1988兲. ‘‘Review of selected models of speech perception,’’ in Speech Lang. Hear. Res. 42, 56 – 64.
Lexical Representation and Process, edited by W. D. Marslen-Wilson Shankweiler, D., Strange, W., and Verbrugge, R. 共1977兲. ‘‘Speech and the
共MIT, Cambridge兲, pp. 201–262. problem of perceptual constancy,’’ in Perceiving, Acting, and Knowing:
Kucera, F., and Francis, W. 共1967兲. Computational Analysis of Present-Day Toward an Ecological Psychology, edited by R. Shaw and J. Bransford
American English 共Brown U.P., Providence, RI兲. 共Erlbaum, Hillsdale, NJ兲, pp. 315–345.
Ladefoged, P., and Broadbent, D. E. 共1957兲. ‘‘Information conveyed by Sommers, M. S., Nygaard, L. C., and Pisoni, D. B. 共1994兲. ‘‘Stimulus vari-
vowels,’’ J. Acoust. Soc. Am. 29, 98 –104. ability and spoken word recognition. I. Effects of variability in speaking
Lane, H. 共1963兲. ‘‘Foreign accent and speech distortion,’’ J. Acoust. Soc. rate and overall amplitude,’’ J. Acoust. Soc. Am. 96, 1314 –1324.
Am. 35, 451– 453. Studebaker, G. A. 共1985兲. ‘‘A ‘rationalized’ arcsine transform,’’ J. Speech
Liberman, A. M., Cooper, F. S., Shankweiler, D. P., and Studdert-Kennedy, Hear. Res. 28, 455– 462.
M. 共1967兲. ‘‘Perception of the speech code,’’ Psychol. Rev. 74, 431– 461. van Wijngaarden, S. J. 共2001兲. ‘‘Intelligibility of native and non-native
Luce, P. A., Goldinger, S. D., Auer, E. T., and Vitevitch, M. S. 共2000兲. Dutch speech,’’ Speech Commun. 35, 103–113.
‘‘Phonetic priming, neighborhood activation and PARSYN,’’ Percept. Psy- Weil, S. A. 共2001兲. ‘‘Foreign accented speech: Encoding and generaliza-
chophys. 62, 615– 625. tion,’’ J. Acoust. Soc. Am. 109, 2473共A兲.
Luce, P. A., and McLennan, C. T. 共in press兲. ‘‘Spoken word recognition: Wingstedt, M., and Schulman, R. 共1987兲. ‘‘Comprehension of foreign ac-
The challenge of variation,’’ in The Handbook of Speech Perception, ed- cents,’’ in Phonologica 1984, edited by W. Dressler, H. Luschutzky, O.
ited by D. B. Pisoni and R. E. Remez 共Blackwell, Malden, MA兲. Pfeiffer, and J. Rennison 共Cambridge U.P., Cambridge兲, pp. 339–345.

3658 J. Acoust. Soc. Am., Vol. 116, No. 6, December 2004 C. M. Clarke and M. F. Garrett: Rapid adaptation to accent

View publication stats

You might also like