You are on page 1of 17

Acoustic and perceptual similarity of North German

and American English vowels


Winifred Strangea)
Ph.D. Program in Speech and Hearing Sciences, The City University of New York—Graduate School
and University Center, 365 Fifth Avenue, New York, New York, 10016-4309

Ocke-Schwen Bohn
English Department, Aarhus University, DK-8000 Aarhus C, Denmark

Sonja A. Trent and Kanae Nishi


Department of Psychology, University of South Florida, 4202 Fowler Avenue, Tampa, Florida 33620

共Received 30 September 2003; revised 26 January 2004; accepted 26 January 2004兲


Current theories of cross-language speech perception claim that patterns of perceptual assimilation
of non-native segments to native categories predict relative difficulties in learning to perceive 共and
produce兲 non-native phones. Cross-language spectral similarity of North German 共NG兲 and
American English 共AE兲 vowels produced in isolated hVC共a兲 共di兲syllables 共study 1兲 and in hVC
syllables embedded in a short sentence 共study 2兲 was determined by discriminant analyses, to
examine the extent to which acoustic similarity was predictive of perceptual similarity patterns. The
perceptual assimilation of NG vowels to native AE vowel categories by AE listeners with no
German language experience was then assessed directly. Both studies showed that acoustic
similarity of AE and NG vowels did not always predict perceptual similarity, especially for ‘‘new’’
NG front rounded vowels and for ‘‘similar’’ NG front and back mid and mid-low vowels. Both
acoustic and perceptual similarity of NG and AE vowels varied as a function of the prosodic context,
although vowel duration differences did not affect perceptual assimilation patterns. When duration
and spectral similarity were in conflict, AE listeners assimilated vowels on the basis of spectral
similarity in both prosodic contexts. © 2004 Acoustical Society of America.
关DOI: 10.1121/1.1687832兴
PACS numbers: 43.71.Hw, 43.70.Kv, 43.71.Es, 43.70.Fq 关RD兴 Pages: 1791–1807

I. INTRODUCTION egories, which are determined by the perceived phonetic


similarity of L1 and L2 segments. Two L2 segments which
In recent years, there has been increased interest in are judged as equally ‘‘good’’ instances of a single L1 cat-
cross-language comparisons of phonetic categories, growing egory 共Single-Category pattern兲 will be most difficult to dif-
out of the well-documented problems that adult second lan- ferentiate, while two L2 segments that are assimilated to two
guage 共L2兲 learners have in acquiring a new phonological different L1 categories 共Two-Category pattern兲 will be very
system. In his Speech Learning Model 共SLM兲, Flege 共1995兲 easy to discriminate. In addition, contrasting L2 segments
claims that continuing problems with ‘‘accented’’ production that differ in their judged goodness as instances of a single
of phonetic segments can be attributed in large part to L2 L1 category 共Category-Goodness pattern兲 will yield interme-
learners’ representation of the L2 segments as equivalent to diate levels of perceptual difficulty. Finally, if an L2 segment
‘‘similar’’ segments in the native language 共L1兲. That is, if is sufficiently dissimilar from any L1 category, it may be
the L2 phones are sufficiently similar to L1 phones, they will considered an ‘‘uncategorizable’’ speech sound. When paired
be perceptually assimilated to those native categories, with with another L2 phone that is phonetically similar enough to
the result that both L1 and L2 segments are produced differ- be categorized as an instance of an L1 category 共i.e., it is
ently from native monolingual speakers’ utterances. If, how- categorizable兲, the two phones 共Uncategorizable versus Cat-
ever, L2 phones are sufficiently dissimilar from any L1 cat- egorizable兲 will be relatively easily discriminated.
egory 共i.e., ‘‘new’’兲, the L2 learner will 共eventually兲 establish According to both these models, then, the perceived
distinct L1 and L2 phonetic categories, and production of the similarity of segments in L1 and L2 is an important deter-
L2 segments will become more native-like. miner of the pattern of initial perceptual problems and per-
In her Perceptual Assimilation Model 共PAM兲, Best sistent learning difficulties adult L2 learners have in master-
共1994, 1995兲 also invokes the concept of cross-language ing the L2 phonological system. It is critical, therefore, that
phonetic similarity to predict the relative difficulties that lis- cross-language perceptual similarity be established, indepen-
teners will have in perceptual differentiation of non-native dent of identification or discrimination performance, in order
segmental contrasts. She describes several patterns of per- to predict L2 learning difficulties more accurately. In the
ceptual assimilation of L2 segments to L1 phonological cat- work of Flege and Best, as well as other researchers in the
field, perceptual similarity has been inferred from 共1兲 a com-
a兲
Electronic mail: strangepin@aol.com parison of impressionistic descriptions of the phonetic seg-

J. Acoust. Soc. Am. 115 (4), April 2004 0001-4966/2004/115(4)/1791/17/$20.00 © 2004 Acoustical Society of America 1791
ments 共e.g., Best and Strange, 1992兲, 共2兲 transcriptions or ration differences than are long and short vowels in Ameri-
reports from listeners about similarities between native and can English 共Strange and Bohn, 1998兲. In both languages,
non-native segments 共e.g., Best, Faber, and Levitt, 1996兲 or vocalic duration also varies with vowel height such that high
共3兲 cross-language comparisons of the acoustic structure of vowels are shorter than low vowels, with mid vowels inter-
the non-native segments 共e.g., Flege, 1987; Bohn and Flege, mediate in duration 共Strange and Bohn, 1998; Strange et al.,
1990兲. In more recent studies, perceptual similarity has been 1998兲.
assessed directly, using a perceptual assimilation task in According to traditional articulatory-phonetic descrip-
which listeners are presented non-native segments and asked tions, then, NG front rounded vowels do not have counter-
to categorize them with respect to which native category they parts in the AE vowel inventory 共they are ‘‘new’’ in Flege’s
are most similar and to rate their ‘‘category goodness’’ as terminology兲. NG 关a兴 and AE关#兴 can be considered ‘‘similar’’
exemplars of the chosen categories 共e.g., Bohn and Flege, in tongue height and position and are both short monoph-
1990; Guion, Flege, Akahane-Yamada, and Pruitt, 2000; thongs, while NG关Å兴 and AE 关Åb兴 are ‘‘similar’’ in tongue
Strange, Akahane-Yamada, Kubo, Trent, Nishi, and Jenkins, height, but differ in intrinsic duration. Finally, NG 关|b, Çb兴
1998; Strange, Akahane-Yamada, Kubo, Trent, and Nishi, are monophthongal long vowels, while their AE counterparts
2001兲. are diphthongized 共and long兲 in most phonetic and prosodic
In the study reported here, the phonetic similarity of contexts.
North German 共NG兲 and American English 共AE兲 vowels was It is well known that speakers of languages that do not
investigated. Results of acoustical analysis of a corpus of NG contrast front rounded vowels with either front unrounded or
vowels were compared with data from a similar corpus of back rounded vowels have difficulty learning to perceive and
AE vowels. Perceptual similarity was assessed directly using produce these vowels. However, previous research has pro-
the perceptual assimilation task in which native speakers of duced conflicting results with respect to native AE speakers’
AE with no previous experience with German were asked to ability to perceptually differentiate such contrasts in German
categorize and rate NG vowels as exemplars of AE vowel and French. Polka 共1995兲 reported native-like categorial dis-
categories. Acoustic and perceptual similarity patterns were crimination by AE listeners of NG back versus front rounded
then compared to determine the extent to which perceptual high tense/u–y/, but less than native-like discrimination of
similarity of sets of NG vowels were predictable from their the mid-high lax pair/*-+/; vowels were produced and pre-
context-specific similarity in spectral and temporal structure sented in citation-form dVt syllables. In contrast, Gottfried
to AE vowels. Of additional interest was the extent to which 共1984兲, and later Levy and Strange 共2002兲, reported that
acoustic and perceptual similarity varied as a function of 关u–y兴, as spoken by Parisian French speakers, was very dif-
prosodic context 共citation-form lists vs sentences spoken at ficult for native English speakers to perceptually differenti-
continuous speech rates兲. ate, even after years of experience speaking French 共see also
The German vowel inventory consists of 14 distinctive Rochet, 1995兲. Best et al. 共1996兲 reported relatively poor dis-
monophthongs that include 7 tense–lax 共long–short or crimination by inexperienced English listeners of the Norwe-
close–open兲 pairs: the front unrounded high 关{b-(兴 and mid gian high front unrounded versus outrounded 关i–y兴 in
关|b-␧兴 vowels, and back rounded high 关Éb-*兴 and mid 关Çb-Åb兴 citation-form bV syllables, while Levy and Strange’s sub-
vowels, the front rounded high 关Ñb-+兴 and mid 关Ö-!兴 vowels jects did relatively well on French unrounded versus 共out兲-
and the low vowels 关Äb-~兴. In North German 共NG兲 dialects, rounded 关i–y兴 in dVt syllables. Finally, both Gottfried and
high and mid tense versus lax vowel pairs are differentiated Levy and Strange reported significant differences in percep-
phonetically by both vocalic duration and tongue/jaw posi- tual performance as a function of the syllabic and consonan-
tion, with the lax vowel of each pair lower 共more open兲 and tal context in which the French front rounded vowels were
more centralized than the tense vowel; the low vowels are produced and presented. Levy and Strange reported that in-
differentiated almost entirely by duration.1 The standard experienced listeners had difficulty discriminating 关i–y兴 in
American English 共AE兲 inventory includes nine so-called bVp syllables but not in dVt syllables, whereas the opposite
monophthongs and two diphthongized nonrhotic vowels: a pattern was true for the 关u–y兴 contrast. 共Contextual effects
front-unrounded series 关{b, (, |( on perceptual assimilation will be addressed in a forthcom-
^, ␧, ,b兴, a back-rounded series ing paper.兲
^, Åb兴, and the low and mid-low back vowels 关Äb, #兴.
关Éb, *, Ç* These discrepant results within and across studies and
While AE vowels differ phonetically in intrinsic duration, as languages clearly indicate that the prediction of difficulty in
indicated in their transcription above ( 关 b兴 ⫽long), length is the perception of non-native vowels by adult L2 learners,
considered a secondary or redundant feature of vowel height based on contrastive analyses at the level of abstract phono-
or closeness 共Peterson and Lehiste, 1960; Crystal and House, logical features or impressionistic descriptions of phonetic
1988a, 1988b; Hillenbrand, Clark, and Houde, 2000兲. Thus, segments, is doomed to failure 共cf. Kohler, 1981兲. Across
North German and American English are similar in that vow- languages, vowels that are transcribed as ‘‘the same’’ can be
els are distinguished phonetically by five levels of vowel very different acoustically 共e.g., French front rounded 关y兴 is
height/openness and by backness. The vowel inventories dif- more front acoustically than is German front rounded 关y兴
fer in that lip rounding is redundant with backness in En- 关Strange et al., 2002兴兲. Thus, it is necessary to establish
glish, whereas it distinguishes unrounded and rounded front cross-language similarities and differences of vowels at a
vowels in German. Finally, in North German, long and short level more closely associated with their phonetic realization
vowels are realized in careful speech with larger relative du- in the particular languages under comparison, if we are to

1792 J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels
gain a better understanding of the perceptual problems facing coronal consonant context lead to the prediction that AE lis-
second-language learners. Furthermore, the extent to which teners will assimilate NG front rounded vowels to back
perceived phonetic similarity may be predicted from a com- rounded AE vowels. However, in many dialects of American
parison of cross-language context-specific acoustic similarity English, back rounded high and mid-high vowels in coronal
must be established. For instance, Flege and Hillenbrand contexts have higher F2 frequencies than in noncoronal con-
共1984兲 reported that AE learners of French produced French texts; that is, they are allophonically ‘‘fronted’’ in coarticu-
关tu兴 more ‘‘authentically’’ than 关ty兴, as judged by native
lation with coronal consonants 共Stevens and House, 1963;
French listeners. However, acoustic analysis showed that
Strange, 1989; Hillenbrand et al., 2001兲. Thus, NG front
their 关y兴 tokens were more similar to native French tokens
than were 关u兴 tokens. They inferred from these data that for rounded vowels may be perceived as more similar to AE
AE speakers, French 关u兴 was perceptually similar to its AE so-called back rounded vowels in alveolar contexts than
counterpart, while French 关y兴 was perceived as a ‘‘new’’ when preceded or followed by noncoronal consonants.
vowel 共i.e., was perceptually dissimilar from any AE vowel兲. Our second goal in this study was to examine the acous-
However, no direct assessment of perceptual similarity was tic and perceptual similarity of the ‘‘similar’’ mid and mid-
made, and no comparison of native French front and back low vowels, NG 关|b, ␧, Çb, Å兴 and AE 关 |( ^, ␧, Ç*
^, Åb兴. Got-
rounded vowels with AE back and front vowels was in- tfried 共1984兲 reported that French 关{–|兴 and 关e–␧兴 were
cluded. difficult for native English speakers to discriminate when
In the present study, a primary goal was the systematic presented in isolation and in a tVt context 共Gottfried, 1984兲.
evaluation of the acoustic and perceptual similarity of NG Previous acoustical studies of AE and NG vowels 共Hillen-
front rounded vowels relative to AE front unrounded and brand et al., 2001; Strange and Bohn, 1998; see also Steinlen
back rounded vowels. However, rather than investigating and Bohn, 1999; Steinlen, 2002, for Southern British English
only these four NG vowels, both acoustical and perceptual
and German vowels兲 suggest that there are differences in the
comparisons included the entire inventory of 14 NG
realization of mid and mid-low vowels 共i.e., their location in
monophthongs in comparison with the 11 AE vowels. We
reasoned that since vowels are perceived with respect to their F1/F2 vowel space relative to high and low vowels兲, in ad-
relative positions in a speaker’s ‘‘vowel space,’’ it was im- dition to the differences in diphthongization 共for 关eb, ob兴兲 and
portant to assess cross-language acoustic and perceptual duration 共for 关Å兴兲. If acoustical analyses revealed that NG
similarity of the complete vowel inventories of both lan- mid long and mid-low short vowels were produced ‘‘higher’’
guages. In acoustic analyses, questions about the relative in vowel space relative to their AE counterparts, we might
spacing of vowels in the vowel space could be more accu- expect that AE listeners would assimilate mid and mid low
rately determined across languages, and issues about speaker vowels to high and mid-high AE categories, respectively, at
normalization could be addressed. For the investigation of least some of the time.
the perceptual assimilation of NG vowels by AE listeners, Our final goal in this study was to investigate the rela-
presenting the entire vowel inventory of each speaker would tive contribution of vocalic duration to the perceptual simi-
allow the listeners to ‘‘normalize’’ the speaker’s utterances. larity of NG vowels for AE listeners. Bohn 共1995兲 has
In addition, questions about the perception by AE listeners of claimed that when non-native vowels have no spectral coun-
‘‘similar’’ and ‘‘identical’’ NG vowels could be answered.
terpart in the native language, listeners may attend more to
In this study, cross-language acoustic and perceptual
temporal cues to distinguish them, even when vowel length
similarity of NG and AE vowels was determined for produc-
tions that were relatively uninfluenced by the surrounding is not phonologically contrastive in their native vowel sys-
consonantal context; i.e., with vowels considered to reflect tem. In a previous study of Japanese listeners’ perceptual
‘‘canonical vowel targets’’ for each language. The NG vow- assimilation of AE vowels, Strange et al. 共1998兲 reported
els were produced and presented in hVp syllables spoken in that temporal assimilation patterns varied systematically with
lists 共citation-form syllables兲 共study 1兲 and in the same syl- prosodic context. Japanese listeners heard long versus short
lables produced and presented in short carrier sentences AE vowels as more similar to long 共two-mora兲 versus short
共study 2兲.2 Because short vowels do not occur in open syl- 共one-mora兲 Japanese vowels, respectively, when hVb syl-
lables in English or in German, the CVC structure was cho- lables were produced and presented in sentence context, de-
sen. The initial /*/ and voiceless labial stop /!/ are phono- spite the fact that the acoustic durations of long and short AE
logically appropriate in the word-initial and word-final vowels were very similar in both sets of materials 共and the
position, respectively, in both German and English and pro- differences were smaller than for Japanese vowels兲. It was
nounced similarly in both languages. Thus, it was assumed
concluded that the rhythmic structure of the sentences al-
that judgements of cross-language perceptual similarity of
lowed Japanese listeners to better ‘‘interpret’’ intrinsic dura-
these syllables would be attributable to the vowels them-
selves, with little effect of the surrounding consonants on tion differences of the AE vowels in relating them to their
either their production or their perception. own vowel inventory. In the present study, it was hypoth-
Previous studies of the perception of German vowels esized that long and short NG vowels would be perceptually
共Polka, 1995; Polka and Bohn, 1996兲 and French vowels assimilated to long and short AE vowels more consistently
共Gottfried, 1984; Rochet, 1995兲 by AE listeners using mate- when presented in the sentence context than in the isolated
rials in which the vowels were produced and presented in the syllable context.

J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels 1793
II. ACOUSTIC AND PERCEPTUAL SIMILARITY OF NG first. This last utterance was discarded to eliminate any list-
AND AE VOWELS PRODUCED IN CITATION- final differences in production. The first randomization was
FORM SYLLABLES used primarily for practice and was not included in the final
Study 1 examined a corpus of vowels produced by four stimulus corpus unless a token in randomization 2 or 3 was
adult male native speakers of NG in citation-form hVp syl- deemed less acceptable as an exemplar of that vowel. After
lables. An acoustical analysis was performed to provide de- recording, a phonetically trained NG speaker listened to all
scriptive data about average formant frequencies and relative of the tokens produced by each of the four speakers. The
durations of these ‘‘canonical’’ NG vowels. Discriminant selected exemplars were all considered good tokens of the
analyses were then used to establish the extent to which the NG vowels by this listener. The final stimulus corpus in-
14 vowels were differentiated spectrally 共using formant fre- cluded 28 stimuli (2 tokens⫻14 vowels) for each of four
quencies as input parameters兲 and the contribution of relative speakers for a total of 112 stimuli. The stimuli were digitally
duration to acoustic differentiation 共adding duration as an transferred from a DAT recorder 共Tascam DA-30 MK II兲 to a
input parameter兲. Acoustic similarity to an existing corpus of Power Macintosh 8100/100 computer via an Audiomedia II
AE vowels produced in citation-form lists of hVba disyl- digital I/O sound card. Sound files were then downsampled
lables was then established using discriminant analyses in to 22.01 kHz with 16-bit amplitude resolution and the iden-
which AE vowels served as the input corpus and NG vowels tifying numbers were deleted from each file.
as the test corpus. We were especially interested in the acous- Four adult male native speakers of AE produced mul-
tic similarity of NG front rounded vowels to AE front and tiple instances of the 11 AE vowels in citation-form hVba
back vowels. Although the consonantal and syllable context context. 共This corpus was the same as that used in a study of
differed somewhat from the NG corpus, it was presumed that perceptual similarity of Japanese and AE vowels, Strange
the influence of final consonants 共/p/ vs /b/兲 and syllable et al., 1998兲. Three of the speakers were residing at the Uni-
structure 共monosyllable versus disyllable兲 across corpora on versity of South Florida at the time of testing; the remaining
the ‘‘target’’ formant frequencies of the vowels would be speaker was residing in Japan at the time of recording. For
comparable 共and minimal兲, and that within-language relative the three Florida participants, stimuli were recorded in an
vowel durations would be interpretable, despite the possible IAC chamber using a dynamic microphone 共Panasonic WM-
effects of final consonant voicing and syllable structure dif- 1325兲 fed to a DAT recorder 共SONY TCD-D10兲. The speaker
ferences on absolute vocalic durations. On the basis of these residing in Japan was recorded in an anechoic chamber, us-
cross-language acoustical comparisons, predictions were ing a condenser microphone 共SONY ECM-77兲 fed to a DAT
made with respect to how native AE listeners with no recorder 共SONY PCM-2500A,B兲. Speakers produced the 11
German-language experience would perceptually assimilate AE vowels in lists of hVba disyllables, each preceded by an
NG vowels to their native AE vowel categories. identifying number. Each speaker produced 4 randomizations
of 12 syllables, where the 12th utterance was identical to the
A. Acoustic similarity of NG and AE vowels first; this utterance was not used. Utterances from random-
izations 2, 3, and 4 were used as stimulus tokens. Thus, for
1. Speakers and stimulus materials each speaker, there were 33 stimuli (3 instances
Four male speakers of North German produced the NG ⫻11 vowels). The stimuli were recorded at 48 kHz, digitally
stimulus corpus; all four had lived in northern Germany transferred to a Power Macintosh 8100/100 computer via an
共Kiel, West Holstein, Hamburg, Lubeck兲 all their lives and Audiomedia II digital I/O soundcard, downsampled to 22.01
had not resided in an English speaking country for more than kHz with 16 bit amplitude resolution, and then identifying
a few months. They were enrolled as students at Kiel Uni- numbers were deleted from the files.
versity at the time of recording. The speakers ranged in age
from 25 to 29 years old, and 共as is typical of college students 2. Acoustic analysis
in Germany兲 all had studied English for at least nine years. Temporal and spectral measurements were performed
They had also studied French in high school 共2– 4 years兲. using SoundScope/16 1.44 共ppc兲™ speech analysis software
Recordings were made in a quiet room at Kiel Univer- designed by GW Instruments, Inc. 共© Copyright 1992 GWI,
sity by the second author 共a native speaker of German兲, with Somerville, MA 02143兲. Temporal measurements were made
the first author in attendance, using a Sony ECM-939LT 共ste- from an inspection of time-synchronized wave form and
reo electret condenser microphone兲 connected to a Marantz wideband spectrogram displays 共300 Hz filter, 6 dB pre-
PMD 420 cassette tape recorder 共as a preamplifier兲 fed to a emphasis兲 with LPC formant tracks superimposed. Vocalic
Sony DTC-P7 DAT tape deck 共48 KHz sampling rate兲. The duration was defined as beginning at the onset of voicing
14 nonsense syllables were printed individually on index after the /h/ 共the beginning of the first pitch period兲 and ter-
cards in German orthography with an identifying number minating at the end of the final pitch period before consonant
printed before each syllable. The speakers were given in- closure 共end of upper formant energy in the case of AE /b/,
structions and practice as needed to produce the number and silence in the case of NG /p/兲. Spectral measures were de-
the syllable, with a pause between, and falling intonation on rived from analysis windows at three relative temporal
the syllable. During practice, clear errors in vowel produc- locations—25%, 50%, and 75%—within the vocalic nucleus;
tion were corrected, but minor dialectal variations were not only data from the 50% windows 共temporal midpoint of the
commented on. Each speaker recorded three randomizations vocalic nuclei兲 are reported here. LPC spectra 共28 coeffi-
of 15 stimuli in which the last stimulus was the same as the cients兲 were computed using a 25 ms Hamming window cen-

1794 J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels
3. Results

Figure 1共A兲 displays the 112 NG stimuli as points in


F1/F2 Bark space; ellipses surround all 8 instances of each
vowel (2 instances⫻4 speakers). Figure 1共B兲 displays the
132 stimuli of the AE corpus for comparison. In both these
plots, long vowels are depicted by closed symbols; short
vowels are depicted by open symbols. Table I presents the
frequencies of the first three formants and the vocalic dura-
tions of each vowel, averaged over all speakers’ tokens 共8 for
NG; 12 for AE兲. German vowels are shown on the left; AE
vowels on the right. Identical or similar vowels across lan-
guages are shown in the same rows to facilitate comparison.
As Fig. 1 shows, the NG front rounded vowels appear to
be spectrally more similar to NG front unrounded vowels,
than to NG back rounded vowels. It is also apparent that the
so-called mid long vowels 关eb, Öb, Çb兴 were realized with
relatively low F1 values, i.e., they were relatively ‘‘high’’
共and close to high long vowels兲 in the acoustic vowel space.
In fact, F1 values for short so-called mid-high vowels 关(, +,
*兴 were actually higher on average than the mid long vowels,
although categories overlapped somewhat in formant values.
Finally, NG long and short low vowels overlapped spectrally.
The AE vowel space showed different patterns of spec-
tral similarity. Mid long and high–mid short vowels 关 |(^, Ç*
^,
(, *兴 overlapped in F1 values, but tended to be differentiated
in F2 values. Relative to NG vowels, AE 关 |( ^, Ç*
^] appeared
to be spectrally more differentiated from 关{b, ub兴. Other spec-
trally similar long and short vowels overlapped in vowel
space 关,b-␧, Åb-#, Äb-#兴. Finally, 关Äb-Åb兴 long vowels showed
FIG. 1. Formant 1/formant 2 共Bark兲 plots of North German 共A兲 and Ameri-
can English 共B兲 vowel corpora in Study 1. Citation-form 共di兲syllables. spectral overlap typical of many AE dialects.
In comparing the relative duration differences of NG
and AE vowels shown in Table I, the short NG vowels were
tered around the 50% location and values for the first three slightly shorter, on average, than the AE short vowels (NG
spectral peaks were tabulated. When LPC peak values re- ⫽80 ms;AE⫽90 ms), whereas the long NG vowels were
flected spurious peaks or missed formants 共based on an in- considerably longer than the AE vowels 共153 ms vs 115 ms兲.
spection of the spectrogram and formant history兲 manual Thus, as expected from phonological descriptions of German
placement of the marker on wideband FFT spectra or, in rare and English, the ratio of long to short vowels for NG 共1.9兲
cases, narrow band FFT were used to estimate formant val- was substantially greater than the long/short ratio for AE
ues. vowels 共1.3兲 for these citation-form utterances.

TABLE I. Average formant frequencies and durations of North German 共NG兲 and American English 共AE兲
vowels in citation-form syllables—study 1. NG long/short vowel ratio⫽1.9. AE long/short vowel ratio⫽1.3.

NG F1 F2 F3 Duration AE F1 F2 F3 Duration

{b 309 1986 2960 137 {b 312 2307 2917 100


( 428 1800 2460 70 ( 486 1785 2573 86
|b 393 2010 2651 161 |(
^ 472 2062 2660 122
␧ 573 1738 2454 83 ␧ 633 1588 2553 91
,b 730 1568 2519 123
Ñb 301 1569 1934 142
+ 428 1340 2137 85
Öb 393 1388 2045 165
! 559 1353 2277 93
Éb 320 689 1978 129 Éb 348 995 2374 104
* 457 834 2368 71 * 489 1148 2472 93
Çb 415 683 2277 165 Ç*
^ 500 909 2643 112
Å 589 893 2497 76 Åb 678 1062 2678 132
Äb 718 1146 2508 172 Äb 753 1250 2596 109
~ 710 1200 2409 82 # 635 1189 2619 89

J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels 1795
A second set of discriminant analyses were performed, in
which the vocalic duration was added as an input parameter;
again, separate analyses were performed for the NG and AE
corpus. Percentages of the ‘‘correct’’ classification of the
stimuli as the intended vowels, based on optimal parameter
weightings established for the input set, were then computed.
For the NG corpus, the overall classification on the basis
of spectral parameters alone 共79% correct兲 reflected the con-
siderable overlap of so-called mid long 关|b, Öb, Ç兴 with so-
called mid-high short vowels 关(, +, *兴 and of long and short
low vowels 关Äb, ~兴, as can be seen in Fig. 1共A兲.
Correct classification ranged from 38% to 75% on these
8 vowels; the other 6 vowels were differentiated spectrally
with 100% accuracy. When the duration was included as an
additional input variable, the overall correct classification
rose markedly (overall⫽93%; range across 14 vowels
⫽63% to 100%兲; at least 7 of 8 tokens of all vowels except
关Çb兴 were correctly classified. The remaining errors included
confusions of adjacent ‘‘height’’ categories 关Éb-Çb, +-!, *-Å兴
and adjacent short/long vowels 关|b-(, Äb-~, Çb-*兴.
Discriminant analysis results for the 132-stimulus AE
corpus, based on spectral parameters alone, yielded 86%
overall correct classification as the speakers’ intended vowels
共range across 11 vowels⫽50% – 100%). As Fig. 1共B兲 sug-
gests, most confusions were between spectrally adjacent
long/short vowel pairs 关 |( ^-(, ,b-␧, Äb-#, Éb or Ç*^-*兴, and
between 关Äb-Åb兴. When vocalic duration was added as an ad-
ditional input parameter, the correct classification improved
(overall⫽92%, range across the 11 vowels⫽83% – 100%),
with at least 11 of 12 instances of all vowels except 关,b, Åb兴
classified correctly. These within-language analyses support
FIG. 2. North German long vowels 共A兲 and short vowels 共B兲 superimposed
on ellipses of the 11 AE vowel categories. Study 1. the conclusion that vowel duration is somewhat more impor-
tant in differentiating spectrally-similar vowels in NG than in
AE. For both corpora, a few acoustic ambiguities remain due
With respect to cross-language comparisons of spectral to speaker differences.
patterns, the averages for NG and AE vowels demonstrate Of greatest interest was the cross-language discriminant
considerable cross-language variability in the acoustic simi- analysis to establish the spectral similarity of AE and NG
larity of so-called identical and similar vowels. To illustrate vowels. 共Because the syllable structure and absolute dura-
this comparison, Fig. 2 displays the distributions, in F1/F2 tions differed for NG and AE corpora, the duration was not
Bark3 space, of each NG vowel 共filled symbols circum- used in cross-language discriminant analysis.兲 The 112
scribed by solid ellipses兲, superimposed on ellipses sur- stimuli of the NG corpus served as the test corpus and were
rounding the 12 instances of each of the AE vowels 共dashed classified with respect to AE vowel centers of gravity estab-
lines兲. For clarity, the 7 long NG vowels are shown in Fig. lished for the 132-stimulus AE input corpus. This was ac-
2共A兲, while the 7 short vowels are superimposed on the same complished in two steps. First, the 10 so-called identical and
AE data in Fig 2共B兲, and the individual AE tokens are not similar NG vowels 关{b, (, |b, ␧, Éb, *, Çb Å, Äb, ~兴 were tested
plotted. As the figure and Table I show, the point vowels 关{b, against the 11 AE vowel categories. Overall, only 56% of
Äb兴 were quite similar for the two corpora, suggesting that these NG vowels were classified as exemplars of their AE
differences in other vowels were probably not due to overall counterparts 共range across vowels⫽0% – 100%). Clearly
differences in the vocal tract size and shape of these speak- then, NG vowels transcribed with the same phonetic symbols
ers. did not necessarily coincide spectrally to their AE counter-
In order to quantify within- and cross-language similari- parts. Second, the 4 front rounded vowels 关Ñb, +, Öb, !兴 were
ties in acoustic structure 关including F3 values not shown in classified with respect to the 11 AE vowel centers of gravity;
Figs. 2共A兲 and 2共B兲兴, a series of linear discriminant analyses no definition of the ‘‘correct’’ classification of these vowels
共Klecka, 1980兲 were performed. First, separate analyses of was possible.
the NG corpus and the AE corpus were computed to quantify To specify further the cross-language acoustic similarity
spectral differentiation. For each analysis, F1/F2/F3 fre- of particular NG and AE vowels 共including the front rounded
quency values 共in Barks兲 served as input parameters to es- vowels of greatest interest兲, Table II presents classification
tablish centers of gravity in formant space for the 11 AE results for each NG vowel, grouped into sets of front
vowel categories and 14 NG vowel categories, respectively. rounded vowels 共top兲, mid and mid-low vowels 共middle兲 and

1796 J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels
TABLE II. Acoustic similarity 共F1/F2/F3 Bark values兲 of North German
共NG兲 and American English 共AE兲 vowels: Study 1—syllables. ^-(兴 and 关Äb-Åb-#兴,
flected spectral overlap of AE categories 关 |(
respectively. Finally, NG 关*兴 was never classified as most
Modal classification Other categories similar to AE 关*兴, being acoustically higher and less fronted
than its AE counterpart.
NG AE # of AE # of
vowel vowel stimuli vowel stimuli

Front rounded Ñb {b 7 ( 1 4. Discussion


Öb ( 4 * 2
Éb 2 Results of the acoustic comparison of NG and AE vow-
+ ( 4 * 3 els can be summarized as follows: 共a兲 NG front rounded
Éb 1 vowels, while more similar acoustically to front unrounded
! ( 3 ␧ 1
* 3 # 1
vowels in German, were found to be acoustically intermedi-
Mid and mid low |b |( 4 {b 2 ate between front unrounded and back rounded AE vowels.
^
( 2 共b兲 While NG back rounded vowels were classified as most
Çb Ç*
^ 4 Éb 4 similar spectrally to back AE vowels, F2 values across lan-
␧ ␧ 4 ( 4 guages clearly reveal that AE back vowels were ‘‘fronted’’
Å Ç*
^ 6 Åb 2
relative to their NG counterparts even in this noncoronal
High {b {b 8
Éb Éb 8 consonantal context. 共c兲 Mid and mid-low NG vowels were
Mid-high ( ( 6 {b 1 acoustically ‘‘higher’’ than their AE counterparts. 共d兲
|(
^ 1 Whereas NG and AE front mid-high 关(兴 were very similar
* Ç*
^ 5 Éb 3 spectrally, back mid-high 关*兴 differed considerably for NG
Low Äb Äb-Åb 4 # 4
and AE vowels. The spectral overlap across NG and AE
~ Äb-Åb 5 # 3
corpora for the high front and low back vowels 关(b- Äb兴 sug-
gest that the differences in NG and AE vowels summarized
in 共a兲–共d兲 above were not due to cross-corpora speaker nor-
high, mid-high and low vowels 共bottom兲. The AE category to malization problems, but rather were due to cross-language
which most tokens of each NG vowel were assigned, is given differences in relative spacing of vowels in F1/F2/F3 vowel
in the second column with the number of tokens so assigned space.
共third column兲, while the next two columns show the AE
categories to which the remaining tokens were assigned. For
this analysis, the AE categories 关Äb-Åb兴 were collapsed due to B. Perceptual similarity of NG and AE vowels
their considerable spectral and temporal overlap in the com-
Based on the acoustic comparison of the NG stimuli
parison corpus and their perceptual confusability by many
with a set of AE vowels produced in a similar consonantal
native AE listeners.
context, we predicted the following perceptual assimilation
Looking first at the front rounded vowels, results re-
patterns:
vealed that three of the four vowels were, indeed, acousti-
cally intermediate between front unrounded and back 共1兲 NG front rounded 关Ñb, +, Öb, !兴 were expected to be
rounded AE vowels; half the tokens of 关+, Öb, !兴 were clas- assimilated inconsistently to front and back AE vowel
sified as more similar to AE front vowels; half were classi- categories except for 关Ñb兴, which would be considered
fied as AE back vowels. All tokens of NG 关Ñ兴 were spectrally more similar to AE 关{b兴. In general, these vowels were
more similar to front than back AE vowels. Thus, while front predicted to be judged as very poor exemplars of any AE
rounded NG vowels are more similar acoustically to front category, reflecting the fact that they are phonologically
than back NG vowels 关see Fig. 1共A兲兴, they are only slightly noncontrastive in English and are acoustically dissimilar
more similar to front than to back AE vowels. This is be- from any AE vowel categories in noncoronal contexts.
cause AE back vowels are acoustically ‘‘fronted’’ relatively 共2兲 NG mid and mid-low vowels 关|b, Çb, ␧, Å兴 were expected
to NG back vowels 共see Fig. 2兲. to be inconsistently assimilated to their AE counterparts
Cross-language acoustic classification patterns for the 共and considered relatively ‘‘poor’’ exemplars兲, due to
mid and mid-low vowels 共second set兲 verify cross-language their being produced with higher ‘‘target’’ values 共lower
differences shown in Fig. 2. Half the tokens of 3 of the NG F1 frequencies兲, and, for the long vowels, as monoph-
vowels 关|b, Çb, ␧兴 were classified as more similar to higher thongs. Similarly, NG 关*兴 was expected not to be assimi-
AE vowels, while 75% of the NG 关Å兴 tokens were classified lated to its AE counterpart, but rather heard as more
as most similar to AE 关 Ç*^兴 . This confirms earlier findings similar to higher back vowels.
that mid and mid-low German vowels are located ‘‘higher’’ 共3兲 The NG vowels 关{b, (, ~b, Éb兴 were predicted to be con-
in vowel space than are AE mid and mid-low vowels. sistently assimilated to their AE counterparts. However,
With respect to the high, mid-high, and low NG vowels, 关Éb兴 might be judged as a relatively poorer exemplar of
only 关{b, Éb兴 were unanimously assigned to their AE counter- its AE counterpart, due to its being acoustically further
parts. For the other 3 vowels, individual tokens were as- ‘‘back’’ 共lower F2 frequencies兲. NG 关~兴 was expected to
signed to different AE categories such that the modal assign- be assimilated consistently to AE 关Äb-Åb兴 or 关#兴, depend-
ment percentages varied from 50% to 75%. However, ing upon the extent to which listeners attended to relative
inconsistency in the classification of NG 关(兴 and 关Äb, ~兴 re- duration in making cross-language similarity judgments.

J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels 1797
共4兲 Temporal assimilation patterns of other vowels were also sponse alternatives consistently and to perform the task when
of interest in this study. While spectral and temporal listening to AE vowels. Listeners completed 55 trials 共five
acoustic similarity patterns led to the same perceptual blocks of 11 AE vowels each兲 and were given feedback fol-
assimilation prediction for most NG vowels, 关Öb, Å, *, ~兴 lowing each of the first three blocks. The last two blocks
provided a test of whether AE listeners would categorize were used as an assessment of task mastery. Listeners who
these vowels as more similar to the spectrally closest AE misidentified more than three AE vowel tokens, or misiden-
vowel or to the temporally more similar AE vowel. If tified both presentations of any one vowel, 共except for 关Äb-Åb兴
vowel duration had a significant effect on assimilation of confusions兲,5 were not included in the study.
these vowels, then 关Öb兴 might be expected to be assimi- If the familiarization criterion was met, listeners re-
lated more often to AE long 关{b, Éb兴 than to AE 关(, *兴, turned for four test sessions of approximately 90 min each.
while the remaining three NG vowels would be assimi- In each session, they were tested on a different NG speaker’s
lated to AE short vowels 关*, #兴. However, since these productions; speaker presentation order was counterbalanced
vowels were produced and presented as isolated syl- across subjects. Before each test, listeners were reminded to
lables, previous research suggests that the influence of use the whole seven-point scale in judging the stimuli. Each
vowel duration might not be significant in this study. test session consisted of 共a兲 28 ‘‘listen only’’ stimuli for the
listener to become familiar with the new speaker; 共b兲 two
sample stimuli 共one ‘‘foreign-sounding’’ 关!兴 and one
1. Method
a. Stimulus materials. The 112-stimulus corpus de- ‘‘English-sounding’’ 关{b兴兲; 共c兲 14 practice trials on which lis-
scribed above served as materials for the perceptual assimi- teners categorized and rated the stimuli, but the responses
lation test. Each speaker’s productions were arranged into a were not included in the analysis; and then 共d兲 the two blocks
separate listening test in which the 28 stimuli (14 vowels of 112 stimuli for that speaker.
⫻2 tokens) appeared 4 times each 共112 trials兲 in random
order in a block, with a 5 s response interval between pre-
2. Results
sentations. The test consisted of two such randomized
blocks, for a total of 8 trials on each syllable, 16 trials for For each of the 14 NG vowels, the frequency of selec-
each NG vowel produced by that speaker. tion of each response category was tallied across all speaker
b. Listeners and procedures. Twelve native speakers of conditions and all 12 listeners (16 trials/vowel⫻12 listeners
AE were recruited from introductory classes in Psychology ⫽192 trials). Frequencies are reported as percentages of op-
and Communication Sciences. Some had a beginning course portunities 共frequency/192兲; the modal 共most frequently se-
in phonetic transcription, but none were trained listeners. lected兲 and the second most frequently selected response al-
None had any experience with German, either in courses as a ternatives are reported in Table III 共columns 2 and 5兲 with
foreign language, or with individuals who spoke German. their percentages 共columns 3 and 6兲. Finally, the goodness
The test stimuli were output via a DAT recorder routed ratings assigned the modal and second most frequent re-
through a power amplifier 共Tascam PA-20B兲 to headphones sponse choices were tallied 共again pooling over all speaker
and presented to individual listeners seated at the computer conditions and listeners’ responses兲 and the median ratings
console placed inside an acoustic chamber. Categorization are reported 共columns 4 and 7兲. Thus, the modal response
and rating responses were obtained using an interactive Hy- percentages can be considered a measure of consistency of
perCard program that displayed the 11 AE vowel category perceptual assimilation to a particular native vowel category
choices represented by key words heed, hid, hayed, head, for the average AE listener, whereas the median rating can be
had, hud, hod, hawed, hoed, hood, who’d. These response considered a measure of the judged goodness of the NG
alternatives appeared on the computer screen with the IPA vowel as a member of that AE category. If the NG vowel is
symbol for each vowel category listed above and a response perceived as an excellent exemplar of an AE category, re-
‘‘button’’ beside each word. A stimulus was presented and sponse consistency should be near 100% and goodness rat-
listeners clicked on the button beside the AE word that con- ings should be close to the English-sounding end of the scale
tained the vowel that was most similar to the vowel they 共7兲. If a NG vowel is perceived as a poorer instance of a
heard during a 5 s response interval. Then the same stimulus particular AE vowel, then we would expect modal category
was repeated and a seven-point horizontal scale appeared consistency to be somewhat lower and the median goodness
below the response alternatives. Listeners rated the goodness rating to be closer to the ‘‘foreign-sounding’’ end of the scale
of the stimulus as an exemplar of the category they had cho- 共1兲. Finally, if a particular NG vowel is perceived as very
sen by clicking on the scale. Instructions were to click on the different from any AE vowel 共uncategorizable in Best’s
1 if the vowel was ‘‘very foreign-sounding;’’ 7 if the vowel terms兲, we might expect low consistency within and across
was ‘‘very English-sounding’’ and to choose an appropriate listeners in category assignment and very low goodness rat-
number between 1 and 7 if the vowel was somewhere ings.
in-between.4 The NG vowels are organized in Table III in three clus-
The experiment was administered in five sessions, usu- ters according to predictions about assimilation, based on
ally on five separate days; a few listeners completed two spectral similarity patterns: 共1兲 front rounded vowels 共acous-
sessions on the same day with at least a one-hour break to tically intermediate between front and back AE vowels兲; 共2兲
combat possible fatigue effects. Day 1 included a response mid and mid-low vowels and 关*兴 共which were spectrally dis-
familiarization procedure to train the listeners to use the re- similar from their AE counterparts兲; and 共3兲 high, mid-high,

1798 J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels
TABLE III. Perceptual Assimilation of NG Vowels to AE Categories: Study 1—Syllables.

Most frequent category 2nd most frequent category

NG AE % Median AE % Median
vowel vowel chosen rating vowel chosen rating

Front rounded Ñb Éb 69 2 {b 24 1
Öb * 37 1 Éb 30 2
+ * 56 3 # 20 2
! # 62 5 ␧ 30 3
Mid |b {b 66 5 |(
^ 23 4
Çb Ç*
^ 89 5 Åb 5 1
Mid-low ␧ ␧ 97 6 ( 1 5
Å Äb-Åb 90 5 # 8 5
Mid-high back * Ç*
^ 42 3 Äb 22 2
High {b {b 97 7 ( 1 2
Mid-high front Éb Éb 86 4 Ç*
^ 6 4.5
( ( 97 7 {b 1 5
Low Äb Äb-Åb 98 6 # 1 6.5
~ Äb-Åb 79 5 # 12 6

and low vowels 共which were spectrally similar to their AE lated to any one AE vowel on a majority of trials and was
counterparts兲. judged to be quite foreign sounding. Indeed, no one speak-
First, the four front rounded vowels were, in general, not er’s instances of NG 关*兴 were categorized as any one AE
consistently assimilated to any one AE vowel 共37%– 69%兲. vowel on more than 53% of opportunities. Individual listener
All four vowels, however, were considered more similar to data also revealed inconsistency across speakers and tokens
back AE vowels; pooled over all four vowels and all AE in the assignment of this NG vowel to AE categories. This
back vowel response alternatives, the overall assimilation to NG vowel, then, was considered ‘‘uncategorizable’’ by the
back vowels was 77%. This was true for all four speakers’ listeners in this study.
productions; back vowel responses ranged from 68% to 84% As was predicted, the spectrally similar NG vowels 关{b, (,
across speakers. Front vowel assimilations were somewhat Äb兴 were very consistently categorized as most similar to
greater for 关Ñb兴 共27%兲 and 关!兴 共32%兲 than for the other two their AE counterparts and judged to be excellent exemplars
vowels 共16% for both vowels兲. Except for 关!兴, median rat- of those vowels. Again as predicted, the remaining ‘‘point
ings reflected the fact that listeners judged these vowels to be vowel’’ NG 关Éb兴, was assimilated a bit less consistently and
relatively poor exemplars of either front or back AE vowel rated only a fair exemplar, reflecting its spectral dissimilarity
categories. from AE 关Éb兴. Finally, NG 关a兴 was assimilated as a moder-
An inspection of individual listeners’ perceptual assimi- ately to very good exemplar of three spectrally overlapping
lation patterns revealed considerable individual differences AE vowels 关Äb, Åb, #兴.
in assimilation of NG fronted rounded vowels. Only 2 of the With respect to temporal assimilation patterns, overall,
12 listeners assimilated all 4 front rounded NG vowels pri- long NG vowels were assimilated to long AE categories 88%
marily to front AE categories 共69% and 95%, respectively, of the time (range⫽46% to 99% across the 7 vowels兲, while
pooling over all AE front vowel responses兲. A third listener short NG vowels were assimilated to short AE vowels only
assimilated NG 关Ñb, !兴 to front AE vowels on a majority of 62% of the time (range⫽9% to 99% across the 7 vowels兲.
trials. For the remaining 9 listeners, modal assimilation pat- These large ranges reflect that fact that the four vowels for
terns for all 4 vowels were to back AE categories 共front which spectral and temporal similarity patterns led to differ-
vowel assimilations: median⫽6%, range⫽0% – 17%, pool- ent predictions 关Öb, *, Å, ~兴 were all assimilated more often to
ing over all 4 vowels and all 4 speakers兲. the spectrally closest AE vowel that differed in intrinsic du-
Turning to the spectrally dissimilar NG vowels shown in ration. For the remaining 10 vowels, temporal assimilation to
the second cluster, it is apparent that the mid back 关Çb兴 was the ‘‘correct’’ AE duration category averaged 95% (range
assimilated as a moderately good exemplar of its AE coun- ⫽91% to 99%兲. Thus, we can conclude that when cross-
terpart 共89% consistency; median rating⫽5), while the mid language spectral similarity is in conflict with temporal simi-
front 关|b兴 was not heard as most similar to its AE counterpart, larity, AE listeners’ assimilation of non-native vowels is
but rather, was assimilated more often as a moderately good more influenced by spectral similarity.
exemplar of AE 关{b兴. Individual listener data showed that To summarize so far, 4 NG vowels 关{b, (, ␧, Äb兴 were
only 3 of 12 listeners assimilated NG 关|b兴 to AE 关 |( ^兴 on the assimilated very consistently as excellent instances of par-
majority of trials. The other nine listeners assimilated it to ticular native AE vowels 共consistency ⬎95%; median ratings
关{b兴 or sometimes 关(兴. When 关Äb-Åb兴 responses were 6 –7兲, while another 4 NG vowels 关Éb, Çb, Å, ~兴 were consis-
collapsed,5 NG 关Å兴 was assimilated relatively consistently as tently assimilated as somewhat less good exemplars of par-
a moderately good exemplar of its AE counterpart, while NG ticular AE vowels 共consistency 79%–90%; median ratings
关␧兴 was very consistently assimilated as a very good exem- 4 –5兲. An additional 4 vowels 关|b, Ñb, +, !兴 were less con-
plar of its AE counterpart. Finally, NG 关*兴 was not assimi- sistently assimilated as fair to poor exemplars of particular

J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels 1799
AE categories 共consistency 56%–73%; median ratings 2–5兲, As in the first study, cross-language acoustic similarity
while the remaining two vowels 关Öb, *兴 were uncategorizable of three sets of vowels was investigated: 共1兲 the NG front
as any one AE vowel 共consistency ⬍50%; median ratings rounded vowels 关Ñb, + Öb, !兴 relative to front and back AE
⫽1 – 3). vowels; 共2兲 the NG mid and mid-low vowels 关|b, ␧, Çb, Å兴
and mid-high back 关*兴, which in study 1, were found to be
C. Discussion acoustically dissimilar from their AE counterparts; and 共3兲
the remaining high, mid-high front, and low vowels 关{b, (, Éb,
In comparing perceptual assimilation data 共Table III兲 Äb, ~兴, which were acoustically similar across languages in
and acoustic data 共Table II兲, it can be seen that perceptual study 1. Of interest was the extent to which cross-language
similarity patterns were not well predicted by cross-language spectral similarity differed from that established for ‘‘canoni-
spectral similarity patterns for the front rounded vowels. cal’’ vowels spoken in citation-form utterances. Perceptual
While acoustically intermediate between front and back AE assimilation tests to assess the extent to which patterns of
vowels, these vowels were far more often assimilated to AE phonetic similarity differed for vowels in sentence context
back vowel categories, albeit judged very poor tokens of from those established for citation-form syllables were then
those categories. For the mid and mid-low front and back completed and results of context-specific acoustic and per-
NG vowels, which have ‘‘similar’’ counterparts in AE, we ceptual similarity patterns were compared.
can see that, again, acoustic similarity patterns did not al-
ways predict perceptual assimilation accurately. NG 关Çb, ␧兴
A. Acoustic Similarity of NG and AE vowels produced
were quite consistently assimilated to their AE counterparts, in sentence context
despite acoustic differences in ‘‘height,’’ while the other
vowels were often assimilated to higher AE vowels, as pre- 1. Speakers and stimulus materials
dicted from acoustic similarity patterns. The high, mid-high, The same four NG speakers as in study 1 produced the
and low NG and AE vowels 关{b, (, Äb, Éb兴 that were very stimulus materials during the same recording session, using
similar acoustically in both spectral and temporal structure, the same equipment as in study 1. The sentences containing
were perceptually assimilated highly consistently and stimuli the hVp syllables were written on index cards and speakers
were rated as very good exemplars of native categories. Of read them at a rate simulating that of continuous speech.
interest is that category goodness judgments for NG 关Éb兴 ap- After some initial practice, speakers produced three random-
peared to reflect the phonetic differences from the somewhat izations of 15 sentences 共the final sentence was not used兲. As
fronted AE similar vowel. Thus, we can at least tentatively before, the first randomization was used for further practice
conclude that goodness judgments reflect knowledge about and was not included in the final corpus unless an utterance
phonetic details in the native language that can be accessed in randomization 2 or 3 was rejected because of a disfluency
in the perceptual assimilation task. or difference in sentence prosody. The same phonetically
The results of this study demonstrate that neither ab- trained German listener transcribed the target vowels; all
stract 共context-independent兲 phonological descriptions of NG were considered good instances of the intended vowels.
and AE vowels, nor 共context-dependent兲 acoustic compari- The same four AE speakers who produced the hVba
sons adequately predict perceptual assimilation patterns. One disyllables for comparison in study 1 produced a set of hVb
problem with this study, however, was that the citation-form syllables imbedded in the carrier sentence that was similar in
NG and AE vowels that were compared acoustically differed structure to the German sentence in the number of syllables
in the syllable structure 共monosyllabic versus disyllabic兲 and and position of the target vowel in the sentence. The record-
consonantal context. Furthermore, the productions of the NG ing was done at the same time as for the citation corpus,
vowels, especially their intrinsic duration differences, may using the same equipment and procedures. 共This corpus was
not reflect the acoustic structure of NG vowels produced in also used to assess the Japanese perceptual assimilation of
the sentence context. Thus, study 2 was conducted to exam- AE vowels in Strange et al., 1998.兲 The final corpus con-
ine the perceptual and acoustic similarity of NG and AE sisted of three instances of each of 11 AE vowels produced
vowels in /hVC/syllables when they were produced and pre- by each speaker. Acoustic analysis was performed using the
sented in short carrier sentences at a speaking rate more same procedures as in study 1.
closely resembling continuous 共casual兲 speech.
2. Results
III. ACOUSTIC AND PERCEPTUAL SIMILARITY OF NG
Figure 3共A兲 displays the 8 instances of each NG vowel
AND AE VOWELS IN SENTENCE CONTEXT
(2 instances⫻4 speakers) plotted in F1/F2 Bark frequency
Study 2 was a replication of the first study, except that space, with ellipses surrounding all tokens of each vowel.
the NG vowel corpus was hVp syllables produced in the Figure 3共B兲 displays the 12 instances of each AE vowel
sentence, ‘‘Ich habe hVp gesagt’’ by the same set of speakers (3 instance⫻4 speakers). Table IV presents the average for-
with instructions to ‘‘speak at the rate you would if you were mant frequencies and durations for NG vowels 共on the left兲
speaking to a native German listener.’’ This corpus was then and AE vowels 共on the right兲. The duration ratio of long to
compared to a corpus of AE vowels produced in hVb syl- short vowels for each language is given in the heading.
lables in the sentence ‘‘I say the hVb on the tape,’’ produced First, it is immediately apparent that the temporal struc-
under similar instructions to ‘‘speak as if you were talking to ture of the NG hVp syllables spoken in sentence utterances
a friend who was a native English speaker.’’ differed markedly from those produced in isolated syllables.

1800 J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels
TABLE V. Results of within-context and cross-context discriminant analy-
ses. Percentages reflect ‘‘correct’’ classification as the intended vowels,
based on spectral parameters alone 共F1/F2/F3 in Bark兲 shown in the left-
hand columns, and spectral parameters plus vocalic duration, shown in the
right-hand columns.

A. North German Corpora F1/F2/F3 F1/F2/F3⫹duration

Sentence input–sentence test 80% 90%


Citation input–citation test 79% 93%
Citation input–sentence test 73% 78%

B. American English Corpora F1/F2/F3 F1/F2/F3⫹duration

Sentence input–sentence test 84% 94%


Citation input–citation test 86% 92%
Citation input–sentence test 79% 92%

ductions and all seven long/short vowel pairs. In contrast, the


long/short duration ratio for AE vowels was the same in
sentence materials as for citation materials and absolute du-
rations were slightly longer for vowels in sentences than in
disyllables. Despite the reduction in the temporal distinctive-
ness of NG long and short vowels as a function of prosodic
context and the differences across languages in absolute du-
rations 共due, in part, to the AE vowels being lengthened pre-
ceding voiced consonants兲, the relative duration differences
between NG long and short vowels were still reliably greater
than for long and short AE vowels. Thus, we can conclude
that vowel length is phonetically more salient in NG than in
AE, despite the fact that in both languages, long/short vowel
pairs 共except for NG 关Äb-~兴兲 differ in target formant frequen-
cies as well as in length.
FIG. 3. Formant 1/formant 2 共Bark兲 plots of North German 共A兲 and Ameri- As in study 1, a series of linear discriminant analyses
can English 共B兲 vowel corpora in study 2. Syllables in sentences. was performed to quantify within- and cross-language acous-
tic similarity. First, separate analyses for NG and AE corpora
The long/short duration ratio was smaller 共1.5 vs 1.9兲, re- were performed, using F1/F2/F3 Bark frequencies as input
flecting the fact that the long vowels were shortened consid- parameters to establish within-language spectral distinctive-
erably more when produced in sentence context 共mean ness for NG and AE vowels. A second set of discriminant
duration⫽95 ms vs 153 ms in isolated syllables兲, than were analyses were performed using duration as an additional in-
the short vowels 共mean duration⫽63 ms vs 80 ms in isolated put parameter. Table V 共A and B兲 presents the overall correct
syllables兲. This pattern held for all four NG speakers’ pro- classification results for these analyses in the top row of each
section 共sentence input–sentence test兲. For comparison, data
TABLE IV. Average formant frequencies 共Hz兲 and durations 共ms兲 of North from study 1 共citation input–citation test兲 are included in the
German 共NG兲 and American English 共AE兲 vowels in hVp/hVb syllables in second row for each language.
sentences—Study 2. NG long/short vowel ratio⫽1.5. AE long/short vowel For the NG sentence corpus, the overall correct classifi-
ratio⫽1.3.
cation on the basis of spectral parameters alone was 80%
NG F1 F2 F3 Duration AE F1 F2 F3 Duration 共50%–100% across the 14 individual vowels兲. Most confu-
sions were of spectrally similar long/short pairs 关|b-(兴, 关Äb-~兴,
{b 317 1943 2971 84 {b 303 2336 2961 108
关Öb-+兴, 关Çb or Éb-*兴. When the duration was included as an
( 428 1784 2462 54 ( 461 1826 2634 94
|b 382 2008 2697 97 |( 423 2175 2722 132 input parameter, overall correct classification rose to 90%
^
␧ 597 1738 2471 65 ␧ 627 1657 2544 98 共63%–100% across vowels兲 with at least 7 of 8 tokens of all
,b 714 1645 2456 147 vowels except 关Çb, Öb, !兴 correctly classified. This is slightly
Ñb 306 1590 2061 84 lower than for study 1 and reflects greater variability within
+ 406 1348 2104 63
and across speakers in production of vowels in sentences.
Öb 409 1345 2051 99
! 551 1364 2231 74 When the NG vowels produced in sentences were evaluated
Éb 344 710 2002 84 Éb 342 1064 2422 115 against spectral centers of gravity established for the citation
* 441 836 2398 60 * 495 1202 2492 107 utterances 共i.e., using ‘‘canonical’’ spectral values兲, overall
Çb 427 727 2454 100 Ç*
^ 479 933 2571 126 correct classification for NG vowels produced in sentences
Å 610 966 2414 60 Åb 660 1056 2571 152
was only 73% 共38% to 100% across vowels兲, as shown in the
Äb 713 1173 2438 115 Äb 754 1234 2609 125
~ 713 1227 2395 64 # 631 1232 2619 98 third row of Table V共A兲. That is, spectral centers of gravity
of vowels varied across prosodic contexts such that 2 or

J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels 1801
TABLE VI. Acoustic similarity 共F1/F2/F3 Bark兲 of NG and AE vowels:
Study 2—Sentences.

Most frequent 2nd most frequent

NG AE # of AE # of
vowel vowel stimuli vowel stimuli

Front rounded Ñb ( 6 |(
^ 2
Öb ( 4 É 2
* 1
Å 1
+ Éb 4 * 3
( 1
! * 4 ␧ 2
# 1
Éb 1
Mid |b |(
^ 5 ( 2
{b 1
Çb Ç(
^ 8
Mid-low ␧ ␧ 8
Å Äb-Åb 6 Ç*
^ 2
Mid-high back * Ç*
^ 7 Éb 1
High {b {b 7 |(
^ 1
Éb * 6 Ç*
^ 2
Mid-high front ( ( 5 |(
^ 3
Low Äb Äb-Åb 5 # 3
~ Äb-Åb 4 # 4

tended vowels on the basis of spectral parameters alone


共58% to 100% across the 11 vowels兲. Again, most confusions
were of spectrally adjacent long/short pairs 关 |(
^-(, ,b-␧, Äb-#兴
and of 关Äb-Åb兴. When the duration was included as an input
parameter to a new discriminant analysis, overall correct
FIG. 4. North German long vowels 共A兲 and short vowels 共B兲 superimposed classification rose to 94% 共83%–100%兲 with at least 11 of 12
on ellipses of the 11 AE vowel categories. Study 2. tokens of all vowels except 关Åb, #兴 correctly classified. When
canonical 共citation-form兲 spectral and temporal values were
more tokens of 9 of the 14 vowels produced in sentence used to assess vowels produced in sentences 共citation input–
context were classified as more similar to spectrally adjacent sentence test, overall correct classification decreased only
‘‘canonical’’ vowel targets. Nevertheless, when duration was slightly 共92% overall兲, reflecting the fact that absolute and
included as a parameter, some of these confusions were re- relative durations were similar across prosodic contexts for
solved 共78% correct classification overall; 38% to 100% these corpora. Overall the correct classification of vowels
across vowels兲, with at least 7 of 8 tokens of 10 of the vow- produced in sentences on the basis of canonical spectral cen-
els correctly classified on the basis of citation-form spectral ters of gravity 共79% overall兲 showed that AE vowel targets
centers of gravity and duration established on the basis of varied somewhat as a function of prosodic context, espe-
citation utterances. cially for AE back rounded vowels 关Åb, ^ Ç*, Éb兴.
These cross-context analyses reaffirm the finding that Of greatest interest was the cross-language discriminant
high, mid-high, and mid NG vowel spectral targets 关{b, (, |b兴 analysis, in which the 112-vowel NG sentence corpus was
关Ñb, +, Öb兴, and 关Éb, *, Çb兴 are located quite close together in classified with respect to the parameter weightings and cen-
vowel space, and may be ambiguous across speakers and ters of gravity established for the AE sentence corpus, using
prosodic contexts. The marked decrease 共90% vs 78%兲 in the F1/F2/F3 Bark frequencies as parameters. 共Again, no cross-
correct classification rate for the cross-context analysis that language analysis was performed with duration as an addi-
included vocalic duration as a parameter reveals that dura- tional input value.兲
tions of NG vowels in citation-form utterances were not rep- Figure 4 illustrates the cross-language spectral overlap
resentative of the temporal structure of vowels spoken in for long 共A兲 and short 共B兲 NG vowels, superimposed on
continuous speech. However, even in continuous speech, ellipses, indicating AE vowel categories. Table VI presents
relative vowel duration served to acoustically differentiate the cross-language classification results for individual NG
spectrally similar NG vowels if continuous speech vowel vowels. As in the previous study, NG vowels produced in
durations were used to establish parameter weightings and sentence context varied considerably in their spectral simi-
centers of gravity 共increase from 80% to 90% correct classi- larity to particular AE vowels 共50%–100% classification
fication兲. consistency across the 14 vowels兲. Overall, the front rounded
As shown in Table V共B兲, 84% of the AE vowel tokens NG vowels 共upper set兲 were again spectrally intermediate
produced in sentences were correctly classified as the in- between front and back AE vowels 共47% classified as front,

1802 J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels
53% classified as back兲, with 关Ñb兴 consistently assigned to NG vowels to front or back AE vowels corresponded
front AE vowels, 关+, Ç|兴 more similar to back AE vowels, more closely with acoustic similarity patterns than in
and 关Öb兴 evenly split between front and back AE categories. study 1.
Turning next to the five vowels that were acoustically 共2兲 Do patterns of perceptual assimilation of NG vowels dif-
dissimilar to their AE counterparts in study 1 共关|b, ␧, Çb, Å, *兴 fer from those found in study 1? Specifically, are any of
shown in the middle set of the table兲, the data reveal some the eight NG vowels that were considered good to excel-
differences. All instances of both NG 关␧, Çb兴 were classified lent exemplars of AE categories when produced in
as acoustically most similar to the equivalent AE category. In citation-form syllables, assimilated less well when pro-
addition, more tokens of NG 关|b, Å兴 were classified as similar duced in sentence-length utterances? Conversely, are any
to their AE counterparts than in study 1, while NG 关*兴 was of the six NG vowels that were considered poor exem-
^兴 than to AE 关*兴. In
still more similar acoustically to AE 关 Ç* plars or uncategorizable as any AE categories in syl-
general, then, these vowels were better matches to AE vow- lables, better assimilated when produced in sentence-
els when produced in sentence context than in citation-form length utterances?
syllables. 共3兲 Are long and short NG vowels assimilated to long and
The last set of vowels 共the lower set in the table兲 were short AE vowels more consistently when they are pro-
acoustically very similar to their AE counterparts in study 1. duced and presented in sentence-length utterances? From
Except for NG 关Éb兴, these vowels were most similar acous- our previous research on Japanese listeners’ assimilation
tically to the AE vowel transcribed as ‘‘identical’’ in sentence of AE vowels 共Strange et al., 1998兲, it was expected that
context as well, although there was still some acoustic am- the presence of the rhythmic pattern of the sentence
biguity among the front vowels 关{b, |b, (兴 and the back vowels might perceptually enhance temporal differences in vow-
关Éb, Çb, *兴, reflecting the relative closeness of NG high and els such that temporal similarity to native categories
mid long vowels on the one hand, and the acoustic overlap of might influence perceptual judgements to a greater ex-
AE 关 |(^, (兴 on the other hand. As in study 1, NG overlapping tent.
long and short low vowels 关Äb, ~兴 were acoustically similar
to AE overlapping categories 关Äb, Åb, #兴. 1. Method
In summary, then, in comparison with the results of Twelve native speakers of AE, drawn from the same
study 1, there were some differences in cross-language pool of undergraduate students in introductory Psychology
acoustic similarity patterns for NG and AE vowels produced and Communication Sciences & Disorders classes, served as
in the sentence context. NG 关␧, Å兴 were somewhat better fits listeners. Equipment and procedures were identical to those
to their AE counterparts in sentence context, and NG 关*兴 was described in study 1, except that subjects were told to listen
even more similar to AE 关 Ç* ^兴 . In contrast, NG 关{b, Éb兴 were to the vowel in the target syllable of the sentence and given
somewhat poorer fits to their AE equivalents in sentence ut- familiarization and practice using the sentence materials.
terances. Finally, the NG front rounded vowels tended to be
classified as similar to AE back vowels slightly more often in 2. Results and discussion
sentence context 共53% vs 44% for citation syllables兲. Despite
As in study 1, perceptual assimilation responses were
these minor differences, however, predictions about percep-
tallied across all 12 listeners for each of the 14 vowels and
tual assimilation patterns from acoustic similarity were the
frequencies of selection of each response alternative reported
same as before: 共1兲 front rounded vowels should be assimi-
as percentages of opportunities. Median ratings of category
lated inconsistently to both front and back AE vowels; 共2兲
goodness for each response alternative were also computed.
NG mid long 关|b兴, but not 关Çb兴 should be assimilated incon-
Table VII reports the most frequently selected response alter-
sistently to higher AE vowels, whereas NG 关*兴 should be
native for each vowel, with percentages and median ratings
assimilated to AE 关 Ç* ^ or Éb兴; and 共3兲 there might also be 共columns 2– 4兲 and the second most frequently selected al-
some inconsistency in the assimilation of NG vowels tran-
ternatives, with percentages and ratings 共columns 5–7兲. For
scribed as ‘‘identical,’’ especially NG 关(, Éb兴.
NG 关Äb兴 and 关Å兴, the AE 关Äb兴 and 关Åb兴 response categories
were collapsed 共and percentages combined兲 because many
listeners did not distinguish these vowels in their own dia-
lect. Thus, for these vowels, the response alternatives listed
B. Perceptual similarity of NG and AE vowels in column 5 are the third most frequently selected alterna-
produced in sentence context tive. The NG vowels are clustered into the same three sets as
for study 1: 共a兲 the front rounded vowel 共top兲; 共b兲 the mid
The NG sentence corpus was presented to native speak-
and mid low front and back vowels 关|b, ␧, Çb, Å兴 and 关*兴
ers of AE to assess directly how they would be assimilated to
共middle兲; and 共c兲 the high, mid high front, and low vowels
native AE vowel categories. Questions of interest were as
共bottom兲.
follows.
Two of the front rounded NG vowels 关Ñb, !兴 were quite
共1兲 To what extent are perceptual assimilation patterns 共and consistently assimilated to AE 关Éb, #兴, respectively, but
differences in assimilation from study 1兲 predictable judged as relatively poor exemplars of those categories. The
from the acoustic similarity of NG and AE vowels pro- other two front rounded vowels were also assimilated over-
duced in similar carrier sentences? Of special interest whelmingly to AE back vowels, but not consistently to any
was whether the perceptual similarity of front rounded single AE category. Overall, the percentage of assimilations

J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels 1803
TABLE VII. Perceptual assimilation of NG Vowels to AE Categories: Study 2—Sentences.

Most frequent category 2nd most frequent category

NG AE % Median AE % Median
vowel vowel chosen rating vowel chosen rating

Front rounded Ñb Éb 87 2 * 7 2
Öb Éb 43 3 * 28 2
+ * 56 2 # 24 3
! # 80 4 ␧ 10 4
Mid |b |(
^ 53 6 {b 30 6
Çb Ç*
^ 81 5 Éb 6 2
Mid-low ␧ ␧ 93 6 {b 1 6.5
Å Äb-Åb 64 3 # 23 3
Mid-high back * Ç*
^ 67 5 * 10 2
High {b {b 91 7 Éb 2 5
Éb Éb 72 3 * 11 2
Mid-high ( ( 87 7 ␧ 3 5
Low Ä Äb-Åb 93 6 Ç*
^ 1 6
~ Äb-Åb 74 5 # 17 3

to back AE vowel categories was 96% 共89% to 98% across text 共72% vs 60% in syllable context兲, and NG 关|b兴 actually
the 4 NG vowels兲. This pattern reflected performance on all changed from being assimilated more often to AE 关{b兴 in
four speakers’ utterances. Further, no listener assimilated any syllables to 关 |(
^兴 when syllables were imbedded in sentences.
of the four NG front rounded vowels to front unrounded AE However, 4 of these vowels 关|b, *, Öb, +兴 were still percep-
categories a majority of the time. The assimilation of NG tually ambiguous 共⬍75% categorization consistency兲 and/or
关!兴 to AE 关␧兴 ranged across listeners from 0% to 36% rated as poor exemplars of any AE vowel 共Mdn rating⬍4).
(median⫽7%). In this study, as before, perceptual assimilation patterns
The second set of vowels varied considerably with re- suggested that spectral similarity was more important than
spect to consistency in assimilation and judged goodness to
temporal similarity in determining to which AE categories
AE counterparts. As in study 1, while NG 关Çb兴 was quite
NG vowels would be assimilated. NG long vowels were as-
consistently perceived as a fair exemplar of its AE counter-
similated to AE long vowels on 89% of trials, while NG
part, NG 关|b兴 was assimilated to AE 关 |( ^兴 more consistently
than in study 1, but was still often perceived as more similar short vowels were assimilated to short AE vowels on only
to AE 关{b兴; note, however, that the 关|b兴 tokens were judged as 61% of trials. The poorer temporal assimilation of short vow-
els was due to NG short 关*, Å, ~兴 being assimilated more
relative good exemplars of either 关 |(^兴 or 关{b兴. NG 关␧兴 was
consistently assimilated as an excellent exemplar of its AE often to spectrally similar AE long 关 Ç*
^, Åb, Äb兴, respectively.
counterpart, while NG 关Å兴 was assimilated as a relatively Thus, temporal assimilation patterns were not consistently
poor exemplar of AE 关Äb-Åb兴 or 关#兴. Finally, NG 关*兴 was most better here than in Study 1.
^兴 , or as AE 关*,
often assimilated as a fair exemplar of AE 关 Ç* As in study 1, acoustic similarity patterns were not pre-
#, Äb兴. dictive of perceptual assimilation of front rounded NG vow-
Of the last set of vowels, NG 关{b, Äb兴 were very consis- els to front and back AE vowels 共upper sets of Tables VI and
tently perceived as excellent exemplars of their AE counter- VII兲. In the sentence context, the front rounded vowels were
parts, while NG 关Éb兴 was less consistently assimilated as a even more consistently categorized as more similar to back
poor exemplar of the acoustically more ‘‘fronted’’ AE 关Éb兴 or rounded AE vowels, although goodness ratings indicated that
关*兴. As in study 1, NG 关(兴 was quite consistently perceived as they were heard as somewhat poorer exemplars of those cat-
an excellent exemplar of AE 关(兴, and NG 关~兴 was assimilated egories than were NG back rounded vowels. For mid and
more often as a fair exemplar of long AE 关Äb-Åb兴 than to short mid-low NG vowels and 关*兴 共middle sets兲, acoustic similar-
AE 关#兴.
ity patterns predicted modal perceptual assimilation catego-
The pattern of perceptual assimilation in Table VII dif-
ries for all five vowels, although the minority acoustic and
fered somewhat from that reported in study 1 共Table III兲. The
perceptual classifications sometimes differed. The fit be-
first eight NG vowels 关{b (, ␧, Äb, a, Å, Çb, Éb兴, which were
perceived as good to excellent exemplars of equivalent AE tween acoustic and perceptual similarity for these vowels
categories in study 1, were somewhat less consistently as- was, in general, better when produced and presented in sen-
similated to their AE counterparts in sentence context 共82%兲 tences than when produced and presented in citation-form
than in syllable context 共92%兲. This was especially true for syllables. Finally, for high, mid-high front, and low vowels
NG 关Å兴 and 关Éb兴 that were perceived as relatively poor exem- 共bottom sets兲, acoustic similarity patterns predicted modal
plars in the sentence context. Of the six NG vowels 关|b, *, Ñb, perceptual assimilation categories for all vowels except NG
+, Öb, !兴, which had yielded inconsistent assimilation pat- 关Éb兴, although there were again some discrepancies between
terns in study 1, four 关|b, *, Ñb, !兴 were perceived more minority acoustic and perceptual classifications. The catego-
consistently when produced and presented in sentence con- rization of NG 关Éb兴 to its AE counterpart was better than was

1804 J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels
predicted from spectral similarity patterns. However, poor or 关*兴, although 关Öb兴 was very poor fit to both categories.
goodness ratings may have reflected the listeners’ perception According to the predictions of Best’s PAM, we would ex-
of this vowel as quite different from their native 关Éb兴. pect that 关Ñb-Éb兴 would present the most difficulty 共the single
category type in sentences, category-goodness type in syl-
IV. GENERAL DISCUSSION lables兲, while 关Öb-Éb兴 and 关Öb-Ñb兴 might be discriminated
The results of these two studies suggest that neither rather better 共uncategorizable–categorizable type兲. The
context-dependent comparisons of vowel production 共as NG mid-high short vowels 关+, !, *兴 might show a similar
specified by spectral similarity patterns兲 nor context- pattern, although modal responses differed across these vow-
independent impressionistic descriptions of vowel invento- els. Thus, we might predict that these vowels would be easier
ries capture all the relevant information necessary to account to discriminate than the long vowels. Polka 共1995兲 reported
for perceived similarities of vowels by non-native listeners. the opposite result; that is, AE listeners’ discrimination of the
One weakness of this study was that the phonetic and syl- short front versus back rounded pair was poorer than for the
labic contexts did vary somewhat across languages such that long pair. Note, however, that the speaker in Polka’s study
durational information could not be compared. However, re- spoke a Southern German dialect. Since Northern and South-
search currently being completed in our laboratory which ern dialects are considerably different in both the spectral
compares the acoustic and perceptual similarity of NG and and temporal structure of vowels, we cannot generalize
AE vowels in exactly the same citation context 共hVba兲 is across studies. We might also expect perceptual confusions
yielding very similar results 共Strange et al., 2002, 2004兲. among NG 关Öb-+兴 because of their spectral overlap and the
Thus, we are confident that the discrepancies between cross- apparent lack of attention to vowel duration by AE listeners.
language acoustic similarity and perceived similarity patterns The NG low vowel pair 关Äb-~兴 would also be expected to
for front rounded vowels and for mid and mid-low vowels cause perceptual difficulties for AE learners of German.
reported here were not due to these minor differences in These vowels are distinguished acoustically almost entirely
consonantal/syllabic context. by duration, which AE listeners apparently ignored in mak-
Given these differences in acoustic and perceived simi- ing perceptual assimilation judgments. In many contexts,
larity, the direct assessment of perceptual assimilation pat- these vowels would be moderately difficult to discriminate
terns will be necessary if we are to anticipate perceptual 共Category Goodness Type兲 unless AE learners were to be-
learning problems of L2 learners. Previous research on Japa- come aware of their distinctive temporal differences. Finally,
nese listeners’ perceptual assimilation of AE vowels 共Strange the NG vowels 关|b-{b兴 might be expected to cause moderate
et al., 1998, 2001兲 showed that the perceived similarity of discrimination problems, even though these vowels are pho-
vowels may be influenced significantly by the prosodic con- nologically distinctive in both languages. Gottfried 共1984兲
text and the immediate phonetic context in which the vowels reported that AE learners of French had difficulty with this
are produced and presented. A comparison of results of stud- ‘‘native’’ contrast. In both German and French, 关|b兴 is not
ies 1 and 2 also suggest some differences in the perceptual diphthongized, and the present study replicated previous
similarity of NG and AE vowels as a function of the prosodic findings that NG 关|b兴 and 关{b兴 are acoustically more similar in
context: 共1兲 NG front rounded vowels were assimilated more
the ‘‘target’’ formant structure than their AE counterparts.
consistently within and across listeners to back rounded AE
Perceptual assimilation results confirmed this pair as a Cat-
vowels when produced and heard in a sentence context, de-
egory Goodness type. It is interesting to note that while the
spite their acoustic ambiguity in both contexts. 共2兲 Except for
NG German pair 关Çb-Éb兴 also overlap spectrally and are both
关 |b兴 , the mid and mid-low NG vowels were perceived as less
monophthongal, they were more often perceptually assimi-
similar to their AE counterparts in the sentence context, de-
lated to different AE categories 共Two-Category Type兲. On
spite the fact that they were more similar acoustically to AE
this basis, we would predict that the front vowel contrast
vowels in that context. 共3兲 The vowels 关{b, Äb兴 and 关(兴 were
consistently perceived as similar to their AE counterparts in would be more difficult to learn than the back vowel contrast
both prosodic contexts, while NG 关Éb兴 was less consistently for AE learners of German. We would predict few problems
assimilated to AE 关Éb兴 and judged a relatively poor exemplar, with the perception of NG 关{b, (, ␧, Äb, Éb兴. However, NG 关␧,
especially in the sentence context. These differences as a Éb兴 might be produced with an ‘‘accent’’ since target formant
function of the prosodic context point to the importance of frequencies differ across languages.
examining cross-language perceptual similarity with materi- In conclusion, the studies reported here document the
als that more closely resemble continuous speech. cross-language phonetic similarity of North German and
On the basis of the findings reported here, we can pre- American English vowels, using both direct measures of per-
dict that native speakers of AE who are beginning to learn ceptual similarity, and acoustic comparisons of AE and NG
German as an L2 will have considerable difficulty with sev- vowels produced in both citation-form 共careful兲 and sentence
eral NG vowel contrasts. As previous research has suggested 共continuous, more casual兲 contexts. Perceptual assimilation
共Polka, 1995; Polka and Bohn, 1996; see also Gottfried, results by AE speakers with no previous experience with
1984; Rochet, 1995 for French兲, NG front rounded vowels German yielded patterns of cross-language similarity that
will be confused with back rounded vowels, even in non- differed from those predicted either from abstract impres-
coronal contexts, because both are assimilated perceptually sionistic descriptions of the phonetic inventories or from
to back rounded AE vowel categories. In this study, the NG context-dependent spectral similarity patterns. Perceptual as-
vowels 关Ñb, Öb, Éb兴 were all assimilated most often to AE 关Éb兴 similation of some NG vowels varied considerably with pro-

J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels 1805
sodic context, suggesting that claims about cross-language Best, C. T. 共1994兲. ‘‘The emergence of native-language phonological influ-
perceptual similarity of vowels 共and perhaps consonants as ences in infants: A perceptual assimilation model,’’ in The Development of
Speech Perception: The Transition from Speech Sounds to Spoken Word,
well兲 based on citation-form syllables may be limited in their
edited by J. Goodman and H. C. Nusbaum 共MIT Press, Cambridge, MA兲,
generalizability to continuous speech materials. From these pp. 167–224.
comparisons of acoustic and perceptual similarity of NG and Best, C. T. 共1995兲. ‘‘A direct realist view of cross-language speech percep-
AE vowels, we can conclude that listeners are able to make tion,’’ in Speech Perception and Linguistic Experience: Issues in Cross-
Language Research, edited by W. Strange 共York Press, Timonium, MD兲,
fine-grained phonetic judgments about the cross-language
pp. 171–204.
similarity of vowels if they are presented in citation-form Best, C. T., Faber, A., and Levitt, A. 共1996兲. ‘‘Assimilation of non-native
materials or in sentences in which the vowels occur in a fixed vowel contrasts to the American English vowel system,’’ J. Acoust. Soc.
consonantal context. Judgments of category goodness re- Am. 99, 2602.
Best, C. T., and Strange, W. 共1992兲. ‘‘Effects of language-specific phono-
flected listeners’ awareness of the allophonic inappropriate-
logical and phonetic factors on cross-language perception of approxi-
ness of the non-native vowels. mants,’’ J. Phonetics 20, 305–330.
Future studies in which the immediate phonetic context Bohn, O.-S. 共1995兲. ‘‘Cross-language speech perception in adult: First lan-
varies unpredictably need to be performed to address several guage transfer doesn’t tell it all,’’ in Speech Perception and Linguistic
Experience: Issues in Cross-Language Research, edited by W. Strange
remaining issues: 共1兲 how acoustic and perceptual similarity
共York Press, Timonium, MD兲, pp. 279–304.
of NG and AE vowels vary as a function of consonantal Bohn, O.-S., and Flege, J. E. 共1990兲. ‘‘Interlingual identification and the role
context, and 共2兲 whether category goodness judgments re- of foreign language experience in L2 vowel perception,’’ Appl. Psychol-
flect an awareness of context-specific allophonic differences ing. 11, 303–328.
Crystal, T. H., and House, A. S. 共1988a兲. ‘‘Segmental duration in connected-
in cross-language similarity when the phonetic context varies
speech signals: Current results,’’ J. Acoust. Soc. Am. 83, 1553–1573.
from trial to trial 共more similar to natural speaking situa- Crystal, T. H., and House, A. S. 共1988b兲. ‘‘Segmental duration in connected-
tions兲. Answers to these questions will begin to address the speech signals: Syllabic stress,’’ J. Acoust. Soc. Am. 83, 1574 –1585.
nature of the underlying representations of native vowel cat- Flege, J. E. 共1987兲. ‘‘The production of ‘‘new’’ and ‘‘similar’’ phones in a
foreign language: Evidence for the effect of equivalence classification,’’ J.
egories and how those categories are accessed when listeners Phonetics 15, 47– 65.
are making judgements about non-native vowels. Flege, J. E. 共1995兲. ‘‘Second language speech learning: Theory, findings,
and problems,’’ in Speech Perception and Linguistic Experience: Issues in
Cross-Language Research, edited by W. Strange 共York Press, Timonium,
MD兲 pp. 233–277.
Flege, J. E., and Hillenbrand, J. 共1984兲. ‘‘Limits of phonetic accuracy in
foreign language speech production,’’ J. Acoust. Soc. Am. 76, 708 –721.
ACKNOWLEDGMENTS Gottfried, T. L. 共1984兲. ‘‘Effects of consonantal context on the perception of
French vowels,’’ J. Phonetics 12, 91–114.
This research was completed on a research grant to the Guion, S. G., Flege, J. E., Akahane-Yamada, R., and Pruitt, J. C. 共2000兲.
first author from the NIH 共NIDCD—00323兲. The authors ‘‘An investigation of current models of second language speech percep-
tion: the case of Japanese adults’ perception of English consonants,’’ J.
wish to acknowledge the contributions of students and col- Acoust. Soc. Am. 107, 2711–2724.
leagues who helped with the evaluation of stimuli and analy- Hillenbrand, J. M., Clark, M. J., and Houde, R. A. 共2000兲. ‘‘Some effects of
sis of results: Katherine Bielic, William Clarke III, Saratha duration on vowel recognition,’’ J. Acoust. Soc. Am. 108, 3013–3022.
Kumarasamy, Thorsten Piske, Melissa Sedda, David Thorn- Hillenbrand, J. M., Clark, M. J., and Nearey, T. M. 共2001兲. ‘‘Effects of
consonant environment on vowel formant patterns,’’ J. Acoust. Soc. Am.
ton, and James J. Jenkins. 109, 748 –763.
Klecka, W. R. 共1980兲. Discriminant Analysis 共Sage Publication, Newbury
1
Tense and lax 共close–open兲 German vowels will hereafter be referred to as Park, CA兲.
long and short, respectively. The diacritic 关b兴 will be used to designate the Kohler, K. J. 共1981兲. ‘‘Contrastive phonology and the acquisition of pho-
long vowels. netic skills,’’ Phonetica 38, 213–226.
2
Many studies of AE vowels have used hVd syllables, primarily because Levy, E., and Strange, W. 共2002兲. ‘‘Effects of consonantal context on per-
most of the vowels in this context form real English words. However, since ception of French rounded vowels by American English adults with and
the final /$/ may influence vowel tongue position, it was decided to use a without French language experience,’’ J. Acoust. Soc. Am. 111, 2361.
labial final consonant that has minimal effects on ‘‘target’’ formant posi- Peterson, G. E., and Lehiste, I. 共1960兲. ‘‘Duration of syllable nuclei in En-
tions 共Stevens and House, 1963; Hillenbrand, Clark, and Nearey, 2001; glish,’’ J. Acoust. Soc. Am. 32, 693–703.
Strange et al., 2002兲. Polka, L. 共1995兲. ‘‘Linguistic influences in adult perception of non-native
3
The formula used to compute Barks was as follows: 13⫻Arctan(0.76 vowel contrasts,’’ J. Acoust. Soc. Am. 97, 1286 –1296.
⫻Hz/1000)⫹3.5⫻Arctan„(Hz/1000/7.5) 2 …. Polka, L., and Bohn, O.-S. 共1996兲. ‘‘A cross-language comparison of vowel
4 perception in English-learning and German-learning infants,’’ J. Acoust.
Listeners were instructed that they could change their categorization re-
sponse after the second presentation of each stimulus before judging it as Soc. Am. 100, 577–592.
native or foreign sounding. However, participants almost never chose this Rochet, B. L. 共1995兲. ‘‘Perception and production of second-language
option. Thus, we can assume that their first categorization response re- speech sounds by adults,’’ in Speech Perception and Linguistic Experi-
flected similarity of the second presentation as well as the first. Two iden- ence: Issues in Cross-Language Research, edited by W. Strange 共York
tical presentations of each stimulus were included so that goodness judg- Press, Timonium, MD兲, pp. 379– 410.
ments could be made immediately after hearing the stimulus rather than at Steinlen, A. 共2002兲. ‘‘A cross-language comparison of the effects of conso-
a delay. In this way, we hoped that listeners’ responses would more accu- nantal context on vowels producted by native and non-native speakers,’’
rately reflect detailed phonetic comparison with ‘‘stored’’ representations of unpublished doctoral dissertation, Aarhus University, Denmark.
native vowel categories. Steinlen, A., and Bohn, O.-S. 共1999兲. ‘‘Acoustic studies comparing Danish
5 vowels, British English vowels, and Danish-accented British English vow-
In many dialects of American English, the /Ä-Å/ contrast is partially or
completely neutralized. Thus, many of our listeners found it very difficult els,’’ J. Acoust. Soc. Am. 105, 1097.
to differentiate these vowels in the familiarization materials and when using Stevens, K. N., and House, A. S. 共1963兲. ‘‘Perturbation of vowel articula-
the key words to indicate their perceptual assimilation responses. We thus tions by consonantal context: An acoustical study,’’ J. Speech Hear. Res. 6,
collapsed these response categories for all perceptual assimilation data 111–128.
analysis. Strange, W. 共1989兲. ‘‘Dynamic specification of coarticulated vowels spoken

1806 J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels
in sentence context,’’ J. Acoust. Soc. Am. 85, 2135–2153. American English vowels by Japanese listeners,’’ J. Acoust. Soc. Am. 109,
Strange, W., and Bohn, O.-S. 共1998兲. ‘‘Dynamic specification of coarticu- 1691–1704.
lated German vowels: perceptual and acoustical studies,’’ J. Acoust. Soc. Strange, W., Weber, A., Levy, E., Shafiro, V., and Nishi, K. 共2002兲. ‘‘Within-
Am. 104, 488 –504. and across-language acoustic variability of vowels spoken in different
phonetic and prosodic contexts: American English, North German, and
Strange, W., Akahane-Yamada, R., Kubo, R., Trent, S. A., Nishi, K., and
Parisian French,’’ J. Acoust. Soc. Am. 112, 2384.
Jenkins, J. J. 共1998兲. ‘‘Perceptual assimilation of American English vowels Strange, W., Levy, E., and Lehnhoff, R. 共2004兲. ‘‘Perceptual assimilation of
by Japanese listeners,’’ J. Phonetics 26, 311–344. French and German vowels by American English listeners: Acoustic simi-
Strange, W., Akahane-Yamada, R., Kubo, R., Trent, S. A., and Nishi, K. larity does not predict perceptual similarity,’’ Journal of the Acoustical
共2001兲. ‘‘Effects of consonantal context on perceptual assimilation of Society of America 共abstract forthcoming兲.

J. Acoust. Soc. Am., Vol. 115, No. 4, April 2004 Strange et al.: Similarity of German and American vowels 1807

You might also like