You are on page 1of 14

Acoustic and perceptual similarity of Japanese and American

English vowelsa)

Kanae Nishib兲 and Winifred Strange


Ph.D. Program in Speech and Hearing Sciences, City University of New York—Graduate School
and University Center, New York, New York 10016

Reiko Akahane-Yamada and Rieko Kubo


Cognitive Information Science Laboratory, Advanced Telecommunications Research Institute International
and ATR Learning Technology Corporation, Kyoto 619-0288, Japan

Sonja A. Trent-Brownc兲
Department of Psychology, University of South Florida, Tampa, Florida 33620
共Received 2 March 2007; revised 31 March 2008; accepted 19 April 2008兲
Acoustic and perceptual similarities between Japanese and American English 共AE兲 vowels were
investigated in two studies. In study 1, a series of discriminant analyses were performed to
determine acoustic similarities between Japanese and AE vowels, each spoken by four native male
speakers using F1, F2, and vocalic duration as input parameters. In study 2, the Japanese vowels
were presented to native AE listeners in a perceptual assimilation task, in which the listeners
categorized each Japanese vowel token as most similar to an AE category and rated its goodness as
an exemplar of the chosen AE category. Results showed that the majority of AE listeners assimilated
all Japanese vowels into long AE categories, apparently ignoring temporal differences between 1-
and 2-mora Japanese vowels. In addition, not all perceptual assimilation patterns reflected
context-specific spectral similarity patterns established by discriminant analysis. It was
hypothesized that this incongruity between acoustic and perceptual similarity may be due to
differences in distributional characteristics of native and non-native vowel categories that affect the
listeners’ perceptual judgments. © 2008 Acoustical Society of America. 关DOI: 10.1121/1.2931949兴
PACS number共s兲: 43.71.Hw, 43.70.Kv, 43.71.Es, 43.70.Fq 关MSS兴 Pages: 576–588

I. INTRODUCTION that of the L2 influences perception. In addition, both models


suggest that differences in the phonetic realization 共gestural
Studies of cross-language speech perception have docu- and acoustic-phonetic details兲 of the “same” phonological
mented that the ease with which listeners can perceive spe- segments in the two languages must be taken into consider-
cific contrasts among speech sounds is shaped by their lin- ation in establishing cross-language similarities, as well as
guistic environment during the early years of life 共e.g., Best
the token variability within similar phonetic categories in the
et al., 1988; Werker and Tees, 1984兲. Such early language-
two languages—whether due to speaking styles, consonantal/
specific shaping in speech perception leads to great difficulty
prosodic contexts, or individual speakers.
in learning a new phonological system as adults.
In order to explain the complex processes involved in Thus far, numerous studies of vowels have attempted to
cross-language speech perception and to predict difficulty in provide evidence for PAM and SLM, but the majority has
both perception and production experienced by adult second focused on how listeners from an L1 with fewer vowel cat-
language 共L2兲 learners, several models have been proposed. egories assimilate vowels from larger L2 inventories that in-
Among those, of particular interest here are the perceptual clude vowels that do not occur as distinctive categories in the
assimilation model 共PAM兲 by Best 共1995兲 and the speech L1. The present study examined perceptual assimilation of
learning model 共SLM兲 by Flege 共1995兲. Both PAM and SLM L2 vowels from a small inventory by listeners whose L1
propose somewhat similar mechanisms in which learners’ included more vowels differing in “quality” 共spectral charac-
native phonological system acts as a sieve in processing non- teristics兲, but for whom vowel length was not phonologically
native phones. Both claim that the similarity between pho- contrastive, namely, perception of Japanese vowels by
netic inventories of the learners’ native language 共L1兲 and American English 共AE兲 listeners.
a兲
The subsequent four sections present the two models in
Portions of this work were presented in “Perceptual assimilation of Japa- more detail, briefly describe the vowel systems of AE and
nese vowels by American English listeners: effects of speaking style,” at
the 136th meeting of the Acoustical Society of America, Norfolk, VA, Japanese, provide summaries of previous studies that exam-
October 1998. ined Japanese listeners’ perception of AE vowels 共Strange et
b兲
Present address: Boys Town National Research Hospital, 555 North 30th al., 1998; 2001兲 and AE listeners’ use of vowel duration in
Street, Omaha, NE 68131; electronic mail: nishik@boystown.org
c兲
Present address: Psychology Department, Hope College, 35 E. 12th Street, perceptual assimilation of German vowels, and present the
Holland, MI 49423. design and the research questions for the present study.

576 J. Acoust. Soc. Am. 124 共1兲, July 2008 0001-4966/2008/124共1兲/576/13/$23.00 © 2008 Acoustical Society of America
A. Perceptual assimilation model and speech 共i.e., allophonic variation兲 and token variability contributed
learning model by speakers, speaking rate, and speaking style differences
PAM 共Best, 1995兲 has its basis in the direct realist view 共cf. Schmidt, 1996 for consonants; Gottfried, 1984 for vow-
of speech perception and hypothesizes that listeners directly els兲. This causes difficulties even for contrasts that may be
perceive articulatory gestures from information specifying distinctive in both languages but differ in phonetic detail
those gestures in the speech signal. It predicts relative diffi- across languages 共cf. Pruitt et al., 2006, for perception of
culty in perceiving distinctions between non-native phones in Hindi 关d-t兴 and 关dh-th兴 by AE listeners and Japanese listeners;
terms of perceptual assimilation to and category goodness of Rochet, 1995, for perception of French vowels 关i-y-u兴 by AE
the contrasting L2 phones with respect to L1 categories. Ac- listeners and Brazilian Portuguese listeners兲. However, the
cording to PAM, if contrasting L2 phones are perceptually majority of these reports are based on assimilation of more
assimilated to two different L1 categories 共two-category pat- L2 categories into fewer L1 categories 共“many-into-few” as-
tern兲, their discrimination should be excellent. However, if similation兲, and at present, it is not known whether PAM or
they are assimilated to a single L1 category as equally good SLM can account for the cases where learners are required to
共or poor兲 instances 共single-category pattern兲, they are pre- assimilate fewer L2 categories into a more differentiated L1
dicted to be very difficult to discriminate. On the other hand, phonological inventory 共“few-into-many” assimilation兲.
if two L2 phones are assimilated to a single L1 category but
are judged to differ in their “fit” to that category 共category
goodness pattern兲, their discrimination will be more accurate
B. Japanese and American English vowel inventories
than single-category pairs but not as good as for two-
category pairs. If both phones are perceived as speech sounds According to traditional phonological descriptions of
but cannot be assimilated consistently to any L1 category Japanese and AE vowel inventories, the two languages differ
共uncategorizable兲, their discrimination will vary depending markedly in their use of quality 共tongue, lip, and jaw posi-
on their phonetic similarity to each other and perceptual tions兲 and quantity 共duration of vocalic gestures兲 to differen-
similarity to the closest L1 categories. Finally, there can be tiate vowels. Japanese has a “sparse” system with five dis-
cases where one L2 phone falls into an L1 category and the tinctive vowel qualities 关i, e, a, o, %兴 which form five long
other falls outside the L1 phonological space 共categorizable- 共2-mora兲-short 共1-mora兲 pairs 共Homma, 1992; Ladefoged,
uncategorizable pattern兲. In such a case, PAM predicts their 1993; Shibatani, 1990兲. These vowels vary in height 共three
discrimination to be very good.1 levels兲 and backness 共front versus back兲. Thus, vowel quan-
Although much of Flege and colleagues’ work has fo- tity or duration is phonologically contrastive, and phoneti-
cused on the accentedness of L2 productions by inexperi- cally, long-short pairs are reported to be very similar in spec-
enced and experienced L2 learners, his SLM 共Flege, 1995兲 tral structure 共Hirata and Tsukada, 2004兲. 共Hereafter, the
also considers the effects of cross-language phonetic simi- Japanese vowels will be transcribed as short 关i, e, a, o, %兴
larities in predicting relative difficulties in learning both to and long 关ii, ee, aa, oo, %%兴.兲 Only the midback vowels 关o,
perceive and produce L2 phonetic categories. SLM hypoth- oo兴, are rounded, and all vowels are monophthongal. In ad-
esizes that L2 learners initially perceptually assimilate L2 dition, the central and back vowels have distinctive palatal-
phones into their L1 phonological space along a continuum ized forms 关 ja共a兲 , jo共o兲 , j%共%兲兴 in the majority of consonan-
from “identical” through “similar” to “new” 共equivalence tal contexts.
classification兲 based on L2 phones’ phonetic similarities to In contrast to Japanese, AE has a relatively “dense”
L1 categories. If an L2 phone is “identical” to an L1 cat- vowel system, described as including 10-11 spectrally dis-
egory, L1 patterns continue to be used, resulting in relatively tinctive, nonrhotic vowels 关ib, (, e(, ␧, æb, Äb, #, Åb, o*, *, ub兴
little difficulty in perception and production. At the other that vary in height 共five levels兲 and backness 共front versus
extreme, if an L2 phone is phonetically “new” 共very different back兲, one rhotic vowel 关É兴, and three true diphthongs 关a(, Å(,
from any L1 category兲, it will not be perceptually assimilated a*兴. The mid, low to high, back vowels 关Åb, o*, *, ub兴 are
to any L1 category and eventually a new category will be rounded, and 关ub兴 can be palatalized in limited phonetic con-
created that guides both perception and production of the L2 texts 共e.g., 关 ju兴 in “view, pew, cue”兲 and is allophonically
phone. Therefore, perception and production will be rela- fronted in coronal contexts 共Hillenbrand et al., 1995; Strange
tively accurate for “new” L2 phones after some experience et al., 2007兲. Although to a lesser extent than the true diph-
with the L2. However, if an L2 phone is phonetically “simi- thongs, the mid, front 关e(兴 and mid, back 关o*兴 are diph-
lar” to an L1 category, it will continue to be assimilated as a thongized in slow speech and open syllable contexts, and
member of that L1 category. According to SLM, this pattern several others of the so-called monophthongs show “vowel-
may lead to persistent perception and production difficulties inherent spectral change” in some contexts and some dialects
because even though there may be a mismatch between L1 共Hillenbrand et al., 1995; Nearey, 1989; Stack et al., 2006兲.
and L2 phones, learners continue to use their L1 category in In many dialects of AE, the distinction between 关Äb, Åb兴 has
perception and production. If two L2 phones are assimilated been neutralized to a low, slightly rounded 关"b兴. Finally,
as “identical” or “similar” to a single L1 category, discrimi- while vowel length is not considered phonologically distinc-
nation is predicted to be difficult, just as for PAM. tive in AE, phonetically, “intrinsic duration” of AE vowels
Studies have shown that the relative difficulty that late varies systematically 共Peterson and Lehiste, 1960兲, with
L2 learners have in discriminating L2 consonant and vowel seven long vowels 关ib, e(, æb, Äb, Åb, o*, ub兴 and four short
contrasts varies considerably due to the phonetic context vowels 关(, ␧, #, *兴. Vocalic duration also varies allophonically

J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels 577
as a function of the voicing of the following consonant, the condition 共42%兲. This indicated that Japanese listeners were
syllable structure, and other phonetic and phonotactic vari- highly attuned to vowel duration when they were given a
ables 共Klatt, 1976; Ladefoged, 1993兲. prosodic context in which to judge the 共small兲 relative dura-
tion differences. As for spectral assimilation patterns, consis-
tent patterns were observed for four vowels 关ib, Äb, *, ub兴 in
C. Previous cross-language vowel perception studies both conditions. Six vowels 关(, e(, ␧, æb, o*, Åb兴 were assimi-
Although they differ in detail, both PAM and SLM hy- lated primarily to two Japanese spectral categories, but pat-
pothesize that L2 learners’ perception and production error terns changed between citation and sentence conditions for
patterns can be predicted from the phonetic similarity be- the first five vowels, reflecting the influence of speaking
tween L1 and L2 phones because they posit that L2 learners style. Finally, 关#兴 was assimilated to more than two Japanese
initially attempt to perceptually assimilate L2 phones to L1 categories, suggesting its “uncategorizable” status.
categories. However, it is important to establish phonetic In Strange et al. 共2001兲, only sentence materials were
similarity, independent of discrimination difficulties in order used but vowels were produced in six consonantal contexts
to make these claims noncircular. In some studies, Best and 关b-b, b-p, d-d, d-t, g-g, g-k兴. Similar to Strange et al. 共1998兲,
her colleagues describe phonetic similarity in terms of ges- none of the 11 AE vowels were consistently assimilated into
tural features 共degree and place of constriction by the tongue, a single Japanese category, with the pattern varying with
lip postures, velar gestures兲, using rather abstract definitions context and/or speaker for six vowels. Assimilation of long
of gestural characteristics 共e.g., Best et al., 2001兲. In many of AE vowels to 2-mora Japanese categories also varied with
Flege’s studies of vowels, acoustic-phonetic similarity has speakers and contexts. More AE vowels were assimilated to
been established through comparisons of formant frequen- 2-mora categories when followed by voiced consonants, in-
cies in an F1/F2 space 共e.g., Fox et al., 1995; Flege et al., dicating that Japanese listeners attributed 共lengthened兲 vo-
1999兲. Best, Flege, and other researchers working within calic duration to the vowels rather than to the consonants.
these theoretical frameworks have also employed techniques Strange et al. 共2004; 2005兲 explored AE listeners’ as-
that directly assess cross-language perceptual similarity of similation of North German 共NG兲 vowels. NG has 14 vowel
L2 phones, using either informal L1 transcriptions by non- qualities that form seven spectrally adjacent long-short pairs.
native listeners or a cross-language identification or so-called However, the spectral overlap between the long-short pairs is
perceptual assimilation task 共see Strange, 2007 for a detailed less marked than for Japanese, and the duration differences
description of these techniques兲. In our laboratory, listeners between spectrally similar long-short vowels 共1.9 in citation,
perform a perceptual assimilation task in which they are 1.5 in sentence materials兲 are smaller than for Japanese but
asked to label 共using familiar key words兲 multiple tokens of greater than for AE. Results of perceptual assimilation tests
each L2 vowel as most similar to 共underlying representations showed that AE listeners categorized the 14 NG vowels pri-
of兲 particular L1 vowel categories, and to rate their “category marily according to their spectral similarity to AE vowels,
goodness” as exemplars of the L1 category they chose on a and largely ignored vowel duration differences.
Likert scale 共e.g., 1 = very foreign sounding; 7 = very native
sounding兲. This technique has been successfully used in pre-
D. The present study
vious studies that investigated perceptual assimilation of
larger L2 vowel inventories by listeners with smaller L1 The present study is part of a series investigating the
vowel inventories 共Strange et al., 1998; 2001; 2004; 2005兲, influence of token variability on cross-language vowel per-
but has never been used to assess assimilation of L2 vowels ception. Thus far, as described above, we have investigated
by listeners from L1s with more vowels. The following is a many-into-few assimilation patterns by systematically ma-
summary of those previous assimilation studies. nipulating sources of variation 共e.g., speaking style and con-
Strange et al. 共1998兲 examined the influence of speaking sonantal context兲. The present study is the first of few-into-
style and speakers 共i.e., 关hVb共.兲兴 syllables produced in lists many assimilation studies and focused on the influence of
关citation condition兴 versus spoken in continuous speech 关sen- speaking style. Other factors, such as consonantal context,
tence condition兴兲 on the assimilation of 11 AE vowels 关ib, (, speaking rate, and prosodic context, are left for future stud-
e(, ␧, æb, Äb, #, Åb, o*, *, ub兴 by Japanese listeners living in ies.
Japan at the time of testing. Acoustic analysis showed that As in Strange et al. 共1998兲, the effects of token variabil-
the duration ratios between long and short AE vowels were ity were investigated using two utterance forms 共i.e., citation
the same for both conditions 共1.3兲, but the absolute durations and sentence兲 produced by multiple speakers, while phonetic
of vowels in sentences were somewhat longer than those in context was held constant 共关hVb共a兲兴兲. This specific conso-
citation form. The results of perceptual assimilation tests nantal context was chosen because in Japanese and AE, both
showed that none of the 11 AE vowels were assimilated ex- 关h兴 and 关b兴 have minimal coarticulatory influence on the
tremely consistently 共i.e., with ⬎90% consistency within and spectral characteristics of the vocalic nucleus. Thus, the vow-
across listeners兲 into a single Japanese category in either els in these materials should reflect their “canonical” spectral
condition. Japanese listeners assimilated short AE vowels in structure that, in turn, specifies their “articulatory targets.” In
both conditions primarily into 1-mora Japanese vowels 共94% addition, the CVCV disyllables are phonotactically appropri-
in citation versus 83% in sentence兲, whereas the proportion ate in both languages, and the consonants are very similar.
of long AE vowels assimilated to 2-mora Japanese vowels The acoustic measurements of Japanese vowels are consid-
was twice as great in sentence 共85%兲 condition as in citation ered of archival value because no study has compared the

578 J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels
distributional characteristics of Japanese and American vow- A. Method
els in comparable materials, even though averages of spectral 1. Speakers and stimulus materials
and duration measures for Japanese vowels have been re-
Four adult Japanese males 共speakers 1–4; 32– 39 yr old兲
ported 共e.g., Han, 1962; Homma, 1981; Keating and Huff-
served as speakers. All were native speakers of Tokyo dialect
man, 1984; Hirata and Tsukada, 2004兲. While the AE corpus
and resided in Japan all of their lives. All had at least eight
was partially described in our previous studies 共Strange et
years of English language education beginning in the seventh
al., 1998; 2004兲, the Japanese corpus has not been described
grade, but in the Japanese school system, little emphasis was
in any published studies.
placed on the spoken language. They had no training in pho-
Based on the phonetic and phonological differences be-
netics and spoke very little English in their daily life. They
tween AE and Japanese vowel inventories described above, it
reported no hearing or speech problems.
was predicted that most of the Japanese vowels would spec-
Stimuli were five long/short pairs of Japanese vowels
trally overlap one or more AE vowels. Because AE has twice
关ii/i, ee/e, a/a, oo/o, %%/%兴. These vowels were embedded
as many spectral categories as Japanese, the SLM would
in a nonsense 关hVba兴 disyllable and recorded in two speak-
predict that all Japanese vowels should be assimilated into
ing styles. First, these disyllables were read singly in lists
some AE category as “identical” or “similar” and none as
共citation form兲, each preceded by an identifying number and
“new” with relatively high category goodness ratings. PAM
a pause, and spoken in falling intonation. Second, each
would predict that uncategorizable or categorizable-
关hVba兴 disyllable was embedded in a carrier sentence
uncategorizable patterns should not be observed, but other
共sentence form兲, Kore wa 关hVba兴 desu ne
patterns may occur as a function of token variability within a
关koTewa#hគba#des共%兲ne兴, which translates to “This is
particular Japanese category as manifested as the difference
关hVba兴, isn’t it?” Each speaker produced four randomized
in category goodness ratings.
lists of 10 vowels for each form. Only the last three readings
were used as stimuli. When problems with fluency or voice
quality were detected, the token from the first reading was
II. ACOUSTIC SIMILARITY OF JAPANESE AND AE used as a replacement. Thus, a total of 240 tokens
VOWELS 共10 vowels⫻ 3 repetitions⫻ 2 forms⫻ 4 speakers兲 were in-
cluded in the final analysis.
This section reports the results of both within-language All tokens were recorded in an anechoic sound chamber
and cross-language acoustic comparisons. Linear discrimi- at ATR Human Information Processing Research Laborato-
nant analysis 共cf. Klecka, 1980兲 was chosen as the quantita- ries in Kyoto, Japan, using a condenser microphone 共SONY
tive method because its conceptual framework resembles the ECM77-77s兲 connected to a DAT recorder 共SONY PCM-
perceptual assimilation processes. Discriminant analysis is a 2500A, B兲. Each speaker sat approximately 22 cm from the
multidimensional correlational technique that establishes microphone. Speakers read stimuli from randomized lists
classification rules to maximally distinguish two or more with identification numbers preceding each token. Speakers
predefined nominal categories from one another 共i.e., vowel were instructed to read at their natural rate without exagger-
categories兲. The classification rules are specified by linear ating the target nonsense syllables. Recorded lists were digi-
combinations of input variables 共i.e., acoustic measures, such tally transferred to Audio Interchange File Formant 共AIFF兲
as formant frequencies and vowel duration兲 with weights for files and down-sampled from 48 to 22.05 kHz, using a
each variable determined to maximize separation of catego- Power Macintosh 8100/ 100 computer and SOUNDEDIT16
ries using the values for center of gravity of each category 共Version 1.0.1, Macromedia兲. AIFF files for individual to-
and within-category dispersion in the input set. By applying kens for analysis and perceptual testing were created from
the classification rule established for the input set, member- the down-sampled files by deleting the identification num-
ship of tokens in the input set can be re-evaluated 共posterior bers.
classification technique兲, and the amount of overlap between The AE corpus was produced and analyzed using similar
categories can be quantified in terms of incorrect classifica- techniques, except that they were recorded in a sound booth
tion of tokens. It is also possible to classify a new set of data at University of South Florida 共USF兲. Four male speakers of
共test set兲 using the previously established classification rules. general American English dialect each recorded four tokens
In the present study, three types of discriminant analysis for each of 11 vowels in the two speaking styles; three tokens
were performed. First, by performing analysis with and with- were retained for analysis. Further details of the stimulus
out vowel duration as a variable, the contribution of spectral preparation for the AE stimuli are reported elsewhere
and duration cues in distinguishing vowel categories were 共Strange et al., 1998兲.
evaluated for each language 共within-language, within-
condition analyses兲. Second, by using the classification rules
2. Acoustic analysis
for the citation materials, the extent of changes in acoustic
structure across speaking styles in each language was char- Acoustic analysis was performed using a custom-
acterized 共within-language, cross-condition analyses兲. Lastly, designed spectral and temporal analysis interface in
using the classification rules for the AE materials in each SOUNDSCOPE/16 1.44 共PPC兲™ speech analysis software 共Copy-
speaking style, similarity between the spectral characteristics right 1992 GW Instruments, Somerville, MA 02143兲. First,
of Japanese and AE vowels were evaluated for citation and the vocalic duration of the target vowel was determined by
sentence conditions separately 共cross-language analyses兲. locating the beginning and ending of the target vowel using

J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels 579
Citation Form Sentence Form TABLE I. Percent correct classification of within-form and cross-form dis-
criminant analyses based on spectral parameters alone 共F1/F2 in bark val-
2
ii ii ues兲 shown in the left column, and spectral parameters plus vocalic duration
  
i shown in the right column.
(Bark)
3
o i
(Barak)


4
ee oo o F1/F2 共%兲 F1 / F2 + Duration 共%兲
Formant

e oo
FirstFormant

ee
5 e a Japanese Corpora
a
Citation Form 77 98
6
First

Sentence Form 65 100


7 Sentence→ Citation 62 98
aa aa
8 American English Corpora
16 12 10 14 8 6 4 16 12 10 14 8 6 4
Citation form 85 91
Second Formant (Bark) Sentence form 86 95
Sentence→ Citation 78 89
FIG. 1. Formant 1/formant 2 共bark兲 plots for ten Japanese vowels produced
by four male speakers in 关hVba兴 contexts in citation form 共left panel兲 and
sentence form 共right panel兲: 12 tokens per vowel. Filled circles are for
2-mora vowels; open circles are for 1-mora vowels.
tended vowels, they were considered extreme cases of token
both the spectrographic and waveform representations of the variability and were not eliminated from the analysis. Long/
stimuli. The vowel onset was the first pitch period after the short pairs of the same vowel quality overlapped consider-
preceding voiceless segment 关h兴, and the vowel offset was ably, especially for the sentence materials. Thus, as pre-
determined as the reduction of the waveform amplitude and dicted, duration appears to play an important role in
the cessation of higher formants in the spectrogram indicat- separating these contrasting vowel categories.
ing the start of the 关b兴 closure; vocalic duration was calcu- The duration ratios of Japanese long to short vowels
lated as the difference between these two time points. The 共L/S ratios兲 averaged 2.9 for both speaking styles 共ranged
first three formants 共F1, F2, and F3, respectively兲 were mea- from 2.6 to 3.4 for each of five vowel pairs; see Appendix for
sured at the 25%, 50%, and 75% temporal points of the detail兲. That is, long vowels in both forms were, on average,
vocalic duration. Only the measurements from the 50% point three times as long as short vowels. Note that these L/S
共midsyllable兲 are used in the discriminant analyses 共average ratios for Japanese vowels are greater than those for the Ger-
values for each vowels are reported in Appendix兲. Linear man vowels reported in Strange et al. 共2004兲 共average L/S
predictive coding 共LPC兲 spectra 共14-pole, 1024-point兲 were ratio= 1.9 for citation form, 1.5 for sentence form兲. As pre-
computed over approximately three pitch periods 共a 25 ms- viously reported, average L/S ratios for AE vowels were
Hamming window兲 around the 50% point. When the cross- markedly smaller compared to Japanese vowels 共1.3 for both
check with the spectrographic representation indicated the citation and sentence forms兲.
LPC estimate of a formant reflected merged, spurious, or In order to statistically evaluate the above observations
missed formants, the formant frequency was manually esti- and to quantify the spectral overlap among vowels and the
mated from the wideband fast Fourier transfer 共FFT兲 repre- contribution of duration in further differentiating vowel cat-
sentation of the windowed portion. In very few cases where egories, two series of discriminant analyses were performed
wideband FFT also failed, a narrowband FFT was used. for each language corpus. The first analysis used spectral
parameters alone 共F1 and F2兲,2 and the second analysis used
spectral parameters plus vocalic duration as input variables.
B. Results
In these analyses, “correct classification” was defined as the
1. Within-language comparisons: Within and across posterior classification of each token as the speaker-intended
speaking styles
vowel category. For the formant values, as in the previous
Figure 1 presents the target formant frequency data studies 共Strange et al., 2004; 2005; 2007兲 only the measure-
共F1/F2 bark values at vowel midpoint兲 for the 120 tokens of ments at vowel midpoint were entered. Table I presents the
Japanese vowels produced in citation form 共left panel兲, and results for the Japanese citation and sentence forms 共also
the 120 tokens produced in sentence form 共right panel兲. El- depicted in Fig. 1兲 and for the comparable AE corpus. For
lipses surround all 12 tokens of each vowel 共4 speakers both languages, the values in the first two rows represent the
⫻ 3 tokens兲; tokens of the long vowels are represented by within-condition differentiation of vowels in the relevant in-
filled circles, the short vowels are represented by open put sets. The third row shows the results of cross-condition
circles. As Fig. 1 shows, across the five long/short Japanese analyses 共citation as input set, sentence as test set兲.
vowel pairs, there was little spectral overlap in either speak- As these overall correct classification rates indicate, the
ing style, despite considerable acoustic variability within and ten Japanese vowels were not well differentiated by spectral
across speakers. Deviant tokens were found for low vowels: parameters alone, especially for the sentence materials. How-
for 关a兴, one each in the citation 共by speaker 1兲 and in the ever, almost all of the misclassifications were within long/
sentence conditions 共by speaker 2兲; for 关aa兴, two in the cita- short vowel pairs 共25 out of 28 cases for citation form and 41
tion condition and one in the sentence condition 共both by out of 42 cases for sentence form兲. Therefore, when vowel
speaker 2兲. However, since these tokens 共and all others兲 were duration was included as a parameter, classification im-
readily identified by a native Japanese listener as the in- proved to almost perfect. Thus, the Japanese five spectral

580 J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels
categories are well differentiated even when variability due A. Long J Vowels in Citation Form C. Long J Vowels in Sentence Form

to speaker differences was included in the corpus, and with 22 22


ii
 ii 
the use of duration, those five spectral categories are further i i

First Formant (Bark)

First Formant (Bark)


3 3 33
ee u ee u
separated into ten distinctive vowel categories. 44 44
 oo e oo
For the comparable analyses for AE vowels, spectral dif- e  
o   o
55  55
ferentiation 共F1/F2 Bark values alone兲 of the 11 vowels was 
  
66  66 
somewhat better than for Japanese vowels, reflecting the fact æ
77  77 æ 
that phonetically long and short AE vowels differ spectrally. aa aa
As reported in Strange et al. 共2004兲 study, many of the mis- 88
16
16 14
14 12
12 10
10 8
8 6
6 44
88
16 14 12 10 8 6 4
16 14 12 10 8 6 4
classifications of AE tokens included confusions between Second Formant (Bark) Second Formant (Bark)
spectrally adjacent pairs 关ib / e(, e( / (, e( / ␧, æb / ␧, Äb / #, Åb / o*,
o* / *, ub / *兴 and between 关Äb兴 and 关Å兴. When duration was B. Short J Vowels in Citation Form D. Short J Vowels in Sentence Form
included as an input variable, correct classification rates for 2
2 22
both forms improved but to lesser degrees than for Japanese. i  i 

First Formant (Bark)

First Formant (Bark)


33 33
The next series of analyses was performed to quantify e e
44 o 44 o
cross-condition similarity. When sentence materials were
classified using parameter weights and centers of gravities 55
a
55
established for citation materials 共F1/F2 Bark values兲, only 66 66
62% of Japanese vowels were correctly classified. However, 77 77 a
when both spectral and temporal parameters were included, 8
8 88

only two tokens were misclassified as vowels of the same 16 14


16 14 12
12 10
10 8
8 6
6 4
4 16 14
16 14 12
12 10
10 8
8 6
6 4
4
temporal categories 共关ee兴 as 关ii兴 and 关%兴 as 关o兴兲. Note that the Second Formant (Bark) Second Formant (Bark)
absolute duration of vowels in sentences were, on average,
FIG. 2. Formant 1/formant 2 共bark兲 plots for 2-mora 共upper panels兲 and
15% shorter than those in citation form. These results indi- 1-mora Japanese vowels 共lower panels兲 produced by four male speakers in
cate that the spectral differences among the five vowel cat- 关hVba兴 in citation form 共left panels兲 and sentence form 共right panels兲 super-
egories are maintained across speaking styles, and that de- imposed on 11 AE vowels 共dotted ellipses兲. Solid ellipses encircle all 12
tokens of a Japanese vowel; individual AE tokens are not plotted.
spite the changes in vowel duration between the two
speaking styles, the temporal differences between the 1- and
2-mora Japanese vowels are still large enough to differentiate rials. The two top panels are for long Japanese vowels, and
the ten categories. the two bottom panels are for short Japanese vowels. The
As for the cross-condition analysis for the AE vowels, it same dashed ellipses for the 11 AE vowels in each form are
was predicted that the inclusion of duration in the analysis shown in the background. As can be seen, most Japanese mid
should help differentiate the 11 AE vowels in sentence form and short, low vowels were higher 共lower F1 values兲 on
but to a lesser degree than in Japanese because there was less average3 and revealed greater within-category variability
spectral overlap among AE vowels in each speaking style. As than the comparable AE vowels while the front and back,
predicted, the cross-condition analysis revealed that the rate high vowels were spectrally quite similar across languages.
of correct classifications improved by 11% when duration In the cross-language discriminant analysis for the cita-
was included, but 15 cases remained misclassified. These 15 tion materials, the citation AE vowel corpus was used as the
cases include the following: six cases of long/short confu- input set 共F1/F2 Bark values as input parameters兲; then all
sions 共number of misclassified tokens兲: 关(兴 as 关e(兴 共1兲, 关␧兴 as tokens of the Japanese citation corpus were classified using
关æb兴 共1兲, 关#兴 as 关ab兴 共1兲, 关o*兴 as 关*兴 共1兲, 关ub兴 as 关*兴 共2兲; eight the rules established for these AE categories. The results are
long/long confusions: 关e(兴 as 关ib兴 共2兲, 关Äb兴 as 关Åb兴 共4兲, between summarized in Table II, where Japanese vowels are grouped
关Åb兴 and 关o*兴 共2兲; and one short/short confusion: 关*兴 as 关#兴. in terms of the duration 共1-mora and 2-mora兲 and tongue
height 共high, mid, low兲, as shown in the first column. The
2. Cross-language spectral similarity third and fourth columns display the modal AE vowels and
Additional discriminant analyses were performed in the number of Japanese vowel tokens 共max= 12兲 classified as
which the spectral parameters for the 11 AE vowels served as most similar to the respective AE category. In order to make
the input set, and those for the ten Japanese vowels served as acoustic classification consistent with the dialect profile of
the test set. That is, these analyses determined the spectral the listeners who performed the perceptual assimilation task
similarity of Japanese vowels to AE vowels using the classi- in study 2, and to make results comparable between studies,
fication rules established for AE vowel categories. Separate Japanese tokens classified into AE 关Äb兴 or 关Åb兴 were pooled
analyses were performed for the two speaking styles. Figure 共关Äb-Åb兴兲. The last two columns show the classification of the
2 displays the F1/F2 Bark plots of Japanese and AE vowels. remaining tokens.
Distributions of vowels in the two languages are indicated by For the citation materials, 52 out of 60 tokens of 2-mora
the ellipses 共solid for Japanese and dashed for AE兲 that sur- vowels were classified as spectrally comparable long AE
round all 12 tokens of each vowel. Individual Japanese to- vowels 关ib, e(, Äb-Åb, o*, ub兴. Some tokens of mid, front 关ee兴
kens are represented by filled circles; for clarity, individual and low 关aa兴 were spectrally similar to short AE categories
AE tokens are not shown. Panels on the left are for citation 共关(兴 and 关#兴, respectively兲. In contrast, only 21 out of 60
materials, and the panels on the right are for sentence mate- tokens of 1-mora vowels were classified as spectrally com-

J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels 581
TABLE II. Acoustic similarity 共F1/F2 in Bark values兲 of Japanese and Japanese vowels 关ii, i, %%, %兴 were consistently classified
American English 共AE兲 vowles: hVb共a兲 syllables produced in citation. as spectrally similar to long AE high vowels 关ib, ub兴, while
Modal classification Other categories
the mid, front 关ee, e兴 and low vowels 关aa, a兴 were classified
as most similar to two or more AE categories. The mid, back
Japanese. AE No. of AE No. of vowel 关o兴 in this form was more consistently classified as AE
vowel vowel tokens vowel tokens
关o*兴.
2-mora
High ii ib 12 C. Discussion
%% ub 12
Mid ee e( 7 ( 5
The results of acoustic comparisons between Japanese
oo o* 12 and AE vowels suggested the following.
Low aa Äb–Åb 9 # 3 共1兲 Regardless of speaking styles, Japanese vowels were
1-mora consistently classified into five nonoverlapping spectral
High i ib 12 categories. These five spectral categories were further
% ub 9 * 3 differentiated into short 共1-mora兲 and long 共2-mora兲
Mid e ( 8 e( 2 vowels by vowel duration. Altogether, classification was
ib 1 almost perfect for all ten vowel categories when F1, F2
* 1 values at vowel midpoint and vowel duration were in-
o o* 9 ub 3 cluded in the analysis.
Low a 7 2
# Äb–Åb
共2兲 AE vowels produced in both speaking styles were differ-
␧ 1
entiated by mid-syllable formant frequencies fairly well,
o* 1
* 1
and some vowel pairs were further differentiated by vo-
calic duration. However, overall correct classification
was not as good as for the Japanese vowels, indicative of
a more crowded vowel space.
parable short AE vowels 关(, ␧, #, *兴, suggesting that they are
共3兲 Within-language cross-condition discriminant analyses
spectrally more similar to AE long vowels. The classification
revealed that both AE and Japanese vowels produced in
of Japanese 1-mora mid, front and low vowels was to two or
sentences share similar spectral and temporal character-
more AE spectrally adjacent vowels, indicating that they are
istics to the citation material in this phonetic context,
intermediate among several AE categories.
indicating that despite the effects of speaking style, the
Table III presents the results of the discriminant analysis
acoustic distinctions among vowel categories are main-
共F1/F2 Bark values兲 with the AE sentence materials as the
tained for both Japanese and AE.
input set and the Japanese sentence materials as the test set.
共4兲 Cross-language discriminant analyses using only the
The results resembled those for the citation materials: 47 out
spectral parameters indicated that Japanese high vowels
of 60 tokens of 2-mora Japanese vowels were classified into
关ii, i兴, 关%%, %兴, and mid, back vowels 关oo, o兴 in both
spectrally comparable long AE vowels, whereas only 12 out
forms were spectrally similar to AE long vowels 关ib兴, 关ub兴
of 60 tokens of 1-mora vowels were classified as spectrally
and 关o*兴, respectively, while the mid, front and low Japa-
comparable short AE vowels. Similar to citation form, high
nese vowels in both speaking styles were found spec-
trally more intermediate between two or more AE cat-
TABLE III. Acoustic similarity 共F1/F2 in Bark values兲 of Japanese and egories. The Japanese 1-mora vowels 关e, a兴 in citation
American English 共AE兲 vowles: hVb共a兲 syllables produced in citation.
form were especially variable with respect to their spec-
Modal classification Other categories tral similarity to AE vowels.

Japanese AE No. of AE No. of In the next section, the results from the perceptual as-
vowel vowel tokens vowel tokens similation experiment are presented and discussed in relation
2-mora to the acoustic comparisons between Japanese and AE vow-
High ii ib 12 els presented in this section.
%% ub 12
Mid ee ( 9 e( 3 III. PERCEPTUAL ASSIMILATION OF JAPANESE
oo o* 12 VOWELS BY AE LISTENERS
Low aa Äb–Åb 8 # 4
A listening experiment was performed using the Japa-
1-mora nese corpus analyzed in Sec. II. Native AE speakers with no
High i ib 10 e( 2 experience with the Japanese language served as listeners
% ub 12 and performed a perceptual assimilation task that involved
Mid e ( 8 e( 3
two judgments: categorization of Japanese vowels using AE
* 1
vowel category labels and rating the category goodness of
o o* 12
Low a Äb–Åb 8 # 3
the Japanese vowels as exemplars of the chosen AE vowel
o* 1 categories. Based on the results of the acoustic analyses, it
was predicted that perceptual assimilation of Japanese high

582 J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels
vowels 关ii, i兴, 关%%, %兴, and mid, back vowels 关oo, o兴 would 4. Procedures
not be affected by speaking style and would be perceived as
good exemplars of AE long vowels 关ib兴, 关ub兴, and 关o*兴 unless A repeated-measure design was employed in which each
AE listeners detected and responded on the basis of the con- listener was presented all tokens from all four speakers in
siderable temporal differences between the 1- and 2-mora both utterance forms. All listeners were tested on two days:
pairs. In comparison, perceptual assimilation of Japanese 关e, one for citation materials and the other for sentence materi-
ee兴 and 关a, aa兴 was predicted to be less consistent 共and pos- als. The order was counterbalanced across listeners.
sibly with poorer category goodness ratings兲 within and a. Familiarization Before testing on day 1, all listeners
completed an informed consent form and a language back-
across listeners, reflecting the fact that spectrally, these vow-
ground questionnaire. Then, a brief tutorial on the IPA and
els straddled more than one AE category. In addition, the two task familiarization sessions were given. The tutorial for
effects of speaking style were expected to be observed for the IPA provided a brief description of relationship between
关ee, e, aa, a兴. IPA for the 11 AE vowels and their sounds using the 11
关hVd兴 keywords. A HYPERCARD stack provided audio and
text explanations.
A. Method Task familiarization had two parts: categorization-
1. Listeners only and categorization and category goodness judgment.
Stimuli for the categorization only familiarization were
Twelve undergraduate students 共two males and ten fe- 关h.C1VC2兴 disyllables, where C1-C2 combinations were
males, mean age= 26.2 yr old兲 at USF served as the listeners 关b-d, b-t, d-d, d-t, g-d, g-t兴 and the V was one of the 11 AE
for extra credit points. They were recruited either from intro- vowels. These disyllables were embedded in a carrier sen-
ductory phonetics or phonology courses offered in the De- tence, “I hear the sound of 关h.C1VC2兴 some more.” The fifth
partment of Communication Sciences and Disorders. Some author recorded 11 vowels in at least one of the C1-C2 con-
of them had had at least one semester of phonetics by the text, and an additional 44 tokens were recorded by a male
time of testing. All were fluent only in American English, native speaker of AE who was not the speaker for the AE
and none of them had lived in a foreign country for an ex- stimuli used in study 1. In the categorization-only part, these
tended length of time. All reported that they had normal 55 tokens were presented in four blocks. The first block pre-
sented 11 tokens from the female speaker, the next two
hearing. Among these listeners, ten had lived in Florida more
blocks presented 11 tokens from the male speaker randomly
than ten years, and two had lived in the northeastern United selected from his 44 tokens, and the last block presented the
States more than ten years. Four additional listeners were remaining 22 tokens from the male speaker. The listeners
tested but their data were excluded either because they failed were asked to indicate which of the 11 AE vowels they heard
to return for the second day of testing or because they were by clicking on one of the response buttons on the screen.
bilinguals. Feedback was provided by the computer program after each
categorization response. When a response was incorrect, the
experimenter sitting beside the listener provided explana-
2. Stimulus materials tions why the response was incorrect.
In the second part, familiarization for the category
Stimulus materials were the 240 tokens of ten Japanese goodness judgment was given using 56 German sentence-
vowels 共10 vowels⫻ 4 speakers⫻ 2 utterance forms⫻ 3 form tokens randomly chosen from a total of 560 tokens
repetitions兲 acoustically analyzed in study 1 共14 vowels⫻ 4 speakers in 5 consonantal contexts兲 used in
Strange et al. 共2005兲 study. A short task description was pro-
vided by a HYPERCARD stack and the listeners practiced on
3. Apparatus and instruments the German tokens using the same interface as the testing. In
each trial, the listeners were asked to categorize a German
All listening sessions including familiarization were pro- vowel in terms of the 11 AE categories, and then indicate its
vided individually in a sound booth in the Speech Perception category goodness in the chosen AE category on the seven-
Laboratory of USF. Stimuli were presented through head- point Likert scale 共1 = foreign, 7 = English兲. No feedback was
phones 共STAX SR LAMBDA semi-panoramic sound electro- provided in this part.
static ear speaker兲. The listeners adjusted the sound volume b. Test Each listener was tested on two days. On day 1,
for their comfort. For all sessions, stimulus presentation was a testing session was given following the task familiariza-
controlled by a HYPERCARD 2.2 stack on a Macintosh Quadra tion. On day 2, the listeners were given only the testing
660AV computer with a 14 in. screen. The HYPERCARD stack session, which followed the same procedure as testing on
used for the listening test had two components. The first day 1. Half of the listeners heard citation materials on day 1
and sentence materials on day 2; presentation order was re-
component was for categorization and displayed 11 buttons
versed for the remaining listeners.
labeled with Phonetic symbols 共IPA兲 for 11 AE vowels 关ib, (, In a testing session, stimuli were blocked by
e(, ␧, æb, Äb, #, Åb, o*, *, ub兴 and keywords in 关hVd兴 context speaker, and a total of four blocks of 120 trials 共10 vowels
共heed, hid, hayed, head, had, hod, hud, hawed, hoed, hood, ⫻ 3 tokens⫻ 4 repetitions兲 were presented for a speaking
who’d, respectively兲. The second component appeared after style. Block order was counterbalanced across the listeners.
the listener categorized a stimulus vowel and displayed a An opportunity for a short break was provided after the sec-
seven-point Likert scale 共1 = foreign, 7 = English兲 on which ond block. In each trial, the same stimulus was presented
the listeners judged the category goodness of the stimulus twice; after the first presentation of a stimulus, the listeners
vowel in the chosen AE category. categorized the Japanese vowel by choosing one of the 11

J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels 583
TABLE IV. Perceptual assimilation patterns: Categorization responses, expressed as percentages of total re-
sponses summed over speakers and listeners for citation-form 共A兲 and sentence-form 共B兲 materials. Bold
= modal perceptual classification; boxed= modal acoustic classification.

AE response categories

ib ( e( ␧ æb Äb–Åb # o* * ub

共A兲 Citation
2-mora
ii 99 1
ee 2 94 5
aa 2 89 9
oo 1 99 1
%% 1 2 5 92
1-mora
i 95 4 1
e 16 76 8
a 3 57 39
o 1 95 1 2
% 3 1 5 91

共B兲 Sentence
2-mora
ii 99
ee 1 97 2
aa 1 96 3
oo 98 1
%% 2 2 96
1-mora
i 98 2
e 1 23 48 28
a 2 77 21
o 95 4
% 2 1 7 89

AE responses, then the same stimulus was presented again, trices were constructed for 共1兲 individual speakers, 共2兲 indi-
and the listener rated its category goodness to the chosen AE vidual listeners, and 共3兲 two speaking styles. Examination of
category on a seven-point Likert scale 共1 = foreign, 7 these matrices revealed some differences among listeners for
= English兲. The listeners were allowed to change their cat- some vowels 共not associated with dialect兲 and between
egorization response before making a rating response but
speaking styles.
were discouraged from doing so. A new trial began after the
rating response was completed; thus, testing was listener Table IV presents the group data in which categorization
paced. responses were pooled across the speakers, repetitions, and
No specific instruction was given in familiarization listeners. In order for the results to be consistent with the
or test regarding the use of vowel duration, number of spec- majority of listeners’ dialect profile, AE response categories
tral categories in Japanese, or differences between AE and 关Äb兴 and 关Åb兴 were pooled as 关Äb-Åb兴 in this analysis.4 The top
Japanese vowel inventories since the focus of the current part 共A兲 of the table displays the categorization for citation
study was to examine how naïve L2 listeners classify foreign materials, and the bottom part 共B兲 presents the results for
speech sounds using L1 categories.
sentence materials. The first column lists the Japanese vow-
els grouped by the temporal categories 共2-mora or 1-mora兲.
B. Results and discussion The cells in columns 2–11 present percentages of responses.
First, for the category goodness responses, medians of Numbers in boldface indicate the modal perceptual responses
ratings across all the listeners were obtained for all AE cat- for each Japanese vowel; boxed entries indicate the spec-
egories that received more than 10% out of 576 total re- trally most similar AE vowels for each Japanese vowel de-
sponses for a Japanese vowel. As predicted from the results termined by the cross-language discriminant analyses pre-
of the acoustic comparisons, this analysis yielded higher sented in Sec. II.
overall median ratings 共6兲 for 关ii, i兴, 关%%, %兴, 关oo, o兴 in both As predicted from the results of acoustic comparisons,
conditions and 关ee兴 in sentence condition, and slightly lower the majority of Japanese vowels were assimilated to long AE
median ratings 共5兲 for 关e, aa, a兴 in both conditions and 关ee兴 in vowels. All five 2-mora vowels 关ii, ee, aa, oo, %%兴 were
citation condition. consistently assimilated to long AE counterparts 关ib, e(, Äb-Åb,
To analyze the categorization results, each listener’s re- o*, ub兴 respectively 共overall percentages from 89% to 99%兲,
sponses were transferred to a spreadsheet and confusion ma- and both high and the mid, back 1-mora vowels 关i, %, o兴 in

584 J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels
both speaking styles were also assimilated to AE long cat- poral similarity to 1-mora 关e兴; other listeners may have cho-
egories 关ib, ub, o*兴, respectively 共overall percentages from sen 关␧兴 because 1-mora 关e兴 共53 ms兲 was temporally more
89% to 98%兲. These patterns were consistent with their spec- similar to 关␧兴 共98 ms兲 than 关e(兴 共132 ms兲. Still, it is not clear
tral similarity to AE categories, except for 关ee兴 in sentence as to why the majority of listeners responded with spectrally
context. This suggests that AE listeners disregarded the tem- and temporally less similar AE vowels in perceptually as-
poral differences between these 1- and 2-mora Japanese similating these Japanese vowels.
vowels and perceived them as equally good exemplars of
long AE vowels.
By contrast, the assimilation patterns for the other Japa-
nese vowels were not straightforward. First, contrary to pre- IV. GENERAL DISCUSSION
diction from acoustical similarity, Japanese 2-mora 关ee兴 did In the present study, the similarity of Japanese vowels to
not show any influence of speaking style and was perceived the AE vowel system was examined through acoustic com-
as most similar to AE 关e(兴 in both conditions. Even though parisons and perceptual assimilation tasks, using stimulus
some tokens of citation-form 关ee兴 were determined spectrally materials produced by four male speakers from each lan-
to be most similar to AE 关(兴 共5 out of 12 tokens, see Table II兲, guage. Consonantal context was held constant and CV and
the influence of token variability was not observed in its VC coarticulatory effects were considered minimal, but the
assimilation pattern 共94% to 关e(兴兲. As for sentence form 关ee兴, target vowels were produced in citation form and in a carrier
9 out of 12 tokens were spectrally similar to AE 关(兴 共see sentence to examine the influence of token variability asso-
Table III兲, but AE listeners were not influenced by token ciated with speaking style as well as individual speakers.
variability or spectral similarity, and almost unanimously as- Results of the acoustic analysis 共Sec. II兲 showed that
similated it to AE 关e(兴. However, for sentence-form 关ee兴, the Japanese has five distinctive spectral vowel categories that
average vocalic duration was 140 ms, which was closer to were further separated into five long-short pairs. In contrast,
that of AE 关e(兴 共132 ms兲 than 关(兴 共94 ms兲. Therefore, al- although the 11 AE categories were differentiated fairly well
though AE listeners ignored vowel duration for 关ii, i, %%, %, by spectral measures, the inclusion of stimulus duration im-
oo, o兴, temporal similarity may have contributed to AE lis- proved classification only slightly. Token variability due to
teners’ perceptual assimilation of Japanese 关ee兴. speaking style was small for both Japanese and AE. Cross-
As expected, perceptual assimilation patterns for 1-mora language acoustic comparisons using only spectral measures
关a兴 and 关e兴 in both forms included more than one AE cat- 共F1 and F2 at vowel midpoint兲 indicated that Japanese vow-
egory. The modal responses were long AE vowels 共关Äb-Åb兴 els 关ii, i, %%, %, oo, o兴 in both speaking styles were most
and 关e(兴, respectively兲, but relatively large proportions of similar to long AE vowels 关ib, ub, o*兴, respectively, whereas
other responses were given 共关#兴 for 关a兴 and 关(, ␧兴 for 关e兴兲. As classification of 关ee, e, aa, a兴 straddled more than one AE
indicated in Table IV, these perceptual assimilation patterns category. Thus, it was predicted that if AE listeners rely pri-
were not entirely predictable from spectral similarity com- marily on the spectral cues, the perceptual assimilation pat-
parisons. Even when vocalic duration was considered, unlike terns should reflect these acoustic similarity patterns, with
sentence-form 关ee兴, these assimilation patterns suggest the highly consistent categorization of long and short 关ii/i, oo/o,
influence of other factors. %%/%兴, but less consistent assimilation of long and short
In order to account for the less consistent overall assimi- 关ee/e, aa/a兴.
lation patterns observed for 关e兴 and 关a兴 in both speaking Perceptual assimilation results revealed that the five
styles, further examination was performed to discover 2-mora Japanese vowels 关ii, ee, aa, oo, 兴 produced in
whether these patterns were due to individual differences both speaking styles were consistently assimilated to compa-
across AE listeners or to within-listener inconsistency. The rable long AE vowels 关ib, e(, Äb-Åb, o*, ub兴, respectively.
results revealed that a few listeners consistently made non- Therefore, according to SLM 共Flege, 1995兲, Japanese 2-mora
modal responses for these vowels and their responses can be vowels can be considered perceptually identical or highly
classified into either spectra-based or duration-based catego- similar to AE long vowels. On the other hand, the Japanese
rizations. Spectra-based patterns were found for citation- 1-mora vowels showed more varied patterns. The high vow-
form 关e兴 and 关a兴: responses from three listeners 共2, 3, 11兲 els 关i, %兴 and mid, back vowel 关o兴 were consistently assimi-
consisted of 83% of 关(兴 responses for citation 关e兴. Similarly, lated to AE vowels 关ib, ub, o*兴, respectively. This is in accor-
for citation-form 关a兴, 52% of 关#兴 responses were made by dance with the results of spectral comparisons, and suggests
three listeners 共1, 4, 10兲. Although sentence-form 关a兴 was that AE listeners perceived these Japanese vowels as identi-
spectrally most similar to AE 关Äb-Åb兴, three listeners 共1, 2, 10兲 cal or highly similar to their spectrally most similar AE long
perceived this vowel as most similar to AE short vowel 关#兴, vowels and disregarded their shorter duration. Considering
suggesting the influence of duration. The response pattern for that their 2-mora counterparts 关ii, %%, oo兴 were also assimi-
sentence 关e兴 can be considered partially spectra-based. Six lated to these same AE categories, the assimilation patterns
listeners made 68% of the nonmodal responses 共关(兴 or 关␧兴兲.5 of these long/short Japanese pairs can be considered ex-
Of these nonmodal responses, one listener 共1兲 responded amples of a “single-category pattern” in PAM 共Best, 1995兲.
with AE 关(兴 and 关␧兴 equally often 共44%兲; two listeners 共2, 3兲 As for the remaining 1-mora vowels 关e, a兴, the responses
made 关(兴 responses most often, whereas 关␧兴 was the modal from a few listeners were accounted for by cross-language
response for the three remaining listeners 共8, 10, 11兲. The 关(兴 acoustic similarity 共spectral, temporal, or both兲. On the other
response was consistent with the spectral and possibly tem- hand, the remaining AE listeners assimilated these Japanese

J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels 585
vowels into the same AE categories 关e(, Äb-Åb兴 as for their that are not currently included in PAM or SLM. For example,
2-mora counterparts and as equally good exemplars, suggest- although it was not intended to make such a claim, the
ing a “single-category” assimilation pattern. Unlike the mi- present results resemble the “peripherality bias” that Polka
nority responses, these responses could not be explained by and Bohn 共2003兲 suggested to account for asymmetries in
cross-language acoustic similarity patterns. infant vowel discrimination. They summarized the results of
Then what would explain these majority responses? Re- their own studies and others and showed that the asymme-
call that the perceptual task used in the present study has
tries in vowel perception by infants exhibit a bias for better
been used successfully to assess assimilation of many L2
discrimination in the category change paradigm when the
vowels by listeners from L1s with smaller vowel inventories
共Strange et al., 1998; 2001; 2004; 2005兲. In those studies, stimuli changed from a more central to a more peripheral
cross-language discriminant analyses predicted perceptual vowel 共e.g., 关*兴 to 关o兴, 关␧兴 to 关æ兴, 关y兴 to 关u兴, etc.兲. They
assimilation patterns well for most, but not all non-native concluded that this bias toward peripheral vowels may have
vowels that had counterparts in the L1. However, for some an important role as perceptual anchors for language acqui-
“new” vowels, such as the front rounded vowels of German sition. Assuming this bias is present into adulthood and
that are not distinctive in AE, context-specific spectral simi- guides non-native speech learning, the present results for
larity did not predict perceptual similarity patterns by naïve Japanese 1-mora vowels, including 关e, a兴 might be explained
AE listeners. Since all five Japanese vowel qualities can be by this perceptual bias toward more peripheral vowels as
considered to be present in AE, it was hypothesized in the follows. Judging from the acoustic similarity and the ob-
present study that the same perceptual assimilation task served perceptual assimilation patterns, it can be hypoth-
should effectively reveal perceptual similarity patterns based esized that AE listeners compared the incoming 1-mora Japa-
primarily on cross-language spectral similarity. The present nese vowels against both long and short AE vowels, but due
results suggested otherwise in some cases. For these failures,
to their lack of attunement to duration cues and the periph-
at least two explanations can be offered. One concerns the
erality bias, AE listeners chose more peripheral vowels as
method of acoustic comparison, and the other is associated
with the assumptions of the perceptual task. most similar to the Japanese vowels.6
As Hillenbrand et al. 共1995兲 and others have shown in Finally, the present study compared Japanese and AE
AE, formant trajectories for many spectrally adjacent short/ vowels produced in relatively uncoarticulated 关hVb兴 context
long vowel pairs tend to be in the opposite directions in the in both citation and sentence conditions. More research is
vowel space, with the lax vowels 共including long 关æ兴兲 mov- underway that compares how cross-language acoustic and
ing toward more central positions, while tense vowels move perceptual similarities may vary when vowels are coarticu-
toward peripheral positions in vowel space. These formant lated in different consonantal contexts, and at different
movements have been shown to affect vowel perception by speaking rates and in different prosodic environments. Re-
AE listeners 共cf. Nearey and Assmann, 1986; Nearey, 1989; search on AE listeners’ perceptual assimilation of North Ger-
Strange et al., 1983兲. Therefore, even though both long and man and Parisian French vowels 共Strange et al., 2005兲 sug-
short Japanese vowels tend to be monophthongal, if they are gests that perceptual assimilation patterns often reflect
spectrally similar to AE 关e(, Äb-Åb兴 at some time point other
context-independent patterns of acoustic similarity. For in-
than the vocalic midpoint, AE listeners’ perceptual assimila-
stance, AE listeners assimilate front, rounded vowels to back
tion might be influenced by such similarity. Furthermore, as
PAM hypothesizes, if listeners perceive articulatory gestures AE vowels, rather than to front, unrounded AE vowels, even
in the acoustic signal, similarity in the formant trajectories in contexts in which back AE vowels are not fronted. How-
may explain the perceptual results. In order to test this hy- ever, since back AE vowels are fronted in coronal consonan-
pothesis, Japanese 1-mora vowels 关e兴 and 关a兴 were subjected tal contexts 共whereas front, unrounded AE vowels are never
to a series of cross-language discriminant analyses using F1 backed兲 共Strange et al., 2007兲, front, rounded vowels can be
and F2 values at three time points, namely, 25%, 50%, and considered allophonic variations of back AE vowels by AE
75% points of vowel duration. Separate analyses were per- listeners. In citation-form utterances, AE listeners were able
formed for each speaking style using four combinations of to indicate that these front “allophones” were inappropriate
three time points: 25% only, 75% only, 25% + 75%, and in noncoronal contexts by rating them as poor exemplars of
25% + 50% + 75%. Results showed that none of the four AE back vowels. However, in sentence condition, front,
combinations yielded different results from those reported in rounded vowels were considered “good” instances of back
Tables II and III. Therefore, although there might be some AE vowels even when surrounded by labial consonants.
other subtle cues that have influenced perceptual assimila-
Thus, while context-specific spectral similarity did not pre-
tion, it was deemed reasonable to exclude inadequacy or in-
dict assimilation of front, rounded vowels in noncoronal con-
sufficiency of the present acoustic comparison in failing to
predict perceptual assimilation patterns for these vowels. texts, context-independent spectral similarity relationships
Turning to the second hypothesis, considering that both did. Further research on acoustic and perceptual similarity of
关e兴 and 关a兴 were assimilated primarily to AE categories 关e(, Japanese and AE vowels is needed to determine whether
Äb-Åb兴 that are closer to the periphery in the vowel space than context-independent spectral similarity patterns might ac-
the spectrally more similar 关(, #兴 are 共see Fig. 2兲, the present count for perceptual assimilation of Japanese 关e, a兴, as well
results suggest the possible existence of other mechanisms as the remaining 1-mora vowels.

586 J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels
V. CONCLUSIONS
J F1 F2 F3 Duration AE F1 F2 F3 Duration
The present study set out to test whether the perceptual long/short long/short
task used in previous studies was also useful in assessing ratio= 2.9 ratio= 1.3
perceptual similarity of non-native vowels from a small Sentence form
vowel inventory by listeners from a relatively large vowel ii 299 2189 3198 116 ib 303 2336 2961 108
inventory. As in previous studies, the results showed that
ee 445 1925 2638 140 e( 423 2175 2722 132
cross-language acoustic similarity may not always accurately
aa 687 1146 2337 138 Äb 754 1234 2609 125
predict perceptual assimilation of non-native vowels. Thus,
direct assessments of perceptual similarity relationships will Åb 660 1056 2571 152
be better predictors of discrimination difficulties by L2 learn- oo 434 796 2400 127 o* 479 933 2571 126
ers according to PAM and SLM. It was hypothesized that the %% 319 1128 2361 128 ub 342 1064 2422 115
inconsistency between acoustic and perceptual similarity re- æb 714 1645 2456 147
sults may suggest the existence of comparison processes that i 312 2076 3115 37 ( 461 1826 2634 94
are influenced by listeners’ knowledge about the distribu- e 464 1770 2407 53 ␧ 627 1657 2544 98
tional characteristics of phonetic segments in their native a 672 1134 2288 49 # 631 1232 2619 98
language. Under some stimulus and task conditions, the lis- o 423 776 2345 47
teners may be able to compare phonetically detailed aspects % 348 1069 2343 38 * 495 1202 2492 107
of non-native segments, while in others, they may resort to a long/short long/short
phonological level of analysis in making cross-language ratio= 2.9 ratio= 1.3
similarity judgments. Further research is needed to investi-
gate under what conditions AE listeners may be able to uti-
1
lize the large duration differences in Japanese 1-mora and In addition to these cases of assimilation of speech sounds to L1 catego-
ries, PAM includes a case of categorization of L2 sounds to nonspeech
2-mora vowels when making perceptual similarity judgments categories 共i.e., nonassimilable兲 where discrimination can range from good
and when attempting to differentiate these phonologically to very good, based on the psychoacoustic salience of the acoustic
contrastive vowel pairs. In general, the findings of the patterns.
2
present studies, as well as previous experiments on AE lis- The preliminary analyses for Japanese vowels did not reveal considerable
differences between two-formant vs three-formant analyses. Therefore, all
teners’ perceptual assimilation of German vowels suggest
discriminant analyses were performed using only F1 and F2 values.
that vocalic duration differences are often ignored in relating 3
The lower F1 for Japanese mid and low vowels were deemed cross-
non-native vowels to native categories. language differences rather than differences in vocal tract sizes between
the two speaker groups based on the acoustic similarity between Japanese
关ii兴 and AE 关i:兴.
ACKNOWLEDGMENTS 4
According to Labov et al. 共2006兲, differentiation between 关Ä:, Å:兴 by AE
speakers in Tampa area is variable overall, although they are produced
This research has been supported by NIDCD under distinctively before 关n兴 and 关t兴.
Grant No. DC00323 to W. Strange and JSPS under Grant No. 5
In some dialects in American English, 关(兴 and 关␧兴 may be merged 共Labov
17202012 to R. Akahane-Yamada. The authors are grateful to et al., 2006兲. However, the dialectal profiles of these listeners indicate no
such relationship with their response patterns.
Katherine Bielec, Mary Carroll, Robin Rodriguez, and David 6
It should be pointed out that the present study is considerably different
Thornton for their support with HYPERCARD stack program- from the studies with infants in methodological aspects. Using synthetic
ing, subject running, and data analysis. stimuli, Johnson et al. 共1993兲 found that phonetic targets for native vowels
are similar to the hyperarticulated forms, but the existence of a peripher-
ality bias for speaker-produced stimuli by adult listeners has not yet been
APPENDIX: AVERAGE FORMANT FREQUENCIES confirmed in any published study.
AND DURATIONS OF JAPANESE AND AMERICAN
ENGLISH VOWELS Best, C. T. 共1995兲. “A direct realist view of cross-language speech percep-
tion,” in Speech Perception and Linguistic Experience: Issues in Cross-
Language Research, edited by W. Strange 共York Press, Timonium, MD兲,
J F1 F2 F3 Duration AE F1 F2 F3 Duration pp. 171–204.
Citation form Best, C. T., McRoberts, G. W., and Goodell, E. 共2001兲. “Discrimination of
non-native consonant contrasts varying in perceptual assimilation to the
ii 295 2243 3123 138 ib 312 2307 2917 100
listener’s native phonological system,” J. Acoust. Soc. Am. 109, 775–794.
ee 443 1982 2563 154 e( 472 2062 2660 122 Best, C. T., McRoberts, G. W., and Sithole, N. N. 共1988兲. “The phonological
aa 709 1175 2343 165 Äb 753 1250 2596 109 basis of perceptual loss for non-native contrasts: Maintenance of discrimi-
nation among Zulu clicks by English-speaking adults and infants,” J. Exp.
Åb 678 1062 2678 132 Psychol. Hum. Percept. Perform. 14, 345–360.
oo 423 732 2416 154 o* 500 909 2643 112 Flege, J. E. 共1995兲. “Second language speech learning: Theory, findings, and
%% 322 1139 2288 146 ub 348 995 2374 104 problems,” in Speech Perception and Linguistic Experience: Issues in
Cross-Language Research, edited by W. Strange 共York Press, Timonium,
æb 730 1568 2519 123 MD兲, pp. 233–277.
i 317 2077 3027 51 ( 486 1785 2573 86 Flege, J. E., MacKay, I. R. A., and Meador, D. 共1999兲. “Native Italian
e 437 1785 2430 57 ␧ 633 1588 2553 91 speakers’ perception and production of English vowels,” J. Acoust. Soc.
Am. 106, 2973–2987.
a 615 1182 2289 53 # 635 1189 2619 89 Fox, R. A., Flege, J. E., and Munro, M. J. 共1995兲. “The perception of
o 430 805 2375 54 English and Spanish vowels by native English and Spanish listeners: A
% 349 1171 2302 47 * 489 1148 2472 93 multidimensional scaling analysis,” J. Acoust. Soc. Am. 97, 2540–2551.
Gottfried, T. L. 共1984兲. “Effects of consonant context on the perception of

J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels 587
French vowels,” J. Phonetics 12, 91–114. and Japanese,” J. Acoust. Soc. Am. 119, 1684–1696.
Han, M. S. 共1962兲. Japanese Phonology: An Analysis Based upon Sound Rochet, B. L. 共1995兲. “Perception and production of second-language
Spectrograms 共Kenkyusha, Tokyo兲. speech sounds by adults,” in Speech Perception and Linguistic Experi-
Hillenbrand, J., Getty, L. A., Clark, M. J., and Wheeler, K. 共1995兲. “Acous- ence: Issues in Cross-Language Research, edited by W. Strange 共York
tic characteristics of American English vowels,” J. Acoust. Soc. Am. 97, Press, Timonium, MD兲, pp. 379–410.
3099–3111. Schmidt, A. M. 共1996兲. “Cross-language identification of consonants. Part 1.
Hirata, Y., and Tsukada, K. 共2004兲. “The effects of speaking rate and vowel Korean perception of English,” J. Acoust. Soc. Am. 99, 3201–3221.
length on formant movements in Japanese,” in Proceedings of the 2003 Shibatani, M. 共1990兲. The Languages of Japan 共Cambridge University
Texas Linguistics Society Conference, edited by A. Agwuele, W. Warren, Press, New York兲.
and S.-H. Park 共Cascadilla Proceedings Project, Somerville, CA兲, pp. 73– Stack, J. W., Strange, W., Jenkins, J. J., Clarke, III, W. D., and Trent, S. A.
85. 共2006兲. “Perceptual invariance of coarticulated vowels over variations in
Homma, Y. 共1981兲. “Durational relationships between Japanese stops and speaking style,” J. Acoust. Soc. Am. 119, 2394–2405.
vowels,” J. Phonetics 9, 273–281. Strange, W. 共2007兲. “Cross-language phonetic similarity of vowels: Theo-
Homma, Y. 共1992兲. Acoustic Phonetics in English and Japanese retical and methodological issues,” in Language Experience in Second
共Yamaguchi-shoten, Tokyo兲. Language Speech Learning: In Honor of James Emile Flege, edited by
Johnson, K., Flemming, E., and Wright, R. 共1993兲. “The hyperspace effect: O.-S. Bohn and M. J. Munro 共John Benjamins, Amsterdam兲, pp. 35–55.
Phonetic targets are hyporarticulated,” Language 69, 505–528. Strange, W., Akahane-Yamada, R., Kubo, R., Trent, S. A., and Nishi, K.
Keating, P. A., and Huffman, M. K. 共1984兲. “Vowel variation in Jpaanese,” 共2001兲. “Effects of consonantal context on perceptual assimilation of
Phonetica 41, 191–207. American English vowels by Japanese listeners,” J. Acoust. Soc. Am. 109,
Klatt, D. H. 共1976兲. “Linguistic uses of segmental duration in English: 1691–1704.
Acoustic and perceptual evidence,” J. Acoust. Soc. Am. 59, 1208–1221. Strange, W., Akahane-Yamada, R., Kubo, R., Trent, S. A., Nishi, K., and
Klecka, W. R. 共1980兲. Discriminant Analysis, 共Sage, Newbury Park, CA兲. Jenkins, J. J. 共1998兲. “Perceptual assimilation of American English vowels
Labov, W., Ash, S., and Boberg, C. 共2006兲. Atlas of North American En- by Japanese listeners,” J. Phonetics 26, 311–344.
glish: Phonology and Phonetics 共Mouton de Gruyer, Berlin兲. Strange, W., Bohn, O.-S., Nishi, K., and Trent, S. A. 共2005兲. “Contextual
Ladefoged, P. 共1993兲. A Course in Phonetics, 3rd ed. 共Harcourt Brace, Or- variation in the acoustic and perceptual similarity of North German and
lando, FL兲. American English vowels,” J. Acoust. Soc. Am. 118, 1751–1762.
Nearey, T. M. 共1989兲. “Static, dynamic, and relational properties in vowel Strange, W., Bohn, O.-S., Trent, S. A., and Nishi, K. 共2004兲. “Acoustic and
perception,” J. Acoust. Soc. Am. 85, 2088–2113. perceptual similarity of North German and American English vowels,” J.
Nearey, T. M., and Assmann, P. F. 共1986兲. “Modeling the role of inherent Acoust. Soc. Am. 115, 1791–1807.
spectral change in vowel identification,” J. Acoust. Soc. Am. 80, 1297– Strange, W., Jenkins, J. J., and Johnson, T. L. 共1983兲. “Dynamic specifica-
1308. tion of coarticulated vowels,” J. Acoust. Soc. Am. 74, 695–705.
Peterson, G. E., and Lehiste, I. 共1960兲. “Duration of syllable nuclei in En- Strange, W., Weber, A., Levy, E., Shafiro, V., Hisagi, M., and Nishi, K.
glish,” J. Acoust. Soc. Am. 32, 693–703. 共2007兲. “Acoustic variability of German, French, and American vowels:
Polka, L., and Bohn, O.-S. 共2003兲. “Asymmetries in vowel perception,” Phonetic context effects,” J. Acoust. Soc. Am. 122, 1111–1129.
Speech Commun. 41, 221–231. Werker, J. F., and Tees, R. C. 共1984兲. “Cross-language speech perception:
Pruitt, J. S., Jenkins, J. J., and Strange, W. 共2006兲. “Training the perception Evidence for perceptual reorganization during the first year of life,” Infant
of Hindi dental and retroflex stops by native speakers of American English Behav. Dev. 7, 49–63.

588 J. Acoust. Soc. Am., Vol. 124, No. 1, July 2008 Nishi et al.: Similarity of Japanese and American vowels

You might also like