(Shport, 2009) Perception of Vietnamese Back Vowels Contrasting in Rounding by English Listeners

Journal of Phonetics 73 (2019) 8–23
Contents lists available at ScienceDirect
Journal of Phonetics
journal homepage: www.elsevier.com/locate/Phonetics
Research Article
Perception of Vietnamese back vowels contrasting in rounding by English

listeners
Irina A. Shport
Department of English, Louisiana State University, Baton Rouge, 70803 LA, USA
a r t i c l e i n f o a b s t r a c t
Article history: The perception of back vowels contrasting in rounding has not previously been examined in major theoretical
Received 12 September 2017 frameworks of cross-language speech perception. In two experiments, Southern U.S. English speakers naïve
Received in revised form 9 December 2018
to the contrast categorized the Vietnamese vowels [u o ɯ ɤ] in terms of their native vowel categories and identified
Accepted 11 December 2018
Available online 8 January 2019 oddball vowels in triads representing the contrasts [u]-[o], [ɯ]-[u], [ɯ]-[ɤ], and [o]-[ɤ]. The relationship between
vowel categorization and discrimination was more accurately predicted when predominant and secondary catego-
Keywords: rization patterns were taken into account. Group results showed that Vietnamese [ɤ] and [u o] were perceived as
Rounding being most similar to English /ʌ/ and /oʊ/, respectively, whereas [ɯ] did not have a predominant categorization.
Back vowels Discrimination was not significantly more accurate in vowel pairs contrasting in rounding than in vowel pairs
Vietnamese contrasting in height. It was more accurate, however, in less-to-more peripheral vowel presentation order than
Cross-language associations in the opposite direction. This asymmetry was even observed in the [o]-[ɤ] pair in which each member assimilated
Vowel discrimination
to a different native category. Collectively, the findings suggest that multiple-category membership and individual
Perceptual Assimilation Model
Natural Referent Vowel framework
variability among listeners are to be considered in vowel perception. Acoustic-phonetic similarity between vowels
may be a better predictor than the category membership in naïve listeners.
! 2018 Elsevier Ltd. All rights reserved.
1. Introduction categorization shapes the perception of non-low back vowels

contrastive in rounding in adult listeners whose native lan-
1.1. Factors influencing cross-language perception of vowels guage lacks this contrast.
Previous PAM-framed studies on vowels included pairs of
Adult listeners are highly attuned to the phonemes of their back vowels contrastive in height but not in rounding (Levy,
native language (L1). This attunement comes at the expense 2009; Tyler, Best, Faber, & Levitt, 2014). To address this
of discriminating certain segments that are contrastive in gap, the present study examined the perception of Vietnamese
non-native languages (L2), but are irrelevant to phonological vowels [u o ɯ ɤ] by native speakers of Southern U.S. English
contrasts or socially meaningful variation in L1. Patterns of (SUSE). Vowel categorization and vowel discrimination exper-
L2 segment perception at initial stages of listener exposure iments were conducted, as both examinations are necessary
to an L2 have been explained in terms of perceived similarity for generating predictions in the PAM (Best, 1995; Tyler
between L1 and L2 segments in the Perceptual Assimilation et al., 2014). In addition to testing PAM with a different vowel
Model (PAM, Best, 1995), universal vowel perception biases set, the choice of [u o ɯ ɤ] extends the PAM in that it requires
in the Natural Referent Vowel framework (NRV, Polka & consideration of factors that are typically examined in the L2LP
Bohn, 2003, 2011), and the similarity between dialect-specific and NRV frameworks. While the present study was not
L1 and L2 segments, as well as individual differences in acous- designed to directly test predictions of these other models, fac-
tic cue weighting in the Second Language Linguistic Percep- tors such as dialectal variation in the realization of English /u o/
tion model (L2LP, Escudero & Boersma, 2004; Escudero, (Labov, Ash, & Boberg, 2006), individual differences in percep-
Benders, & Lipski, 2009). The current study adopts PAM’s tion (Tyler et al., 2014), and a universal perception bias involv-
methodology to address the question of how L2 phone ing /u o/ in comparison to unrounded non-low back vowels
(Lisker, 1988; Polka & Bohn, 2011; Schwartz, Abry, Boë,
E-mail address: ishport@lsu.edu
Ménard, & Vallée, 2005; Stevens, 1972) were expected to
https://doi.org/10.1016/j.wocn.2018.12.003
0095-4470/! 2018 Elsevier Ltd. All rights reserved.
I.A. Shport / Journal of Phonetics 73 (2019) 8–23 9
influence the perception of Vietnamese [u o ɯ ɤ]. These fac- Because vowel-inherent spectral change may influence the
tors were considered here in relation to PAM predictions. perception of non-native Vietnamese vowels, their F1/F2 tra-
jectories were examined here to inform the background for
the current study.
1.2. Assessing acoustic similarity in English and Vietnamese non-low
Productions of Vietnamese words containing the [u o ɯ ɤ]
back vowels
vowels were recorded here by a female native speaker of
All speech perception models emphasize the importance of Central Vietnamese. Productions of English words containing
acoustic–phonetic detail in L2-to-L1 vowel mapping (for an the SUSE vowels spoken by 27 female native speakers were
overview of major theories, see Bohn, 2018). Acoustic compar- obtained from Chung and de Mahy (2017). Similar to the par-
isons between L2 and L1 vowels must be dialect-specific, ticipants in the current study, Chung and de Mahy’s partici-
because the acoustic realization of the same vowel may sub- pants were raised in the state of Louisiana and attended
stantially differ across regional varieties of a language (Labov Louisiana State University. In both studies, three monosyllabic
et al., 2006). Regional differences in a native language and words with the same vowel were repeated by each speaker
the knowledge of a standard variety influence the perception three times; vowels were segmented in Praat (Boersma &
of non-native vowels (Chládková & Podlipský, 2011; Weenink, 2017); formant measurements at the 20%, 50%,
Escudero, Simon, & Mitterer, 2012; Williams & Escudero, and 80% temporal points in each vowel were extracted using
2015). In the present study, participants were recruited in the the same script; and the measurements were checked for
Southern U.S., where SUSE is the predominant dialect. In uncharacteristic values. In Fig. 1, the average F1/F2 trajecto-
what follows, acoustic comparisons in terms of rounding, for- ries of vowels are plotted together by language. The most
mants, and duration are made between non-low back vowels peripheral values of English vowels are connected by dotted
in SUSE and [u o ɯ ɤ] in Vietnamese. lines to delineate the average vowel space in the female SUSE
Rounding in non-low back vowels makes it possible for the speakers. A dotted circle outlines the space occupied by the
second formant (F2) to achieve a minimum value in a region [u o ɯ ɤ] vowels in the female Vietnamese speaker whose pro-
where the first formant (F1) is relatively stable (Stevens, ductions were used to create the stimuli for the current study.
1972). This proximity of the two formants has been described Fig. 1 shows that F1/F2 changes within each vowel are smaller
as formant convergence or focalization, resulting in more in Vietnamese than in English. Vietnamese rounded [u o] have
prominent spectral peaks in rounded vowels than in their a lower F2 in comparison to substantially fronted /u o/ in SUSE.
unrounded counterparts. This property of rounded non-low Unrounded [ɯ ɤ] are acoustically close to the /o/, /ʊ/, and /ʌ/
back vowels may increase their psychoacoustical salience, categories in SUSE. Vietnamese [ɤ] occupies space close to
leading to their relatively high frequency of occurrence across the endpoint of the English /ʌ/ trajectory — it can also be
the world’s languages (Maddieson, 2013), and to a perception described as having F1/F2 values intermediate between
bias of categorizing back vowels as rounded (Lisker, 1988), English /ʊ/ and /ʌ/. Vietnamese [ɯ] occupies the F1/F2 space
and to a perception bias in their use as natural referents in close to the second part of the diphthongized English /o/ trajec-
vowel discrimination (Polka & Bohn, 2003, 2011; Schwartz tory (/oʊ/ hereafter). (These two vowels may be better differen-
et al., 2005). tiated by F3.) Vietnamese [ɯ] is often mid-centralized in Hanoi
Among English non-low back vowels, /u ʊ o ɔ/ are rounded Vietnamese and transcribed as or [ɨ] (Kirby, 2011). How-
and only /ʌ/ is unrounded. Although unrounded realizations of ever, in the recordings used in the current study, [ɯ] was not
/ʊ/ have been reported (Cruttenden, 2014; Ladefoged & that centralized.
Johnson, 2015), articulatory work on rounding in /ʊ/ has not
been conducted. In principle, rounding may be lost over time
as attested by the development of Middle English /ʊ/ into /ʌ/
(Cruttenden, 2014). However, in the absence of articulatory
data, it is not known whether the perception of unrounded /ʊ/
in some native listeners stems from a reduction in lip rounding
or from a reduction in lip pursing while lip protrusion is main-
tained. Importantly for this study, rounding in non-low back
vowels is not contrastive in any English variety. Therefore,
the question arises of how back vowels contrastive in rounding
are perceived by English listeners. Vietnamese is ideally suited
to address this question as it happens to have the typologically
rare, unrounded back vowels /ɯ ɤ/, along with rounded /u o/.
The degree of acoustic similarity between native and non-
native vowels is largely determined by spectral characteristics,
especially their F1 and F2 patterns. Many English vowels are
characterized by vowel-inherent spectral change, as demon-
strated by diphthongization and complex vowel trajectories
that distinguish regional varieties of English (Fridland, Fig. 1. Mean F1 and F2 values of Vietnamese vowels produced by a female native
Kendall, & Farrington, 2014; Jacewicz, Fox, & Salmons, speaker of Central Vietnamese (in orange) and of English vowels produced by 27 adult
female speakers of Southern U.S. English (in blue, Chung & de Mahy, 2017). (For
2011; Nearey & Assman, 1986). The formant dynamics of Viet- interpretation of the references to colour in this figure legend, the reader is referred to the
namese vowels have not previously been investigated. web version of this article.)
10 I.A. Shport / Journal of Phonetics 73 (2019) 8–23
Lastly, vowel duration must be considered in cross- [y] may be perceived as a member of the /i/ category or the
language comparisons. Relatively long L2 vowels may be per- /u/ category. A high-threshold criterion helps to minimize the
ceived as acoustically similar to L1 low vowels, tense vowels, multiple category membership, but it also dramatically
or diphthongs rather than to L1 high vowels, lax vowels, or increases the number of L2 vowels considered uncategorized.
monophthongs. Note, however, that the duration difference If substantial variability in L2 vowel categorizations is
between tense and lax vowels is reduced in SUSE as com- expected, then an adoption of a low-threshold criterion is war-
pared to other North American English dialects (Clopper, ranted. In the current study, the 50% consistency criterion was
Pisoni, & De Jong, 2005; Fridland et al., 2014). In Vietnamese, adopted, and an even lower, above-chance threshold was
vowel duration is not contrastive, with the exception of [ɤ] in considered.
closed syllables (Kirby, 2011). In the present study, all Viet- Perceived similarity is also influenced by universal percep-
namese words had a consonant–vowel syllable structure. tual biases which are rooted in phonetic processing. One such
Therefore, the variation in Vietnamese vowel length was bias is using focalized vowels (often peripheral in the articula-
assumed not to be substantial enough to trigger a change in tory/acoustic vowel space) as anchors for the perception of
L2 vowel categorizations provided by SUSE listeners. other vowels in the system. In the NRV framework, evidence
for this bias is found in vowel discrimination tasks (Polka &
Bohn, 2003, 2011; Schwartz et al., 2005), but it may also man-
1.3. Assessing perceived similarity through vowel categorizations
ifest itself in categorization tasks. For example, front vowels
Determining degree of similarity on the basis of acoustic com- tend to be perceived as unrounded and (non-low) back vowels
parisons may be problematic because vowel categories are as rounded (Lisker, 1988), a bias supported by the typological
defined along multiple acoustic dimensions, and individual lis- frequencies (Maddieson, 2013). Furthermore, listeners may
teners may prioritize some dimensions over others in phonetic assimilate L2 vowels not to the spectrally most similar vowels
categorization (Escudero et al., 2009, among others). Consider, in their native language, but to peripheral vowels in a similar
for example, Vietnamese [u]. It is rounded as are English /u/, /oʊ/, articulatory/acoustic space. For example, Japanese [e] and
/ɔ/, and /ʊ/. It is characterized by a relatively small vowel- [a] are assimilated not to spectrally similar English /ɪ/ and /ʌ/,
inherent spectral change, similar to English /u/, /ʊ/, and /ɔ/ but to peripheral /eɪ/ and /ɑː ɔ/ (Nishi, Strange, Akahane-
(Fig. 1). It is less fronted, similar to English /ɔ/. Lastly, it is of a Yamada, Kubo, & Trent-Brown, 2008).
height similar to the endpoint of English /oʊ/. These comparisons
along several acoustic dimensions do not lead to a simple 1.4. Predicting discrimination of back vowels
answer for which English category Vietnamese [u] is most sim-
ilar to. If we consider the vowel characteristics described in the L2 vowel perception is often assessed in terms of
previous section collectively, Vietnamese rounded [u o] may discrimination accuracy, that is, one’s ability to distinguish
be perceived as similar to English /u/, /oʊ/, /ʊ/, or /ɔ/. Vietnamese two vowels. According to the PAM, discrimination accuracy
unrounded [ɯ] may be perceived as similar to English /ʊ/ or /oʊ/, may be predicted from patterns of perceived similarity between
and Vietnamese [ɤ] as similar to English /ʊ/ or /ʌ/. Thus, acoustic L2 phones and L1 categories observed in vowel categorization
similarity between native and non-native vowels influences, but (Best, 1995). Specifically, these patterns are classified as one
does not solely determine, the perceived similarity between of five assimilation types, as illustrated below by examples
them (Bohn, 2018). Furthermore, L2 vowel perception may also from Tyler et al. (2014, Table 2 and Table 3). To illustrate all five
be influenced by tone and segmental context (Gottfried, 1984; assimilation types, the 60% categorization consistency crite-
Levy, 2009; Yu, 2010). Considerable variation in perceptual rion is applied here to Tyler et al.’s data, which were originally
assimilations should be expected within a listener and across classified using the 70% criterion.
listeners of similar L1 background.
In the PAM framework, perceived similarity is measured (1) Two-Category (TC) assimilation: Each L2 vowel in a pair is con-
sistently labeled as a different L1 vowel (e.g., Norwegian [i]-[ʉ]
directly and holistically by asking listeners to judge the degree
labeled as English /i/-/u/).
of similarity between an L2 phone and different L1 phonemes
(2) Uncategorized-Categorized (UC) assimilation: Each L2 vowel in
(for a review, see Levy, 2009). A criterion of 50–90% consis- a pair is frequently labeled as a different L1 vowel; one label rep-
tency across listener judgments typically determines whether resents less than 60% of listener responses, whereas the other
L2 vowels are categorized (C) or uncategorized (U) in terms label is consistently applied (e.g., French [œ]-[ø] labeled as
of L1 categories (Harnsberger, 2001; Tyler et al., 2014). The English /ʌ/-/u/).
choice of the criterion threshold depends on categorization (3) Uncategorized-Uncategorized (UU) assimilation: Two L2 vowels
patterns observed in a specific data set and on a researcher’s are frequently labeled as the same or different L1 vowels, but
approach (see the discussion in Harnsberger, 2001). For both labels represent less than 60% of listener responses
example, when American English listeners perceive French (e.g., Thai [ɯ]-[ɤ] labeled as English /ʌ/ or Thai [y]-[ø] labeled
[y] as being similar to English /i/ and /u/ in 51% and 33% of tri- as English /i/-/ɝ/).
(4) Category-Goodness (CG) assimilation: Two L2 vowels are con-
als, respectively (Tyler et al., 2014), [y] is considered catego-
sistently labeled as the same L1 vowel, but their goodness-of-fit
rized under the 50% threshold but uncategorized under the
ratings for that label differ (e.g., Norwegian [i]-[y] labeled as Eng-
70% threshold. A low-threshold criterion helps to capture pat- lish /i/ by some listeners).
terns in variable responses, which may be due to the complex- (5) Single-Category (SC) assimilation: Two L2 vowels are consis-
ity of the L1 vowel’s acoustic structure, acoustic variation in L2 tently labeled as the same L1 vowel, and their goodness-of-fit
phones, or individual differences in acoustic cue weighting. In ratings for that label are similar (e.g., Norwegian [i]-[y] labeled
the above example, two categorization patterns emerge: Thai as English /i/ by certain other listeners).
These five types of assimilation capture perceived similarity vowel pairs that are likely to involve phonetic, within-category
on a nominal scale. The following discrimination accuracy is processing, that is, SC, CG, or UU assimilations (Tyler et al.,
predicted, from easiest to most difficult: TC = UC > C- 2014). In TC and UC assimilations, listeners are presumed to
G > UU > SC (Best, 1995; Bohn, 2018). Levy (2009) devel- rely more on phonological information as they consistently
oped an alternative approach to capturing perceived associate vowels with different L1 categories. Tyler et al.
similarity by comparing the frequencies of French vowel cate- (2014) tested this cross-model prediction and found that the
gorizations in terms of English vowel categories and calculat- less-to-more peripheral vowel order in a trial facilitated discrim-
ing an overlap score for each target French vowel pair (e.g., ination accuracy only in SC assimilations shown by a slim
[y]-[œ], [y]-[e], [y]-[o]). The overlap is defined as the smaller majority of English listeners for the Norwegian [i]-[y] contrast.
percentage of responses when two members of an L2 vowel The researchers proposed that a relatively small number of
pair are labeled as similar to the same L1 vowel. Overlap CG assimilations and an at-ceiling discrimination accuracy of
scores are used to rank non-native vowel contrasts in terms contrasts showing the UU assimilation might have obscured
of discrimination difficulty predicted from perceptual assimila- the referent vowel bias in these two assimilation types which
tion patterns, but without a reference to goodness-of-fit ratings. involve phonetic processing.
Levy’s method does not necessarily differentiate between the
PAM’s UC and TC assimilations (2009). Furthermore, this 1.5. Study rationale and hypotheses
method does not capture within-category differences indexed
by goodness-of-fit ratings, which are necessary to differentiate The present study examined the perception of non-low back
between CG and SC assimilations. However, it does allow vowels [u o ɯ ɤ] by English listeners in two types of tasks, with
researchers to quantify assimilation on an interval scale, inde- the outcomes of the categorization experiment serving to gen-
pendently of the categorization consistency criterion chosen a erate predictions for the discrimination experiment. A combina-
priori. tion of these two experiments allowed for direct testing of PAM
Because the current study examined back vowels con- predictions (Best, 1995) and an examination of some NRV pre-
trastive in rounding, the PAM’s predictions of vowel discrimina- dictions (Polka & Bohn, 2003, 2011). These examinations have
tion accuracy might be modulated by the universal perceptual not previously been done on back vowels contrasting in
bias discussed in Section 1.3. In the NRV framework, this bias rounding.
is used to explain why the discriminability of non-native vowels In the categorization experiment, the goal was to investigate
varies depending on the order of vowel presentation (Bohn, the perceptual assimilation patterns of Vietnamese vowels [u o
2018; Polka & Bohn, 2003, 2011). Specifically, when phonemic ɯ ɤ] by SUSE listeners naïve to the rounding contrast in back
categories have not been established in a listener (e.g., in an vowels. Rounding was expected to play a crucial role in cross-
infant’s L1 or in beginning learner’s L2), a change from a more language associations due to focalization in rounded back
peripheral vowel to a less peripheral vowel is more difficult to vowels (Lisker, 1988; Nishi et al., 2008; Polka & Bohn, 2011;
detect than a change in the opposite direction. For example, Schwartz et al., 2005; Stevens, 1972). SUSE listeners were
Dutch L1 listeners exposed to Southern British English exhibit asked to categorize these four vowels in terms of their native
less accurate discrimination in /ɒ/-/ʌ/ trials than in /ʌ/-/ɒ/ trials; vowel categories /u oʊ ʊ ɔ ʌ ɝ ɑ/. Their responses were
English L1 listeners exposed to Dutch are less accurate in expected to indicate the role of vowel rounding and height on
/u/-/y/ and /ʊ/-/ʏ/ trials than when the same vowels are pre- cross-language assimilations, given that English does not
sented in reverse order (Polka & Bohn, 2011). This phonetic have unrounded [ɯ ɤ], and that rounded [u o] are more fronted
processing bias is reduced/absent when phonemic processing in SUSE than in Vietnamese. Specifically, categorization
is engaged (e.g., in listeners with substantial language responses can show whether unrounded [ɯ] and [ɤ] are asso-
experience) or when general auditory processing is engaged ciated with spectrally similar /ʊ/ and /ʌ/ (Fig. 1) or with natural
(e.g., in discrimination tasks with a short inter-stimulus interval referent vowels /oʊ/ and /ɔ/. As for the SUSE /u/ category, it is
of 400–500 ms) (Polka & Bohn, 2011). unclear whether /u/ may serve as a referent vowel. This vowel
The more peripheral vowel must be determined for each is very fronted in SUSE (Fig. 1) and subsequently less focal-
vowel contrast. For example, /e/ is more peripheral than /ɪ/ ized, unless it is followed by /l/ (e.g., pool, tool), a phonological
but less peripheral than /i/. Recall that the relative psychoa- environment in which fronting is blocked. If knowledge of this
coustical salience of vowels such as /i/, /æ/, /a/, /u/, and /y/ phonologically conditioned variation or knowledge of an Eng-
is attributed to formant frequency convergence in a narrow lish dialect with less fronted /u/ influences L2 vowel perception
spectral region (Polka & Bohn, 2003, 2011; Schwartz et al., (Williams & Escudero, 2015), then SUSE listeners may use /u/
2005). Focalization may be a more accurate term to refer to as a referent vowel.
this formant convergence than peripherality because the latter In the discrimination experiment, the research questions
invokes an image of a two-dimensional F1/F2 space, as in were whether vowel assimilation patterns established in the
Fig. 1, which captures only two-formant frequency conversion. categorization experiment can serve as accurate predictors
Focalization, on the other hand, describes a multi-dimensional of vowel discrimination accuracy and whether rounded vowels
space defined by at least three formants, which is essential for were perceptually more salient than other vowels. To address
vowels such as /y/ (Schwartz et al., 2005). In the current study, these questions, English speakers made odd-man-out judg-
these two terms are used interchangeably, but three-formant ments for words with [u o ɯ ɤ] presented in triads. These judg-
acoustic characteristics are considered. ments were used to classify cross-language assimilation
When the concept of referent vowels is applied to the PAM, patterns (Best, 1995; Harnsberger, 2001; Tyler et al., 2014).
a perceptual bias would be expected in the discrimination of These classifications were used to test PAM predictions based
on expected qualitative ranking from easiest to most difficult, Southern states: in New York (PR, female), Florida (MZ, male),
namely TC = UC > CG > UU > SC (Best, 1995; Bohn, 2018), and Arizona (BH, female). At the time of the study, PR had
and on quantitative difficulty ranking in terms of overlap scores resided in Louisiana for 2.4 years, MZ for 6 years, and BH for
(Levy, 2009). 10 years. It was assumed that all participants were highly
Several possible asymmetries in discrimination were addi- familiar with the Southern variety of U.S. English and that most
tionally examined because the presence or absence and the of them spoke SUSE on daily basis.
presentation order of a rounded vowel in a triad was expected All participants had a minimal bilateral hearing range of
to influence the accuracy and speed of listeners’ responses. 250–8000 Hz as tested at 25 dB on site. They were naïve to
First, although the rounding contrast does not exist in SUSE, Vietnamese but had studied other foreign languages. The data
it may still be relatively easy to discriminate because rounded of one participant were eliminated because she had been
back vowels are more perceptually salient than unrounded exposed to unrounded back vowels through her language
back vowels, which may enhance the contrast for listeners studies (allophone [ɤ] in Mandarin and phoneme /ɯ/ in Kor-
(Lisker, 1988; Stevens, 1972). Thus, the discrimination of vow- ean). Exposure to a rounding contrast in front vowels through
els contrasting in rounding — [u]-[ɯ] and [o]-[ɤ] — was learning French (seventeen participants) or German (one par-
expected to be more accurate than discrimination of vowels ticipant) was not a data-eliminating criterion. Thus, data from
contrasting in height — [u]-[o] and [ɯ]-[ɤ]. Second, discrimina- 49 participants were included in the analyses and coded for
tion has been found to be easier from a less focalized to a experience with a rounding contrast in vowels.
more focalized vowel than in the other direction (Polka &
Bohn, 2003, 2011). Thus, a more accurate discrimination
was expected in [ɯ]-[u] and [ɤ]-[o] trials than in [u]-[ɯ] and 2.1.2. Materials
[o]-[ɤ] trials. (See, however, a caveat with regard to the Twelve Vietnamese stimulus words, three words represent-
reduced focalization of English /u/ described above.) This ing each of the vowels [u o ɯ ɤ], were selected in consultation
asymmetry was expected to be more prominent in SG assim- with a female native speaker of Vietnamese (Table 1). Each
ilations than in other assimilation types (Tyler et al., 2014). word had a consonant–vowel structure, in which the initial con-
Lastly, a listener’s previous experience with the rounding sonant was either [t] or [tʰ]. In each trio of words with the same
contrast, even if the contrast is absent in L1, may influence vowel, two words had a level tone and one word had a falling
vowel categorization and discrimination. Therefore, any such tone. These words were spoken in modal voice, which is not
previous experience was taken into account in the analyses associated with durational differences across Vietnamese
of listener responses. tones (Kirby, 2011).
The consultant recorded three tokens of each word in a
2. Experiment 1: Vowel categorization
sound-attenuated laboratory room at a sampling rate of
44,100 Hz with 16-bit resolution, on a mono channel. Record-
2.1. Method ings were normalized for amplitude and spliced into separate
audio files. Vowel formants, duration, and fundamental fre-
2.1.1. Participants quency (f0) were examined in Praat (Boersma & Weenink,
Fifty native American English speakers (31 females, 19 2017). Given that duration may serve as an acoustic cue
males; age 18–25) were recruited at Louisiana State Univer- (e.g., in tense/lax vowel contrasts) and thus influence L2
sity. The majority grew up in Southern U.S. states as defined phone categorization (Chládková & Podlipský, 2011;
by the dialect regions in the Atlas of North American English Escudero et al., 2009), vowel durations were compared across
(Labov et al., 2006). Three participants grew up in non- words. Durations ranged from 290 to 366 ms, and were not
Table 1
Vowel characteristics in twelve Vietnamese stimulus words, averaged across three repetitions of each word. Formant values and fundamental frequency were recorded at vowel midpoints.
Only eight level-tone words were used in the discrimination experiment.
Vowel Word Tone F0 (Hz) Duration (ms) F1 (Hz) F2 (Hz) F3 (Hz)

tu 'knockup' Level 266 327 483 775 3014
[u] thu 'autumn' Level 245 311 484 851 3089
tù 'prison' Falling 178 292 390 730 2868
Mean 310 452 785 2990
tư 'four' Level 271 327 515 1632 3148
[ɯ] thư 'letter' Level 248 306 492 1479 2790
từ 'word' Falling 185 335 464 1433 3112
Mean 323 490 1514 3017
tô 'bowl' Level 238 311 536 962 3071
[o] thô 'coarse' Level 247 290 563 963 3268
th 'to transport' Falling 186 366 490 820 3061
Mean 322 530 915 3133
tơ 'silk' Level 248 317 630 1562 3144
[ɤ] thơ 'poem' Level 250 281 694 1484 3173
thờ 'to worship' Falling 199 356 651 1397 3116
Mean 318 658 1481 3144
systematically influenced either by aspiration in a preceding on a scale from 1 (bad example) to 7 (very good example).
consonant or by the tone type (Table 1). After the rating was selected, the ‘ok’ button appeared on the
As expected, f0 decreased in all words, but at the vowel screen, and the participant moved to the next trial by clicking
midpoint, it was lower in words with a falling tone than in words it. The inter-trial interval was one second, but the response
with a level tone (Kirby, 2011). The f0 difference between the time was unlimited. In total, 1764 categorization responses
vowels' starting and ending points was 13–30 Hz in level- were collected: four vowels ! nine trials ! 49 listeners.
tone words and 35–52 Hz in falling-tone words. Table 1 also
shows that, as expected, F2-F1 values were much closer in 2.2. Results
rounded vowels (333 Hz difference in [u] tokens and 385 Hz
difference in [o] tokens) than in unrounded vowels (1024 Hz To determine assimilation patterns that would serve as a
difference in [ɯ] tokens and 823 Hz difference in [ɤ] tokens) predictor for discrimination accuracy in Experiment 2, catego-
(Stevens, 1972). The relative convergence of F1 and F2 for- rization responses from Experiment 1 were analyzed by Viet-
mants in [u o] corresponds to vowel focalization as compared namese vowel. All statistical analyses were conducted in
to [ɯ ɤ] (Polka & Bohn, 2011; Schwartz et al., 2005). While SPSS (IBM Corp., 2016); the p = 0.05 level of significance
there was some overlap in F3 values and in F1 values, espe- was adopted. If the same keyword was chosen in 50% or more
cially in vowels [u] and [ɯ], the vowel profiles defined by F2-F1 of the vowel categorizations, the vowel was considered Cate-
and F3-F1 were non-overlapping and distinct from each other. gorized; otherwise it was considered Uncategorized. Based
on this approach, an assimilation type in a vowel pair was
labeled as TC, UC, or UU. If each vowel in a pair was catego-
2.1.3. Procedure rized 50% of the time as the same English vowel category,
A total of thirty-six stimuli were used in the categorization then the goodness-of-fit ratings for each category responses
test: four vowels ! three words ! three tokens of each word. were examined in a Mann-Whitney test to determine whether
The stimuli were presented in random order for two kinds of the assimilation type was SC or CG. In addition to the categor-
auditory evaluation. First, participants were asked to catego- ical labeling of the assimilation patterns in the [u]-[o], [ɯ]-[ɤ],
rize a vowel in each stimulus as an instance of some English [ɯ]-[u], and [o]-[ɤ] vowel pairs, interval-scale overlap scores
vowel category in a seven-alternative forced-choice task. The for these vowel pairs were computed.
seven response categories were represented by keywords
GOOSE for /u/, GOAT for /oʊ/, HAWK for /ɔ/, PUT for /ʊ/, 2.2.1. Average categorization responses
BUS for /ʌ/, POT for /ɑ/, and NURSE for /ɝ/. In a pilot study, Proportions of individual selections of English keywords
a separate group of 21 participants was asked to categorize were calculated out of nine trials of a vowel categorization by
native English vowels using these keywords. Their accuracy every listener. When compiled, these individual scores totaled
was similar to previously reported results (Jacewicz & Fox, 343 data points per vowel (seven response categories ! 49
2012): 93–99% accuracy for words with /u ʊ ɝ ʌ/ and 40– listeners). Table 2 shows average categorization responses
58% accuracy for words with /ɔ ɑ/. The same group of pilot par- by the listeners. Under the 50% consistency criteria, the
ticipants was also asked to categorize the target Vietnamese [u o ɤ] vowels were Categorized: both [u] and [o] were
vowels by using a larger number of keywords, corresponding perceived as similar to the English /oʊ/ category; [ɤ] was
to all SUSE vowel categories. Only seven of the aforemen- perceived as similar to the English /ʌ/ category. In contrast,
tioned keywords were used at least once by the pilot group. the [ɯ] vowel was Uncategorized. For each of the Vietnamese
Therefore, only these seven were adopted in the current vowels, however, two or three English keywords were chosen
experiment. above the chance level of 14.3%. Considering all above-
Each participant performed the test individually in the labo- chance categorizations, [u] was perceived as similar to
ratory. The test was preceded by a short training session con- /u/ and /oʊ/; [o] was perceived as similar to /oʊ/and /ɔ/; [ɤ]
sisting of four trials with Vietnamese words different from the was perceived as similar to /ʌ/ and /ʊ/; and [ɯ] was perceived
ones used in the experiment. Experimental trials were pre- as similar to /ʌ/, /ʊ/, and /u/.
sented through headphones via Praat interface (Boersma & The categorizations in each row of Table 2 were further
Weenink, 2017). Upon hearing a Vietnamese word, partici- examined in four separate Kruskal-Wallis tests (one per vowel)
pants responded by clicking one of the seven keyword buttons to provide a statistical analysis of observed patterns. Kruskal-
on a computer screen. Then, immediately after, they rated the Wallis tests were used because they accommodate more than
token for goodness-of-fit of the just-identified English category two response categories as compared to other non-parametric
Table 2
Vietnamese vowel categorizations, averaged over 343 individual scores per vowel. Responses above the chance level of 14.3% are in bold. Average goodness-of-fit ratings are in
parentheses (the higher the rating, the better the fit).
Vietnamese Vowel % Response

ʊ
/u/ GOOSE /o / GOAT /ɔ/ HAWK /ʊ/ PUT /ʌ/ BUS /ɑ/ POT /ɝ/ NURSE
[u] 25 (4.1) 58* (4.5) 8 (4.2) 5 (2.9) 1 (3.5) 2 (3.3) 1 (3.0)
[o] 1 (2.5) 59** (4.6) 20 (4.8) 6 (3.3) 2 (2.8) 11 (4.3) 1 (3.0)
[ɯ] 22 (3.7) 10 (4.0) 1 (1.8) 32 (3.4) 26 (3.3) 2 (2.6) 7 (2.9)
[ɤ] 0 (2.0) 1 (3.7) 4 (2.8) 18 (4.6) 70** (5.1) 3 (3.3) 3 (3.9)
Note. In each row, asterisks indicate significant difference in Mann-Whitney tests with Dunn-Bonferroni corrections between the above-chance proportions of responses at the p = 0.001
level (**) and at the p = 0.05 level (*).
tests. Kruskal-Wallis tests on proportions of all response types GOAT, respectively. The assimilation type for [ɯ]-[ɤ] may be
yielded significant outcomes: [H(6) = 201.51, p < 0.001] for [u], CG, in which [ɤ] phones are perceived as more similar to BUS
[H(6) = 156.30, p < 0.001] for [o], [H(6) = 111.14, p < 0.001] for than [ɯ] phones. The assimilation type for [o]-[ɤ] was still TC,
[ɯ], and [H(6) = 216.49, p < 0.001] for [ɤ]. Listeners’ prefer- in which [ɤ] phones were associated with BUS and [o] phones
ences in the above-chance responses only (see the asterisk were associated with either GOATor HAWK above chance level.
notations in Table 2) were examined in Mann-Whitney pairwise Thus, if discrimination predictions are based on above-chance
comparison tests with the Dunn-Bonferroni corrections for mul- associations (i.e., the SC < CG < TC discrimination accuracy
tiple comparisons. These examinations showed that [u] was ranking), then the most challenging Vietnamese contrasts for
categorized more frequently as GOAT than GOOSE English listeners should be [u]-[o] and [ɯ]-[u]; the [ɯ]-[ɤ] con-
[U = 52.36, p = 0.047], [o] was categorized more frequently trast should be less challenging; and the [o]-[ɤ] contrast should
as GOAT than HAWK [U = 88.44, c001], and [ɤ] was catego- be the least challenging.
rized more frequently as BUS than PUT [U = 86.57, Lastly, to account for all categorization responses in a par-
p < 0.001]. Because both [u] and [o] were predominantly cate- ticular Vietnamese vowel pair, an overlap score was calculated
gorized as GOAT, their goodness-of-fit ratings for the GOAT as the sum of per-category overlaps when two vowels were
responses were compared in a Mann-Whitney test and found categorized as similar to one or more English vowel category
to be not significantly different at the 0.05 level. (Levy, 2009). This approach has the advantage of accounting
As for [ɯ] categorizations, the Kruskal-Wallis tests showed for more than one pattern of responses on an interval scale,
that the differences between selections of PUT-BUS, PUT- which is particularly useful when the responses are distributed
GOOSE, and BUS-GOOSE keywords were not significant at in a bimodal (e.g., 25%–58% for GOOSE-GOAT categoriza-
the 0.05 level. Goodness-of-fit ratings were consequently exam- tions of [u]) or multimodal (e.g., 22%–32%–26% for GOOSE-
ined and also found to be not significantly different at the 0.05 PUT-BUS categorizations of [ɯ]) manner, which suggests a
level in any of the PUT-BUS, PUT-GOOSE, and BUS-GOOSE multiple category membership. When the overlap score metric
comparisons. These results confirmed that [ɯ] was not assimi- was used for quantification of perceptual assimilation, Viet-
lated to any English vowel category in terms of frequency of namese vowel pairs could be ordered along a continuum from
selected responses and goodness-of-fit judgments. most to least similar. Refer to Table 2 for an illustration of the
Taken together, this set of analyses suggests the following overlap score calculation. Vietnamese [u] and [o] vowels were
L2-to-L1 assimilation types for the Vietnamese vowel pairs: a assimilated to English /u/ in 25.3% and 1.4% of all responses,
SC assimilation for [u]-[o], an UC assimilation for [ɯ]-[u] and respectively. The overlap here (i.e., the portion that overlaps
[ɯ]-[ɤ], and a TC assimilation for [o]-[ɤ] (Best, 1995). If discrim- between 25.3% and 1.4%) was 1.4%, the smaller percentage
ination predictions are based on predominant associations (i.e., of the two. Next, the overlap in assimilations of the same two
the SC < TC = UC discrimination accuracy ranking), then the vowels to English /oʊ/ was 57.8%. In this manner, the overlap
most challenging contrast for native English listeners should in assimilations of these two vowels was determined for each
be [u]-[o]. The other three contrasts should be equally easy to of the available English categories. An overall overlap score
distinguish. However, multiple category membership was was calculated as the sum of per-category overlaps. Thus,
observed in above-chance keyword selections (see responses the overlap score for Vietnamese [u] and [o] was 75.5%: 1.4
in bold in each row of Table 2). Although [ɯ] was determined + 57.8 + 7.7 + 5.4 + 0.9 + 1.8 + 0.5.
to be Uncategorized, the PUT, BUS, and GOOSE keywords The calculations showed that the overlap scores in
were chosen at the above-chance level of 14.3%. Although, responses of 49 listeners were: 80% for [u]-[o]; 52% for
[u o ɤ] were determined to be Categorized, each of these vowels [ɯ]-[ɤ]; 41% for [ɯ]-[u]; and 17% for [o]-[ɤ]. The higher the
had a secondary, above-chance association with another cate- score, the higher the percentage of errors that was expected
gory as well. These observations suggest individual differences in vowel discrimination (Levy, 2009). Thus, the discrimination
in L2-to-L1 categorizations, which may influence the prediction accuracy ranking based on the overlap scores was expected
of discrimination accuracy in L2 vowels pairs. to be [u]-[o] < [ɯ]-[ɤ] < [ɯ]-[u] < [o]-[ɤ].
To account for these secondary associations, Mann-
Whitney tests comparing same-category responses selected 2.2.2. Individual categorization responses
at the above-chance level for pairs of Vietnamese vowels were Multimodal distributions of categorization responses sug-
conducted with the following results. The BUS response was gested that listeners varied substantially in their categorization
selected more frequently for [ɤ] than for [ɯ] [U = 52.49, strategies, and this individual variation could have been
p < 0.001], and the [ɤ]-[ɯ] goodness-of-fit ratings to BUS were obscured in group-level assimilation patterns. The total number
significantly different [U = 128.41, p < 0.001]. Neither the differ- of individual responses per vowel was relatively small, only nine.
ence between categorizations of [u] and [ɯ] as GOOSE, nor Under the 50% criterion, a Vietnamese vowel was considered
the difference between categorizations of [u] and [o] as GOAT Categorized when it was judged to be similar to a particular Eng-
was significant at the 0.05 level. When goodness-of-fit ratings lish keyword in five trials or more; otherwise, it was considered
for these vowel pairs were examined in Mann-Whitney tests, Uncategorized. Some individual responses had a bimodal distri-
neither of the tests yielded significant results. bution of 5-4 judgments splits between two English keywords
Taken together, this set of analyses suggests that L2-to-L1 (6% of all trials) and 4-4-1 splits among three English keywords
association patterns were not exactly the same for any of the (1% of all trials). Bimodal categorization may be a strategy
vowels, even in cases when the predominant associations were adopted by some listeners. However, a larger number of individ-
similar, as in [u o] assimilating to /oʊ/. The assimilation type for ual responses than nine would be necessary to reliably confirm
[ɯ]-[u] and [u]-[o] may be SC, associating with GOOSE and the existence of this strategy within a listener.
Table 3
Individual categorizations of Vietnamese vowels determined by a 50% consistency criterion ("5 responses in the same category out of nine trials). The last column of responses represents
an uncategorized (UC) assimilation type, in which responses were distributed over several English categories with no response frequency larger than five.
Vietnamese Vowel Categorized as English (Number of Listeners) UC

ʊ
/u/ /o / /ɔ/ /ʊ/ /ʌ/ /ɑ/ /ɝ/
[u] 7 30 4 1 – – – 7
[o] – 33 6 – – 4 – 6
[ɯ] 10 – – 14 7 – 3 15
[ɤ] – – 2 5 38 – – 4
Table 4 selection of keywords. A multinomial logistic regression analy-

A summary of individual assimilation patterns in 49 listeners by Vietnamese vowel contrast sis was conducted on individual categorization responses for
based on individual categorizations of Vietnamese vowels at the 50% consistency
criterion.
each Vietnamese vowel, in which seven English keywords
represented multiple levels of the dependent variable. Predic-
Assimilation Contrast
tor variables included four stimulus variables: vowel duration
Type [o]-[ɤ] [ɯ]-[u] [ɯ]-[ɤ] [u]-[o]
as well as F1, F2, and F3 measured at a vowel midpoint and
TC 40 24 21 14
converted to the Bark scale using the formula proposed in
UC 8 15 17 7
UU 1 3 1 3 Traunmüller (1990). Predictor variables also included two lis-
SC – 5 2 22 tener variables: previous exposure to the rounding contrast
CG – 2 8 3
(a) in front vowels through learning L2 French or German
Note: TC = Two-Category, UC = Uncategorized-Categorized, UU = Uncategorized- and (b) in back vowels through taking the discrimination task
Uncategorized, SC = Single-Category, CG = Category Goodness.
first in the present study. Based on the expectation that rela-
tively more peripheral (but acoustically close) vowels may
Individual categorizations of Vietnamese vowels are shown serve as referent vowels in perception, GOAT was used as
in Table 3. As expected, a great deal of inter-listener variability a referent response category in multinomial regressions for
was observed. Table 3 shows that, except for [ɯ], the majority [u o ɯ], and HAWK in the regression for [ɤ]. Significant contri-
of listeners exhibited strong preferences in their categorization butions of predictor variables in each of the models are shown
of the target Vietnamese vowels: 61% of listeners categorized in Table 5.
Vietnamese [u] as most similar to English /oʊ/; 67% of listeners Overall, regression models with six predictor variables fitted
also categorized [o] as /oʊ/; and 78% of listeners categorized the response data better than intercept-only models: [v2 (36)
[ɤ] as most similar to /ʌ/. As for the Vietnamese [ɯ] vowel, it = 188.08, p < 0.001] for [u]; [v2 (36) = 86.93, p < 0.001] for [o];
was uncategorized by 31% of the listeners; categorized as /ʊ/ [v2 (36) = 180.44, p < 0.001] for [ɯ]; and [v2 (36) = 98.29,
by 29%, as /u/ by 20%, and as /ʌ/ by 14% of the listeners. The p < 0.001] for [ɤ]. Depending on the Vietnamese vowel pre-
responses to [ɯ] were also more variable considering the round- sented to listeners, the significance of predictor contribution
ing cue than the responses to the other vowels. An examination in the model varied. For example, almost none of the predictors
of listeners' IDs showed that listeners who picked the less pop- had a significant contribution in differentiating listeners'
ular keywords were not the same across all four Vietnamese responses to [ɤ] with /ɔ/ as a reference category. On the other
vowel categorizations, suggesting that none of the listeners hand, most predictors were significant in listeners' responses
was an outlier with an idiosyncratic categorization strategy. to [u] with /oʊ/ as a reference category. These results show that
A tally of individual assimilations is shown in Table 4. The the listeners relied on different (and complex) sets of acoustic
[o]-[ɤ] pair showed TC, UC, and UU patterns of assimilation cues while selecting their answers for each of the Vietnamese
(compared to just the TC pattern in the group-level analysis). vowels.
Three other vowel pairs showed all five types of assimilation The results also show that both listener variables played a
in individual listeners. Both TC and UC assimilations were fre- role in the selection of some keywords. Listeners who had pre-
quent in the [ɯ]-[u] and [ɯ]-[ɤ] vowel pairs. This suggests that vious exposure to a rounding contrast in front or back vowels
the difference between TC and UC may be obscured in group- categorized [u] more frequently as GOAT than GOOSE. This
level analyses. Furthermore, the UU assimilation type is also means that [u] and [o] in these listeners were likely to show
obscured when only average data are considered. a SC assimilation pattern to GOAT. They also categorized
In addition to determining individual categorization [ɯ] more frequently as BUS than GOAT, which suggests that
responses and assimilation patterns, individual overlap scores they may be more sensitive to the rounding contrast in the
for each of the four contrasts were also calculated (see Appen- Vietnamese back vowels than listeners without such experi-
dix). The overlap scores ranged from 0% to 56% for TC assim- ence. This also means that [ɯ] and [ɤ] in these listeners were
ilations, 0–78% for UC assimilations, 22–78% for UU likely to show a SC or a CG assimilation pattern to BUS,
assimilations, 56–78% for CG assimilations, and 56–89% for whereas [ɯ]-[u] were likely to show a TC assimilation.
SC assimilations.
2.2.3. Factors influencing categorization 3. Experiment 2: Vowel discrimination

While the analyses above were used to derive predictions
for Experiment 2, the last set of analyses below was explora- In this experiment, discrimination was tested for four
tory in order to examine what factors might influence the Vietnamese vowel contrasts with regard to predictions based
Table 5
Significant effects of predictor variables at the p # 0.05 level in four multinomial logistic regressions on listener responses by Vietnamese
vowel. The reference category (shaded in the table) was HAWK for [ɤ], and GOAT for [u o ɯ].
on the assimilation types and overlap scores for each of the in which stimuli were presented in triads to test a single con-
Vietnamese vowel pairs determined in Experiment 1. trast, but no two stimuli in a triad were exactly the same. In
each trial of the oddity task (e.g., [tu1]-[tɯ1]-[tu2]), the partici-
3.1. Method pant heard three stimuli, two of which were different tokens
of a word representing one vowel category (e.g., [tu]), and
3.1.1. Participants one of which was a token of a word representing another vowel
Out of 49 participants in the categorization task, 30 com- category. In the example given, the second stimulus was an
pleted an additional discrimination task (17 females, 13 males; oddball because it contained a vowel different from the first
age 18–25), including eight speakers who self-reported a and the third stimuli.
beginning or intermediate proficiency in French (see The serial position of the oddball was not fixed, which
Appendix). increased the overall difficulty of the task (Flege, 2003). The
oddball stimulus occurred with equal frequency in all three pos-
3.1.2. Stimuli and task sible positions in a triad, resulting in six trial types for each
Vowel discrimination in four minimal pairs — [u]-[o], [ɯ]-[ɤ], vowel pair A-B: ABB and BAA (first stimulus was an oddball),
[o]-[ɤ], and [u]-[ɯ] — was tested in an oddity task (Flege, 2003; ABA and BAB (second stimulus was an oddball), AAB and
Macmillan & Creelman, 2005). Each of these pairs differed in BBA (third stimulus was an oddball). The number of oddball tri-
one phonological feature: [u]-[o] and [ɯ]-[ɤ] differed in vowel als for each vowel pair was 24: two words per an oddball vowel
height; [o]-[ɤ] and [u]-[ɯ] differed in rounding. To create the (e.g., [tɯ], [tʰɯ]) ! two tokens representing a different vowel
test, eight Vietnamese words with level tones were selected: category (e.g., [tu1], [tu2]) ! six trial types. In these trials, if
two words with each of the [u], [ɯ], [o], and [ɤ] vowels (see the oddball was chosen, then the response was considered
Table 1). Three tokens (productions) of each word were used correct. It was assumed that the listeners would minimize
to construct test trials in the categorial discrimination format within-category differences (e.g., [tu1] versus [tu2]) and
Table 6
Discrimination accuracy in four Vietnamese contrasts predicted from (a) assimilation types under the 50% categorization consistency criterion (Best, 1995); (b) assimilation types under the
above-chance criterion of 14.3%; and (c) overlap scores (Levy, 2009). Discrimination accuracy ranking is expected to be TC = UC > CG > SC and to negatively correlate with the overlap
scores.
Vietnamese Contrast Assimilation Overlap Score %(c) Discrimination Accuracy % (SD)

(a) (b)
Predominant Above-Chance
[o]-[ɤ] TC /oʊ/-/ʌ/ TC /oʊ/-/ʌ/ 17 90 (11.5)
[ɯ]-[ɤ] UC /?/-/ʌ/ CG /ʌ/ 52 89 (12.8)
[ɯ]-[u] UC /?/-/oʊ/ SC /u/ 41 83 (19.1)
[u]-[o] SC /oʊ/ SC /oʊ/ 80 84 (13.3)
Note: TC = Two-Category, UC = Uncategorized-Categorized, SC = Single-Category, CG = Category Goodness. The symbol /?/ is used to indicate the lack of the predominant
assimilation pattern for Vietnamese [ɯ].
maximized between-category differences (e.g., [tu1] versus selected, the participant moved to the next trial by clicking an
[tɯ2]) by basing their decisions on contrastive acoustic cues. ‘ok’ button. The inter-stimulus and inter-trial intervals were
Chance performance was 1/3. 1.5 s. Relatively long intervals were used to decrease the like-
In addition to the oddball trials, the test also included catch lihood of auditory-based judgments and increase demands
trials which consisted of three physically different productions related to phonetic encoding (Flege, 2003; Polka & Bohn,
of the same word. Catch trials were used to further encourage 2011). Participants' responses and RTs were recorded. RT
participants' responses to phonetically relevant (and not only was measured from the beginning of the first stimulus sound.
auditorily detectable) differences between the stimuli (Flege,
2003). The participants were naïve to Vietnamese at the onset
3.2. Results
of the study, but it was expected that they would start discern-
ing Vietnamese vowel contrasts as the experiment pro- 3.2.1. Vowel discrimination in oddball trials
gressed. In other words, their sensitivity to the differences The first set of analyses was used to determine whether
between the words containing different vowels was expected there were accuracy and RT differences in discrimination of
to increase, while their sensitivity to the variation among differ- four Vietnamese contrasts. Listeners’ responses in oddball tri-
ent productions of the same word was expected to decrease. A als constituted 2880 data points (96 trials ! 30 listeners). Aver-
statistical method to assess individual participant’s response age discrimination accuracy by contrast is shown in the last
bias in the oddity task is not currently available (Macmillan & column of Table 6, where the predictions based on Experiment
Creelman, 2005), but the inclusion of catch trials may reduce 1 results are summarized in columns 2–4 for convenience.
potential response biases (Flege, 2003). Binary accuracy data (correct or incorrect response in each
trial) were fitted to a generalized mixed-effects model with logit
3.1.3. Procedure link function and binomial distribution (IBM Corp., 2016).
The participants took the categorization test and discrimina- Parameters included the following fixed factors and their inter-
tion test in counter-balanced order; that is, half of them com- actions: contrast (4), oddball vowel (4), position of an oddball
pleted Experiment 1 first. The discrimination test consisted of vowel in a triad (3), experience with the rounding contrast via
144 trials (four vowel pairs ! 24 oddball trials + four vow- foreign language learning (2), experience with the rounding
els ! 12 catch trials), which were presented in a random order. contrast via taking the categorization test first (2). Random fac-
For each trial, participants were asked to select a stimulus that tors (with random slopes) included listener and trial number.
they perceived as containing a different vowel from the other This analysis yielded significant effects of contrast [F
two. They selected the ‘first,’ ‘second,’ or ‘third’ response but- (3,2856) = 4.55, p = 0.003] and oddball vowel [F(3,2856)
ton on a computer screen indicating the serial position of the = 3.47, p = 0.016], as well as an interaction between the odd-
oddball stimulus. Participants were instructed to select the ball vowel and its position in a triad [F(6,2856) = 3.98,
'none' button if they thought that all three stimuli in a triad p = 0.001]. The effect of listener was also significant
had the same vowel, analogous to the English examples bat- [Z = 2.87, p = 0.004]; other effects and interactions were not.
bat-bet versus bat-bat-bat. They were instructed to respond The significant effect of contrast was further investigated in
as quickly and accurately as possible; however, the response pairwise comparisons with Bonferroni corrections for multiple
time (RT) was not limited. After a response button was comparisons. All contrasts were compared to [u]-[o], which
Table 7
Discrimination accuracy (%) in four Vietnamese contrasts by trial type. Each contrast had 720 responses (six trial types ! 30 listeners).
Contrast Trial Type Effect of Mean

[A], [B] ABB BAA BAB ABA BBA AAB Trial Type Accuracy
[o], [ɤ] 88 93 93 85 98 83 p = 0.002 90
[ɯ], [ɤ] 84 93 91 83 89 91 p = 0.089 89
[u], [ɯ] 77 82 80 85 92 83 p = 0.034 83
[u], [o] 78 84 87 83 77 93 p = 0.022 84
Note: Discrimination accuracy in triads where the vowel category changes from a less peripheral to a more peripheral ([ɤ] to [o], [ɯ] to [u]) are underlined.
had the highest overlap score and was predicted to be the did not fall into the M ± 2.5 SD distribution of log-transformed
most challenging for listeners. As expected, the [u]-[o] contrast RTs (5.1% of the data) were removed. A regression analysis
was discriminated with lower accuracy than the [ɯ]-[ɤ] contrast yielded one significant result, an effect of the position of an
[t = 2.17, p = 0.030] or the [o]-[ɤ] contrast [t = 2.83, p = 0.005]. oddball vowel in a triad, [F (2,2746) = 8.78, p < 0.001]. Specif-
The difference between [u]-[o] and [ɯ]-[u], however, was not ically, when the target vowel was in a triad-final position, its dis-
significant. This result suggested that not all contrasts in crimination was faster than when it was in the triad-initial or
rounding such as [ɯ]-[u] were discriminated more accurately triad-medial positions, [t = $2.46, p = 0.014] and [t = $3.98,
than contrasts in height such as [u]-[o], although for the p < 0.001]. Not surprisingly, the effect of listener was also sig-
[o]-[ɤ] and [ɯ]-[ɤ] contrasts this prediction was borne out. nificant [Z = 3.27, p = 0.001]. Because the effect of the oddball
The percentage of correct responses by contrast is shown in vowel and its interaction with its position in a triad were not sig-
Tables 6 and 7. nificant, further analysis of RTs was not conducted.
The significant effect of the oddball vowel was also further
examined in pairwise comparisons where all vowels were com- 3.2.2. The relationship between category overlap and discrimination
pared to [u], which was the most peripheral among the target The next set of analyses was conducted to examine
Vietnamese vowels. The comparisons showed that when [u] whether assimilation patterns (a discrete variable with the five
was an oddball, contrasts were discriminated with lower accu- levels of TC, UC, UU, SC, and CG) and overlap scores (a con-
racy than when [o] and [ɤ] were oddballs, [t = 2.26, p = 0.024] tinuous variable) predict discrimination accuracy. The propor-
and [t = 2.86, p = 0.004], respectively. The difference between tions of correct discrimination answers were calculated for
[u] and [ɯ] was not significant. each contrast and individual. These proportions were submit-
A significant interaction between the oddball vowel and its ted to two generalized mixed-effects models (identity link, nor-
position in a triad was further examined in separate logistic mal distribution). In the first model, assimilation type was an
regression analyses by contrast. To utilize the task design independent variable; in the second model, overlap score
and simplify the model, trial type was used as a fixed factor was an independent variable. In both models, vowel pair (4)
instead of the oddball vowel and position factors (Table 7). was another fixed effect (an interaction with the independent
Because listener experiential factors and trial number factor variable of interest included) and listener was a random effect
were not significant in the omnibus regression analysis (slopes included).
described above, they were excluded from the analyses by A regression analysis on proportions of listeners' accuracy
contrast. The outcomes of these analyses were as follows. in oddball trials with assimilation type as an independent vari-
First, the analysis of discrimination accuracy for [o]-[ɤ] able yielded a significant effect of assimilation type [F(4,99)
yielded a significant effect of trial type [F(5,714) = 3.81, = 8.91, p < 0.001], vowel pair [F(3,99) = 4.99, p = 0.003], and
p = 0.002] and listener [Z = 2.22, p = 0.026]. The [ɤ]-[ɤ]-[o] tri- their interaction [F(9,99) = 3.23, p = 0.002]. The effect of lis-
als were discriminated with the highest accuracy of 98%, which tener was also significant [Z = 2.36, p = 0.018]. Pairwise com-
was significantly different from the 88% accuracy in the parisons showed that accuracy was lower in SC assimilations
[o]-[ɤ]-[ɤ] trials [t = 2.72, p = 0.007]. The differences among than in TC [t = $6.42, p < 0.001], in UC [t = $4.97, p < 0.001],
other trial types were not significant. Second, the analysis for and in UU assimilations [t = $3.85, p = 0.002]. Furthermore,
[ɯ]-[ɤ] did not show a significant effect of trial type; however, accuracy was lower in CG assimilations than in TC assimila-
the effect of listener was significant [Z = 2.22, p = 0.028]. Third, tions [t = $3.12, p = 0.017]. Fig. 2 shows that the overall accu-
the analysis for [ɯ]-[u] yielded a significant effect of trial type racy ranking by assimilation type from easiest to most difficult
[F(5,714) = 2.42, p = 0.034] and listener [Z = 2.81, p = 0.005]. was: TC = UC = UU > CG = SC. However, this ranking
The [ɯ]-[ɯ]-[u] trials were discriminated with the highest reflected mostly the relationship between discrimination
accuracy of 92%, which was significantly different from the accuracy and assimilation types in the [ɯ]-[u] vowel pair. In
77% accuracy in the [u]-[ɯ]-[ɯ] trials [t = 2.34, p = 0.001]. the [u]-[o] pair, such a ranking was not observed. In the other
The differences among other trial types were not significant. two pairs, SC and/or CG assimilations are missing, therefore,
Fourth, the analysis for [u]-[o] yielded a significant effect of trial a ranking among the five assimilation types cannot be
type [F(5,714) = 2.65, p = 0.022] but not listener. The [u]-[u]-[o] established.
trials were discriminated with the highest accuracy of 93%,
which was significantly different from the 78% accuracy in
the [u]-[o]-[o] trials [t = 2.30, p = 0.003].
In Table 7, discrimination accuracy in triads where the vowel
category changes from less peripheral to more peripheral ([ɤ]
to [o] and [ɯ] to [u]) are underlined. In line with predictions from
the Natural Referent Vowel model (Polka & Bohn, 2011), Eng-
lish listeners performed slightly better in such triads where the
oddball vowel was in the last position (i.e., BBA trials types) as
compared to triads where the direction of the change was
reversed (ABB or AAB trial types).
In addition to discrimination accuracy, log-transformed RTs
were examined in similar analyses, except that the generalized
linear mixed model had identity link and normal distribution Fig. 2. Mean discrimination accuracy in oddball trials by vowel pair and assimilation
(IBM Corp., 2016). Prior to statistical analyses, outliers that type. Error bars indicate ± 1 standard error.
A regression analysis on proportions of listeners’ accuracy Both the Vietnamese [u] and [o] were predominantly associ-
in oddball trials with overlap score as an independent variable ated with /oʊ/. This SC assimilation is not surprising because
yielded a significant effect of overlap only [F(1,108) = 17.24, these Vietnamese vowels are (a) acoustically close to each
p < 0.001]. The effect of listener was also significant other, and (b) acoustically closest to SUSE’s /oʊ/ (Fig. 1). In
[Z = 2.39, p = 0.017]. An unexpected finding was that there particular, the perception of [u] as /oʊ/ may be driven by the
was no effect of vowel pair on discrimination accuracy in this acoustic characteristics of /u/ in the listeners’ L1 dialect, SUSE,
model. in which it is fronted. This finding is consistent with previous
A two-tailed correlation analysis of discrimination accuracy research showing that regional differences in L1 influence
scores for each contrast in Experiment 2 and overlap scores the perception of L2 vowels, and thus cross-language compar-
for each contrast in Experiment 1 yielded a significant correla- isons need to be dialect-specific (Chládková & Podlipský,
tion of moderate strength [q = $0.386, p < 0.001]. This result 2011; Escudero et al., 2012).
indicates a negative relationship trend between the two vari- Even though Vietnamese [ɯ] is acoustically closest to Eng-
ables: namely, the higher the overlap score in the perception lish /ʊ/ and /oʊ/, it did not show a predominant association with
of two non-native vowels, the less accurately the listeners dis- either of these vowels. One might posit that the difference in
criminated the vowels. rounding between [ɯ], on one hand, and /ʊ/ and /oʊ/ on the
other, limited such associations. However, F2 and F3, the
vowel formants that index rounding, were not significant pre-
4. General discussion dictors of listeners' choice of the keywords with English vowels
similar to [ɯ], except for the significant F2 effect in the GOOSE
The goal of the present study was to examine cross- versus GOAT choice. Both of these keywords, however, have
language perception of non-low back rounded and unrounded rounded vowels (Table 5). This result casts doubt on whether
vowels by listeners naïve to a rounding contrast in back vow- the mapping of [ɯ] on to /ʊ/ or /oʊ/ was limited because of
els. Predictions of non-native phone categorization and dis- the differences in rounding. An explanation of why [ɯ] was
crimination were based mainly on the PAM approach to not associated with the /oʊ/ category may be that differences
cross-language speech perception (Best, 1995). Predictions in formant trajectories, for example, the diphthongization of
arising from perception bias when rounded back vowels are /o/, set these vowels apart, a conclusion also supported by
involved were also examined (Lisker, 1988; Polka & Bohn, the significant role of vowel duration in predicting listeners'
2003, 2011; Schwartz et al., 2005; Stevens, 1972). choice of BUS versus GOAT in [ɯ] categorizations (Table 5).
An explanation of why [ɯ] was not predominantly associated
with the /ʊ/ category may rely on the variability in the percep-
4.1. Acoustic similarity as a predictor of cross-language associations
tion of /ʊ/-rounding (Cruttenden, 2014; Ladefoged &
The findings in the current study were consistent with previ- Johnson, 2015), as related to the articulation of rounding
ous research in that acoustic similarity was one, but not the (i.e., the degree of lip pursing versus lip protrusion).
only, factor determining perceived similarity between vowels. Above-chance but not predominant assimilations of [ɯ] did
Based on a comparison of acoustic characteristics of Central include /ʊ/, /ʌ/, and /u/ categories. Possible explanations for
Vietnamese [u o ɯ ɤ] vowels and vowel categories in Southern why these three SUSE vowel categories do not invoke consis-
U.S. English in terms of rounding and F1/F2 values (Fig. 1), tent (predominant) associations with the Vietnamese [ɯ]
one might predict the following L2-to-L1 assimilation patterns: include: (a) the quality of rounding in /ʊ/ and free /ʊ/-/u/ varia-
[u] assimilated to /oʊ/; [o] to /oʊ/; [ɯ] to /ʊ/ or /oʊ/; and [ɤ] to tion in some words, (b) a much lower vowel height in /ʌ/, and c)
/ʌ/ or /ʊ/. Predominant categorization responses showed that the fronting of /u/ in SUSE. Thus, the uncategorized status of
at the group level, three predicted L2-to-L1 assimilation pat- Vietnamese [ɯ] in native SUSE listeners may be attributed
terns were borne out: [u] assimilated to /oʊ/, [o] to /oʊ/, and to acoustic–phonetic detail in the realization of English close
[ɤ] to /ʌ/. The [ɯ] vowel did not show a predominant assimila- and close-mid back vowels. Acoustic comparisons between
tion to any SUSE vowel category. L2 and L1, however, have limited explanatory power because
When above-chance responses were considered at the the relationship between segment acoustics and segment per-
group level, none of the Vietnamese vowels were perceived ception by listeners is not straightforward (Bohn, 2018).
in a similar way: [u] was assimilated to /oʊ/ and /u/, [o] was
assimilated to /oʊ/ and /ɔ/, [ɤ] was assimilated to /ʌ/ and /ʊ/; 4.2. Perceived similarity in non-low back vowels
and [ɯ] was assimilated to /ʊ/, /ʌ/, and /u/ (Table 2). Thus,
the multiple category membership effect describes the current Unlike articulatory/acoustic similarity, perceived L2-to-L1
data well (see Harnsberger, 2001). This finding supports fur- similarity has to be first established in categorization experi-
ther exploration of the acoustic similarity between L2 phones ments. The best example of the difference between acoustic
and L1 categories in terms of acoustic cue weighting, beyond similarity and perceived similarity is the case of Vietnamese
determining the “closest” L2 phone – L1 category pairings. [ɯ] perception described above. Other relevant results are
Although the current study was not designed to investigate summarized in Tables 2 and 3.
acoustic cue weighting in vowel perception, acoustic charac- Previous research suggested that back unrounded vowels
teristics of the Vietnamese vowel stimuli were shown to have such as [ɯ] and [ɤ] may be perceptually less salient than their
different contributions in the perception of the [u o ɯ ɤ] vowels rounded counterparts, and listeners may be biased to perceive
(Table 5). The cue weighting approach is further supported by them as rounded (Lisker, 1988; Stevens, 1972). Results from
individual variability in perception. the current study did not provide strong evidence for this
hypothesis. The [ɤ] vowel was categorized most consistently 4.3. Discrimination of non-native back vowels
as similar to /ʌ/, which is not rounded or focalized. Only two lis-
teners categorized it predominantly as /ɔ/, a vowel that is The PAM approach was used in the current study to
rounded and more focalized than /ʌ/ (Table 3). As for the [ɯ] examine the relationship between L2 vowel categorization
vowel, the inconsistency of categorizations at the group level and discrimination (Best, 1995; Harnsberger, 2001; Tyler
was described in the previous section. None of the /ʊ/, /ʌ/, or et al., 2014). Vietnamese vowel categorizations were used to
/u/ vowels are focalized in SUSE. determine assimilation types in the Vietnamese vowel con-
In summary, SUSE listeners were not biased to associate trasts, which in turn were used to test the following ranking in
Vietnamese [ɯ] and [ɤ] with the more peripheral /oʊ ɔ/ cate- discrimination accuracy of vowel pairs, from easiest to most
gories rather than with the less peripheral /u/, /ʊ/, and /ʌ/ cate- difficult: TC = UC > CG > UU > SC (Best, 1995; Bohn, 2018).
gories. An extension of the NRV approach to vowel As summarized in Table 6, under the 50% consistency
categorization, where peripheral/focalized vowels act as natu- criterion, discrimination accuracy ranking was expected to be
ral referent vowels and, perhaps, are categorized more consis- [o]-[ɤ] = [ɯ]-[ɤ] = [ɯ]-[u] > [u]-[o]. Under the above-chance
tently (Polka & Bohn, 2003, 2011), was also not supported. criterion, the ranking was expected to be [o]-[ɤ] > [ɯ]-[ɤ] >
Vietnamese [u o] were categorized more consistently than [ɯ]-[u] = [u]-[o]. The observed ranking was [o]-[ɤ] = [ɯ]-[ɤ] >
[ɯ]. However, the categorization of unrounded, non- [ɯ]-[u] = [u]-[o].
peripheral [ɤ] was the most consistent among the four vowels, The important difference between expected and observed
which does not go along with a possible extension of the NRV rankings concerns the [ɯ]-[u] contrast. The accuracy of this
model to categorization. contrast discrimination was not higher than in [u]-[o] and was
While a bias towards perceiving back vowels as rounded not similar to [ɯ]-[ɤ], as predominant assimilation patterns
was not observed, the majority of listeners appeared to use would lead us to expect. The accuracy of this contrast discrim-
rounding as a major acoustic cue in their judgments. For ination was predicted better with the above-chance categoriza-
example, except for one listener, the responses to [u] were tion criterion which classified both [ɯ]-[u] and [u]-[o] as SC
limited to the rounded /u oʊ ɔ/. Similarly, the responses to assimilations. This was likely because the above-chance
[o] were limited to the rounded /oʊ ɔ/, except for four listen- criterion accounts for both predominant and secondary
ers whose /ɑ/-responses arguably may be attributed to the assimilations.
widespread /ɑ/-/ɔ/ merger in U.S. English. Except for two lis- When overlap scores were used to examine the relationship
teners, the responses to [ɤ] were limited to unrounded /ʊ ʌ/. between L2 vowel categorization and discrimination (Levy,
Only, unrounded [ɯ] was perceived as similar to both 2009), discrimination accuracy ranking was expected to be
rounded and unrounded categories, since 24 listeners classi- [o]-[ɤ] > [ɯ]-[u] > [ɯ]-[ɤ] > [u]-[o]. This prediction was partially
fied it as /u/ or /ʊ/, and seven listeners classified it as /ʌ/. supported, but the [ɯ]-[u] contrast did not fit the expected order
Vowel height appeared to play a modest role in categoriza- (Table 6). In summary, discrimination accuracy predictions
tions. Close vowels [u ɯ] were categorized as similar to based on either assimilation patterns or overlap scores were
English close /u/, close-mid /ʊ oʊ/, and open-mid /ʌ/. accurate for L2 vowel pairs in which each member of the pair
Close-mid vowels [o ɤ] were categorized as similar to was predominantly associated with a different (TC) or the
close-mid /ʊ oʊ/ and open-mid /ɔ ʌ/. same (SC) L1 category. When perceived similarity between
Lastly, listeners’ responses showed that their previous two L2 vowels was medium (as indexed by UC assimilations
experience with an L2 that has a rounding contrast in front or medium-range overlap scores), then not all predictions were
vowels or previous short exposure to the target Vietnamese borne out.
vowels via the discrimination experiment influenced their per- It is difficult to compare assimilation patterns in the current
ception of back close vowels contrasting in rounding, that is, study with previous research because the choices of L2 con-
[u] and [ɯ]. The eighteen SUSE listeners who had studied trasts and the L1 background of listeners vary across studies.
French and German tended to associate [u] and [ɯ] with differ- The only comparable study in the PAM framework is Tyler et al.
ent SUSE categories, /oʊ/ and /ʌ/. These associations maintain (2014), in which assimilation of the Thai [ɯ]-[ɤ] contrast was
the rounded/unrounded contrast, show a TC assimilation, and examined in English listeners. Reported assimilation types
thus are likely to be well discriminated by bilingual listeners. for this Thai contrast were TC (1 listener), UC (7 listeners),
This finding suggests that there may be a correlation between CG (1 listener), and UU (4 listeners). Assimilation types for
previous experience with a given contrast in a language, for the Vietnamese [ɯ]-[ɤ] contrast in the current were TC (21 lis-
example, rounded/unrounded front vowels in French, and teners), UC (17 listeners), CG (8 listeners), SC (2 listeners),
application of this generalized contrast experience to phones and UU (1 listener). Thus, the distribution of assimilation types
in another language (e.g., rounded/unrounded back vowels for the same contrast is different in these studies. Whether this
in Vietnamese). Similar findings have been reported with difference is better explained by different categorization con-
regard to a cross-language application of another generalized sistency criteria (70% in Tyler et al., 50% here), a difference
contrast experience, a manner of articulation distinction in the phonetic realization of [ɯ ɤ] in Thai and Vietnamese,
between coronal stops and fricatives (de Jong, Silbert, & or a difference in the L1 dialects of listeners (Northeast U.S.
Park, 2009). This generalized experience was shown to in Tyler et al., Southern U.S. here) is unclear. However, this
explain the correlation between perceptual accuracy in differ- comparison shows that the same predominant assimilation
ent L2 English segments that involve the same stop/fricative patterns for the same phonological contrast cannot be
contrast as in L1 Korean. assumed.
Each of the target L2 vowels in this study was associated tion times was observed for any contrasts and vowel presenta-
with more than one L1 category. That is, they showed a multi- tion orders, whether the more “familiar” vowel, [u] or [o], was
ple category membership, for example, the overall tendency to presented first in a triad or not. Thus, the asymmetry reported
associate Vietnamese [o] either with /oʊ/, as 33 listeners did, or in Ettlinger and Johnson (2009) was not found here. Taken
with /ɔ/, as six listeners did. When individual assimilations were together, these results suggest that the overall familiarity with
tallied by vowel pair, each of the L2 contrasts showed multiple vowel categories in L1 does not increase listeners’ discrimina-
assimilation patterns: all five assimilations occurred for the [u]- tion sensitivity when the phonetic realization differences are
[o], [ɯ]-[u], and [ɯ]-[ɤ] contrasts, and three assimilations substantial, for example, the difference between fronted /u/ in
occurred for the [o]-[ɤ] contrast (Table 4). Thus, this report pro- SUSE and back /u/ in Vietnamese or the difference between
vides additional support for the suggestion that individuals vary diphthongized /oʊ/ in SUSE and monophthongal /o/ in
in their assimilation of vowels, and that individual assimilation Vietnamese. In naïve listeners, vowel discrimination may be
patterns should inform the PAM model (Tyler et al., 2014). based on acoustic characteristics of tokens rather than on
In addition to the relationship between L2 vowel categoriza- the perceived phonetic category of tokens or their phonological
tion and discrimination, three possible asymmetries in discrim- category.
ination were investigated in non-low back vowels. First, it was One interesting outcome in the discrimination experiment
hypothesized that the discrimination of vowels contrasting in was that listeners made their decisions faster when an oddball
rounding – [u]-[ɯ] and [o]-[ɤ] – would be more accurate than vowel was in the triad-final position than when it was in other
the discrimination of vowels contrasting in height – [u]-[o] and positions. The lack of interaction between vowel presentation
[ɯ]-[ɤ] (Lisker, 1988; Stevens, 1972). Results showed that order and vowel identity suggests that listeners attended to
the [u]-[o] contrast was discriminated with the lowest accuracy, all three tokens in a triad faithfully. When they heard that the
but it was not significantly different from the accuracy in [u]-[ɯ]. last token was an oddball, they made their judgments immedi-
The [o]-[ɤ] contrast was discriminated with the highest accu- ately. When the last token was not perceived as an oddball,
racy, but it was not significantly different from the accuracy in they contemplated longer, perhaps “replaying” a triad in their
[ɯ]-[ɤ]. Collectively, these results suggest that vowel pairs con- heads. Notice that accuracy and speed of decision-making
trasting in rounding are not easier to discriminate than vowel may not align because decisions made relatively fast are not
pairs contrasting in height. Therefore, the absence of the necessarily the most accurate.
rounding contrast in L1 back vowels does not necessarily
impede listeners' ability to discriminate this contrast in L2. 5. Conclusions and limitations
Second, discrimination was hypothesized to be easier in the
direction from a less focalized to a more focalized vowel (Polka The present study contributes to the research on cross-
& Bohn, 2003, 2011). Two vowel pairs were indeed discrimi- language speech perception by investigating the perception
nated in this manner: the discrimination accuracy was higher of non-low back vowels contrasting in rounding that have not
in [ɤ]-[ɤ]-[o] than in [o]-[ɤ]-[ɤ] trials, and in [ɯ]-[ɯ]-[u] than in been examined before. The relationship between non-native
[u]-[ɯ]-[ɯ] trials. These findings are in line with the predictions vowel categorization and discrimination was examined mainly
of the NRV model (Polka & Bohn, 2003, 2011). Note that the in the framework of the Perceptual Assimilation Model (Best,
assimilation pattern in the [o]-[ɤ] vowel pair may be classified 1995). In addition, predictions specific to either vowel catego-
as UC or SC at the group level (Table 6), and both [o]-[ɤ] rization or discrimination in other theoretical frameworks were
and [ɯ]-[u] vowel pairs exhibited TC assimilation in the major- also examined in this typologically rare set of [u o ɯ ɤ] vowels
ity of listeners (Table 4). Thus, a less-to-more focalized vowel (Maddieson, 2013). Based on the multiple lines of investigation
asymmetry can be observed in TC assimilations. This result in the present study, we can conclude that the perception of
suggests that NRV asymmetry is not limited to vowel pairs non-low back vowels is a subject of multiple category member-
exhibiting SC assimilations (and perhaps, CG and UU assimi- ship: Each target vowel has its own “perceptual profile,” with no
lations), a proposal in Tyler et al. (2014). This asymmetry can profiles identical, even for Vietnamese [u o] vowels that have
be observed in L2 vowel pairs that “cross a native phonological very similar acoustic characteristics and are predominantly
boundary” such as the TC assimilation type. perceived as most similar to /oʊ/ in Southern U.S. English
Third, due to experience with [u o] vowels in L1 and a cor- listeners. This multiple-category membership and the large
responding lack of experience with [ɯ ɤ], one might hypothe- individual variability among listeners are important for describ-
size that SUSE listeners may be more sensitive to ing and predicting vowel perception.
differences in the [u]-[ɯ] and [o]-[ɤ] pairs, which contain a “fa- Naïve listeners’ perception seems to be more guided by
miliar” vowel, versus the difference in the [ɯ]-[ɤ] pair, where acoustic characteristics of vowels and phonetic processing
both vowels are unfamiliar (Ettlinger & Johnson, 2009). In the biases than by the influence of native phonological categories
current study, however, the discrimination accuracy in the or contrasts. In the current study, these guiding factors
[u]-[ɯ] vowel pair was among the lowest, while the [ɯ]-[ɤ] pair included acoustic similarity between Vietnamese [u] and [o],
was not. There was no interaction between the oddball vowel /u/-fronting in SUSE, assimilations between rounded non-
and the contrast in which the vowel was presented. For exam- native vowels and rounded native vowels, and a bias towards
ple, when [u] was the oddball vowel, discrimination accuracy the less-to-more peripheral vowel order in discrimination tasks.
was relatively low, whether this vowel was presented alongside Listeners' attention to vowel rounding is evident in predominant
[o] or alongside [ɯ]. When [ɤ] was an oddball vowel, discrimi- assimilation patterns between rounded non-native and native
nation accuracy was relatively high, whether this vowel was vowels ([u]-/u oʊ/; [o]-/oʊ ɔ/) and between some unrounded
presented alongside [o] or alongside [u]. No difference in reac- non-native and native vowels ([ɤ]-/ʌ ʊ/). However, some
unrounded non-native vowels like [ɯ] may be associated with and thus encouraging phonetic processing (Flege, 2003) did
both rounded and unrounded native categories, a pattern pos- not make the task more difficult, but perhaps introducing
sibly attributable to variable (between dialects and among multi-talker stimuli or stimuli in noise bubbles could.
speakers) phonetic realizations of those categories. Variation Lastly, conclusions with regard to the role of L2 experience
in English /u/ has been suggested to influence cross- and the experiment order in L2 vowel categorization are tenta-
language vowel perception (Levy, 2009). In general, it cannot tive because the number of participants was different in the
be concluded that naïve listeners are biased to perceive experienced and inexperienced groups, preventing a more rig-
unrounded non-native vowels as rounded or biased to discrim- orous analyses. A detailed examination of acoustic cue weight-
inate contrasts in rounding better than contrasts in other fea- ing and listener variables in categorizations of [u o ɯ ɤ] was
tures such as height, for example, [o]-[ɤ] versus [ɯ]-[ɤ]. beyond the scope and methodology of the current study, but
This study has several limitations. First, in both experi- is worth investigation in future research.
ments, only one talker’s voice was used to record the Viet-
namese stimuli; therefore, the results may be specific to that
Acknowledgments
talker’s voice. Second, limited data were collected from each
participant — 36 responses in the categorization experiment My gratitude is extended to Tâm Nguy n at the Vietnam
and 96 responses in the discrimination experiment. This lim- National University-HCMC and Nina Nguyen at Louisiana
ited data collection was partially ameliorated by a relatively State University for consultations in Vietnamese and stimulus
large number of participants — 49 in the first experiment and recording; Marybeth Lima for her comments and suggestions.
30 in the second. The current data represent a wide range of The author thanks the reviewers and the associate editor for
individual responses in vowel perception tasks. The design their constructive and detailed feedback.
and generalizability of results could be improved by including
multi-talker stimuli and a larger number of stimulus words
and or repetitions. Funding
Additionally, the discrimination experiment was relatively
easy for participants to complete. Overall, discrimination accu- This research was supported by a 2015 Manship Summer
racy was high in all four contrasts, although not quite at ceiling, Research Award from the College of Humanities & Social
that is over 95% accuracy. This relative ease of task might be Sciences Funding, Louisiana State University, United States.
related to the choice of target contrasts and the nature of the
task, where largely auditory processing is involved. It might
Appendix
have obscured the effect of listener experience on non-native
vowel perception. Introducing a longer inter-stimulus interval
Individual assimilation patterns and overlap scores in Experiment 2 participants.
Listener Residence L2 Exp. Order Assimilation Type | Overlap Score (%)

[o]-[ɤ] [ɯ]-[u] [ɯ]-[ɤ] [u]-[o]
JD LA Categ. first TC | 0 SC | 90 TC | 11 TC | 0
MBD LA Categ. first TC| 0 UC | 60 UC | 44 TC | 22
MB TX, LA Discr. first TC | 0 CG | 60 TC | 0 TC | 33
TL LA Categ. first TC | 33 SC | 78 TC | 0 TC | 33
BA LA Categ. first TC | 0 UC | 56 UC | 22 TC | 44
BH AZ, LA Categ. first TC | 0 UC | 56 UC | 33 TC | 44
BS LA Categ. first TC | 11 UC | 11 UC | 44 TC | 44
HF TX, LA Categ. first TC | 0 TC | 22 TC | 22 TC | 44
JL LA Discr. first UU | 56 TC | 44 CU | 33 CU | 44
MP LA French Categ. first CU | 0 TC | 0 CU | 78 TC | 44
NW VA, LA Categ. first TC | 0 UU | 22 UC | 33 UC | 44
PR NY, LA Categ. first UC | 33 UU | 56 UC | 44 UU | 44
TC LA Categ. first TC | 11 SC | 56 TC | 22 TC | 44
VB LA Categ. first CU | 11 CG | 67 CU | 11 TC | 44
DC LA Categ. first UC | 44 CU | 56 CG | 78 UU | 56
EB LA French Categ. first TC | 0 CU | 33 TC | 33 UC | 56
EBD LA Discr. first UC | 33 CU | 56 TC | 11 UU | 56
KA LA French Discr. first TC | 0 UC | 22 UC | 56 TC | 56
KG LA French Discr. first CU | 44 UU | 56 UU | 78 UC | 56
MZ FL, LA Categ. first TC | 0 CU | 33 TC | 0 UC | 56
AHD LA French Categ. first TC | 22 TC | 0 CG | 78 SC | 56
CB LA Categ. first TC | 56 UC | 67 UC | 56 TC | 67
DCO LA Discr. first TC | 11 TC | 56 TC | 0 CG | 67
DG LA Discr. first TC | 0 TC | 33 TC | 33 SC | 67
BR LA French Discr. first TC | 0 UC | 33 UC | 44 SC | 78
GN LA Discr. first TC | 0 TC | 44 TC | 0 SC | 78
MBA LA French Discr. first TC | 0 TC | 11 TC | 11 SC | 78
MWO LA Categ. first TC | 0 TC | 0 TC | 22 SC | 78
LD LA French Discr. first TC | 0 TC | 0 TC | 11 SC | 89
LG VA Discr. first TC | 11 TC | 22 TC | 22 SC | 89
References Jacewicz, E., Fox, R. A., & Salmons, J. (2011). Cross-generational vowel change in
American English. Language Variation and Change, 23, 45–86. https://doi.org/
10.1017/S0954394510000219.
Best, C. T. (1995). A direct realist view of cross-language speech perception. In W.
Kirby, J. P. (2011). Illustration of the IPA: Vietnamese (Hanoi Vietnamese). Journal of the
Strange (Ed.), Speech perception and linguistic experience: Issues in cross-
International Phonetic Association, 41(3), 381–392.
language research (pp. 171–204). Timonium, MD: York Press.
Labov, W., Ash, S., & Boberg, C. (2006). Atlas of North American English: Phonetics,
Boersma, P., & Weenink D. (2017). PRAAT: Doing phonetics by computer (version
phonology, and sound change. New York: Mouton de Gruyter.
6.0.26).
Ladefoged, P., & Johnson, K. (2015). A course in phonetics (7th ed.). Cengage Learning.
Bohn, O.-S. (2018). Cross-language and second language speech perception. In E. M.
Levy, E. S. (2009). On the assimilation-discrimination relationship in American English
Fernandez & H. S. Cairns (Eds.), Handbook of psycholinguistics (pp. 213–239).
adults’ French vowel learning. Journal of the Acoustical Society of America, 126(5),
John Wiley & Sons, Inc.. https://doi.org/10.1002/9781118829516.
2670–2682. https://doi.org/10.1121/1.3224715.
Chládková, K., & Podlipský, V. J. (2011). Native dialect matters: Perceptual assimilation
Lisker, L. (1988). Interpreting vowel “quality”: The dimension of rounding. Journal of the
of Dutch vowels by Czech listeners. Journal of the Acoustical Society of America,
Acoustical Society of America, 83(Suppl. 1), S83.
130(4), EL186–EL1192. https://doi.org/10.1121/1.3629135.
Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide (2nd ed.).
Chung, H., & de Mahy, L. A. (2017). Vowel acoustic characteristics of Southern White
New Jersey: Lawrence Erlbaum Associates.
English produced by speakers from New Orleans area, Louisiana. Journal of the
Maddieson, I. (2013). Front rounded vowels. In M. S. Dryer & M. Haspelmath (Eds.), The
Acoustical Society of America, 142, 2681. https://doi.org/10.1121/1.5014777.
world atlas of language structures online. Leipzig: Max Planck Institute for
Clopper, C. G., Pisoni, D. B., & De Jong, K. (2005). Acoustic characteristics of the vowel
Evolutionary Anthropology.
systems of six regional varieties of American English. Journal of the Acoustical
Nearey, T. M., & Assman, P. F. (1986). Modeling the role of inherent spectral change in
Society of America, 118(3), 1661–1676. https://doi.org/10.1121/1.2000774.
vowel identification. Journal of the Acoustical Society of America, 80(5), 1297–1308.
Cruttenden, A. (2014). Gimson's pronunciation of English (8th ed.). Routledge.
https://doi.org/10.1121/1.394433.
de Jong, K. J., Silbert, N. H., & Park, H. (2009). Generalization across segments in
Nishi, K., Strange, W., Akahane-Yamada, R., Kubo, R., & Trent-Brown, S. A. (2008).
second language consonant identification. Language Learning, 59, 1–31.
Acoustic and perceptual similarity of Japanese and American English vowels.
Escudero, P., Benders, T., & Lipski, S. C. (2009). Native, non-native and L2 perceptual
Journal of the Acoustical Society of America, 124(1), 576–588. https://doi.org/
cue weighting for Dutch vowels: The case of Dutch, German, and Spanish learners.
10.1121/1.2931949.
Journal of Phonetics, 37, 452–465.
Polka, L., & Bohn, O.-S. (2003). Asymmetries in vowel perception. Speech
Escudero, P., & Boersma, P. (2004). Bridging the gap between L2 speech perception
Communication, 41, 221–231.
research and phonological theory. Studies in Second Language Acquisition, 26,
Polka, L., & Bohn, O.-S. (2011). Natural Referent Vowel (NRV) framework: An emerging
551–585.
view of early phonetic development. Journal of Phonetics, 39(4), 467–478. https://
Escudero, P., Simon, E., & Mitterer, H. (2012). The perception of English front vowels by
doi.org/10.1016/j.wocn.2010.08.007.
North Holland and Flemish listeners: Acoustic similarity predicts and explains cross-
Schwartz, J.-L., Abry, C., Boë, L.-J., Ménard, L., & Vallée, N. (2005). Asymmetries in
linguistic and L2 perception. Journal of Phonetics, 40(2), 280–288. https://doi.org/
vowel perception, in the context of the Dispersion-Focalisation Theory. Speech
10.1016/j.wocn.2011.11.004.
Communication, 45(4), 425–434. https://doi.org/10.1016/j.specom.2004.12.001.
Ettlinger, M., & Johnson, K. (2009). Vowel discrimination by English, French and Turkish
Stevens, K. N. (1972). The quantal nature of speech: Evidence from articulatory-acoustic
speakers: Evidence for an exemplar-based approach to speech perception.
data. In P. B. Denes & E. E. David (Eds.), Human communication: A unified view
Phonetica, 66, 222–242. https://doi.org/10.1159/000298584.
(pp. 51–56). New York: McGraw-Hill.
Flege, J. E. (2003). A method for assessing the perception of vowels in a second
Traunmüller, H. (1990). Analytical expressions for the tonotopic sensory scale. The
language. In E. Fava & A. Mioni (Eds.), Issues in clinical linguistics (pp. 19–44).
Journal of the Acoustical Society of America, 88, 97–100. https://doi.org/10.1121/
Padua, Italy: UniPress.
1.399849.
Fridland, V., Kendall, T., & Farrington, C. (2014). Durational and spectral differences in
Tyler, M. D., Best, C. T., Faber, A., & Levitt, A. G. (2014). Perceptual assimilation and
American English vowels: Dialect variation within and across regions. Journal of the
discrimination of non-native vowel contrasts. Phonetica, 71(1), 4–21. https://doi.org/
Acoustical Society of America, 136(1), 341–349. https://doi.org/10.1121/1.4883599.
10.1159/000356237.
Gottfried, T. L. (1984). Effects of consonant context on the perception of French vowels.
Williams, D., & Escudero, P. (2015). Influences of listeners' native and other dialects on
Journal of Phonetics, 12, 91–114.
cross-language vowel perception. Frontiers in psychology. https://doi.org/10.3389/
Harnsberger, J. D. (2001). On the relationship between identification and discrimination
fpsyg.2014.01065.
of non-native nasal consonants. Journal of the Acoustical Society of America, 110,
Yu, A. C. L. (2010). Tonal effects on perceived vowel duration. In C. Fougeron, B.
489–503. https://doi.org/10.1121/1.1371758.
Kuhnert, M. Imperio, & N. Valle (Eds.). Laboratory phonology (Vol. 10,
IBM Corp. (2016). IBM SPSS Statistics for Macintosh (version 24.0). Armonk NY: IBM
pp. 151–168). Berlin: Mouton de Gruyter.
Corp.
Jacewicz, E., & Fox, R. A. (2012). The effects of cross-generational and cross-dialectal
variation on vowel identification and classification. Journal of the Acoustical Society
of America, 131, 1413–1433. https://doi.org/10.1121/1.3676603.

(Shport, 2009) Perception of Vietnamese Back Vowels Contrasting in Rounding by English Listeners

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Shport, 2009) Perception of Vietnamese Back Vowels Contrasting in Rounding by English Listeners

Uploaded by

Copyright:

Available Formats

Journal of Phonetics 73 (2019) 8–23

Contents lists available at ScienceDirect

Perception of Vietnamese back vowels contrasting in rounding by English

1. Introduction categorization shapes the perception of non-low back vowels

Vowel Word Tone F0 (Hz) Duration (ms) F1 (Hz) F2 (Hz) F3 (Hz)

Vietnamese Vowel % Response

Vietnamese Vowel Categorized as English (Number of Listeners) UC

Table 4 selection of keywords. A multinomial logistic regression analy-

2.2.3. Factors inﬂuencing categorization 3. Experiment 2: Vowel discrimination

Vietnamese Contrast Assimilation Overlap Score %(c) Discrimination Accuracy % (SD)

Contrast Trial Type Effect of Mean

Individual assimilation patterns and overlap scores in Experiment 2 participants.

Listener Residence L2 Exp. Order Assimilation Type | Overlap Score (%)

You might also like