You are on page 1of 4

Frequency Effects on Perceptual Compensation for Coarticulation

Alan C. L. Yu1 , Ed King1 , Morgan Sonderreger2


1

Phonology Laboratory, Department of Linguistics, University of Chicago 2 Department of Computer Science, University of Chicago
aclyu@uchicago.edu, etking@uchicago.edu, morgan@cs.uchicago.edu

Abstract
Errors in compensating perceptually for effects of coarticulation in speech have been hypothesized as one of the major sources of sound change in language. Little research has elucidated the conditions under which such errors might take place. Using the paradigm of selective adaptation, this paper reports the results of a series of experiments testing for the effect of frequency on likelihood of perceptual compensation for coarticulation by listeners. The results suggest that perceptual compensation might be ameliorated (which might result in hypocorrection) or exaggerated (i.e. hypercorrection) depending on the relative frequency of the categories that are being perceived in their specic coarticulated contexts. Index Terms: perceptual compensation, sound change, selective adaptation.

1. Introduction
A fundamental property of speech is its tremendous variability. Much research has shown that human listeners take such variability into account in speech perception [1, 2, 3, 4, 5, 6]. Beddor and colleagues [7], for example, found that adult English and Shona speakers perceptually compensate for the coarticulatory anticipatory raising of /a/ in C/i/ context and the anticipatory lowering of /e/ in C/a/ context. Both English and Shona listeners report hearing more /a/ in the context of a following /i/ than in the context of a following /a/. Many scholars have hypothesized that a primary source of systematic sound changes in language comes from errors in perceiving the intended speech signal [8, 9]. That is, errors in listeners classication of speakers intended pronunciation, if propogated, might result in systematic changes in the sound systems of all speakers-listeners within the speech community. Ohala [8], in particular, argues that hypocorrective sound change (e.g., assimilation and vowel harmony) obtains when a contextual effect is misinterpreted as an intrinsic property of the segment (i.e. an increase in false positive in sound categorization). For example, an ambiguous /a/ token might be erroneously categorized as /e/ in the context of a following /i/ if the listeners fail to take into account of the anticipatory raising effect of /i/. If enough /a/ exemplars are misidentied as /e/, a pattern of vowel harmony might emerge. That is, the language will show a prevalence of mid vowels before /i/ and low vowels before /a/. On the other hand, a hypercorrective sound change (e.g., dissimilation) emerges when the listener erroneously attributes intended phonetic properties as contextual variation (i.e. an increase in false negative in sound identication). In this case, an ambiguous /e/ might be misclassifed as /a/ in the context of a following /i/. If enough /e/ exemplars are misidentied as /a/ when followed by /i/, a dissimilatory pattern of only low vowels before high vowels and non-low vowels before low vowels might emerge. While much

effort has gone into identifying the likely sources of such errors [10, 11, 8, 12, 9], little is known about the source of regularity in listener misperception that leads to the systematic nature of sound change. That is, why would random and haphazard misperception in an individuals percept lead to systematic reorganization of the sound system within the individual and within the speech community? The present study demonstrates that the likelihood of listeners adjusting their categorization pattern contextually (i.e. perceptual compensation) may be affected by the frequencies of the sound categories occurring in the specic contexts. In particular, the present study expands on Beddor et al.s work [7] on the perceptual compensation for vowel-tovowel coarticulation in English, showing that the way English listeners compensate perceptually for the effect of regressive coarticulation from a following vowel (either /i/ or /a/) depends on the relative frequency of the coarticulated vowels (i.e. the relative frequency of /a/ and /e/ appearing before /i/ or /a/). The idea that category frequency information affects speech perception is not new. Research on selective adaptation has shown that repeated exposure to a particular speech sound, say /s/, would shift the identication of ambiguous sounds, say sounds that are half-way between /s/ and /S/, away from the repeatedly presented sound towards the alternative [13, 14, 15]. In perceptual learning studies, repeated exposure to an ambiguous sound, say a /s/-/f/ mixture, in /s/-biased lexical contexts induces retuned perception such that subsequent sounds are heard as /s/ even in lexically neutral contexts [16, 17]. The experiments report below extend Beddor et al. [7]s ndings by presenting three groups of participants with the same training stimuli but varying the frequency with which they hear each token. The purpose of ths present study is to demonstrate that contextually-sensitive category frequency information can induce selective adaptation effects in perceptual compensation and to examine the implications of such effects on theories of sound change.

2. Methods
2.1. Stimuli The training stimuli consisted of CV1CV2 syllables where C is one of /p, t, k/, V1 is either /a/ or /e/, and V2 is either /a/ or /i/. To avoid any vowel-to-vowel coarticulatory effect in the training stimuli, a phonetically-trained native English speaker (second author) produced each syllable of the training stimuli in isolation (/pa/ /pe/, /pi/, /ta/, /te/, ti/, /ka/, /ke/, /ki/). The training disyllablic stimuli were assembled by splicing together the appropriate syllable and were resynthesized with a consistent intensity and pitch prole to avoid potential confound of stress. The test stimuli consisted of two series of /pV1pV2/ disyllables where V2 is either /a/ or /i/. The rst syllable, pV1, is a 9-step continuum resynthesized in PRAAT by varing in F1, F2, and F3

in equidistant steps from the abovementioned speakers /pa/ and /pe/ syllables. The original /pa/ and /pe/ syllables serve as the end points of the 9-step continuum. 2.2. Participants and procedure The experiment consists of two parts: exposure and testing. Subjects were assigned randomly to three exposure conditions. One group was exposed to CeCi tokens four times more often than to CeCa tokens and to CaCa tokens four times more often than to CaCi ones (the H YPER condition). The second group was exposed to CeCa tokens four times more often than to CeCi tokens and to CaCi tokens four times more often than to CaCa ones (the H YPO condition). The nal group was exposed to an equal number of /e/ and /a/ vowels preceding /i/ and /a/ (the BALANCED condition). See Table 1 for a summary of frequency distribution of exposure stimuli. The exposure stimuli were presented over headphones automatically in random order in E-Prime in a sound-proof booth. Subjects performed a phoneme monitoring task during the exposure phase where they were asked to press a response button when the word contains a medial /t/. Each subject heard 360 exposure tokens three times; a short break follows each block of 360 tokens. A total of fortyeight students at the University of Chicago, all native speakers of American English, participated in the experiment for course credit or a nominal fee. Eleven subjects took part in the H YPO condition, sixteen subjects each participated in the H YPER condition and the BALANCED condition. During the testing phase, subjects performed a 2-alternative force-choice task. The subject listened to a randomized set of test stimuli and were asked to decide whether the rst vowel sounds like /e/ or /a/.

Vocalic Context x Step


1.0 Probability of a 0.4 0.6 0.8

i a

0.0

0.2

4 Step

Figure 1: Interaction between VOCALIC C ONTEXT and S TEP. The predictor variables were back-transformed to their original scales in the gure.

3. Analysis
Subjects responses (i.e. subjects /a/ response rates) were modeled using a mixed-effect logistic regression. The model contains four xed variables: T RIAL (1-180), C ONTINUUM S TEP (1-9), E XPOSURE C ONDITION (balanced, hyper, hypo) and VOCALIC C ONTEXT (/a/ vs. /i/). The model also includes three two-way interactions: VOCALIC C ONTEXT x S TEP, S TEP x C ONDITION, and VOCALIC C ONTEXT x C ONDITION. In addition, the model includes a by-subject random slopes for T RIAL. A likelihood ratio test comparing a model with a VOCALIC C ONTEXT x S TEP x C ONDITION as a three-way interaction term and one without it shows that the added three-way interaction does not signicantly improve model log-likelihood (2 = 3.253, df = 2, P r(> 2 ) = 0.1966). Table 2 summarizes the parameter estimate for all xed effects in the model, as well as the estimate of their standard error SE(), the associated Walds z-score, and the signicance level. To eliminate collinearity, scalar variables were centered, while the categorical variables were sum-coded. Consistent with Beddor et al.s ndings, continuum step and

vocalic contexts are both signicant predictors of /a/ response. That is, listeners reported hearing less and less /a/ from the /a/end of the continuum to the /e/-end of the continuum and they heard more /a/ when the target vowel is followed by /i/ than when it is followed by /a/. Specically, the odds of hearing /a/ before /i/ is 1.3 times that before /a/. The signicant interaction between S TEP and VOCALIC C ONTEXT suggests that the vocalic context effect differs depending on where the test stimulus is along the /a/-/e/ continuum. As illustrated in Figure 1, the effect of vocalic context is largest around steps 4-6 while identication is close to ceiling at the two endpoints of the continuum regardless of vocalic contexts. Of particular interest here is the signicant interaction between the exposure condition and vocalic contexts. Figure 2 illustrates this interaction clearly; the effect of vocalic context on /a/ response is inuenced by the nature of the exposure data. When the exposure data contains more CaCa and CeCi tokens than CaCi and CeCa tokens (i.e. the hyper condition), listeners report hearing more /a/ in the /i/ context than in the /a/ context, compared to the response rate after the balanced condition where the frequency of CaCa, CeCi, CaCi, and CeCa tokens are equal. On the other hand, in the hypo condition where listeners heard more CaCi and CeCa tokens than CaCa and CeCi ones, listeners reported hearing less /a/ in the /i/ context than in the /a/ context, the opposite of what is observed in both the balanced and hyper conditions. The model also shows a signicant interaction between C ONDITION and S TEP. As illustrated in Figure 3, the slope of the identication function is the steepest after the hyper condition, but shallowest in the hypo condition.

4. Discussion and conclusion


The present study shows that the classication of vowels in different prevocalic contexts is inuenced by the relative frequency distribution of the relevant vowels in specic contexts. For example, when /a/ frequently occurs before /a/, listeners are less likely to identify future instances of ambiguous /a/-/e/ vowels as /a/ in the same context; listeners would report hearing more /e/ before /a/ if CaCa exemplars outnumber CeCa exemplars. Likewise, when /a/ occurs frequently before /i/, listeners would reduce their rate of identication of /a/ in the same context; lis-

Table 1: Stimuli presentation frequency during the exposure phase. C = /p, t, k/ Type CeCi CeCa CaCi CaCa BALANCED 90 90 90 90 H YPER 144 36 36 144 H YPO 36 144 144 36

v2

Table 2: Estimates for all predictors in the analysis of listener response in the identication task. Predictor Coef. SE() z p Intercept -0.0096 0.0674 -0.14 0.8867 -0.0008 0.0009 -0.87 0.3825 T RIAL S TEP -0.9260 0.0185 -49.98 < 0.001 *** -0.2690 0.0316 -8.51 < 0.001 *** VOCALIC C ONTEXT = a C ONDITION = hyper -0.0594 0.0934 -0.64 0.5251 0.0126 0.0950 0.13 0.8942 C ONDITION = hypo S TEP x VOCALIC C ONTEXT = a 0.0372 0.0170 2.19 < 0.05 * 0.0481 0.0247 1.95 0.0514 S TEP x C ONDITION = hyper -0.2631 0.0296 -8.89 < 0.001 *** S TEP x C ONDITION = hypo VOCALIC C ONTEXT = a x C ONDITION = hyper 0.0311 0.0432 0.72 0.4717 -0.4146 0.0477 -8.70 < 0.001 *** VOCALIC C ONTEXT = a x C ONDITION = hypo

Vocalic Context x Condition


1.0

Condition x Step
1.0 Probability of a 0.4 0.6 0.8

Probability of a 0.4 0.6 0.8

i a

Hypo Balanced Hyper Condition


2 4 Step 6 8

0.2

0.0

Balanced

Hyper
Condition

Hypo

v2

Figure 2: Interaction between VOCALIC C ONTEXT and C ON DITION .

Figure 3: Interaction between C ONDITION and S TEP.

teners would report hearing more /e/ before /i/ if CaCi tokens are more prevalent than CeCi tokens. These results suggest that listeners exhibit selective adaptation when frequency information of the target sounds varies in a context-specic fashion. That is, the repeated exposure to an adaptor (the more frequent variant) results in heighten identication of the alternative. This nding has serious implications for models of sound change that afford a prominent role to listener misperception to account for sources of variation that lead to change. To begin with, subjects in the hyper exposure condition exhibit what can be interpreted as hypercorrective behavior. That is, speech tokens that were classied as /a/ in the balanced condition were being classied as /e/ in the hyper condition when V2 = /a/; likewise, sounds that were classifed as /e/ in the banalced condition were treated as /a/ in the hyper condition when V2 = /i/. If this type of hypercorrective behavior persists, the pseudo-lexicon of the made-up language our subjects experienced would gradually develop a prevalence of disyllabic words that do not allow in consecutive syllables two low-vowels or two non-low vowels. This would represent a state of vocalic height dissimilation, not unlike the pattern found in the Vanuatu languages [18]. On the other hand, listeners in the hypo exposure condition exhibit what could be interpreted as hypocorrective behavior. That is, tokens that were classied as /e/ in the balanced condition were being classied as /a/ in the HYPO condition when V2 = /a/; likewise, vowels heard as /a/ in

the balanced condition were heard as /e/ in the HYPO condition when V2 = /i/. If this type of reclassication persists, listeners in the hypo condition would develop a pseudo-lexicon where vowels in disyllabic words must agree in lowness and a state of vocalic height harmony would obtain, similar to many cases found in the Bantu languages of Africa [19]. Another ramication the present ndings have for listenermisperception models of sound change concerns the role of the conditioning environment. That is, such models of sound change often attribute misperception to listeners failing to detect the contextual information properly and thus failing to properly normalize for the effect of context on the realization of the sound in question. Here, our ndings establish that systematic failure of perceptual compensation take place despite the presence of the coarticulatory source; perceptual compensation failure is interpreted here as whenever the context-specic identication functions deviate from the canonical identication functions observed in the balanaced condition. This nding echoes early ndings that perceptual compensation may only be partial under certain circumstances. Taken together, these ndings suggest that failure to compensate perceptually for coarticulatory inuence need not be the result of not detecting the source of coarticulation. Listeners may exhibit behaviors of not taking into account properly the role of coarticulatory contexts have on speech production and perception. It is worth pointing out in closing that selective adaptation effects have generally been attributed to adaptors fatiguing specialized linguistic feature detectors [13], which suggests that the

0.0

0.2

neural mechanism that subserves speech perception may eventually recuperate from adaptor fatigue and the selective adaptation might dissipate. There is some evidence that selective adaptation effects are temporarily [20]. The lack of durativity of selective adaptation raises doubt about its implication for sound change since sound change necessitates the longevity of the inuencing factors. Additional research is underway to ascertain the longitudinal effects of selective adaptation. Such data will provide much needed information regarding the signicance of selective adaptation effects on speech perception and sound change.

[17] A. G. Samuel and T. Kraljic, Perceptual learning for speech, Attention, Perception, & Psychophysics, vol. 71, no. 6, pp. 1207 1218, 2009. [18] J. Lynch, Low vowel dissimilation in Vanuato languages, Oceanic Linguistics, vol. 42, no. 2, pp. 359406, 2003. [19] F. B. Parkinson, The representation of vowel height in phonology, PhD dissertation, Ohio State University, 1996. [20] J. Vroomen, S. van Linden, M. Keetels, B. de Gelder, and P. Bertelson, Selective adaptation and recalibration of auditory speech by lipread information: dissipation, Speech Communication, vol. 44, p. 5561, 2004.

5. Acknowledgements
This work is partially supported by National Science Foundation Grant BCS-0949754.

6. References
[1] V. Mann, Inuence of preceding liquid on stopconsonant perception, Perception & Psychophysics, vol. 28, no. 5, p. 40712, 1980. [2] V. A. Mann and B. H. Repp, Inuence of vocalic context on perception of the [ ]-[s] distinction, Perception & Psychophysics, vol. 28, pp. 213228, 1980. [3] J. S. Pardo and C. A. Fowler, Perceiving the causes of coarticulatory acoustic variation: consonant voicing and vowel pitch. Perception & Psychophysics, vol. 59, no. 7, pp. 114152, 1997. [4] A. Lotto and K. Kluender, General contrast effects in speech perception: effect of preceding liquid on stop consonant identication, Perception & Psychophysics, vol. 60, no. 4, p. 60219, 1998. [5] P. Beddor and R. A. Krakow, Perception of coarticulatory nasalization by speakers of English and Thai: Evidence for partial compensation, Journal of the Acoustical Society of America, vol. 106, no. 5, pp. 28682887, 1999. [6] C. Fowler, Compensation for coarticulation reects gesture perception, not spectral contrast, Perception & Psychophysics, vol. 68, no. 2, p. 161177, 2006. [7] P. S. Beddor, J. Harnsberger, and S. Lindemann, Languagespecic patterns of vowel-to-vowel coarticulation: acoustic structures and their perceptual correlates, Journal of Phonetics, vol. 30, pp. 591627, 2002. [8] J. Ohala, The phonetics of sound change, in Historical Linguistics: Problems and Perspectives, C. Jones, Ed. London: Longman Academic, 1993, pp. 237278. [9] J. Blevins, Evolutionary Phonology: the emergence of sound patterns. Cambridge: Cambridge University Press, 2004. [10] J. Ohala, Sound change is drawn from a pool of synchronic variation. Berlin: Mouton de Gruyter, 1989, pp. 173198. [11] , The phonetics and phonology of aspects of assimilation, in Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech, J. Kingston and M. Beckman, Eds. Cambridge: Cambridge University Press, 1990, vol. 1, pp. 258275. [12] , Towards a universal, phonetically-based, theory of vowel harmony, ICSLP, Yokohama, vol. 3, pp. 491494, 1994. [13] P. Eimas and J. Corbit, Selective adaptation of linguistic feature detectors, Cognitive Psychology, vol. 4, pp. 99 109, 1973. [14] P. D. Eimas and J. L. Miller, Effects of selective adaptation of speech and visual patterns: Evidence for feature detectors, in Perception and Experience, H. L. Pick and R. D. Walk, Eds. N.J.: Plenum, 1978. [15] A. G. Samuel, Red herring detectors and speech perception: In defense of selective adaptation, Cognitive Psychology, vol. 18, pp. 452499, 1986. [16] D. Norris, J. M. McQueen, and A. Cutler, Perceptual learning in speech, Cognitive Psychology, vol. 47, no. 2, pp. 204238, 2003.

You might also like