You are on page 1of 4

Dialect variation and formant frequency: The American English vowels revisited

Robert Hagiwara
Waisman Center & Department of Communicative Disorders, University of Wisconsin-Madison, Madison, Wisconsin 53706

Received 9 February 1996; accepted for publication 27 February 1997 Vowel production data collected from 15 southern Californian English-speaking monolinguals is compared with data reported by Hillenbrand et al. J. Acoust. Soc. Am. 97, 30993111 1995 and Peterson and Barney J. Acoust. Soc. Am. 24, 175184 1952 . Recordings were made of nine women and six men producing multiple repetitions of {, (, |, }, ,, , *, , , #, [ in three consonant o contexts. The frequencies of the rst three formants were measured by simultaneous comparison of wideband spectrograms, narrow-band FFT spectral slices, and LPC spectra taken at vowel center, or steady state where available. The Southern Californian data are seen to differ greatly from that described by Peterson and Barney 1952 and Hillenbrand et al. 1995 . 1997 Acoustical Society of America. S0001-4966 97 04906-0 PACS numbers: 43.70.Fq, 43.70.Hs AL

Reference to formant frequency data from a wide variety of American English dialects is useful in forming and testing theories of vowel features, of the relation between speech production and perception, and the progress of socio linguistic change, as well as in providing adequate information for building speech technological applications in recognition and synthesis, and establishing dialect-appropriate norms in clinical speech therapy. However, there is a dearth of such data. The purpose of this study is to illustrate the variability observed across American English dialects in the domain of steady-state formant frequency, and to provide, in the absence of more extensive data from Southern Californian English, a veriable and testable set of formant frequency norms for adult men and women. Vowel data from collegeaged southern Californian speakers will be seen to diverge from similar data reported for northern Midwesterners Hillenbrand et al., 1995 , and from General American speakers Peterson and Barney, 1952 . This letter is also an appeal to other researchers to produce similar, local studies of American English, with a view to cooperatively producing an acoustic atlas of American English dialects as indicated by formant frequency. The data in this study comprise a subset of the data reported in Hagiwara 1995 , which contains a fuller description of the complete corpus than will be included in this letter. Undergraduate students at UCLA were asked to respond to a Speaker Survey Form if they were willing to participate in a phonetic study of /./ in dialects of American English. Of the respondents to the survey, 15 were selected for their similarity in age 1826 , geographic background, and gross socioeconomic and educational indicators. They represent a relatively unmarked, middle-class, suburban population. They include Anglo-Americans, African Americans, and Asian Americans, but appear to represent as unied a speech community as can reasonably be studied without imposing predetermined sociometric boundaries on a target group of speakers. That is, to the degree that South655 J. Acoust. Soc. Am. 102 (1), July 1997

ern California American English is a dialect within which a certain amount of ethno-social variation is to be expected, these 15 appear to represent the region. Each was compensated $10 U.S. for participating in the study. Sixty-nine monosyllabic words were selected to illustrate the plain nonrhoticized vowels and three allophones of /./ in southern California English. Thirty words illustrate plain vowels in three consonantal environments: /b_t/, /t_k/, and /h_d/. Three exemplify syllabic /[/ in the same environments. These 33 words together form the database for the present study. Table I lists the words used. Only real English words and familiar proper nouns were used; where a word of the appropriate phonological shape did not exist, a word as close in shape as possible to the target was substituted, as with put. The word hoed was respelled as a proper noun Hode to avoid its relatively odd-looking spelling. Each word was presented in the frame Cite__twice. This frame was selected to provide citation form pronunciations, but in a continuous speech stream. The symmetrical, nonapping coronal environment was necessary for a simultaneous study conducted with the same speakers. Each word/frame was included three times in random order in a single recording script. Each speaker was recorded reading from the script in a sound-treated room on professional quality equipment. The subjects speech was digitized from the audio cassette tape of the recording session at 10 kHz using the Kay Elemetrics Computerized Speech Laboratory. Frequencies were measured for the rst three formants (F 1 , F 2 , F 3 ) of each syllabic nucleus in the words illustrating syllabic [ o and other vowels. Formant frequencies were determined by simultaneous evaluation of several transforms of the signal. These included wideband spectrograms and narrow-band FFT spectra averaged over a 30-ms window through the steady-state portion of the vowel, if there was one. If no steady state was present, the 30-ms window was placed in the center of the vowel. Bandwidths for spectrograms and narrow-band spectra were varied to achieve the best formant
1997 Acoustical Society of America 655

0001-4966/97/102(1)/655/4/$10.00

Downloaded09Sep2011to147.52.9.76.RedistributionsubjecttoASAlicenseorcopyright;seehttp://asadl.org/journals/doc/ASALIB-home/info/terms.jsp

TABLE I. Words illustrating the 11 vowels of Southern Californian English. /{/ bt tk hd beat teak heed /(/ bit tick hid /|/ bate take hate /}/ bet tech head /,/ bat tack had // boot duke hoot /*/ put took hood // boat toke Hode // bought tock hod /#/ but tuck hut /[/ o Bert Turk herd

resolution for each token. Generally, bandwidths were 200 Hz wideband and 59 Hz narrow band for men, and 293/59 Hz for women. Also included were an LPC formant history superimposed over the spectrogram and/or an LPC slice taken in the center of the FFT window superimposed over the FFT spectrum. The number of LPC poles used varied between 10 and 14 depending on the sex of the speaker, the number of formants visible in the spectrogram, and whether or not more poles were needed to resolve two formants which were close together. By using multiple views of the spectral properties of the signal, a certain degree of on-line checking was introduced during the measurement process, with the result of greater condence in the state of the measurements. Moreover, wellknown problems such as discerning low-frequency F 1 s in higher pitched voices, or the apparent merger of close formants in LPC analysis, as well as the problem of locating formants in voices that exhibit consistent spectral eccentricities unexpected zeroes, unusually strong harmonics, etc. were more easily avoided. The results are summarized in Table II. The southern Californian // and /*/ have higher F 2 values than //. This is partly the result of the duke context, in which // is accompanied by a front on-glide. However, even discounting the fronting inuences of the duke context, the F 2 of // still averages around 1500 Hz for women and 1300 Hz for men, still quite a bit higher than // or //. The inclusion of the duke context, and to a lesser extent the boot context, to add breadth to the production database. In the near-citation forms produced in this study, speakers sometimes produced fully round allophones of back vowels. However, in casual speech, southern Californians rarely produce fully rounded vowels. The combined centrality and unrounding observed in the southern Californian speakers result in a vowel space which resembles a parallelogram more than the traditional trapezoi-

dal or triangular gures usually associated with vowel spaces. This is illustrated in Fig. 1. While several American dialects have been studied in detail in their socio-historical contexts see, for instance, Labov 1994, and references therein for some discussion , and these often make use of acoustic measurements, there are relatively few well-known studies of vowels formants in American English cited in the phonetics literature. Of these, the classic study by Peterson and Barney 1952 is probably the most cited. The recent paper by Hillenbrand et al. 1995 , being paradigmatically similar to the Peterson and Barney study, although conducted over a much larger corpus in a different dialect, is also a very important work. Peterson and Barney 1952, PB hereafter was a study with the primary objective of showing that information obtained by spectrographic analysis of speech was useful in characterizing vowel quality. The vowels /{, (, }, ,, , , *, , #, [/ were included. Fundamental frequency, F 1 , F 2 , and o F 3 , were measured during steady-state portions of /* $/ productions by 33 men, 28 women, and 15 children. In general, little can be said about the dialectal afliations of the speakers involved, or of the dialectal homogeneity of the group as a whole. Two of the speakers were born outside the United States and a few others spoke a foreign language before learning English. Most of the women and children grew up in the Middle Atlantic speech area. The male speakers represented a much broader regional sampling of the United States, the majority of them spoke General American. PB, p. 177 . The acoustic data were studied two ways; rst by spectrographic analysis, to determine formant frequencies. Analysis of these data suggested that static formant frequencies were good at differentiating vowel quality. The data were then used in a listening test, where listeners were asked to identify the word uttered. For the most part, correct identication was achieved, despite the less than perfect listening

TABLE II. Formant averages for 15 southern Californian speakers of English. Units are Hz. Standard deviations are in parentheses. F1 W { ( | } , * # [ o 362 467 440 808 1017 395 486 516 997 847 477 36 62 48 167 134 48 115 130 102 154 82 291 418 403 529 685 323 441 437 710 574 429 M 31 36 43 68 105 31 40 37 97 80 40 W 2897 2400 2655 2163 1810 1700 1665 1391 1390 1753 1558 176 151 187 195 131 364 166 212 99 140 170 F2 M 2338 1807 2059 1670 1601 1417 1366 1188 1221 1415 1362 205 85 138 67 59 215 122 118 69 88 79 W 3495 3187 3252 3065 2826 2866 2926 2904 2743 2989 1995 239 262 277 285 231 225 261 263 201 264 347 F3 M 2920 2589 2690 2528 2524 2399 2446 2430 2405 2496 1679 219 117 166 143 271 248 173 216 175 211 91 { ( | } , * # [ o

656

J. Acoust. Soc. Am., Vol. 102, No. 1, July 1997

Robert Hagiwara: Letters to the Editor

656

Downloaded09Sep2011to147.52.9.76.RedistributionsubjecttoASAlicenseorcopyright;seehttp://asadl.org/journals/doc/ASALIB-home/info/terms.jsp

FIG. 1. Womens lled circles and mens open squares vowel centers from the present study. Vowels are joined by lines in order starting at the upper right /{, (, }, ,, , *, /. Not joined on the line is /#/. Other vowels were eliminated for visual clarity and to facilitate comparison. Units are Hertz, plotted in a Bark scale.

FIG. 3. Womens lled circles and mens open squares vowel centers from Hillenbrand et al. 1995 , plotted in the same space as Fig. 1. Vowels are joined by lines in order starting at the upper right /{, (, }, ,, , *, /. In these data, /#/ accidentally fell on the line between // and /*/.

conditions. Where confusions occurred, they appeared to follow reasonable patterns /-/ confusions, etc . The PB vowel spaces are illustrated in Fig. 2, which presents the PB F 1 and F 2 averages see PB, Table II, p. 183 plotted to the same scale as Fig. 1. The mens and womens productions are superimposed in the same plot. As in Fig. 1, the vowels represented are /{, (, }, ,, , *, / joined by the lines and /#/. Figure 2 illustrates the classic trapezoidal form of the General American vowel space dened by the point vowels. Recognizing a number of limitations in the PB study, Hillenbrand et al. 1995; HGCW hereafter collected a new corpus, using productions from 45 men, 48 women, and 46 children. These speakers were carefully screened for dialect; all were from the northern Midwest area. HGCW included F 4 measurements, as well as measurements from multiple

FIG. 2. Womens lled circles and mens open squares vowel centers from Peterson and Barney 1952 , plotted in the same space as Fig. 1. Vowels are joined by lines in order starting at the upper right /{, (, }, ,, , *, /. Not joined on the line is /#/. 657 J. Acoust. Soc. Am., Vol. 102, No. 1, July 1997

points during the vowel. They also included /|/ and // vowels, excluded from the PB study. LPC formant histories were superimposed over a wideband spectrogram, and adjustments made until the rst four formants were adequately resolved by the LPC. Formant frequencies were then extracted from the LPC peaks. Listening tests were also performed with the data, and a variety of analyses offered. The HGCW vowel centers see HGCW, Table V, p. 3103 are plotted in Fig. 3. Note the extreme raising of /,/ relative to its position in PB or the present study, and the centrality of //. This conguration along with the position of //, not included in Fig. 3 is the result of a clockwise chain shift among the low vowels typical of the northern midwest the Northern Cities Shift, as described in sum by Labov, 1994, and many sources therein . With the raising of /,/ out of the low front corner of the space, and the low, central position of //, this conguration of point vowels suggests a triangle, and is quite distinct, both in outline and in composition from either of the two previous gures. Comparing the present studys data in Fig. 1 with the PB data in Fig. 2, the obvious differences are in the F 2 frequencies of the back and central vowels. As noted earlier, the canonical back vowels // and /*/ are typically unrounded in Californian speech. Thus they appear with F 2 frequencies more characteristic of the central space in the PB data in Fig. 2. Further, the canonical central vowel /#/ has a much higher F 2 in the southern California data than in PB. The low vowels // and /,/ are approximately 200 Hz higher in the Californian women than for PBs women speakers. The men have similar F 1 values for these vowels in the two studies. The most obvious difference between the HGCW vowels in Fig. 3 and the PB ones is in the position of the vowels transcribed /,/ and //. // in the HGCW space has moved into a central position relative to its position in the PB space; the /,/ vowel has been fronted and raised. The /#/ vowel, in contrast, is almost identical in the two studies. This is also true of /[/, which was included in all studies under discuso sion, but not represented graphically in Figs. 13. More
Robert Hagiwara: Letters to the Editor 657

Downloaded09Sep2011to147.52.9.76.RedistributionsubjecttoASAlicenseorcopyright;seehttp://asadl.org/journals/doc/ASALIB-home/info/terms.jsp

subtly, the // and /*/ vowels in the PB are extremely back have extremely low F 2 s . While still low the HGCW F 2 values for // and /*/ are slightly higher than in PB. The conguration in Fig. 3 indicates a well-established Northern Cities Shift, such as described recently by Labov 1994, summarizing sources therein . The Northern Cities Shift dialects are characterized by a centralized reex of // more properly /~/ , and a raised /,/, among other changes, that have been observed to varying degrees in the cities of Detroit, Chicago, and Cleveland, as well as upstate New York. HGCW described their speakers as being raised in southern Michigan and other areas in the upper midwest. Thus it is fair to assume that, even if they are not speakers of Northern Cities Shift dialects, they are from areas heavily inuenced by them. Thomas 1958 refers to the relative centrality of // in his Northern Central dialects, which include the northern midwest, the same region studied in HGCW. The PB study, conducted at Bell Laboratories in New Jersey, used speakers from a variety of places, and even some who spoke other languages than English natively. However, the women speakers in that study are identied as primarily from the Middle Atlantic region. Thomas 1958 indicates that the Middle Atlantic region includes most of New Jersey, and specically excludes upstate New York and the northern Midwest. Comparing Figs. 1 and 3 reveals as many differences between HGCWs vowel space and the southern Californian space as between HGCW and PB. This clearly demonstrates considerable variation between contemporaneous regional variants of American English. The purpose of this discussion is merely to demonstrate that American English is an amorphous entity at best, and that there are considerable regional and also social differences, particularly in urban centers Labov, 1994 . In discussions of vowel production, references to General American are not as informative as references to data from specic dialects. Throughout this report, an effort has been made to characterize the southern Californian vowel space and the spaces reported in HGCW and PB without reference to an arbitrary standard. In their paper, HGCW point out that the PB results are often regarded as the denitive set of static formant frequencies describing the American English vowels, serving as a comparator with other languages and pathological speech, target values in speech synthesis, prototype values, etc. Such a view clearly misrepresents the intent of PB, and belies the reality of the whole of American English, even perhaps General American. At best, the PB results are a prole of a specic dialect at a specic time in the history of that dialect HGCW, p. 3108 .

Implicit in HGCWs discussion is the need for more such proles. HGCW provide one, with the purpose not only of replicating the PB study but also determining the how dynamic information can be used to discriminate vowels. The data in the present study represent another prole of a different dialect. They were collected originally with the intention of providing baseline F 3 measurements against which lowered F 3 in American /./ could be compared Hagiwara, 1995 . However, as this paper argues, the resulting information about the form of the southern Californian vowel space is interesting in its own right, particularly as it provides counterpoint to the view of the PB or the HGCW formant values as somehow representative of the state of American English as a whole. As HGCW point out, large studies are consuming of time, money, and scholarly resources. Documentation of every American dialect on such a scale is obviously beyond the scope of any single researcher or research group, especially considering that in the best case it would include more than static formant frequency information. However, studies of a dozen or so speakers are well within the scope of most researchers, perhaps most students, and, in the absence of studies of large numbers of speakers, would be better than nothing. If the results of many such studies were combined, they would ll a signicant void in objective descriptions of American English, and help pinpoint areas where the greater resources of larger, or more expansive studies may most protably be employed. Such investigations would not only lead to a better understanding of American English vowels, but also yield a database of general interest to phonetics, sociolinguistics, speech training, and speech technology.

ACKNOWLEDGMENTS

The author would like to thank Peter Ladefoged, James Hillenbrand, and two anonymous JASA reviewers for comments on earlier versions of this letter.

Hagiwara, R. 1995 . Acoustic realizations of American /./ as produced by women and men, UCLA Ph.D. dissertation. Also UCLA Working Papers in Phonetics 90. Hillenbrand, J., Getty, L. A., Clark, M. J., and Wheeler, K. 1995 . Acoustic characteristics of American English vowels, J. Acoust. Soc. Am. 97, 30993111. Labov, W. 1994 . Principles of Linguistic Change, Volume 1: Internal Factors Blackwells, Cambridge . Peterson, G. E., and Barney, H. L. 1952 . Control methods used in a study of the vowels, J. Acoust. Soc. Am. 24, 175184. Thomas, C. K. 1958 . An Introduction to the Phonetics of American English Ronald, New York , 2nd ed.

658

J. Acoust. Soc. Am., Vol. 102, No. 1, July 1997

Robert Hagiwara: Letters to the Editor

658

Downloaded09Sep2011to147.52.9.76.RedistributionsubjecttoASAlicenseorcopyright;seehttp://asadl.org/journals/doc/ASALIB-home/info/terms.jsp

You might also like