Speech Rhythm in World Englishes: The Case of Hong Kong

JANE SETTER University of Reading Reading, England

This study investigated syllable duration as a measure of speech rhythm in the English spoken by Hong Kong Cantonese speakers. A computer dataset of Hong Kong English speech data amounting to 4,404 syllables was used. Measurements of syllable duration were taken, investigated statistically, and then compared with measurements of 1,847 syllables from an existing corpus of British English speakers. It was found that, although some similarities existed, the Hong Kong English speakers showed smaller differences in the relative syllable duration of tonic, stressed, unstressed, and weakened syllables than the British English speakers. This result is discussed with regard to potential intelligibility problems, features of possible language transfer from Cantonese to English with respect to speech rhythm, and implications for language teaching professionals.

n considering nonnative patterns of English speech, two paths are generally pursued: segmental and suprasegmental. This article focuses on the suprasegmental features of language. Speech rhythm is a suprasegmental aspect of pronunciation, those aspects which describe and address features larger than individual speech sounds. English speech rhythm in older native varieties like British and American English is often described as stress timed, which, in basic terms, means that the start of each stressed syllable is said to be equidistant in time from the start of the next stressed syllable. This kind of rhythm is in contrast to syllable-timed languages (e.g., French, Spanish, Cantonese), in which the start of each syllable is said to be equidistant in time from the start of the next. Instrumental studies have, in fact, shown that very little difference can be found between languages thought of as typically stress timed and typically syllable timed (Roach, 1982; Dauer, 1983), and, in fact, Cauldwell (2002) describes English as irrhythmical. Whether these descriptions stand up under instrumental scrutiny, they do seem to have some psychological importance for speakers of the languages so described. English spoken with a syllable-timed rhythm
can be difficult for speakers of stress-timed accents to understand (Anderson-Hsieh & Venkatagiri, 1994). Tajima, Port, & Dalby (1997) demonstrated that, when a Mandarin Chinese or Taiwanese speaker’s speech was manipulated to match the syllable timing of a native American English speaker, and vice versa, the Chinese speaker’s speech improved in intelligibility by up to 25%, and the American English speaker’s speech worsened in intelligibility by up to 25%, showing that use of more native-like patterns considerably improves intelligibility among native speakers of stress-timed varieties of English. This result indicates that the acquisition of stress-timed English speech rhythm by nonnative speakers is important in some contexts, for example, in those where a nonnative speaker may be interacting with a native speaker of an older, stress-timed variety such as British or American English. Adams (1979) suggests that a learner’s failure to use appropriate syllable timing when producing utterances in English, instead producing “an anomalous rhythm which seriously impairs the total intelligibility of their utterance” (p. 87), results in communicative failure, and both parties to the act of communication will be at a loss to explain what has happened and what was intended. This matter has not eluded researchers, materials writers, and teachers (see, e.g., Anderson-Hsieh, Johnson, & Koehler, 1992; Anderson-Hsieh & Venkatagiri, 1994; Chela-Flores, 1998; Gilbert, 1984; Taylor, 1981; Wong, 1987), but it seems that speech rhythm and other suprasegmental features of speech are not the easiest for teachers or learners to tackle. Indeed, rhythm is considered by some to be the single most difficult feature of English for nonnative speakers to learn (Taylor, 1981). It should be noted that this article assumes interactions between native speakers of stress-timed varieties of English and nonnative speakers of English. Jenkins (2000), for example, considering nonnative speaker interactions, does not include speech rhythm in her lingua franca core, although she does agree that, based on evidence from her own research, it appears to be crucial to lengthen stressed and tonic syllables to improve intelligibility in English. This study was based largely on the suggestions arising from Dauer (1983). Although Dauer shows by examination of interstress intervals in several languages that there is no instrumental evidence for the stress-timed/syllable-timed dichotomy in speech production, she admits that a so-called syllable-timed language like Spanish and a stress-timed language like English do sound different rhythmically. She looks to other features for an explanation, considering syllable structure, vowel reduction, and stress/accent. Concerning syllable structure, Dauer (1983) finds that stress-timed languages tend to have a greater variety of syllable types. In addition,

open syllables such as consonant-vowel or CV syllables are found to predominate in Spanish and French, whereas English has much more variation among different syllable types. Dauer also finds that “there is a strong tendency for ‘heavy’ syllables … to be stressed and ‘light’ syllables … to be unstressed” in stress-timed languages (p. 55). Heavy syllables are determined according to what happens at the end of a syllable, and usually contain consonants in coda position, although those containing a long vowel or diphthong may also be analysed as heavy. English is certainly a language which allows heavy syllables, with up to three consonants at the beginning of a syllable and four in syllable-final position, whereas Cantonese, the first language (L1) of the speakers in this article, maximally permits CVC, with the final consonant being restricted to either an unreleased voiceless bilabial, alveolar or velar stop [p t k], or a nasal consonant, one of [m n ŋ]. Based on Dauer’s suggestions, it might be predicted that Cantonese is less likely to be a stress-timed language than English. Dauer also notes that in Arabic and Thai, considered to be stress-timed languages, stressed syllables are more likely to be heavy. It is not only the structure of the syllable, but also its composition which has a bearing on stress. Dauer (1983) claims that 92% of the unstressed CV syllables in the English text she analysed were made up of a consonant plus a weak vowel, a type which tends to be inherently short, whereas the stressed CV syllables contained strong vowels, which tend to be longer. Turning to vowel reduction, Dauer (1983) claims that stress-timed languages often have a “separate and more restricted set of vowels to choose from in unstressed syllables” (p. 57), whereas syllable-timed languages tend not to have reduced vowel variants in unstressed syllables, but rather reduction results in the elimination of whole syllables. For example, weak syllables in English contain /ə ʊ / or a syllabic consonant, with the actual number of syllables in a word preserved (unless in a contracted form, like I’m for I am); in Spanish “a sequence of adjacent vowels often becomes reduced to a single vowel or is pronounced as a single syllable” (p. 57). Cantonese has an extremely restricted number of instances where syllable weakening is possible (Bauer & Benedict, 1997). Finally, Dauer (1983) examines stress, claiming that, whereas stresstimed languages tend to have stress at the lexical or word level, syllabletimed languages usually either have no lexical stress, or, where it does exist, realise accent by pitch contour variation. Cantonese is a tone language, in which each syllable has a specific pitch contour assigned to it. In conclusion, Dauer (1983) asks whether we are justified in using the terms stress timed and syllable-timed at all, if it is the case that syllable

structure, vowel reduction, and word stress, rather than aspects timing, make a language nearer to one or the other category. Preferring the term stress-based, as used by both Allen (1975) and O’Connor (1973), she suggests, as did Roach (1982), a continuum on which languages may be placed depending on how stress based their rhythm is, with Japanese as the least stress based and English the most (Dauer, 1983, p. 60). So, although instrumental studies have proven either dismissive or, at best, inconclusive about the physical existence of stress timing and syllable timing, even those undertaking the instrumental studies mentioned earlier admit that the languages under discussion sound either stress timed or syllable timed, enough so to be able to suggest a continuum on which these languages can be placed. Therefore, the labels stress timed and syllable timed are used throughout this study. In addition, Dauer (1983) makes a good case for there being factors other than differences in interstress intervals, or the lack thereof, that make languages sound more or less stress based; these factors are syllable structure, vowel reduction, and word stress or accent. The difficulty experienced by nonnative speakers of English from language backgrounds which have different rhythmical types in acquiring stress-timed English speech rhythm has implications for intelligibility, as demonstrated in investigations of Englishes similar to that spoken in Hong Kong. Low, Grabe, and Nolan (2000) study the temporal features of Singapore English, a Southeast Asian English which has been recognised as having native speakers. Using the pairwise variability index (PVI), which they developed, they compared vowel quality and vowel duration with that of British English. They demonstrated that Singapore English speakers do not reduce vowels in weak syllables to the same extent that British English speakers do. This practice can be expected to contribute to the rhythmic differences between Singapore English and British English, the implication being that Singapore English will be difficult for speakers of British English to understand.

English in Hong Kong is described by Li (1999) as a “value added” language (p. 97), meaning that being able to communicate effectively in English is perceived by the speaker as having socioeconomic advantages. Because of the economic and business environment in Hong Kong, speakers of Hong Kong English may be interacting with other speakers whose English could be classified as having stress-timed rhythm. This being the case, for Hong Kong English speakers, speech rhythm is certainly a feature of English pronunciation worthy of study.

Simply by listening to Hong Kong English, it is clear that the speech rhythm is very different from that of varieties with a stress-timed rhythm.

This study aims to investigate speech rhythm among speakers of Hong Kong English. Syllable duration was selected for investigation because, in combination with pitch, loudness, and vowel quality, it is an important factor in determining syllable stress in English and must therefore contribute to its perceived rhythmical properties. An additional reason to study syllable duration is that it is thought to be a highly learnable and teachable feature of word and rhythmic stress (see, e.g., Gilbert 1984; Chela-Flores, 1994, 1998; Halliday, 1989). This study focuses on weakened, unstressed, stressed, and tonic syllables. Because this is a study of English as a second language, transfer effects from the learner’s first language, Cantonese, in the production of English, in particular, fewer instances of weakened syllables in the Hong Kong English data, may contribute to the perceived rhythm of Hong Kong English. The hypothesis was that the rhythm of Hong Kong English differs from that of British English because Hong Kong English has smaller differences in the relative durations of weakened, unstressed, stressed and tonic syllables. The use of the term Hong Kong English does not attribute any special status for this variety as an official new variety of English. As far as I am aware, and certainly at the time of undertaking this study, there are no native speakers of Hong Kong English, as there are of Singapore or Indian English. Hong Kongers do not speak English with each other outside of contrived situations, such as classes at tertiary-level educational establishments and conversations, including business dealings, where someone is present who is not a speaker of Cantonese but is a speaker of English.

The relative differences in duration between weakened, unstressed, stressed, and tonic syllables were measured to test the hypothesis that the rhythm of Hong Kong English differs from that of British English because Hong Kong English has smaller differences in the relative durations of each type of syllable. The hierarchy of syllable stressing (weakened, unstressed, stressed, and tonic) was derived and developed for this research from studies such as Bolinger (1965) and Klatt (1975), which indicate that stressed

(including tonic) syllables are longer than unstressed (including weakened) syllables in spoken discourse, and teaching materials such as Gilbert (1984) and Chela-Flores (1998), which support such an approach. In the sentence The book I bought had a blue front, assuming no prior context and the main stress falling on the last word, with rhythmic beats occurring on book, bought, blue and front, front is tonic, with a falling tone, book, bought, and blue are stressed, I and had are unstressed, and the and a are weakened. The item had is in fact the main verb and therefore could be stressed, in which case blue would probably be unstressed to maintain overall rhythm; had could certainly not be weakened. It should be noted that any item could be stressed depending on context. Applying the stress hierarchy, and according to the first rhythmic pattern described, a British English speaker could be expected to produce the single-syllable words book, bought, blue, and front with a longer average duration than the single-syllable words the, I, had, and a, with front being particularly long because it is tonic, and the and a being particularly short because they are weakened.

The Hong Kong English Data
Data from 20 Hong Kong Cantonese speakers of English were used in this study. Participants were all students in their third and final year of study at the Hong Kong Polytechnic University at the time of data collection. Recordings were made over a 3-year period, from 1996 to 1999. The 10 female and 10 male students from whom the data were collected fall roughly into two groups: those studying for language degrees, and those studying nonlanguage subjects. The students following language degrees at the Hong Kong Polytechnic University are assessed in English language skills as part of their degree, whereas, at the time of data collection, those following nonlanguage programmes had to take classes in English but were not required to pass English to be awarded a degree. The students whose speech was analysed for this study were from three different departments of the university: Chinese and Bilingual Studies, specifically from the Bachelor or Arts (Honors) in Language and Communication, Building and Real Estate, the Bachelor of Science (Honors) in Building Surveying, and Building Services Engineering, and the Bachelor of Engineering (Honors) in Building Services Engineering. The Hong Kong Polytechnic University is an English medium institution, which means that, with the exception of students studying another language, all classes should take place in English. In reality, a good deal of tuition takes place in Cantonese. This is especially so in the case of nonlanguage subjects.

Because the study focuses on the rhythm of Hong Kong English, research based on word lists, which are possibly the most convenient method for collecting large amounts of data, is inappropriate. Instead, cassette tape recordings were made of students giving presentations in class. This method has the advantage of providing a dataset that comprises a large amount of monologue from a number of different speakers. It is for the latter reason that conversational data were not considered for this study; although potentially the most natural kind of speech, it was felt that it might not have yielded a suitably large quantity of connected speech from one speaker and would certainly have involved interruptions and overlap from other speakers. Also, the participants would not normally speak English to each other, and so any spoken English data collected at all is bound to be contrived to some extent. One criticism of using data generated from class presentations is that the delivery might be stilted, or less than natural, because of the scripted nature of the task. However, being students in their third year of study, the participants were all skilled in-class presenters and in the main did not require strong adherence to a script. Cue cards were used by students during their presentations as an aide-mémoire, and students also used overhead transparencies. Because it was an assessed task, students may well have rehearsed their presentations. In addition, most students were either presenting on their final year projects—material with which they are more than familiar— or on a passion or hobby of theirs. Therefore, the data used in this study can be considered to give a reasonably accurate representation of the features of English connected speech of all participants. The topics covered in the data are presented in Table 1; each participant is labelled f for female or m for male. A purely subjective score of how stress timed or syllable timed the speaker sounds based on my expert opinion as a phonetician is given in the column marked Rhythm; a rating of 1 means a speaker sounds stress timed and a rating of 5 that the speaker sounds syllable timed. I wish to emphasise that this score is entirely subjective. Participants were tape-recorded using a personal stereo cassette recorder (Sony Walkman™ model WM-R707) with a lapel microphone clipped on to either a lapel or the collar of their clothing. The participants were fully aware that they were being recorded and had given their permission for the recordings to be used as data for study purposes.

Data Processing
The speech collected was analysed by converting the recordings to a machine-readable sound signal and measuring the duration of syllables

TABLE 1 List of Participants’ Presentation Topics and Author's Impression of Rhythmic Type Female Speaker Files f01: To be a good manager f02: Intercultural communication f03: Personal space f04: Interview follow-up f05: Advertisements f06: Wording of advertisements f07 – nonverbal behaviour f08: AIDS f09: SCMP versus People’s Daily f10: Goal setting Rhythm 1–5 2 3 4 5 3 4 3 3 4 3 Male Speaker Files m01: Property & housing market m02: Ceramic tiles m03: Safety (demolition) m04: Site supervisor motivation m05: Interest risk m06: Pollution problems m07: Job satisfaction m08: Bamboo scaffolding m09: Industrial accidents m10: Poling contractors Rhythm 1–5 3 4 3 4 5 5 4 3 5 4

using specialist computer software on a PC platform. Speech from the cassette recordings was sampled at a rate of 16,000 samples per second (16 kHz, 16 bit mono PCM), and then labelled on computer by the author. The computer software used to label data in this study is Speech Filing System (SFS; for the latest edition, see Phonetics & Linguistics, UCL, 2004), developed for research purposes at the Department of Phonetics and Linguistics, University College London. With the SFS software, speech data may be labelled in a number of ways. For the purposes of this study, a broad phonetic segmental transcription was used but included glottal stops, nasalisation, vocalised /l/, and aspiration, where strong. The software then allows the user to generate a file which contains information on the duration of each of the sound segments in samples per second. This number was converted into milliseconds (ms) by dividing it by 16 (thus 16,000 samples = 1,000 ms). Calculations of syllable duration were made from that information; this is then analysed and compared with the SCRIBE data.

The British English Data
The British English data used for this study were drawn from the SCRIBE corpus (see Spencer, 1990). SCRIBE is a corpus of British English speakers from four main areas of the United Kingdom: the Southeast (with received pronunciation or a southern standard British

English accent), Glasgow, Leeds, and Birmingham. The aim was to record and annotate the speech of 30 speakers from each set performing a number of different spoken tasks, which include reading several different sets of sentences, reading a passage, and undertaking a map task to elicit free speech. In selecting appropriate material for comparison, it was necessary to decide which speech task performed by the British English speakers is most closely comparable to the Hong Kong English data. In this instance, it was decided to use the read passage for comparison. The passage itself takes little more than 2 minutes to read aloud and is about the advances in sailing technology since the time of the Vikings to the present day. This passage, and not the free speech task, was chosen for comparison because the Hong Kong English speakers, in giving presentations with the aid of note cards that may have been rehearsed, are performing a task which is in more ways similar to passage reading than to free speech. Five speakers were taken from the SCRIBE material, one female and four male speakers. All were from the Southeast set. The choice of speakers was restricted by the availability of comparable transcription passages because only one female and five male speakers from this region were transcribed using a broad phonetic transcription. The passage is divided into four paragraphs of just over 30 seconds each. To extract an amount of speech from each of the speakers for comparison, approximately one minute of each of the four male speakers was used, two of the male speakers reading the first two paragraphs and the other two reading the last two paragraphs. In the case of the female speaker, as there was only one female for whom a broad phonetic transcription was available, the entire passage was used in this study. Speech from the SCRIBE corpus was sampled at a rate of 20,000 samples per second (20 kHz) and labelled using suitable speech analysis software. This renders the label files into a slightly different format from that of SFS, and so the data were manipulated on computer to make them comparable. In addition, the segmental durations derived from sampling at 20 kHz were divided by 20 in order to give a duration in milliseconds (20,000 samples 1,000 ms).

In order to calculate the duration of the syllables in the data, it is first necessary to syllabify the data. This was achieved using the maximal onsets approach adopted in Roach, Hartman, & Setter (2006) for syllabifying the entries in the seventeenth edition of the English

Pronouncing Dictionary. In its most basic form, maximal onsets means that, “where possible, syllables should be divided in such a way that as many consonants as possible are assigned to the beginning of the syllable to the right” (p. xiii), assuming a linear transcription in which speech is transcribed from left to right. The rules for syllabification were based on what is permissible in the citation form of a monosyllabic word in English. In the case of vowels, long vowels and diphthongs in English were permitted to be syllable final, but short vowels were not; this is because no monosyllabic English word occurs in RP or southern standard British English, which ends with one of the short vowels / e æ / or /ʊ/. There are, however, exceptions among short vowels in the case of unstressed syllables. Schwa is always weak and can therefore occur in syllable-final position; unstressed / / and /ʊ/ also occur in weakened syllables in English and were therefore afforded the same structural status when weakened. In this system, photography, for example, is syllabified /f . t g.r .fi/, and educate /ed.jʊ.ke t/. The nonphonemic vowel symbols [i] and [u] were used either as the counterparts to unstressed / / and /ʊ/, respectively, when either was followed by a vowel (e.g., react /ri'ækt/; influential / nflu'enʃəl/) or appeared word finally in unstressed positions (e.g., happy /'hæpi/). This practice is in line with current practice transcribing British English, as demonstrated in Roach et al. (2006) and Wells (2000). It should be noted, however, that using the symbols /i/ and /u/ is based on native speaker intuitions of vowel quality in the positions mentioned earlier and that the symbols have no phonemic validity. Concerning consonants, it is permissible to have up to three consonants initially and four consonants finally in restricted combinations in British English monosyllables (Roach, 2000). All consonants making up the consonantal inventory of British English, with the exception of /ŋ/, may occur in initial positions. In final positions in British English, the approximant consonants /r w/ and /j/ and fricative /h/ are not permitted. However, according to the maximal onsets rule, in connected speech, consonants belonging to the end of words may be syllabified as initials when the speech is broken down into syllables. For example, if the maximal onsets rule is applied, cats and dogs is likely to become /kæt.sn.d gz/ and forced in two will be divided as /f ፡.st n. tu፡/ in connected speech. It was found in the process of syllabifying the Hong Kong English data that, in some cases, it was difficult to apply maximal onsets insofar as many syllables that would usually be weakened in British English connected speech were pronounced with a vowel that was not weakened. For example, collapse of any part is produced by speaker m03 as /k læps venipɑ፡t/, rather than /k læps venipɑ፡t/. If adhering strictly

to maximal onsets in this case, it would be necessary to divide collapse of any part as /k l.æp.s v.en.i.pɑ፡t/; however, it was felt that for Hong Kong English speakers, a short vowel in syllable final position is entirely possible, as long as the syllable is unstressed. In other words, unstressed short vowels in syllable final position in Hong Kong English are treated as having a similar status to / / and /ʊ/ in British English. In fact, Jenkins (2000) positively encourages this approach with regards to English as an international language. This interpretation leads us to the following division of syllables: /k .læp.s .ven.i.pɑ፡t/, which is comparable to the likely British English version, /k .læp.s .ven.i.pɑ፡t/. This approach, together with others mentioned below, was adopted to cope with the data in this study and is not intended to imply that Hong Kong English speakers have overt rules about syllabification. Other matters arose during syllabification. One was that Hong Kong English has many phonetically nasalised vowels, where Cantonese speakers of English lower the velum in anticipation of a nasal consonant which is present in the target phonology but not necessarily realised with a full oral closure (Walmsley, 1997). In syllabifying nasalised vowels where there was nasalisation in anticipation of a final nasal consonant, but no final nasal consonant was pronounced, the syllable was treated as containing a final nasal consonant. For example, speaker M06 produces construction industry as [k nsrÙk ò ̃ind stri], where the vowel in the third syllable [ò ̃] is nasalised; this syllable is treated as ending with a nasal consonant. A second issue concerns final dark and syllabic /l/. As is noted in Hung (2000), dark and syllabic /l/ are frequently realised as vowels by Hong Kong English speakers. Where a dark /l/ was very clearly realised as a vowel, it was transcribed as a vowel. Finally, the Hong Kong English data contains a large amount of glottal stopping. This feature can prevent the linking associated with connected English speech. Where the glottal stop is clearly not a realisation of another consonant and appears in prevocalic position (e.g., speaker m09’s the accident is realised with a glottal stop at the beginning of accident), it is not included as part of the syllable measurement. This rule was also applied to the British English data to make sure the treatment was comparable. The British English data are much more straightforward to syllabify, and in no cases were maximal onsets violated to cope with a speaker’s idiolect.

Assigning Syllables to Stress Type
I assigned the syllables to a category in the stress hierarchy by using an auditory/perceptual analysis, that is, listening to the speech in its

continuous form and deciding which syllable belonged to which category, based on my experience of both varieties of English and my expertise as a phonetician. The categories were weakened (1), unstressed (2), stressed (3), and tonic (4). A sample was checked by another phonetician with less experience of Hong Kong English for verification; no objective measure of interrater reliability was carried out, however.

Tables 2 and 3 give an overview of syllable duration across the two language types, measured in milliseconds (ms). As previously stated, the Hong Kong English data comprised 4,404 syllables and the British English data comprised 1,847 syllables. It becomes immediately apparent from a quick glance at Tables 2 and 3 that the overall duration of syllables in Hong Kong English was longer than in British English. The mean syllable duration for the Hong Kong English speakers was 244.39 ms and that of the British English speakers was 109.99 ms (all data in this section is rounded to two decimal points where appropriate, with some rounding resulting in one decimal point only). The British English syllables were shorter despite the fact that the British English speakers were performing a reading task in which their speech tempo was reasonably slow and precise. However, the standard deviation in both cases was relatively similar: 104.6 for the Hong Kong English speakers and 109.21 for the British English speakers. The distributions for both sets of data were normal, and an alpha level of 0.01 was used for all statistical tests. The syllables were divided into four categories: weakened, unstressed, stressed and tonic, as outlined earlier, and these categories were used in the data analysis. It was assumed that tonic syllables in the data would be the longest in duration, followed by stressed, unstressed, and then weakened syllables. The findings support this assumption. Descriptive statistics can be seen for Hong Kong English and British English in Tables 4 and 5, respectively (1 weakened, 2 unstressed, 3 stressed, 4 tonic).
TABLE 2 Descriptive Statistics for All Syllables: Hong Kong English N Duration (ms) ValidN (listwise) 4,404 4,404 Minimum 22.38 Maximum 759.38 Mean 244.39 Std. deviation 104.60



TABLE 3 Descriptive Statistics for All Syllables: British English N Duration (ms) ValidN (listwise) 1,847 1,847 Minimum 18.00 Maximum 687.00 Mean 109.99 Std. deviation 109.21

Figure 1 shows the difference between the two varieties. The Hong Kong English data are represented by the upper solid line (L1 = 1 in the key), and the British English data by the lower dashed line (L1 = 2). On the x (horizontal) axis, 1 = weakened syllables, 2 = unstressed syllables, 3 = stressed syllables and 4 = tonic syllables. On the y (vertical) axis, average duration in milliseconds is given. From the fact that syllables in the Hong Kong English data were considerably longer overall than those in the British English data, it might be anticipated that syllables in all categories in the Hong Kong English data would be significantly longer statistically than those in the British English data, but in fact this is not the case. It is clearly shown in Figure 1, in which a curvilinear relationship between stress and duration emerges, that this group of Hong Kong English speakers maintain differences in length across the four stress levels, but that they do not maintain these differences to the same degree as the British English speakers studied; the ratio is different. An independent samples t-test of each category finds the data to be different at a significance level of p ≤ 0.000 for weak, unstressed, and stressed syllables, but it finds no significant difference between the duration of tonic syllables across the two language groups, at p 0.536 (equal variances not assumed). This finding can be expected from looking at Figure 1. The ratios of the syllables (Hong Kong English: British English) are
TABLE 4 Syllable Duration According to Stress Level: Hong Kong English Data Stress level Duration (ms) ValidN (listwise) 1 Duration (ms) ValidN (listwise) 2 Duration (ms) ValidN (listwise) 3 Duration (ms) ValidN (listwise) 4 N 849 Minimum 33.19 Maximum 637.75 Mean 195.34 Std. deviation 100.09

1922 960 673

22.38 94.06 72.25

669.38 697.88 759.38

220.78 282.47 319.38

90.29 91.22 107.36



TABLE 5 Syllable Duration According to Stress Level: British English Data Stress level Duration (ms) ValidN (listwise) 1 Duration (ms) ValidN (listwise) 2 Duration (ms) ValidN (listwise) 3 Duration (ms) ValidN (listwise) 4 N 643 498 408 298 Minimum 18 20 73 99 Maximum 453 599 553 687 Mean 129.95 150.30 246.83 314.23 Std. deviation 71.97 77.52 71.32 124.76

as follows: weak syllables 1:1.5; unstressed syllables 1:1.47; stressed 1:1.14; tonic 1:1.02. A feature revealed by the descriptive statistics that may have influenced the perceived rhythm of Hong Kong English was the much
FIGURE 1 Line Plot of Syllable Duration According to Stress Level in Hong Kong English And British English



greater proportion of unstressed but not weakened syllables in the Hong Kong English data, as demonstrated in Figure 2. Although the Hong Kong English and British English data had similar percentages of stressed and tonic syllables, the Hong Kong English data had far more unstressed than weakened syllables: 43.64% of Hong Kong English syllables were unstressed and 19.3% weakened, compared with 26.96% unstressed and 34.81% weakened in the British English data.

The line plot, Figure 1, is rather telling about the situation in Hong Kong English rhythmic stress: Weak and unstressed syllables are not as short as those in the British English speech data, but tonic syllables are very similar in length. Thus, the degree to which these syllables differ in Hong Kong English is in sharp contrast to that of British English. For the pattern to reflect the British English speakers, and taking into account the overall difference in syllable length, the lines would have had to have been parallel, not convergent. The lines, although similar in form, are certainly not parallel, and the only point at which the two varieties show no statistically significant difference is tonic syllables (4 on the x axis). At each of the other three points, the amount of difference becomes progressively less, but is still significantly
FIGURE 2 Proportion of Syllables According to Stress Level for Hong Kong English and British English



different from the British English data. However, it would be overly simplistic to conclude that the difference in rhythmic pattern between Hong Kong English speakers and British English speakers is dependent only on differences in relative syllable duration across categories of stressing. Figure 2 clearly shows that the Hong Kong English speech has a much greater proportion of unstressed syllables than does the British English speech, which contains more weakened syllables, and this fact will affect the perceived rhythm of Hong Kong English. It was noted earlier that syllables in Hong Kong English were longer on average than those of British English. This difference could be due to speaking rate, and no attempt has been made in this study to normalise the data for differences in participants’ speaking rate—unlike, for example, Low et al. (2000). However, speaking rate should not affect the relative durations of syllables, and will certainly have no bearing on the ratio of each category. In addition, it is hoped that, through choosing data from a fairly large number of Hong Kong English speakers (20 in total), speaking rate would be reasonably consistent, at least for this group of speakers, for the task they were doing (i.e., giving a presentation). Relative syllable duration in different levels of stressing may be a key factor in determining the perceived rhythm of a language. This belief arises directly from Dauer’s (1983) observations concerning differences in vowel reduction in syllables across languages demonstrating different rhythmic types, or being at one end or the other of a stressbased continuum, and it was the basis of the hypothesis explored in this study. Support can certainly be found for this hypothesis. Figure 1 clearly shows that, although this group of Hong Kong English speakers maintained the differences in length across the four stress levels weakened, unstressed, stressed, and tonic, the group of British English speakers did not maintain them to the same degree. Vowel reduction, or lack thereof, is one of Dauer’s (1983) criteria for languages to differ in the way they sound rhythmically; we can expect that these differences in the patterns of vowel and syllable reduction shown in Figure 1 will, therefore, serve to make Hong Kong English sound different rhythmically from British English. This situation could be seen as similar to Low et al’s (2000) finding for syllable nuclei in Singapore English. The descriptive statistics revealed a feature of equally high importance to the perceived rhythm of Hong Kong English, that of the much greater number of unstressed but not weakened syllables in the Hong Kong English data (Figure 2). This result is tied to Dauer’s (1983) criterion of vowel reduction, and it could be seen as similar to Low and Grabe’s (1999) “lack of ‘deprominencing’” (p. 49) in Singapore English. Although the Hong Kong English and British

English data have similar proportions of stressed and tonic syllables, the Hong Kong English data has far more unstressed than weakened syllables: 43.64% of Hong Kong English syllables are unstressed and 19.3% weakened, compared with 26.96% unstressed and 34.81% weakened in the British English data. Therefore, more syllables in Hong Kong English appear with a full vowel rather than a schwa or syllabic consonant—they are, in effect, less weak, and so lack deprominencing. Dauer’s observation that syllable-based languages do not have the same patterns of vowel reduction supports the fact that Hong Kong English is likely to sound syllable rather than stress based if a language transfer stance is adopted because such a stance reveals Hong Kong English speakers’ preference for unstressed rather than weakened syllables. Because the pattern of strong and weak syllables seems to be important in native speakers’ perception of stress-based languages (see, e.g., Adams, 1979; Anderson-Hsieh et al., 1992; Cutler, 1993; Fear et al., 1995), the lack of deprominencing in the Hong Kong English data could suggest that these speakers are likely to be less intelligible to native speakers of English when compared with their British English counterparts. Native speakers of English may be less able to understand these Hong Kong English speakers because the predictability of English speech rhythm, which Buxton (1983) notes to be “relevant to perceptual processing” (p. 120), is somewhat lacking in their speech. So, is it possible to explain what is responsible for the differing pattern of Hong Kong English from British English? One possible cause is L1 transfer. Cantonese is described as a syllable-timed language, in part because it has an extremely restricted number of instances where syllable weakening is possible (Bauer & Benedict, 1997). This restricted syllable weakening could mean that Cantonese speakers of English do not demonstrate native-like patterns of English stress-timing because they transfer their L1 patterns of syllable-timing, in which a full vowel appears in each syllable, with syllables typically not subject to weakening, to the L2. This transfer of course would go hand in hand with other features of the L1 syllable, all of which might contribute to the perceived syllable-timed sound of the L2. Another suggestion involves the difference in how English and Chinese are represented graphically. The Chinese writing system is not alphabetic, but pictographic or ideographic. Outside of alphabetic representations of Chinese, like Pin Yin for Mandarin, no claim is made that the form of the character in any systematic way depicts the pronunciation of the syllable represented (although a phonetic element may be present). English, on the other hand, is basically represented in a phonetic manner, in that letters are used which correspond to the sounds of the word and presented in a linear left-to-right format giving the order in which these sounds are produced. However, English

is notorious for being difficult to spell because the graphemephoneme correspondence is not static and is therefore often a poor guide to pronunciation. Luke & Richards (1982) have commented on the more frequent occurrence of full vowels in syllables that are neither stressed nor tonic in Hong Kong English, and they ascribed it to the influence of English orthography. Additionally, Brown (1988) mentions the same phenomenon in Singapore English and suggests spelling pronunciation—pronouncing each vowel with a full value as represented in the spelling—as a possible culprit. It could be that the phenomenon of preferring unstressed rather than weakened syllables is, therefore, not so much a matter of L1 transfer, but of habits developed when learning to read in L2. It is also possible that a combination of L1 transfer and L2 reading habits is responsible. To conclude, Hong Kong English speakers have smaller differences in the duration of weakened, unstressed, stressed, and tonic syllables than British English speakers as well as a much greater proportion of unstressed to weakened syllables than found in the British English data. These two factors combine to affect the perceived rhythm of Hong Kong English speech.

Having reported on a study of speech rhythm in speakers of Hong Kong English, it should be clear that I value the importance of the relative stressing of syllables in a stream of speech and believe that work on teaching English speech rhythm of the kind thought to exist in British English has obvious importance and rewards for learners. Like Chela-Flores (1994, 1998) and Gilbert (1984), I advocate work on syllable duration as a way of teaching and learning speech rhythm because, as this study shows, the duration of syllables in Hong Kong English does not differ from syllables in British English; this similarity in duration contributes to the lack of deprominencing which can make Hong Kong English difficult to follow. More native-like speech rhythm will improve matters for those British, American, or Australian visitors, for example, whether commercial or recreational, who are not used to the syllable-timed patterns of Hong Kong English, resulting in better transactions for all concerned. The controversy over whether the terms stress timed and syllable timed are useful as pedagogical terms, however, rumbles on. Cauldwell (2002), based on his own research, concludes that the use of these terms in fact obstructs our understanding of how spontaneous speech works and that they should therefore be abandoned altogether in teaching and learning theories and materials. But although the

influence of research into the reality of the production of stress- and syllable-timed languages is growing in English language teaching circles, sensible research will not fail to focus on the importance of and mechanisms behind appropriate stressing to make messages clear. For Marks (1999), the use of rhythmical structures such as rhymes in the classroom is valid in so far as it
provides a convenient framework for the perception and production of a number of characteristic features of English pronunciation which are often found to be problematic for learners: stress/unstress (and therefore the basis for intonation), vowel length, vowel reduction, elision, compression, pause (between adjacent stresses). (p.198)

Although stress timing may itself fall out of favour as a description of what is happening in the rhythm of English, skilful identification of some key aspects of the theory and how they contribute to making messages clear are useful for pedagogical purposes.
Jane Setter is a lecturer in phonetics at the University of Reading, Reading, England. She has also worked in Hong Kong and Japan. Jane is co-editor with Peter Roach and James Hartman of the seventeenth edition of Daniel Jones’s English Pronouncing Dictionary and joint coordinator of IATEFL’s Pronunciation Special Interest Group.

