Speed Good

Bruschke Lab ADI 2K5 Page 1 of 6

Any listener can easily adjust to faster speech rates two to three times the “normal” rate. JANSE, ’03 [Esther; Ph.D. @ Utrecht institute of Linguistics OTS; “Production and Perception of Fast Speech”] Listeners can adapt to very fast rates of speech. They can quite easily learn to understand speech which is compressed to rates that are much faster than can ever be attained in natural fast speech. In
the Introduction Chapter, the question was raised whether this fact provides a challenge to the Motor theory of speech perception. The central claim of the Motor theory is that “to perceive an utterance, then, is to perceive a specific pattern of intended gestures”. But what then, if what listeners perceive cannot possibly be a pattern of intended gestures produced by a human speaker? For the perception of synthetic speech, Liberman & Mattingly (1985) claim that synthetic speech will be treated as speech if it contains sufficiently coherent phonetic information. In their view, “it makes no difference that the listener knows, or can determine on auditory grounds, that the stimulus was not humanly produced; because linguistic perception is informationally encapsulated and mandatory, he will hear synthetic speech as speech” (p.28). Consequently, the fact that people can listen to speech which is time-compressed to much faster rates than can be produced by human speakers is not a strong argument against the Motor theory. Time-compressed speech is still sufficiently phonetically coherent to be perceived as speech. Listeners will only have to perform a timescaling step in order to derive the original gestures. In normal everyday speech, speaker and listener tune in to each other. Listeners need to adapt to the speaker’s voice characteristics and dialect or regional accent. On the speaker’s side, speakers adapt their speech to the requirements of the communicative situation (Lindblom 1990; Nooteboom & Eefting 1994). An example of this type of co-operative behaviour is accentuation and deaccentuation. Accentuation is used by the speaker to guide the listener’s attention to new and informative words in the speech stream, whereas given or more redundant information is usually deaccented. Likewise, speech rate can also be varied according to contextual redundancy. Speakers may have to speak relatively slowly and carefully when they are conveying new information, but they can use a relatively fast speech rate when they are, e.g., recapitulating what they have just said. However, this pact between speaker and listener does not hold for time-compressed speech. Now the listener is presented with a global speech rate which is much faster than the speaker intended. In this chapter we hope to give some insight into how listeners deal with these uncooperative situations.

In order to adapt to strongly time-compressed speech (two to three times the original rate), listeners need only a small amount of training (Pallier et al. 1998). When adapting to time-compressed speech, listeners are assumed to learn to make acoustic transformations on the signal in order to derive the correct speech segments and words.

Speed Good
Bruschke Lab ADI 2K5 Page 2 of 6

Listeners only need a short amount of time to comprehend faster time-compressed speech. JANSE, ’03 [Esther; Ph.D. @ Utrecht institute of Linguistics OTS; “Production and Perception of Fast Speech”] Subjects need only a limited amount of speech material to show significant improvement, or even plateau performance, in the identification of highly time-compressed speech. At the same time, the

adaptation effect does not seem to be lasting. This proves the flexibility of the speech perception mechanism: listeners tune in to a fast speech rate quickly, but once they are no longer presented with time-compressed speech, they gradually lose the initial adaptation to it. Secondly, the results disprove Foulke’s (1971) point that successful processing of heavily time-compressed speech is possible as long as the listener has enough processing time. The duration of a segment’s steady-state portion, and also the care of articulation of the segment itself, determine whether the segment can be identified after strong time compression. Obviously, there is a limit to what listeners can adapt to: the segmental intelligibility of speech, as measured in the identification of the nonwords, remains rather low at these heavy time-compression rates.

Studies show that much faster speech is still intelligible. JANSE, ’03 [Esther; Ph.D. @ Utrecht institute of Linguistics OTS; “Production and Perception of Fast Speech”]
The duration study described in the previous section shows that the prosodic pattern at word level is made more pronounced with increasing speech rate. These production data then lead to the expectation that the intelligibility of time-compressed speech will be

Experiments in our laboratory have shown that speech remains intelligible at rates that are much faster than can ever be attained in natural fast speech. Speech that is time-compressed to the fastest rate which human speakers can achieve is still almost perfectly intelligible. It would seem reasonable to evaluate the perceptual effects of applying fast speech timing to time-compressed speech at the fast rate which is produced by the speakers. However, the perceptual effects of more natural fast speech patterns will first be established for a much faster rate of
improved if its temporal organisation is closer to that of natural fast speech. speech. There are two reasons for this. First, a practical reason is that intelligibility of artificially time-compressed speech is very high, even at rates twice the normal rate. This ceiling effect would make any intelligibility differences between linearly time-compressed and nonlinearly time-compressed speech difficult to find. Second, a more fundamental reason is that the role of prosody is expected to become more important as the listening situation becomes more difficult. The information carried by the more salient prosodic pattern might be exploited in difficult listening situations. For these two reasons, the rules of fast speech timing were extrapolated to even faster rates.

Speed Good
Bruschke Lab ADI 2K5 Page 3 of 6


Adaptation to faster speech is caused by several thoroughly researched factors. JANSE, ’03 [Esther; Ph.D. @ Utrecht institute of Linguistics OTS; “Production and Perception of Fast Speech”] Dupoux & Green (1997) speculate that the adjustment to time-compressed speech may be the result of two processes operating simultaneously: a short-term adjustment to local speech rate parameters, and a longer-term, more permanent, perceptual learning process. The studies by Pallier et al. (1998), Altmann & Young (1993), and Sebastián-Gallés, Dupoux, Costa & Mehler (2000) were set up to investigate the mechanisms that are responsible for these adaptation effects. They argued that adaptation to time-compressed speech may also involve certain phonological processes. Subjects who were trained with time-compressed sentences in a foreign language which was phonologically similar to their own native language (e.g., Spanish-speaking subjects adapted to Italian or Greek) showed an adaptation effect when they were subsequently presented with time-compressed sentences of their own language. Even though they did not understand the language they had been presented with first, they still performed better when presented with sentences of their own language in comparison to subjects who had not had any training at all. More importantly, these listeners also performed better than listeners who had been trained with a language that was phonologically distant from their own language (e.g., Spanish-speaking subjects
adapted to English or Japanese). The authors argue that certain pairs of languages show transfer of adaptation, while others do not. This suggests that adaptation does not rely on raw acoustic properties, but on phonological or rhythmic properties. This is in line with the distinction between different broad language classes, as laid out by, amongst others, Abercrombie (1967). In his study, stress-timed languages (such as English, Dutch and German), which are said to exhibit nearly equal intervals between stresses or rhythmic feet, are distinguished from syllable-timed languages (such as Italian and Spanish), which display near isochrony between successive syllables (a third category are mora-timed languages, such as Japanese). More recent work on rhythmic differences between languages has refined this dichotomy between stress-timed and syllable-timed languages. Dauer (1983) observed that the rhythmic classes differ with respect to, amongst others, syllable type inventory and spectral vowel reduction. Ramus, Nespor & Mehler (1999) argue that the proportion of vocalic intervals in an utterance (%V) is the best acoustic correlate of rhythm class: stress-timed languages having, on average, a lower %V than syllable-timed languages. Low, Grabe & Nolan (2000) propose a pairwise variability index (the mean absolute difference between successive pairs of vowels, combined with a normalisation procedure for speaking rate) to capture rhythmic differences between languages or between language varieties. Importantly, a number of studies have shown that languages within such a class show particular language processing mechanisms (Cutler & Mehler 1993; Cutler, Mehler, Norris & Segui 1986; Ramus et al. 1999). Processing a language that belongs to the same class as one’s native language should then be easier than processing a more distant language.

Speed Good
Bruschke Lab ADI 2K5 Page 4 of 6


Don’t believe the hype. Studies show that speech rate adaptation is simply an issue of the listener tuning in. JANSE, ’03 [Esther; Ph.D. @ Utrecht institute of Linguistics OTS; “Production and Perception of Fast Speech”] Altmann and Young (1993) showed that adaptation also occurs when listeners are trained with timecompressed (phonotactically legal) nonwords. Subjects who had been trained with time-compressed nonsense sentences (sentences in which all content words had been replaced by nonsense words) performed equally well on (meaningful) test sentences as subjects who had been trained with time-compressed meaningful sentences.
In this thesis, we will not be concerned with the exact level at which adaptation takes place. Adaptation is assumed to take place at some pre-lexical level, whether

The fact that adaptation has been shown to be fast (Dupoux & Green 1997; Pallier et al. 1998) suggests that it is not an explicit learning procedure, but rather a quick process of tuning in.
phonological or not. It seems reasonable to assume that lexical redundancy plays an important role in the perception of time-compressed speech. The more degraded the segmental information is, the more one has to rely on extra non-segmental information. Most of the studies employing time-compressed speech have used meaningful sentences as test material. Thus, listeners could make use of both the segmental information and the non-segmental sources of information. The present study was set up to examine segmental intelligibility and the effect of lexical redundancy separately. By disentangling these two factors, we hope to shed more light on the mechanisms underlying the robustness of the speech perception mechanism.

Listeners’ inability to cope with extremely strongly time-compressed speech has been ascribed to a limit on storage capacity by Foulke (1971). According to Foulke, complete processing of speech is possible as long as there is processing time available in between stretches of highly time-compressed speech. This would mean that the identifiability of the time-compressed representations is not so much at stake, but that problems mainly arise because of the lack of processing time and because some words actually fall out of the crowded memory.

Spreading boosts short-term memory, which is key to education and remembering everyday tasks. PSYCHOLOGY TODAY, ‘92 [October 1992 (report of the results of the Raine et al study)] Speech rate is a strong index of short term memory span... 'Therefore, the faster you can talk, the greater your short-term memory,' says Adrian Raine, PhD, a University of Southern California psychologist. The link has been established for adults for some time, Raine reports in Child Development. Now, he and his colleagues find the correlation holds for kids as well, a finding that promises short-term payoff in the classroom and long-term payoff in life. Short-term memory is the power behind recall of phone numbers, directions, and other everyday tasks. It is also the foundation of arithmetic and reading skills... That raises the possibility that speech- training may be a short-cut to achievement." (p.14)
"If friends criticize you for talking too fast, at least they can't also accuse you of having a bad memory.

Speed Good
Bruschke Lab ADI 2K5 Page 5 of 6

Talking faster increases memory, preventing losses with age. HULME + MCKENZIE, ’92 [Charles & Susie. (1992). Working Memory and Severe
Learning Difficulties. Hillsdale, USA: Lawrence Erlbaum Associates. Pg 45]

results are striking in that the same linear function relating recall to speech rate fits the results for all age groups. Subjects of different ages in this study all recalled, on average, as much as they could say in roughly 1.5 seconds. Increases in memory span with age are seen to be very closely related to changes in speech rate with age. Thus the results of these different studies are remarkably clear and consistent. The dramatic improvements in serial recall performance with increasing age are closely and quantitatively related to changes in speech rate. In terms of the articulatory loop theory, which gave impetus to these studies, the length of the loop appears to remain constant across different ages; more material is stored in this system because it can be spoken and so rehearsed more rapidly. These results, relating developmental increases in speech rate to increases in short-term memory efficiency, lead quite directly to a simple causal theory: That increases in memory span with age depend upon increases in speech rate.
Needless to say, however, such a theory is not necessitated by the findings. The findings are essentially correlational; as children get older their speech rate increases and in line with this so does their memory performance. It could be that both these changes depend upon some other factor. The obvious way to test this causal theory is to conduct a training study. If short-term memory depends upon speech rate, if we can successfully train children to speak faster, then this should, according to the theory, lead to a corresponding increase in short-term memory. (p.45)

TURN: Working memory is critical to literacy and math --- which is key to keeping your GPA up, which is key to you staying in debate. HULME + MCKENZIE, ’92 [Charles & Susie. (1992). Working Memory and Severe
Learning Difficulties. Hillsdale, USA: Lawrence Erlbaum Associates. Pg 45]
"In its broadest sense, working

memory refers to the use of temporary storage mechanisms in the performance of more complex tasks. So, for example, in order to read and understand prose, we must be able to hold incoming information in memory. This is necessary in order to compute the semantic and syntactic relationships among successive words, phrases, and sentences and so construct a coherent and meaningful representation of the meaning of the text. This temporary storage of information during reading is said to depend on working memory. In this view the ability to understand prose will depend on, among other things, the capacity of a person’s working memory system. Such temporary storage of information is obviously necessary for the performance of a wide variety of other tasks apart from reading, such as mental arithmetic (Hitch, 1978) and verbal reasoning (Baddeley & Hitch, 1974)."

Speed Good
Bruschke Lab ADI 2K5 Page 6 of 6

Speed is critical to linguistic abilities. STINE, WINGFIELD, + POON, ’96 [Elizabeth L., Arthur, & Leonard. “How much
and how fast: Rapid processing of spoken language in later adulthood.” Psychology and Aging, vol. 1, no. 4, 303-311]

"At a very fast rate, several things must be accomplished. The various processes required to recode linguistic stimuli into meaning have been articulated for both spoken language (Just & Carpenter, 1980; Marslen-Wilson & Tyler, 1980) and written text (Kintsch & vanDijk, 1978; J. Miller & Kintsch, 1980). There must be some initial phase in which the stimulus is encoded, physical features (visual or acoustic) are extracted, and lexical access is achieved (Just & Carpenter, 1980). Next, the language content must be parsed into meaningful idea units in which relationships are determined among words (Kintsch & vanDijk, 1978). These relationships are typically represented in terms of propositions consisting of a predicate and one or more arguments that are related by the predicate. Third, relationships between idea units of the text must be established in order to construct overall structural coherence in the text. Finally, the text must be related to and integrated with world knowledge. Although such processes would undoubtedly have to work in both a top-down and bottom-up fashion, the output at each of these stages would have to be held in an online working memory for an effective integration of meaning."