You are on page 1of 13

Journal of Personality and Social Psychology

1979, Vol. 37, No. 5, 715-727

Effects of Pitch and Speech Rate on Personal Attributions


William Apple Lynn A. Streeter
Columbia University Bell Laboratories
Murray Hill, New Jersey
Robert M. Krauss
Columbia University

In three experiments, subjects listened to recordings of male speakers answering


two interview questions and rated the speakers on a variety of scales. The re-
cordings had been altered so that the pitch of the speakers' voices was raised or
lowered by 20% or left at its normal level, and speech rate was expanded or
compressed by 30% or left at its normal rate. The results provided clear evi-
dence that listeners use these acoustic properties in making personal attributions
to speakers. Speakers with high-pitched voices were judged less truthful, less
emphatic, less "potent" (smaller, thinner, faster), and more nervous. Slow-talk-
ing speakers were judged less truthful, less fluent, and less persuasive and were
seen as more "passive" (slower, colder, passive, weaker) but more "potent."
However, the effects of the acoustic manipulations on personal attributions also
depended on the particular question that elicited the response.

Human speech provides a listener with at pitch variance, whereas "low energy" states
least two sources of information: a verbal such as sorrow and indifference are associated
channel, encoding the message's linguistic con- with a lower mean pitch and a slower speech
tent, and a vocal channel, conveying paralin- rate.
guistic information by variations in pitch, Given that emotional states do differ re-
speech rate, loudness, and the like. liably in their paralinguistic expression, to
One important type of information com- what extent do listeners use these vocal cues
municated via the vocal channel concerns a in judging the immediate affective state or
speaker's affective state. The vocal character- more enduring personality traits of a speaker?
istics associated with the expression of emo- An early investigation of the noncontent as-
tion are beginning to be understood (see, for pects of speech by Allport and Cantril (1934)
example, Fairbanks, 1940; Hecker, Stevens, demonstrated that listeners could judge, at
von Bismarck, & Williams, 1968; Williams & better than chance levels, a speaker's age and
Stevens, 1972). It is now reasonably well at least some personality characteristics from
established that stressful situations raise the voice alone: In four of six experiments, speak-
voice's fundamental frequency (the number ers' scores on a test of ascendance-submis-
of glottal pulses per second) and that "ac- sion (Allport's A-S reaction study) were
tive" emotions such as anger and fear tend judged with significant accuracy. However,
to be reflected in increased mean pitch and reviewing much of the voice-attribution work,
Kramer (1963) concluded that more com-
We wish to thank John Lin and Nina Macdonald
mon than accuracy in such judgment studies
for their valuable assistance. We are also grateful to was the finding of "vocal stereotypes." That
Myron Wish, Nancy Morency, and four anonymous is, certain voices were reliably, though some-
reviewers for their helpful comments. times incorrectly, judged as belonging to cer-
Requests for reprints should be sent to Robert M. tain personality types.
Krauss, Department of Psychology, 400 Schermer-
horn Hall, Columbia University, New York, New Unfortunately, as Kramer noted, the de-
York 10027. scription of the vocal parameters that under-

Copyright 1979 by the American Psychological Association, Inc. 0022-3514/79/3705-0715$00.75

715
716 W. APPLE, L. STREETER, AND R. KRAUSS

He such stereotypes leaves much to be desired. differential scales. Scherer found that judg-
While not all the earlier research on voice ments were most influenced by tempo and
attributions can be faulted on these grounds, pitch variations: Fast tempo led to attribu-
studies attempting to specify critical stimu- tions of highly active and potent emotions
lus dimensions often have neglected or con- (e.g., interest, anger, and happiness) and
founded important paralinguistic cues. For slow tempo to attributions of sadness, disgust,
example, in reporting two recent field experi- and boredom; extreme pitch variation with
ments, Miller, Maruyama, Beaber, and Va- rising contours produced ratings of highly
lone (1976) concluded that the persuasive- pleasant, active, and potent emotions (hap-
ness of a communication can be directly re- piness, surprise, interest). Other parameters
lated to the rate at which it is delivered. More influenced the perception of specific emotional
rapid speech was found to be more persuasive, qualities. While Scherer's method effectively
presumably because a fast talker is viewed as deals with the confounding of vocal qualities
more credible. Although Miller et al. have in natural speech and suggests the effect that
ruled out certain alternative explanations independently varied acoustic cues can exert,
(such as the effect of having limited oppor- in view of the artificial nature of the stimuli
tunity to counterargue against a rapid pre- used, it is not clear how directly the results
sentation), the stimulus materials used in can be generalized to the processing of speech.
their experiments warrant closer examina- Other investigators, using natural speech,
tion. In both experiments the persuasive mes- have achieved a degree of control over stimu-
sages were recorded by the same speaker at lus materials by means of editing techniques.
either a slow or a fast speech rate. This was For example, investigators have added or
accomplished "by simply instructing the removed such speech disfluencies as filled
speaker to practice delivering the same speech pauses and repetitions from stimulus audio-
as rapidly and slowly as possible while con- tapes and asked listeners to judge speakers'
trolling his level of enthusiasm and involve- credibility and other personal attributes (Lay
ment" (Miller et al, 1976, p. 618). The stim- & Burron, 1968; Miller & Hewgill, 1964).
ulus recordings in the two conditions do, of Typically, highly hesitant or disfluent speak-
course, differ in speech rate, but it is quite ers are assigned relatively undesirable per-
likely that they differ in other respects as well. sonality traits, and their communications are
In natural speech, such vocal parameters as judged to be low in credibility.
amplitude, pitch, and rate tend to covary. While the methods thus far described have
For example, rapid speech is likely to be all afforded some degree of stimulus control,
louder and higher pitched than normal speech recent developments in speech synthesis tech-
(Black, 1961). Consequently, it is quite pos- nology permit investigators to vary indepen-
sible that subjects in the Miller et al. study dently one or more parameters of natural
were responding to pitch and/or loudness cues speech. This approach has been exploited by
as well as to rate. Brown and his colleagues (Brown, Strong, &
Because it is so difficult to assess, in a con- Rencher, 1973, 1974; Smith, Brown, Strong,
trolled way, the contribution of various vocal & Rencher, 1975), who have manipulated
parameters to the attribution process using speech rate, mean fundamental frequency,
natural speech, a number of workers have at- and fundamental frequency variance, using
tempted to deal with the problem by using a computer-based analysis-synthesis system.
nonspeech stimuli. For example, Scherer Two major personality dimensions (termed
(1974) presented listeners with simple tone "competence" and "benevolence") have
sequences generated on a Moog synthesizer, emerged from factor analyses of judgments
in which a minimal set of acoustic cues of such manipulated speech. Generally speak-
(pitch level and variation, amplitude level ing, higher pitch seems to result in a speaker
and variation, and tempo) were varied fac- being judged less competent and less benev-
torially; listeners rated the emotional quality olent (Brown et al., 1974), whereas faster
of these tone sequences on a set of semantic speech rates produce judgments of higher
PITCH AND SPEECH RATE 717
competence but yielded an inverted-U rela- mean fundamental frequency of speakers'
tionship on the benevolence dimension (Smith voices increased when they lied, relative to
etal., 1975). a truth-telling baseline, and that this effect
The studies of Brown and his associates was stronger when the speaker had been mo-
make an important contribution to our under- tivated to lie effectively. In addition, Streeter
standing of the significance that listeners et al. found that although judges' ratings of
ascribe to vocal qualities. Nevertheless, they truthfulness were essentially uncorrelated
are not free of methodological problems. with pitch level, there was a significant nega-
They typically have relied on a small number tive correlation between judged truthfulness
of stimulus voices, all of whose acoustic per- and average pitch for listeners who heard the
mutations are presented to the same raters. speech after it had been passed through a con-
For example, the Brown et al. (1974) study tent filter—a device that destroys intelligibil-
used only two adult male speakers uttering ity without affecting vocal features of the
the same sentence ("We were away a year utterances. Apparently, the natural pitch in-
ago"), which was then manipulated to fill crements during lying (on the average about
the 27 cells of a 3 X 3 X 3 factorial design. 3 Hz) were too small to affect the judgments
It is not clear that judgments based on hear- of listeners who had content available but
ing 54 repetitions of the same content bear were taken into account by listeners who
much similarity to the kinds of everyday per- could not understand the responses' verbal
sonality ascriptions listeners make under more content. However, since pitch increments and
natural conditions. Additionally, the particu- speech rate decrements were correlated in
lar parametric values used in these studies their data, Streeter et al. could not rule out
are problematic. Brown et al. (1974) reported the possibility that listeners were attending
that in the high-pitch condition, the original to rate differences rather than pitch differences
fundamental frequency was multiplied by a between true and false utterances. From the
factor of 1.8. Although they did not report finding of Miller et al. (1976), one would
average fundamental frequency values for expect slower speech to result in lower per-
their two speakers, assuming that they fell ceived speaker credibility. In Experiment 1
into the normal range for .males, Brown et we assessed the effects of these two variables
al.'s high-pitch manipulation would have on truthfulness judgments.
raised a male voice into the female range.
Similar criticisms apply to their choice of Method
speech-rate scale factors. Stimulus materials. Forty male Columbia Col-
The present article reports an exploratory lege undergraduates, all native speakers of English,
series of experiments designed to demonstrate answered questions for use as stimulus materials. They
the effects of two acoustic parameters—aver- were individually recorded in a sound-isolated booth
on high-quality audio equipment, and they received
age fundamental frequency and speech rate— course credit for their participation. Each speaker
on judgments of several state and trait vari- answered six standard questions dealing with his
ables. It was hoped that by using a large opinion on a range of topics. The questions were
number of speakers and naturally produced administered in a fixed order, and speakers were in-
utterances, together with more conservative structed to answer all questions honestly and frankly.
They were told to give brief answers, but to give
values for acoustic alteration, and by inde- more than a yes-or-no response. All speakers were
pendently manipulating the two parameters debriefed concerning how their interviews might be
factorially, we could overcome the methodo- used and signed informed-consent releases permitting
logical problems noted above. the later use of the recordings.
Two of the six interview questions were selected
for use in the experiment. One asked the subject's
Experiment 1 opinion of college admissions quotas designed to
favor minority groups (Question 3 ) ; the other asked
Experiment 1 derives from a finding re- the subject what he would do if he suddenly won
or inherited a large sum of money (Question 6).
ported by Streeter, Krauss, Geller, Olson, and These two questions were chosen because both elicited
Apple (1977), who demonstrated that the a diversity of responses and because the two differed
718 W. APPLE, L. STREETER, AND R. KRAUSS

on a dimension of personal salience for the respon- speakers had been instructed to tell the truth, whereas
dents. (The question of quotas is a matter of gen- the remaining half had been told to lie—that is, to
uine concern and frequent discussion among Co- give answers that did not correspond to their actual
lumbia undergraduates.) Twenty-seven speakers, who beliefs or feelings.
gave responses that were not excessively long and The two stimulus tapes were played over high-
represented the spectrum of opinions on the two quality audio equipment with the order of tape
questions, were selected. presentation counterbalanced. A SOO-Hz warning tone
Each of the 27 speakers was then randomly as- preceded and followed each answer by 1 sec. Raters
signed to one of nine cells of a 3 (rate: slow, un- were given IS sec in which to rate the truthfulness
manipulated, fast) X 3 (pitch: low, unmanipulated, of the answer on a 7-point scale ranging from "not
high) completely crossed factorial design, with three at all truthful" (1) to "entirely truthful" (7).
speakers in each cell. All speech material was digitized Because of variations in the quality of the record-
and analyzed by the linear predictive coding (LPC) ings, raters were also told that recordings had been
method of Atal and Hanauer (1971). (The LPC made using a variety of equipment in several differ-
analysis calculates 14 parameters every 10 msec. ent environments.3 The experimental session lasted
Twelve of these parameters represent pseudoarea about an hour, and there was a S-min. rest period
functions of the vocal tract; the other two are between tapes. Participants in the three rating studies
values for amplitude and fundamental frequency. were sent a report describing the purpose and detail-
The advantage of LPC analysis is that such vari- ing some of the findings of the study several weeks
ables as rate, pitch, and amplitude can be manipu- after its conclusion.
lated without changing other voice parameters. In
addition, since the parameters are derived from
the original speech, the quality of the resulting syn- Results
thesis is high.) The two responses of each speaker
were manipulated on a DDP-224 computer using an Some of the premanipulation characteristics
interactive program (Nakatani, Note 1) that displays of the stimulus materials are summarized in
the parameters as functions of time and allows the Table 1. Note that the more involving topic,
user to manipulate any or all of the 14 parameters college admissions quotas, resulted in signifi-
as well as to linearly expand or compress the
time base. The altered utterances can then be syn-
cantly longer, slower, and higher pitched re-
thesized and recorded on audiotape. sponses. Since the quota topic always preceded
The scale factors chosen for the low- and high- the money topic in the original order, we
pitch manipulations were 80% and 120%, respec- cannot rule out the possibility that these dif-
tively, of the speaker's unmanipulated fundamental ferences are due to serial position effects.
frequency. The values chosen for the slow and fast
speech rate manipulations were 70% compression
and 130% expansion, respectively, of the utterance's 1
The rate manipulation algorithm linearly ex-
time base, resulting in speech rates that were 77% panded or compressed consonant and vowel informa-
and 143% of the unmanipulated rates.1 These scale tion alike. In natural speech, rate variation tends to
values were chosen because the resulting speech still be reflected more in changes of vowel duration than
sounded natural and the acoustic properties remained in consonant duration. However, the rate manipula-
more or less within the normal range of values. (See tion used resulted in reasonably natural-sounding
Hanley, 1951; Mysak, 19S9; Peterson & Barney, speech.
19S2; and Terango, 1966, for data on pitch and It may not be immediately apparent why 70%
Goldman Eisler, 1968, for normative speech rate compression of an utterance's time base results in
data.) The mean premanipulation fundamental fre- a speech rate that is 143% of the original. Con-
quency and speech rate for our speakers were 109.79 sider a response consisting of » syllables spoken over s
Hz (SD = 14.82) and 3.27 syllables/sec (SD = .66). seconds of time; that utterance's rate would be n/s
Following manipulation, all utterances were resyn- syllables/sec. Seventy-percent compression changes
thesized to produce a stimulus audiotape of the the rate to n/(.1s), or 1.43 n/s—a faster speech rate
answers to the quota question (in one random that is 143% of the original. Likewise, a 130% ex-
order) and a tape of the answers to the money pansion of an utterance's time base results in a
question (in a different order). The tapes were for- slower speech rate that is 77% of the original rate.
matted to allow IS sec of silence between each re- - One additional rater was run, but his data were
sponse. discarded after he expressed some suspicion regard-
Procedure, Twenty undergraduates, 14 males and ing possible splicing of the tapes. No other raters
6 females, were paid for their participation.2 Raters voiced any suspicion that the stimuli had been
were run in groups of three to seven. They were told altered.
that the purpose of the study was to determine how 3
There is some quality variation across speakers in
well people can tell, from the sound of the speaker's the LPC synthesis; in particular, speakers who tend
voice, whether someone is lying or telling the truth. to mumble and have a high degree of nasalization
They were informed that approximately half the seem to suffer the greatest quality degradation.
PITCH AND SPEECH RATE 719

Table 1
Premanipulation Characteristics of Stimulus Tapes

Quota tape Money tape

M SD M SD

Response length (syllables) 111.1 40.7 84.6 42.7 3.89**


Response time (sec) 36.8 13.8 25.6 13.2 4.68**
Speech rate (syllables/sec) 3.0 .50 3.4 .74 2.12*
Fundamental frequency (Hz) 112.2 14.9 107.4 14.6 5.60***

Note. N = 27 segments per tape.


* p < .05, two-tailed. ** p < .01, two-tailed. *** p < .001, two-tailed.

However, it seems more plausible to regard (questions) analysis of variance was per-
these differences as reflections of differences in formed on the mean truthfulness ratings with
our subjects' personal involvement with the repeated measures on the last factor. Pre-
two questions and/or the cognitive complex- manipulation speech rate and premanipula-
ity of the answers they called for (see Gold- tion fundamental frequency were used as co-
man Eisler, 1968, and Williams & Stevens, variates, and all means reported below have
1969). been adjusted for these covariates. A signifi-
To check the adequacy of the random as- cant main effect was found for the pitch ma-
signment, premanipulation speech rate (syl- nipulation, F"(3, 34) = 3.37, p < .05, and a
lables/sec) and average pitch were subjected marginally significant effect for rate, F"(3,51)
to analyses of variance (3 rate levels X 3 = 2.26, p < .10. The mean truthfulness rat-
pitch levels X 2 questions) with speakers ings for the three pitch conditions (going
nested within the rate and pitch factors. In from low to high pitch) were 4.62, 4.40, and
addition to the between-questions differences 4.09, respectively, indicating that lower pitch
already noted, the analysis of the premanip- enhanced credibility. The corresponding means
ulation pitch failed to reveal other significant for the rate manipulation were, going from
effects. However, there was a marginally sig- slow to fast rate, 4.10, 4.63, and 4.37; the
nificant difference in premanipulation speech unmanipulated rate was judged most credible
rate across the three assigned rate conditions, and the slow rate least credible.
F(2, 18) = 3.00, p < .08, primarily because No significant effect was found for the
of some slower speakers' random assignment
to the slow rate condition. Therefore, prior *To ensure that our subjects' ratings were not
to all further analyses, we adjusted the raw affected by variations in acoustic quality across our
data for covariation on speakers' premanipu- nine experimental conditions, we had six undergrad-
lation speech rate and fundamental frequency: uates rate the 54 recorded segments for intelligibility
and correlated these ratings with our subjects' ratings
The appropriate beta weights were derived of truthfulness. The two sets of ratings were not
from linear regression of the data collapsed significantly correlated (r— .15). Similarly, to ensure
across raters. This adjustment has the virtue that response content was well distributed across
of controlling for spurious rate and question conditions, we had 12 undergraduates rate from a
effects in the analyses of variance.'* Having transcript how pro- or antiquota (for the quota
question) or generous or selfish (for the money
thus transformed each dependent variable, we question) each response was. Their ratings were then
computed min-/? ratios (and their approxi- subjected to a 3 (pitch levels) X 3 (rate levels) anal-
mate degrees of freedom) for the analyses of ysis of variance. For neither question were significant
variance. The F" statistic we report (Winer, main effects or interactions found.
1971, pp. 375-378) treats both speakers and •' In cases where the min-F value (F") is marginal,
we will also report the conventional F ratios for
raters as random effects, permitting simul- raters (considering speakers as a fixed effect) and
taneous generalization over both groups.5 speakers (considering raters as a fixed effect). These
A 3 (pitch levels) X 3 (rate levels) X 2 values represent less conservative tests.
720 W. APPLE, L. STREETER, AND R. KRAUSS

5.5 While the rate effect was only marginally


LEGEND significant, it appears that the effect of rate
'QUOTA on truthfulness judgments is not linear;
= MONEY rather, the pattern is an inverted-U function,
c/> 5.0 with slower speech perceived as least credible.
(O
UJ
H These results are consistent with the findings
of Miller et al. (1976), who demonstrated
that a faster speaker was perceived as more
i
intelligent, knowledgeable, and objective than
a
UJ
a slower speaker. Since Miller et al. used only
two levels of speech rate, it is, of course, not
4.0 possible to establish the effect of intermediate
rates on credibility judgments in their study.
It is relevant, however, that Brown and co-
workers have reported a similar inverted-U
3.5 relationship between manipulated speech rate
LOW NORMAL H\GH
PITCH CONDITION and their "benevolence" dimension, on which
the adjective pair sincere-insincere loads sig-
Figure 1. Average rated truthfulness plotted as a nificantly (see Smith, Brown, Strong &
function of pitch condition (low, normal, high) for
each of the two question topics. Rencher, 1975). To the extent that ratings
of sincerity correspond to this study's truth-
Pitch X Rate interaction. However, there was fulness measure, the two results are consistent.
a nearly significant Pitch X Question effect, The most plausible explanation of the Ques-
F"(2, 32) = 3.02, p < .10; raters' F ( 2 , 38) tion X Pitch interaction (Figure 1) is that
= 10.25, p< .001; speakers' F ( 2 , 18) =3.84, listeners took question content into considera-
p < .05. For the quota question, truthfulness tion in making their truthfulness judgments:
ratings and pitch were curvilinearly related; When listening to a potentially "loaded" topic
for the money question, low-pitched speakers (quota question), raters were willing to call
were judged as most truthful. The two sets both low- and normal-pitched voices more
of means are shown in Figure 1 .° truthful than high-pitched voices. For the less
involving question (money), raters were will-
Discussion ing to call only low voices more truthful. This
The results demonstrate that the acoustic interaction argues for a pitch threshold, above
manipulations performed on the speech stimuli which deception is signaled to raters. Such a
affected judgments of truthfulness. Consistent threshold would interact with response con-
with the findings of Streeter et al. (1977), tent, so that for an emotionally involving
judges rated high-pitched voices as less truth- topic, it would be set higher than for a less
ful than lower pitched voices. Perhaps listen- involving topic. With an emotionally involv-
ers perceived high pitch to be an indication of ing topic, some of the vocally reflected stress
stress and attributed such stress to attempted can be attributed to the topic; given a non-
deception. The pitch manipulations used, al- involving topic, the high-pitch responses are
though not extreme enough to place voices likely to be attributed to attempted decep-
outside the normal male pitch range, were tion. However, since only two questions were
evidently large enough to produce attribu- used, further research is needed to test this
tions of lying from naive listeners. It will be content-attribution hypothesis.
recalled that the smaller, naturally occurring The truthfulness ratings reflect judgment
pitch increments accompanying deception in
the Streeter et al. experiment did not evoke 6
The corresponding analysis performed on the in-
such attributions, except in the filtered listen- telligibility ratings revealed a significant effect only
ing condition, in which verbal content was un- for the rate manipulation, with fastest speech suffer-
intelligible. ing greatest loss in intelligibility.
PITCH AND SPEECH RATE 721

processes that give rise to attributions of a and not agreement or disagreement with the con-
speaker's transient state. If listeners had not tent of the answer. Subjects were told that all
speakers had answered the questions truthfully.
been told that certain speakers were lying,
differences in the acoustically manipulated
variables might have been seen as enduring Results
vocal properties reflecting stable personal pre- Means of the nine scales were computed
dispositions. Such properties have been re- for each recorded segment, and the intercor-
ferred to by the linguist Trager (1958) as relation matrix was factor analyzed using
the "voice set" and involve "the physiologi- principal factoring followed by a varimax rota-
cal and physical peculiarities resulting in the tion. A three-factor solution accounted for
patterned identification of individuals as ... 84.5 % of the total variance and roughly cor-
persons of a certain sex, age, state of health, responded to the three dimensions of Osgood
body build, rhythm state" (p. 4). The voice et al.'s (19S7) semantic space. We chose
set, therefore, acts as a relatively permanent scales loading greater than .60 (in absolute
background against which transient vocal value) as representative of the respective fac-
changes are superimposed. In the absence of tors. Using that cutoff, Factor 1 consisted of
situational factors (e.g., the possibility that all three activity scales (slow-fast, cold-hot,
the speaker was lying) that could explain the and passive-active) as well as the strong-
voice qualities produced by our acoustic ma- weak scale; it accounted for 54.4% of the
nipulations, listeners would be likely to as- variance. Factor 2 was a pure evaluation di-
cribe such qualities to the voice set. How mension (with only the three evaluative scales
such variables affect person perception and loading appreciably: sour-sweet, awful-nice,
contribute to vocal stereotypes was explored in bad-good); it accounted for 20.2% of the
Experiment 2. variance. The third factor consisted of two
potency scales (thin-thick, small-large) and
Experiment 2 an activity scale (slow-fast); it accounted
for 9.9% of the variance.
Method Each rater's data were reduced to three
Stimulus materials. The quota and money tapes factor scores weighting the original scales by
from Experiment 1 were used. the factor loadings; the factor scores were ad-
Procedure. Eleven college students, nine males and justed for covariates (as in Experiment 1)
two females, were paid for their participation as
raters. The procedure was essentially the same as and entered into univariate analyses of vari-
that used in Experiment 1, with the following dif- ance of the same design as that used in Ex-
ferences: Raters were instructed that the study's periment 1.
purpose was to investigate how listeners form im- For Factor 1, a significant rate effect,
pressions of speakers from the things they say as F"(2, 32) = 10.49, p< .001, and a nearly
well as from the way they say them. Accordingly,
raters were told to focus both on content and de- significant Rate X Question interaction,
livery when making their judgments. F"(2, 31) =3.02, p<.lO, were obtained;
The speaker of each recorded segment was rated raters' F ( 2 , 20) = 9.72, p < .01; speakers'
on nine bipolar adjective pairs taken from the se-
mantic differential (Osgood, Suci, & Tannenbaum, F ( 2 , 18) =3.87, p< .05. The configuration
19S7); scales were chosen that had high loadings on of means is shown in Figure 2. In both cases,
one of Osgood et al.'s semantic space factors and slow speakers were perceived as less active.
relatively low loadings on the other two. The scales
for the evaluation factor were sour-sweet, awful-nice, No significant main effects of interactions
and bad-good. Scales for the potency factor were were found for Factor 2, For Factor 3, signifi-
thin-thick, small-large, and weak-strong. Those for cant main effects were found for rate, F" (2,3 6)
the activity factor were slow-fast, cold-hot, and
passive-active. = 12.87, p< .001, and pitch, F"(2, 33) =
A warning tone followed each recorded segment, 7.94, p < .01. In both cases the relationship
signaling judges to begin making the nine ratings on was monotonic, with increasing pitch and
7-point scales. (The second adjective of each pair
was scored as 7.) Instructions stressed that ratings rate resulting in judgments of decreasing po-
should reflect the listener's impression of the speaker tency. No other significant effects were found.
722 W. APPLE, L. STREETER, AND R. KRAUSS

12.5r (slow-fast) and potency ratings: Segments


with slower speech were rated as "weaker."
11.5 Not surprisingly, Scherer et al. also found
that segments with faster speech were heavily
10.5 loaded on the activity dimension.
Our findings are likewise consistent with
9.5 the results of Brown et al' (1974). Although
the methodological differences previously
8.5 noted preclude direct comparison, Brown
found that high fundamental frequency de-
75 creased competence ratings—a scale probably
related to our potency dimension.
6.5 The Rate X Question interaction (Figure
2) again suggests the influence that con-
5.5 _L _L tent exerts on raters' judgments: When speak-
SLOW NORMAL FAST
ers talked about quotas, ratings on the pas-
RATE CONDITION
sive-active dimension were linear with ma-
Figure 2. Averages for the activity factor plotted as
a function of rate condition (slow, normal, fast) nipulated rate; for the money question, they
and question topic. were not.

Discussion Experiment 3
These results extend the findings of Ex- Experiment 3 returned this work to the
periment 1 to judgments of more stable area of judgments of the speaker's affective
speaker dispositions. Men speaking in higher state. Impressions of nervousness, emphatic-
pitched voices were perceived as less potent ness, seriousness, fluency, and persuasiveness
(smaller, thinner, slower) and slow-speaking illustrate how these acoustic variables serve
men were perceived as more passive (slower, to convey a speaker's self-presentation under
colder, more passive, weaker) and more conditions in which raters believe that an-
potent. swers are being given honestly. These state
These findings are to some extent con- ratings (with the exception of persuasiveness)
sistent with correlational evidence provided were chosen because Krauss, Geller, and Olson
by Scherer, Koivumaki, and Rosenthal (Note 2) found significant correlations be-
(1972). In their experiment, listeners rated tween them and truthfulness ratings in a
taped segments, taken from a recorded play, previous study of deception interactions.
on semantic differential scales similar to the
ones used here, as well as on scales reflecting Method
the segments' acoustic properties (e.g., bass-
treble, soft-loud). Unlike the present study, Stimulus materials. The quota and money tapes
raters judged the emotion portrayed and not from Experiment 1 were used.
Procedure. Ten college-student subjects, two males
their impression of the speaker, and a variety and eight females, were paid to rate all segments on
of listening conditions were used to degrade five state variables: fluency, emphaticness, persua-
semantic content and prosodic features. siveness, nervousness, and seriousness. The rating
Nevertheless, Scherer et al. found a marginally scales spanned 7 points, with 1 indicating that the
significant relationship between pitch rating smallest amount of a variable was judged and 7 in-
dicating that the largest amount was judged. For
(bass-treble) and potency ratings (strong- example, anchors of the nervousness scale were "Not
weak) paralleling the main effect reported at all nervous" and "Very nervous." Subjects were
above: Lower pitched speech was placed to- encouraged to adopt their own criteria for all rat-
ward the "stronger" pole in the potency di- ings; no external standards were given. Instructions
were virtually identical to those used in Experi-
mension of the emotional-meaning space. In ment 2. Again, listeners were asked to take into
contrast to our results, they also found a consideration both the content of an answer and
marginal correlation between articulation rate the manner in which it was delivered.
PITCH AND SPEECH RATE 723

Table 2
Min-¥ Values for the Five State Variables

Rate (R) Pitch (P) Question (Q)


State variable effect effect XR QXP
Persuasiveness 3.93** 2.94* 3.54**
Fluency 6.03*** 4.07** 2.73*
Emphaticness 6.66*** 3.12** 3.33**
Nervousness 6.16*** 5.90***
Seriousness 4.60**

*p < .10. **/> < .OS.***p < .01.

All other procedural details were identical to quota question only the slowest group suffered
those of Experiment 2, except that this study was low ratings, while for the money question
run at Bell Laboratories, using students from a
number of colleges who were home on summer vaca- both the slow and fast groups received low
tion. ratings. The effect for seriousness judgments
was marginally significant, but of the same
Results shape.

Segment means were computed for all state Discussion


measures and analyzed using covariance anal- These results again demonstrate the effect
yses as described above. Table 2 summarizes acoustic variables have on person perception
the findings. Note that all variables with the processes. Decreasing speech rate has a par-
exception of seriousness show main effects ticularly deleterious effect on a speaker's per-
for the rate manipulation. These effects are ceived persuasiveness, fluency, and emphatic-
all of the inverted-U type with the normal ness. Similarly, increased pitch lowers ratings
(unmanipulated) speakers judged most fluent, of persuasiveness and increases greatly the
persuasive, and so forth, and slow speakers impression of nervousness.
judged lowest on these scales. (Ratings of Our findings also suggest that context plays
nervousness go in the direction opposite to a role in the attribution process, as evidenced
the other three scales.) by the question interactions. When a speaker
Only nervousness yielded a significant main
effect for pitch; the pitch manipulation was 5.5 LEGEND
• » EMPHATIC QUOTA
marginally significant for persuasiveness. D * EMPHATIC MONEY
Rated nervousness increased with higher 50 o O * SERIOUS QUOTA
• = SERIOUS MONEY
pitch, whereas rated persuasiveness decreased.
In addition, ratings of emphaticness and ce
seriousness showed significant Question X tt 4.5
Pitch interactions. The shape of these inter- i
<t v-
actions, shown in Figure 3, is quite similar to </> 4.0
UJ
the corresponding effect on judged truthful- z
o
ness (Figure 1). For the quota question, only S 3.5
the highest pitched group was "underrated"
on emphaticness and seriousness, whereas UJ
o 3.0
for the money question, both the normal- and
high-pitched groups were "underrated." The
fluency measure showed a comparable inter- LOW NORMAL HIGH
action. PITCH CONDITION
There were Question X Rate interactions Figure 3. Average emphaticness and seriousness rat-
for fluency, persuasiveness, and emphaticness. ings plotted as a function of pitch condition (low,
All interactions had a similar shape; for the normal, high) and question topic.
724 W. APPLE, L. STREETER, AND R. KRAUSS

answered the quota question, higher pitch our acoustic manipulations would be potent
levels could be discounted by attributing them enough to affect speaker state or trait at-
to the stressfulness of the topic. Similarly, tributions. However, it is evident that list-
it may have been inappropriate on the money eners may have, at least partially, attributed
topic to talk too slowly, (perhaps because of our acoustic alterations to the question topic.
the topic's relative simplicity) or too rapidly The quota question called for an answer that
(perhaps because rapid speech is perceived was both complex and emotionally involving
as an attempt to be inappropriately serious for our college-student speakers and raters.
or persuasive). On the other hand, rapid As Table 1 shows, before any manipulations
speech on the quota question might have been were done, the quota answers were longer and
attributed to the speaker's conviction regard- slower (suggesting greater cognitive complex-
ing his argument. ity; see Goldman Eisler, 1968) and higher
pitched (suggesting greater stressfulness; see
General Discussion Hecker et al., 1968). Because of this, slower
The three experiments taken as a whole and higher pitched answers might have been
provide clear evidence that acoustic properties perceived as more appropriate for quota re-
of a message have considerable impact on sponses than for money responses.' But even
judgments of a variety of state and trait vari- when premanipulation speech rate and pitch
ables. The impressions of high-pitched or were covaried, the question interactions re-
slow-talking speakers seem particularly nega- mained, supporting the discounting principle.
tive. For example, men with high-pitched For example, higher pitch levels seemed to
voices are judged less truthful, less persua- be discounted on ratings of truthfulness,
sive, weaker, and more nervous. Similarly, fluency, emphaticness, and seriousness for
slow-talking men are judged to be less truth- speakers answering the quota, but not the
ful, fluent, emphatic, serious, and persuasive, money, question.
and more passive, although they are also seen It should be noted that manipulating fun-
as more potent. damental frequency by multiplying by a scale
By using a large number of speakers and factor as we did has the effect of multiplying
factorially varying speech rate and pitch, we the variance of the fundamental frequency by
have a reasonable measure of confidence in the square of the scale factor. Thus, high-
the validity of our findings. However, since pitched segments were both high-pitched and
no female voices were used, we cannot gen- high pitch-variance segments, and vice versa.
eralize these results to women; it is conceiv- Thus, we cannot rule out the interpretation
able that these same acoustic variables would that the pitch effects observed could be pitch-
produce different effects on perceptions of variance effects. However, this interpretation
women speakers. seems unlikely considering the findings of
Message context, presumably mediated by Brown et al. (1974), who did the appropriate
the two question topics, was also demonstrated factorial experiment and found that increased
to influence attribution processes. However, mean fundamental frequency lowered judg-
since there was only one question of each ments of speakers' competence and benevo-
type, what follows must remain somewhat lence, while decreased variance also lowered
speculative. The question interactions sug- these ratings. Thus, it seems that average
gest an interpretation along the lines of Kel- pitch and pitch variance affect judgments on
ley's (1971) discounting principle. Kelley
suggests that the more factors a situation 7
Subsequently, in connection with another study,
contains—any one of which might plausibly we had 12 undergraduates rate a long list of po-
have resulted in an observed outcome—the tential interview topics (including the two used in
less likely is any one factor to be perceived as the present study) as to how stressful and how com-
plex they would be for a typical Columbia under-
the cause of that outcome. With fewer pos- graduate to discuss. Subjects judged the quota ques-
sible causes present, the cause-to-effect at- tion to be significantly more stressful and more com-
tribution is more compelling. We hoped that plex than the money question.
PITCH AND SPEECH RATE 725

a wide range of scales in opposite ways. In the from the Streeter et al. (1977) study lend
present study, in which both pitch and pitch plausibility to this argument. A marginally
variance were positively correlated, the pitch- significant interaction (p < .08) indicated
correlated variance could only have attenuated that subjects did, in fact, speak more slowly
the effects of average pitch. when lying than when telling the truth, pro-
Given that the experimental manipulations vided they had been given instructions engag-
used here affected the perception of speakers' ing their motivation to lie effectively. How-
personality and state, it remains to be ex- ever, speakers not receiving such instructions
plained why these particular data patterns spoke more rapidly when lying.
were observed. Miller et al. (1976), for ex- Judgments of what is or is not "rational-
ample, conclude that the effect of speech rate istic" are probably to a large extent matters
on message persuasiveness is mediated by the of personal preference, but it does seem to
effect that variable has on the perception of us that a listener would be ill-advised to ig-
a speaker's credibility. This, they assert, is a nore reliable information concerning a speak-
"less rationalistic view" of attitude change er's internal state in evaluating the speaker's
than other interpretations (e.g., change medi- message, especially when the internal state
ated by comprehension effects or counterargu- seems incongruent with the situation or with
ment disruption—two hypotheses their experi- the message's content.
ments ruled out). Furthermore, the significant Question X
Certainly such a conclusion would be jus- Manipulation interactions for ratings of state
tified if it were the case that variations in variables suggest some qualifications on the
voice quality bore no relation to the actual findings of Miller et al. (1976). Fast speakers
internal state or predisposition of the speaker. are not always more persuasive; talking too
However, there is considerable evidence that quickly in response to the money question
stressful situations do produce discernible produced lower persuasiveness ratings than
changes in voice quality (Fairbanks, 1940; did responses at a normal rate. Apparently,
Hecker et al., 1968; Williams & Stevens, listeners take more into account than meets
1969, 1972), and it seems reasonable to as- the ear—at least, more than simply the
sume that listeners in the present study used acoustic data.
such variations appropriately to infer a speak- Evidence of veridicality for vocally based
er's state from the quality of his speech. For attributions of enduring personality traits is
example, Hecker et al. demonstrated that less firm. We have found no reliable data to
task-induced stress raised the fundamental indicate that fast talkers actually are more
frequency of those speakers who did not talk active people or that higher pitched men are
more softly under stress. In our experi- weaker than their lower pitched counterparts.
ment, attributions of increased nervousness to Apart from studies of psychiatric patients,
higher-pitched speakers are quite "rational- much of the work in this area deals with the
istic," given the similarity of our pitch ma- traits of introversion-extraversion and domi-
nipulations to the effect of real-life stress. nance. Mallory and Miller (1958) found
Similarly, Streeter et al. (1977) demonstrated small but significant negative correlations be-
that pitch increments accompany deceptive tween judged pitch and rate and the domi-
responses; listeners' truthfulness judgments nance scale of the Bernreuter Personality In-
in Experiment 1 appropriately reflect this re- ventory. While these findings support our re-
lationship. sults on judged potency along the pitch
The effects of speech rate can be similarly dimension (Experiment 2), it is possible that
interpreted in light of Goldman Eisler's Mallory and Miller's acoustic judgments are
(1968) finding that rate and the cognitive either inaccurate or subject to biasing effects
complexity of the topic were negatively re- from other sources, since raters judged speak-
lated. Listeners may have assumed that lying ers in a live situation. Furthermore, our re-
increases speakers' cognitive load, resulting in sults for judged potency on the speech-rate
slower rates. Unpublished speech rate data variable appear opposite to Mallory and
726 W. APPLE, L. STREETER, AND R. KRAUSS

Miller's. A more recent study (Ramsay, 1966) manipulations of acoustical . parameters. Journal
found that the speech of subjects classified as of the Acoustical Society of America, 1973, 54,
29-35.
extraverts on the Eysenck Personality Inven- Brown, B. L., Strong, W. J., & Rencher, A. C. Fifty-
tory had longer unbroken phonation times four voices from two: The effects of simultaneous
and shorter silences than those of introverts manipulations of rate, mean fundamental fre-
across a variety of speaking tasks; no data on quency, and variance of fundamental frequency on
speech rates were presented. ratings of personality from speech. Journal of the
Acoustical Society of America, 1974, 55, 313-318.
The acoustic stimulus, of course, contains Fairbanks, G. Recent, experimental investigations of
more information than average pitch and rate. vocal pitch in speech. Journal of the Acoustical
In addition, there is sequential information Society of America, 1940,11, 457-466.
(provided by intonation contours and dura- Goldman Eisler, F. Psycholinguistics: Experiments in
tion pattern), loudness, and variability of spontaneous speech. London: Academic Press, 1968.
Hanley, T. D. An analysis of vocal frequency and
both pitch and loudness over time. There is duration characteristics of selected samples of
also voice quality information (e.g. "breathy" speech from three American dialect regions. Speech
or "raspy" voices) that may not be so readily Monographs, 1951, 18, 78-93.
specified in terms of physical parameters. All Hecker, M. H. L., Stevens, K. N., von Bismarck, G.,
of these factors can be expected to enter into & Williams, C. E. Manifestations of task-induced
stress in the acoustic speech signal. Journal of the
the person perception process via stereotypes Acoustical Society of America, 1968, 44, 993-1001.
with larger or smaller kernels of truth. Kelley, H. H. Attribution in social interaction. Mor-
On none of the measures we examined was ristown, N.J.: General Learning Press, 1971.
the Rate X Pitch interaction statistically sig- Kramer, E. Judgment of personal characteristics and
nificant. The median F interaction was 1.12— emotions from nonverbal properties of speech.
close to its expected value under the null hy- Psychological Bulletin, 1963, 60, 408-420.
Lay, C. H., & Burron, B. F. Perception of the per-
pothesis. This absence of interaction argues sonality of the hesitant speaker. Perceptual and
for an additive model, in which pitch and Motor Skills, 1968, 26, 951-956.
rate exert independent effects on listeners' Mallory, E. B., & Miller, V. R. A possible basis for
judgments. It remains to be seen whether the association of voice characteristics and per-
support for such a model will continue as the sonality traits. Speech Monographs, 1958, 25, 255-
role of additional vocal factors is explored. 260.
Miller, G. R., & Hewgill, M. A. The effect of varia-
tions in nonfluency on audience ratings of source
Reference Notes credibility. Quarterly Journal of Speech, 1964, 50,
36-44.
1. Nakatani, L. H. SYNLOG: An interactive system Miller, N., Maruyama, G., Beaber, R. J., & Valone,
for manipulating speech. Unpublished manuscript, K. Speed of speech and persuasion. Journal of Per-
Bell Laboratories, Murray Hill, N.J., 1976. sonality and Social Psychology, 1976, 34, 615-624.
2. Krauss, R. M., Geller, V., & Olson, C. T. Modal- Mysak, E. Pitch and duration characteristics of
ities and cues in the detection of deception. Paper older males. Journal of Speech and Hearing Re-
presented at the meeting of the American Psy- search, 1959, 2, 46-54.
chological Association. Washington, B.C., Sep- Nakatani, L. H. Computer-aided signal handling for
tember 1976. speech research. Journal of the Acoustical Society
of America, 1977, 61, 1057-1062.
References Osgood, C. E., Suci, G. J., & Tannenbaum, P. H.
The measurement of meaning. Urbana: University
Allport, G. W., & Cantril, H. Judging personality of Illinois Press, 1957.
from voice. Journal of Social Psychology, 1934, 5, Peterson, G. E., & Barney, H. L. Control methods
37-55. used in a study of the vowels. Journal of the
Atal, B. S., & Hanauer, S. L. Speech analysis and Acoustical Society of America, 1952, 24, 175-184.
synthesis by linear prediction of the speech wave. Ramsay, R. W. Personality and speech. Journal of
Journal of the Acoustical Society of America, 1971, Personality and Social Psychology, 1966, 4, 116-
SO, 637-655. 118.
Black, J. W. Relationships among fundamental fre- Scherer, K. R. Acoustic concomitants of emotional
quency, vocal sound pressure, and rate of speak- dimensions: Judging affect from synthesized tone
ing. Language and Speech, 1961, 4, 196-199. sequences. In S. Weitz (Ed.), Nonverbal com-
Brown, B. L., Strong, W. J., & Rencher, A. C. Per- munication. New York: Oxford University Press,
ceptions of personality from speech: Effects of 1974.
PITCH AND SPEECH RATE 727

Scherer, K. R., Koivumaki, J., & Rosenthal, R dimension. Journal of Speech and Hearing Re-
Minimal cues in the vocal communication of af- search, 1966, 9, 590-595.
fect: Judging emotions from content-masked Trager, G. L. Paralanguage: A first approximation.
speech. Journal of Psycholinguistic Research, 1972, Studies in Linguistics, 1958,13, 1-12.
7, 269-285. Williams, C. E., & Stevens, K. N. On determining the
Smith, B. L., Brown, B. L., Strong, W. J., & Rencher, emotional state of pilots during flight. An explora-
A. C. Effects of speech rate on personality percep- tory study. Aerospace Medicine, 1969, 40, 1369-
tion. Language and Speech, 197S, 18, 145-152. 1372.
Williams, C. E., & Stevens, K. N. Emotions and
Streeter, L. A., Krauss, R. M., Geller, V., Olson, C.,
speech: Some acoustical correlates. Journal of the
& Apple, W. Pitch changes during attempted de-
Acoustical Society of America, 1972, 52, 1238-1250.
ception. Journal of Personality and Social Psy-
Winer, B. J., Statistical principles in experimental de-
chology, 1977, 35, 345-350.
sign (2nd ed.), New York: McGraw-Hill, 1971.
Terango, L. Pitch and duration characteristics of the
oral reading of males on a masculinity-femininity Received January 18, 1978 •

Corrections to Koretzky, Kohn, and Jeger


In the article "Cross-Situational Consistency Among Problem Adolescents:
An Application of the Two-Factor Model" by Martin B. Koretzky, Martin
Kohn, and Abraham M. Jeger (Journal of Personality and Social Psychology,
1978, Vol. 36, No. 9, pp. 1054-1059), there are errors in two correlations re-
ported in the first paragraph on page 1058. The Factor I and Factor II scores
for the Koretzky (1976) study cited there should be reversed. The correct cor-
relation for Factor I is .40, and for Factor II it is .60. Thus, the sentence
should read, "Consistency correlations between classroom and residence settings
were even stronger in this experiment for Factor II (r = .60) and were also
respectable for Factor I (r — .40)."
Martin B. Koretzky's affiliation was erroneously given as the Veterans Ad-
ministration Hospital, Bronx, New York. His affiliation at the time of the
original research was the State University of New York at Stony Brook.

You might also like