You are on page 1of 7

Difference limens for fundamental frequency contours

in sentences
M.S. Harris
Psychology Department, Kean College of New Jersey, Union, New Jersey 07083
N. Umeda
Department of Linguistics, New York University, 10 Washington Place, New York, New York 10003
(Received 4 May 1984; accepted for publication 18 December 1986)
Difference limens (DLs) for fundamental frequency (Fo) of naturally spoken sentences were
studied. The experiments can be classified into two major categories. In the first category the
fundamental frequency of a portion of sentences of 2 to 3 s in duration was manipulated. The
second set of experiments used very short sentences ("The subject verb ") in which the Fo of the
entire sentence was manipulated. Across experiments, sentences of comparable length yielded
similar DLs, except when the Fo was abruptly shifted within a continuous voicing period.
However, the DLs did vary significantly as a function of stimulus complexity and speaker. The
range of DLs obtained in this series of experiments was between 10 and 50 times greater than
that found with sustained synthetic vowels.
PACS numbers: 43.71.Es, 43.66.Fe
INTRODUCTION
Numerous studies have investigated the sensitivity of
human listeners to differences in frequency of pure tones
(e.g., Harris, 1952; Rosenblith and Stevens, 1953). With
pure tones it is very easy to control all aspects of the stimuli.
Harris (1952) and others have shown that, for pure tones,
frequency difference limens of less than 1 Hz are not uncom-
mon. The actual DLs were dependent on the subjects used,
the methodology employed, and the "loudness level" (LL)
of the tones.
Generalization from pure tones to speech is, of course,
impossible. Flanagan and Saslow (1958), in a now classic
study, reported results of an experiment in which they mea-
sured the DL for fundamental frequency (Fo) of synthetic
vowel stimuli with steady Fo. The average DL for six highly
trained subjects across a number of vowels was 0.32 Hz.
They claimed that their listeners were able to make slightly
more acute discriminations of changes in Fo of vowels than
of pure tones of equivalent frequency and level. Klatt
(1973), using the synthetic vowel/e/, compared the DLs
for steady Fo, a ramp Fo, and a steep rate of Fo change, claim-
ing that the synthetic vowel with steady Fo did not have the
"dynamic qualities characteristic of speech" (p. 8). The DL
for the steady Fo was 0.3 Hz, which was very similar to that
found by Flanagan and Saslow. For the ramp Fo the DL was
2 Hz, and for the "steep rate of change of Fo" (32 Hz/s) the
DL was 4 Hz. Klatt assumed that the steep rate of change of
Fo was closer to speech in quality. The results from Klatt's
study suggest that with speechlike stimuli with changing F o,
the DL for Fo may be an order of magnitude higher than it is
for stimuli with steady Fo.
There are studies which have used more complex stimu-
li than those mentioned above, such as consonant-bound
vowels (Mermelstein, 1978) and multisyllabic number
words ('t Hart, 1974, 1981). However, no data from natu-
rally spoken sentences are available. The current experi-
ments were designed to fill this gap. Although we were aware
of the many difficulties involved in conducting a frequency
DL experiment using complex speech stimuli, we felt that
such an investigation was necessary because of the dearth of
relevant psychoacoustic data.
The current investigation consists of four experiments.
Untrained subjects were used in all of the experiments in
order to obtain a large number of subjects. In the first two
experiments, subjects discriminated changes in Fo of about
750-ms duration inside a sentence of approximately 2000-ms
duration. In the last two experiments, shorter sentences
(600-800 ms) were used, and the Fo of the entire sentence
was raised or lowered. Prior to these four experiments, a
pilot study was conducted to see if phonemic information
affected the DL for the Fo of a sentence. In this pilot study
subjects participated in two sessions, one with all linguistic
information in the sentence intact and the other with phone-
mic information of the same sentence destroyed by changing
the value of all LPC coefficients. No significant difference
was obtained between the results from the two sessions, and,
therefore, sentences with phonemic information intact were
used throughout the four experiments.
I. GENERAL METHODS
Different stimuli were used in each of the four experi-
ments in this study. Since the stimulus preparation proce-
dures were identical and the experimental procedures were
the same in all experiments, general procedures are dis-
cussed in this section. Table I summarizes the differences
between experiments.
The procedure for making stimuli was as follows: First,
sentences were recorded on analog tape and then digitized
on a Data General Eclipse computer. Fundamental frequen-
cy values of the sentence were obtained using a parallel pro-
1139 J. Acoust. Soc. Am. 81 (4), April 1987 0001-4966/87/041139-07500.80 @ 1987 Acoustical Society of America 1139
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 14.139.82.6 On: Tue, 05 Aug 2014 04:49:22
TABLE I. Summary of differences between experiments.
No. of No. of Sentence Portion of Fo Control of F o
Exp. sentences speakers length change change
Step of Fo No. of
change in Hz listeners
I 5 I(MH)
II 4 2(MH,PB)
III(a) 4 I(MH)
III(b) 1 3 (RC,DM,JW)
IV 1 4(MH,RC, DM,
JW)
2-3 s middle of the none
sentence
2-3 s middle of the change starts and ends
sentence at stop consonants
6-800 ms entire sentence none
6-800 ms entire sentence none
6-800 ms entire sentence none
5 2O
5 19
3 17
2 19
3 for MH, 19
2 for others
cessing technique (Gold and Rabiner, 1969). An LPC vo-
coder program (Atal and Hanauer, 1971) was used to
manipulate Fo values: The sentence was analyzed first and
then resynthesized with original or with shifted Fo values
(e.g., + 3 Hz, -- 10 Hz, etc.).
Only male speakers were used. A stimulus consisted of a
sentence presented twice, once with original Fo values and
the other time with shifted Fo values (including a zero shift
condition) for the designated portion of the sentence. In oth-
er words, the whole Fo contour during that portion was
transposed upward or downward by the specified amount.
The position of the standard sentence and the Fo-shifted
one in the pair was counterbalanced, so that the standard
sentence appeared in the first position the same number of
times as it did in the second position in a listening session.
The presentation of all of the stimuli in the session was in
random order.
Subjects in each experiment were either college or high
school students and were either paid or given course credit in
an introductory psychology class. All reported having no
hearing problems. They were tested in groups of five to seven
in a large Industrial Acoustics Corporation sound booth.
They were instructed to put on a set of headphones and told
that they would hear a certain number of pairs of sentences.
They were to judge for each pair whether the second sen-
tence contained any portion that was higher or lower in pitch
than the first. They were asked to guess if they were not sure.
The within-pair interval was 0.55 s, and the between-pair
interval was 2.55 s.
Prior to the actual test sentences, they heard several sen-
tence pairs in order to familiarize themselves with the type of
stimuli and the testing situation.
TABLE II. List of sentences and their DLs for experiment I. The portion
whose F o was changed is in italics. The DL is the 75% correct point on the
psychometric function.
DL(Hz)
Sentence Down Up
( 1 ) I am not going to get rid of my pet. 7.5 < 5
(2) We must all live together. 11.5 8
(3) It is an unusual situation, I admit. 14.5 16
(4) To begin with, pigs are very beautiful animals. 11.5 15
(5) The ordinary folk left no trace. 10 8.5
II. EXPERIMENTS WITH LONGER SENTENCES
A. Experiment I
L Method
a. Stimuli. A male speaker MH read a number of sen-
tences taken from a variety of texts. Five sentences were cho-
sen for use in this experiment; they are listed in Table II. The
Fo of a portion of each sentence (in parentheses) was
changed in 5-Hz steps in both increasing and decreasing di-
rections. Figure 1 is an example of the type of changes that
were made. There were four changes of 5 Hz each in the
upward direction and four changes in the downward direc-
tion from the original Fo contour. The dashed line above it
represents a 5-Hz parallel increase in the portion of the sen-
tence chosen for change. The dashed line below represents
the same change in the downward direction. The duration of
the changed portion was between 625 and 825 ms depending
upon the sentence. All changes were made in a medial por-
tion of the sentence that included a stressed word. Two stim-
ulus tapes were created on which pairs of sentences were
recorded in random order. Each possible pair appeared four
times 'on each tape, twice with the standard first and twice
with the standard second. Each tape contained 180 pairs of
sentences. The stimuli on each tape had a different random
order but were otherwise identical.
b. Procedure. Twenty college students were tested in two
200. ,
/ '
'1 ß " ...
/
75
50 THE •INARY FOLK
o soo
LEFT NO TRACE.
I
1500 2000
TItlE (ms)
FIG. 1. Illustration of the type of Fo contour changes made in the sentences.
The dashed lines represent 5-Hz upward and downward changes from the
original contour.
1140 J. Acoust. Soc. Am., Vol. 81, No. 4, April 1987 M.S. Harris and N. Umeda: Fundamental frequency contours 1140
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 14.139.82.6 On: Tue, 05 Aug 2014 04:49:22
sessions, half receiving tape 1 first and half receiving tape 2
first. The sessions were separated by 1 week.
The psychometric functions for both upward and down-
ward changes from the original Fo were plotted and the DLs
were determined from the functions. The DL was defined as
the 75% correct point (obtained by linear interpolation)
which has been traditionally used when the method of con-
stant stimuli is employed. The data were analyzed using a
two-way repeated-measures analysis of variance in which
the effect of the direction of change and differences among
sentences were examined.
The results of the first session were compared to those of
the second session using a t test in order to determine
whether there was any practice effect.
2. Results
Figure 2 shows the psychometric functions for the five
sentences. Each point in the figure represents the mean of 8
judgments for all 20 subjects (the mean of 160 judgments).
The five sentences and their respective DLs are shown in
Table II. The results of a 2 X 5 factorial analysis of variance
indicated that there was no significant effect of direction of
change. That is, there were no differences between the up-
ward and downward shifts of Fo. There were, however, sig-
nificant differences among sentences, F(4,76) = 10.38,
p <0.01.
The possibility of a practice effect was examined by
comparing subjects' percent correct scores of the first experi-
mental session with those of the second experimental ses-
sion. A t test for correlated measures was performed which
indicated that there was no significant difference, t(19) < 1.
B. Experiment II
A possible factor that resulted in DL differences among
sentences in experiment I might be the location of the end-
points of Fo changes in the sentence. That is, the beginning or
end of the Fo change occurring on a continuously voiced
portion may provide an extra cue, resulting in a smaller DL
than if the beginning or end abutted a devoiced portion.
Therefore, in this experiment, the beginning and the end of
the change occurred nominally during the silent portion of a
stop consonant in the sentence.
I. Method
a. Stimuli. The stimuli used in experiment II consisted
of four sentences (see Table III) none of which was used in
experiment I. The four sentences were read and recorded by
two speakers, MH and PB. MH was also the speaker in ex-
periment I. These recordings were subjected to exactly the
same analysis and synthesis procedures used in experiment I.
All four sentences were between 2 and 3 s in duration, and
the beginning and the end of the Fo change occurred nomi-
nally during the closure portion of a stop consonant inside
each sentence. The Fo change varied in duration depending
on the sentence, as in experiment I. By controlling the start-
ing and ending point of the changes, it was hoped that any
I__1 i#1#1#
4--
I I I,
-20 -15 -I0
'Coo •,•..8
. -- •e•q•e e e ø
CORRECT
I00--
90--
80--
70--
60--
50--
-5 0 5
I----' m#m#m#
: mmmmmm m'
•;=
4: s
** •**• •0•; 0• ø%*
• •o ø
I I I
IO 15 20
FIG. 2. Psychometric functions for the
up and down changes for each sentence
in experiment I.
FREQUENCY DIFFERENCE IN Hz
1141 J. Acoust. Soc. Am., Vol. 81, No. 4, April 1987 M.S. Harris and N. Umeda: Fundamental frequency contours 1141
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 14.139.82.6 On: Tue, 05 Aug 2014 04:49:22
450
4OO
350
300
250
200
150
I O0
50
500
I I i
1000 1500 2000
TIHE (ms)
FIG. 3. Fundamental frequency plot for a sentence "The water was buoyant
and cold" in experiment 2. Vertical lines indicate region of change.
go
8o
•m 70
• •o
!
4 0000000 e ß ß e
ß
_;. ...'- ',,,,.,,c---.';.•-...
'"•:½"-'""-'?" '- 4'
SPEAKER PB
I I I I i
3 6 9 12 15
FREQUENCY DIFFERENCE IN Hz
onset or offset cue (i.e., an abrupt change in continuous Fo
movement) would be eliminated. An example of a sentence
and the location of the Fo change are shown in Fig. 3.
b. Procedure. In this experiment, 3-Hz steps were used
because in one sentence of experiment I the DL was less than
5 Hz. There were 5 changes made, so that the greatest differ-
ence between stimuli was 15 Hz. In order to simplify the
procedure, no "down" condition was used, because the re-
sults of experiment I indicated that direction of change was
not a significant determinant of the DLs obtained.
Thirty-eight paid high school seniors listened to 192
pairs of sentences. Nineteen subjects heard speaker MH and
19 subjects heard speaker PB. Each possible pair was pre-
sented eight times in a counterbalanced random order.
Psychometric functions for both speakers were plotted
and the DLs were determined. The average percent correct
scores were analyzed using a 2 X 4 mixed design analysis of
variance to assess the effects of speakers and sentences.
2, Results
The psychometric functions for each speaker are shown
in Fig. 4. The most notable finding in comparing these two
sets of functions is the average percent correct for the two
speakers. In fact, retaining our operational definition of DL,
the curves for speaker PB do not reach threshold with the
exception of sentence 4. The DLs for speaker MH ranged
from 10.5 Hz to over 15 Hz (8% to 11% ofthe averageFo for
speaker MH). The mean of the DLs for MH was larger than
that found in experiment I.
The results of the analysis of variance indicated a signifi-
cant difference in the listeners' performance as a function of
speaker, F(1,36) = 5.67, p <0.05. There was no significant
difference between sentences and no significant interaction
effect. The sentences and the obtained DLs for both speakers
are shown in Table III. The lack of significant differences
between sentences was probably due to the control exercised
in the creation of the stimuli used in this experiment.
C. Discussion of experiments I and II
The most significant finding in experiments I and II was
the magnitude of the DLs. The range of DLs for the five
8O
7O
/
/
SPEAKER HH
ol 3 6 9 12 15
FREQUENCY DIFFERENCE IN Hz
FIG. 4. Psychometric functions for sentences of experiment II. Functions
are shown for results of both speakers.
sentences in experiment I was from 5 to 16 Hz and was even
higher ( 10 to 15 Hz for MH and higher than 15 Hz for PB)
in experiment II. That is to say, our subjects exhibited DLs
for natural sentences that were 20 times larger than those for
steady synthetic vowels with comparable Fo (Flanagan and
Saslow, 1958) and two to four times larger than those for
steady synthetic vowels with linearly descending Fo (Klatt,
1973). The most characteristic feature of natural speech is
that Fo never stays in a steady state or changes linearly.
Rather, it has small perturbations within a large range of
TABLE III. Sentences used in experiment II and their respective DLs for
both speakers. The portion whose Fo was changed is in italics.
D L in Hz
Sentence MH PB
( 1 ) Then in the quiet water I turned and floated.
(2) Whence came this extraordinary people.
(3) He turned back to his map in irritation.
(4) The water was buoyant and cold.
> 15 > 15
10.5 > 15
11.25 > 15
12.75 12
1142 J. Acoust. Soc. Am., Vol. 81, No. 4, April 1987 M.S. Harris and N. Umeda: Fundamental frequency contours 1142
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 14.139.82.6 On: Tue, 05 Aug 2014 04:49:22
movements. Therefore, it is reasonable to assume that sub-
jects are less sensitive to Fo differences in naturally spoken
sentences than to linearly changing Fo in synthetic vowels. It
must be remembered, however, that the subjects used in our
experiments were untrained, in contrast to the subjects of
earlier studies.
Another interpretation for the large DLs in our study is
that the subjects were not told where in the sentence they
were supposed to compare the height of Fo in the pair of
stimuli and, therefore, they had no obvious place in the sen-
tence to anchor their judgment. The larger DLs of experi-
ment II suggest that this interpretation is plausible. That is,
the onset and offset of Fo manipulation in the sentence were
controlled in experiment II but not in experiment I. In other
words, some of the sentences used in experiment I contained
an extra cue for the location of onset or offset of Fo manipula-
tion, this being the discontinuity in Fo contour. This cue was
absent in all sentences in experiment II. The difference may
have made the DLs for speaker MH in experiment II higher
than those obtained in experiment I.
The speaker difference in experiment II was large, and is
statistically significant. This finding poses additional ,ques-
tions. Is this difference due to the difference in acoustic sig-
nals attributable to the speaker or to the analysis and synthe-
sis technique employed in the study? MH's voice had rather
monotonous pitch, while in PB's voice, pitch varied in a larg-
er range. At the same time, the LPC process is known to
create distortion, and the degree of distortion is speaker de-
pendent. In fact, MH's voice after the LPC process retained
its original quality better than PB's voice.
The experiments described in the following sections
were concerned with the questions raised in experiments I
and II. Experiment III (a) was designed to answer the ques-
tion of whether simpler stimuli would yield smaller DL val-
ues, and experiment III(b) addressed the question of
whether the speaker difference was reliable.
III. EXPERIMENTS WITH SHORTER SENTENCES
A. Experiment III
1. Experiment Ill(a)
a. Stimuli. In the current experiment, changes in Fo that
encompassed the entire sentence served as the stimuli. The
sentences were short and composed of three monosyllabic
words, that is "the," a noun as subject, and a verb (e.g., "the
train moved"). Four such simple sentences were read by one
speaker, MH. The procedure for the preparation of the stim-
uli was identical to the previous experiments. The Fo of the
entire sentence was changed in 3-Hz steps in the increasing
direction only. There were a total of five steps.
b. Procedure. Seventeen high school students participat-
ed in the experiments. Each subject judged 288 pairs of sen-
tences. The data were analyzed using an analysis of variance
to examine if there is a significant difference between sen-
tences.
c. Results. Figure 5 shows the psychometric functions of
the four sentences for speaker MH. An average DL of 4.8 Hz
was obtained. This value is far smaller than any DL obtained
in experiments I and II. Analysis of the results also revealed
no significant differences between the sentences.
o
3 6 9 12 15
FREQUENCY DIFFERENCE IN Hz
FIG. 5. Psychometric functions for speaker MH in experiment III (a).
2. Experiment Ill(b)
a. Stimuli. Since there was no statistically significant
difference in the DLs of the four sentences, one of the four,
"the boy talked," was used in this experiment. Three new
male speakers were used. The experimental procedure was
identical ' to experiment III(a) except that the Fo was
changed in 2-Hz steps. The maximum range of Fo change
was 10 Hz.
A stimulus tape consisting of pairs of sentences was
created for each of the three speakers. Each possible pair
appeared 24 times on each tape. Each tape contained 144
pairs of the sentence.
b. Procedure. Nineteen college students were tested over
three sessions during each of which they heard the tape of
one speaker. The order of administration of the tapes was
counterbalanced so that each third of the listeners heard a
different order of speakers. The experiment was conducted
over a 3-week period. The listeners heard one speaker each
week.
The DLs were analyzed using an analysis of variance to
examine the hypothesis that there is a significant difference
in DL between speakers. The psychometric function for each
speaker was plotted and the DLs were determined from
those functions.
c. Results. Figure 6 shows the'psychometric functions
for the three speakers. Each point in the figure represents the
mean of 24 judgments for all 19 listeners (the mean of 456
judgments). It is clear from inspection of the figure that the
performance of the listeners varied as a function of speakers.
The difference in DL between speakers was statistically sig-
nificant, F(2,54) = 9.84, p < 0.01. Individual comparisons
between the mean DLs for the three speakers showed that
the overall significance was due to the difference between
speaker JW and the other two speakers. The DL for speaker
JW was 3.25 Hz, for RC the DL was 7.0 Hz, and for DM it
was greater than 10 Hz.
The results of this experiment show a large difference in
listeners' ability to judge fundamental frequency changes in
a sentence as a function of the speaker. It is interesting to
note that if the data from experiment III (a) were included,
the findings would be even more intriguing. In experiment
III (a), in which the same sentence was one of the stimuli
1143 J. Acoust. Soc. Am., Vol. 81, No. 4, April 1987 M.S. Harris and N. Umeda: Fundamental frequency contours 1143
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 14.139.82.6 On: Tue, 05 Aug 2014 04:49:22
ioo
9o
60-
50-
40-
$0-
eee ee eeeee•eeee
JW
RC ß
DM .........
0 2 4 6 B I0
FREQUENCY DIFFERENCE IN Hz
FIG. 6. Psychometric functions for the three speakers in experiment III (b).
used, the DLs for speaker MH were 4 Hz for that sentence.
Examination of the results for all speakers showed fairly
acute frequency discrimination for two of the speakers (JW,
3.25 Hz and MH, 4.0 Hz). The other two showed substan-
tially poorer discrimination (7.0 Hz for RC and greater than
10 Hz for DM). The question that arises is: What is there in
the speech of a particular speaker that makes it easy or diffi-
cult for a listener to detect F o changes? It is possible that
there is some acoustic characteristic that is common to the
speech of MH and JW that is not present in the speech of the
other two speakers or vice versa. Further investigation is
required to determine what this characteristic might be.
B. Experiment IV
The fourth experiment was designed to determine the
effect of the computer processing system on the DL for the
speakers used in experiment III. The stimuli used in experi-
ments III (a) and (b) were further processed using an algo-
rithm for spectral correction of LPC processed speech devel-
oped by Malah (1981, 1982). The testing procedure was
identical to that described above. The two experiments were
compared in order to see if there were significant DL differ-
ences as a function of the processing method. Nineteen paid
volunteers listened to these stimuli. These listeners had not
participated in any of the previous experiments.
a. Results. The mean DL values for each speaker were
compared across experiments III and IV. The t tests were
performed in order to determine if any significant change in
DLs occurred as a function of the processing method. The
only significant difference was for speaker RC. For his
speech, with the spectral correction method, there was a
considerable decrease in DL, t(35) = 2.01, p < 0.05. For the
spectrally corrected speech of experiment IV, the DL for
speaker DM was significantly higher than for either of the
other two speakers. (MH's results could not be included in
the across speaker statistical comparisons, because the stim-
uli of his speech were changed in 3-Hz steps, and that of the
other three speakers in 2-Hz steps.)
C. Discussion of experiments III and IV
The results obtained from experiments III and IV are as
follows: (1) A significant speaker difference in Fo DL was
observed; and (2) the spectral correction of LPC processed
speech (Malah, 1981, 1982) decreased the DL for only one
of our four speakers.
Malah's technique of restoring the spectral envelope of
LPC-processed speech (i.e., restoring higher frequency
components) brings back the quality of the original voice to
an extent, but not to a satisfactory degree. It is premature to
conclude that the decrease in DL values in RC's speech, in
the spectrally corrected version, is the consequence of the
restoration of higher frequency components (this conclu-
sion would of necessity imply that higher frequency compo-
nents play an active role in the judgment of Fo). Since there is
no convenient technique to manipulate Fo contour of spoken
sentences other than the LPC vocoder, at present the ques-
tion concerning the influence of voice quality on the DL.
judgments remains unanswered.
Large differences in D L values among speakers may
also be attributed to the difference in acoustic characteristics
of their speech. The Fo contours display large differences
among our speakers. All of them display a rise immediately
after/b/in "boy" until near the end of the vowel, and a fall
through the word "talked." Table IV summarizes the aver-
age Fo, the amount and the ratio of the rise and the fall,
together with the length of the utterance (excluding/kt/at
the end of the sentence), and the F o range for each speaker.
The rise for any speaker is far less steep than the fall. How-
ever, the rise is nearly twice (for MH) to more than six times
(for DM) steeper than the steep Fo change ( 32 Hz/s) for a
synthetic vowel in Klatt's (1973) study. The surprising fact
TABLE IV. Fundamental frequency characteristics of the four talkers in the utterance "The boy talked."
[ •t ] in BOY
Lowest Highest
Average Fo in the Fo in the Amount of
Talker Fo (Hz) vowel (Hz) vowel (Hz) rise (Hz)
MH 126 126 134 8
JW 125 121 146 25
RC 126 129 143 14
DM 119 104 139 35
[ • ] in TALKED
Duration of Duration of ß
the vowel Highest Fo Lowest Fo Amount of the vowel Total Fo
(s) (Hz) (Hz) fall (Hz) (s) range (Hz)
0.14 142 115 27 0.16 27
0.25 137 105 32 0.18 41
0.17 141 102 39 0.24 41
0.18 147 96 51 0.18 51
1144 J. Acoust. Soc. Am., Vol. 81, No. 4, April 1987 M.S. Harris and N. Umeda: Fundamental frequency contours 1144
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 14.139.82.6 On: Tue, 05 Aug 2014 04:49:22
is that the DLs of our two speakers (MH and JW) with this
short simple sentence exhibit values as small as those in
Klatt's study.
The hypothesis of't Hart ( 1981 ), that the size of the DL
is relative to the average fundamental frequency, might be
true if the ratio of the average fundamental frequencies is as
large as or larger than 2 to 1. However, we have to reject the
hypothesis when the difference in the average fundamental
frequencies between speakers is small. Our result shows that
the speaker with the lowest average Fo demonstrated the
largest DL, and that the DL value varies with the same aver-
age Fo. Since Fo changes its direction and amount constantly
in the sentence stimulus, it is impossible to tell if linguistic
information helps listeners' judgments, or if listeners tried to
anchor their judgments at the point where Fo change be-
comes minimal (for example, at the point where Fo changes
from rise to fall). In other words, no simple comparison
between the DLs of uniform stimuli, such as those in Klatt's
study, and DLs of changing stimuli, as in the present study,
can be made.
In any case, it seems reasonable to assume that, for sen-
tence-level stimuli, the ease of Fo change judgments is in-
versely associated with the amount .(or steepness) of Fo per-
turbation within the stimulus. Examination of Table IV
shows that this is true in most cases. One exception is that
MH and JW exhibit very similar DLs, though MH's Fo per-
turbation is smaller than JW's. This may be explained by two
facts. First, JW is the slowest speaker and MH is a rather fast
speaker. Listeners may have had more time in judging JW's
reading than MH's. (If this assumption is true, DM's read-
ing contains a twofold difficulty--the Fo change is the lar-
gest and the utterance is the shortest.) Second, MH's D Ls
may, in fact, have been smaller. His utterances were pro-
cessed in 3-Hz steps and many of them received D Ls less
than 3 Hz.
Results of Fo DL experiments for sentence stimuli can-
not be directly compared with those obtained from experi-
ments with simpler stimuli. However, it is interesting to note
that some of our results--with the simplest stimuli (i.e.,
three-word sentences) by some speakers whose Fo excursion
was relatively small--yielded DLs comparable with Klatt's
study, which used a synthetic vowel with steep Fo move-
ments.
Little can be said about how linguistic information plays
a role in DL judgments. For example, we assumed that Fo of
stressed syllables, at the sentence level, may have received
more attention by listeners than unimportant syllables. But
more complex stimuli than those used in our experiments
will be required to test this assumption.
ACKNOWLEDGMENT
This work was carried out at Bell Laboratories, Murray
Hill, NJ, with the assistance of Ann-Marie Quinn.
Atal, B. S., and Hanauer, S. L. (1971). "Speech analysis and synthesis by
linear prediction of the speech wave," J. Acoust. Soc. Am. 50, 637-655.
Flanagan, J. L., and Saslow, M. G. (1958). "Pitch discrimination for syn-
thetic vowels," J. Acoust. Soc. Am. 30, 435-442.
Gold, B., and Rabiner, L. R. (1969). "Parallel processing techniques for
estimating pitch periods of speech in the time domain," J. Acoust. Soc.
Am. 46, 442-448.
Harris, J. D. (1952). "Pitch discrimination," J. Acoust. Soc. Am. 24, 750-
755.
Klatt, D. H. (1973). "Discrimination of fundamental frequency contours
in synthetic speech: implications for models of speech perception," J.
Acoust. Soc. Am. 53, 8-16. /•
Malah, D. (1981). "Efficient spectral matching of the LPC residual sig-
nal," Proc. IEEE ICASSP 3, 1288-1291.
Malah, D. (1982). "Cepstral residual vocoder for improved quality trans-
mission at 4-8 K bits," Proc. IEEE ICASSP 1, 622-625.
Mermelstein, P. (1978). "Difference limens for formant frequencies of
steady-state and consonant-bound vowels," J. Acoust. Soc. Am. 63, 572-
580.
Rosenblith, W. A., and Stevens, K. N. (1953). "On the DL for frequency,"
J. Acoust. Soc. Am. 25, 980-985.
't Hart, J. (1974). "Discriminability of the size of pitch movements in
speech," IPO Ann. Prog. Rep. 9, 56-63.
't Hart, J. (1981). "Differential sensitivity to pitch distance, particularly in
speech," J. Acoust. Soc. Am. 69, 811-821.
1145 J. Acoust. Soc. Am., Vol. 81, No. 4, April 1987 M.S. Harris and N. Umeda: Fundamental frequency contours 1145
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 14.139.82.6 On: Tue, 05 Aug 2014 04:49:22