You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/2935506

Unknown

Article · June 2003


Source: CiteSeer

CITATIONS READS

0 383

1 author:

Snefrid Holm
Norwegian University of Science and Technology
5 PUBLICATIONS 17 CITATIONS

SEE PROFILE

All content following this page was uploaded by Snefrid Holm on 03 April 2019.

The user has requested enhancement of the downloaded file.


Umeå University, Department of Philosophy and Linguistics
PHONUM 9 (2003), 157-160
Available online at http://www.ling.umu.se/fonetik2003/

Individual use of acoustic parameters in read and


spontaneous speech
Snefrid Holm
Department of Language and Communication Studies, Norwegian University of Science and
Technology

This study investigates acoustic differences between read and spontaneous


speech. The parameters investigated are mean f0 , f0-range, intensity-range
and formants. Two slightly different types of speech material both show that
speakers use different parameters when changing from read to spontaneous
speech. Speakers also use the same parameters in different ways.

1. Introduction

The manner in which we speak constantly varies as a result of many factors, for instance the
degree of formality in the situation, the knowledge the speaker has about the listener(s) and
wishes on the side of the speaker to make a particular impression upon the listener(s)
(Eskénazi, 1993). These different manners in which to speak are called speaking styles.
The two speaking styles which have been investigated most often are read and
spontaneous speech. Read and spontaneous speech are often perceptually very distinct from
each other and there must therefore be measurable differences between them. Parameters that
have been shown to vary between these two speaking styles are of syntactic, semantic,
acoustic and paralinguistic nature. The aim of this study is to investigate acoustic differences
between read and spontaneous speech.

2. Experimental procedure

2.1. Speech material


Two different types of speech material have been investigated. Part A consisted of
recordings of spoken telephone-numbers. There were forty-one speakers older than eight
years of age and of both sexes. The speakers were from the northern, western and eastern
parts of Norway. The speech material had been collected by the Norwegian Telecom (Amdal
& Ljøen, 1995) and was used by permission. Each speaker telephoned an operator and first
read a list of twelve telephone numbers (lists had been sent the speakers beforehand), then
pretended to use an automatic telephone service in order to be transferred to two numbers
that he knew by heart. This kind of spontaneous speech can thus be sub-labeled memorized
spontaneous speech. Between ten and twelve read utterances as well as two spontaneous
utterances were recorded from each of the forty-one speakers. Files were digitized with a
sampling rate of 8kHz and bandwidth of 300-3400Hz.
158 Snefrid Holm
Part B consisted of recordings of two dialogs. Each dialog was a discussion about a pizza-
menu between two people who knew each other. The spontaneous speech in Part B could be
sub-labeled conversational spontaneous speech. The two dialogs were orthographically
transcribed and each of the participants later returned to read his own lines. Seven read and
seven spontaneous utterances were obtained from each of the four speakers. Files were
digitized with a sampling rate of 44,1kHz.

2.2. Parameters and method


The parameters investigated were f0-mean, f0-range, intensity-range and formants (F1 and
F2). The program Praat (Boersma & Weenink, 2002) was used for measurements of f0 and
intensity. The program Signalyze (Keller, 1994) was used for measuring formants.
In order to investigate f0-range, the pitch-curve of each utterance was extracted and the
smallest f0-value was subtracted from the largest f0-value. f0-mean is the mean value for an
entire utterance (f0-mean and f0-range were measured in semitones). Intensity-range (in dB)
was measured by displaying the intensity-curve of each utterance. Then the intensity at the
midpoint of each vocalic syllable-peak was measured and the smallest intensity-value was
subtracted from the greatest intensity-value. Formants (F1 and F2 in Hz) were measured in
selected vowels. The formants were measured at the midpoint of the target of the vowel as
judged from the formant transitions in a spectrogram.

3. Results

3.1. Across speakers


The acoustic parameters were compared between the read and spontaneous speech. The
results are shown in Table 1. The f0-mean was significantly greater in spontaneous speech.
This effect was present only in Part B and only for the group of speakers as a whole and for
the subgroup women, not for the men separately. f0-range was greatest in spontaneous
speech, but only for the subgroup women in Part A as well as for the group of speakers as a
whole in Part B. The parameter of intensity-range was greatest in spontaneous speech for the
group of speakers as a whole in Part A but not for the speakers in groups according to
gender. As to formants there were no significant effects at all.
Thus effects were observed only for some parameters and for some subcategories of
speakers. This was true for both Part A and Part B. Moreover the effects were not consistent
across the two types of material.

Table 1. Effects of speaking style on the parameters investigated in both Part A and Part B. t-
tests for correlated samples. > parameter greatest in spontaneous speech, < parameter greatest
in read speech. * p ≤ 0.05, ** p ≤ 0.01 and *** p ≤ 0.001. Empty space= no significant
difference.
Part A Part B
Speakers All Women Men All Women Men
f0-mean > *** > ***
f0-range >* >*
Intensity-range > *
Formants
Individual use of acoustic parameters in read and spontaneous speech 159
3.2. Within speakers
For each speaker and for each parameter there is a mean in read speech (MR) and a mean in
spontaneous speech (MS). If a speaker’s MS is greater than his MR, then he has a tendency
for greater value in spontaneous speech. However, he may still have some read utterances
that go against this tendency. If this speaker has no read utterances with greater value than
his MS, then he is perfectly consistent in his tendency for greater value in spontaneous
speech. If the speaker has two read utterances (out of ten or twelve) with greater value than
his MS, then he is not perfectly consistent, but he still has a high degree of consistency.
Strong consistency must mean that the speaker uses the parameter in question as part of
his strategy for changing between read and spontaneous speech. It is convenient to
differentiate between consistent and non-consistent speakers. For this purpose I set an
arbitrary boundary for consistency at two exceptions from the speaker’s own tendency. This
means that speakers with no more than two utterances against their own tendencies are
regarded as consistent. With this criterion for consistency, almost every speaker (Part A:
forty out of forty-one. Part B: all four speakers) was consistent and thus had a strategy
regarding at least one of the investigated parameters (except for formants).
Figure 1 show the individual strategies regarding the parameter f0-mean in Part A. The
pattern in this figure is highly representative for all parameters and for the speech material in
both Part A and Part B.

1,8
1,6

1,4
1,2

1
0,8

0,6
0,4

0,2
2 1 3 4 3 1 1 3 4 9 3 6 1
0
6 excep. 5 excep. 4 excep. 3 excep. 2 excep. 1 excep. 0 excep.
number of exceptions from tend

Figure 1. Individual strategies regarding f0-mean in Part A. Shaded bars= MR greatest.


Empty bars= MS greatest. Numbers in bars show number of speakers with the given amount
of exceptions from their own tendencies.

Figure 1 shows that consistent speakers (bars to the right) have greatest parameter-value in
either read or in spontaneous speech. The direction of the tendency thus seems to be
unimportant. The consistent speakers also have great differences between the speaking styles
as compared to the non-consistent speakers.
160 Snefrid Holm
4. Discussion and conclusion

This study has shown that the speakers use different strategies when changing between read
and spontaneous speech. Individual strategies imply that a) different speakers use different
parameters and b) different speakers use the same parameters in different ways when
changing between the speaking styles.
It was also found that consistent speakers had great differences between their MS and
MR. Inconsistency will necessarily yield small difference between the styles, but it is not a
mathematical necessity that consistency will yield great difference between the styles.
Consistent speakers could have had either large or small differences between the read and
the spontaneous speech. That the differences between the speaking styles in the consistent
speakers are large perhaps means that the parameters are used in such a way that the
difference between the read and spontaneous speech becomes perceptually distinct.
Although the parameters investigated in this study may be used individually, many other
parameters are of course used in a more general manner. Through informal listening to the
recordings, I got the impression that segments have a much greater tendency to be shortened
as well as lengthened in spontaneous speech as compared to read speech.
Acoustic studies of read versus spontaneous speech have been conducted by many
researchers, but different studies tend to yield different conclusions. The confusing results
from previous studies may to some extent be due to a) the use of very few speakers, b) the
variation within the category of spontaneous speech and c) the supposition that the speakers
as a group behave in the same way. I know of only one other study that supports the theory
of individual strategies (Eskénazi, 1992). This study used six speakers.
None of the speech material used in this study was collected in order to investigate the
speakers individually. To further test the theory of individual strategies, it is necessary to
obtain a large number of data in both speaking styles for a large number of speakers.

5. References

Amdal, I. & Ljøen, H. (1995) TABU.0 - en norsk telefontaledatabase. Scientific Report, the
Norwegian Telecom.
Boersma, P. & Weenink, D. (2002) Praat - a system for doing phonetics by computer.
Eskénazi, M. (1992) Changing speech styles: strategies in read speech and casual and careful
spontaneous speech, Proceedings ICSLP 1992, 1, 755-758.
Eskénazi, M. (1993) Trends in speaking styles research, Proceedings Eurospeech 1993, 1,
501-505.
Keller, E. (1994) Signalyze- signal analysis for speech and sound.

View publication stats

You might also like