Acoustic Expression of Emotions in Vocal

Acoust. Sci. & Tech.
43, 3 (2022) #2022 The Acoustical Society of Japan
Acoustic expression of emotions in vocal performance:

Vibrato variability in emotional singing styles
Liu JieYing, Toru Kamekawa and Atsushi Marui
Graduate School of Music, Tokyo University of the Arts, 1–25–1 Senju, Adachi-ku, Tokyo, 120–0034 Japan
(Received 5 October 2021, Accepted for publication 22 January 2022)
Keywords: Singing, Opera, Emotional feature, Vibrato, Spectral centroid
1. Introduction different emotions and how a singer’s emotional interpretation

Opera singers adopt different singing styles to express affects acoustic parameters in the singing vocalizations. Are
a wide array of emotions and display acoustic features the resulting acoustic patterns of vibrato enough to reflect the
while the audience judges their emotive capabilities. Despite emotions of a singer?
significant breakthroughs in the synthesis of AI singing For this purpose, we collected twenty-four emotional
technology, the encoding emotional expression remains a (twenty emotions used in Saito and Nakamura [9] and added
topic that needs further discussion. When adopting famous four types of ‘‘Neutral’’ emotions) singing styles of notes. By
opera arias, the emotions in singing are usually closely related consulting Robinson [10], we divided these twenty-four
to the lyrics. However, a number of researchers have reported emotions into three groups, ‘‘Neutral (4),’’ ‘‘Positive (11),’’
that emotions can be judged by a single note [1]. Furthermore, and ‘‘Negative (9).’’ (Table 1)
some studies have proved that singers can perform emotions Regarding the selection of the acoustic features of note
such as ‘‘Anger, Fear, Joy, and Sadness’’ by virtue of the duration, vibrato onset delay, rate and extent, we referred to
differences in vibrato [2]. the research of Johnson-Read [11]. We also found the change
Vibrato has been studied in detail since Seashore [3] of spectral centroid during the analysis, and then we added it
conducted his first observation, including the vocal modu- as an acoustic feature.
lations’ rate (number of regular pulsations in pitch per second)
and extent (half the maximum to minimum fundamental 2. Method
frequency fluctuation in vibrato). His study showed that Design: There are two design methods for investigating
vibrato was a rapid (4 to 7 Hz) modulation of the pitch, found hypotheses. One is to recruit a singer or singers and request
in all opera singers, to varying degrees [4]. Another study by them to perform in some method, while the other is to use the
Prame [5] found that the average rate of frequency modulation existing recordings of respected commercial recording per-
(FM) of vibrato in singing was 6.5 Hz, and the extent range of formances. We decided to recruit a singer for the recording of
vibrato was 34 to 123 cents. the experiment, to know the performance methods and effects
Although some studies have been carried out to measure of different emotions with the same vocal cords and the same
the rate and extent of vibrato, there was a fairly rare scientific pitch. Because of interference factors such as the year of
understanding of the initiation time of vibrato (delay from practice and voice conditions of various singers possibly
onset of a stable note to onset of vibrato). However, it still reduce the control over the performance conditions of
requires more sufficient data to verify the relationships among different emotions. As illustrated in the previous chapter, we
the initiation time of vibrato, rate and extent of vibrato for investigated a total of twenty-four single continuous notes
singing, and emotional expression. Moreover, what consti- and conducted an acoustic analysis on them. There were four
tutes a ‘‘good’’ vibrato was still ambiguous, to some extent. types of neutral notes, nine different types of positive notes,
For example, according to some studies, ‘‘natural’’ use of and eleven different types of negative notes.
vibrato should fill the whole note [6], while some researchers Stimuli: Sound stimuli were recorded in Senju campus
argue that vibrato oscillation should be ‘‘lower than’’ the Studio B (Tokyo University of the Arts). ProTools HD 8 was
whole note [7]. used for recording/editing. The recording microphone was
Sundberg [8] investigated the relationships between unidirectional Neumann U87Ai, recorded at 48 kHz/24 bit
vibrato and pitch or volume, the relationships between and received via a mono microphone.
emotions and vibrato seemed worthy of more thorough One recruited singer (vocal department of the university,
analysis. A reasonable hypothesis would be that emotional Soprano singer, singing for twenty years) expressed complex
involvement influences vibrato. The purpose of this study is emotions. After one of the authors showed the singer twenty-
to analyse and compare the vibrato of each emotion using four emotional words (Table 1), the singer expressed twenty-
different emotional expressions, to verify whether the feature four emotional songs in syllables /la/ and scale of C major
quantity of vibrato is different when the singer expresses (one octave, C4 to C5) for about ten seconds. Before each
performance, the singer listened to the pitch of the piano to

keep it in tune. Then, the singing rhythm was adjusted
e-mail: liu miriam@yahoo.com.tw according to the metronome at the tempo (92 bpm). Finally,
[doi:10.1250/ast.43.201]
201
Acoust. Sci. & Tech. 43, 3 (2022)
after recording many times of each emotion, the singer Mean Energy Intensity(dB)[INTENSITY]: Energy in-
selected the most appropriate version of each emotion by tensity listed in PRAAT is the RMS amplitude of the signal,
listening and comparing. When singing major scale with which is related to (but different from) the perceived structure
notes, the singer could still adjust her performance in the first ‘‘loudness.’’ The Mean value was measured from the begin-
few whole tones, while the 8th whole tone was the most stable ning to the end of the note.
and the emotion was the most expressive, which was selected Mean Pitch(Hz)[PITCH]: The Mean pitch value was
to investigate the acoustic features and these selected notes measured from the beginning to the end of a note.
of twenty-four kinds of emotions were in the same position Spectral Centroid(Hz)[CENTROID]: It is calculated as
and comparable intonation. the weighted mean of the frequencies present in the signal,
Procedures: For acoustic measurement, the last note at where the weights are the normalized energy of each
the end of the continuous scale was analysed. The vibrato frequency component in that sub-band. It indicates the
was extracted using the spectrogram in PRAAT [12], with a location of the centroid of the spectrum, which is related to
sampling rate of 48 kHz and a Fast Fourier Transform (FFT) the brightness impression in perception.
window size for 2,048 samples. If the difference of adjacent
peaks in pitch is larger than a predefined threshold (set to 3. Results
6 Hz), then determined the position exhibits vibrato. The Table 1 shows acoustic features of twenty-four emotional
vibrato extent of Zhang [13] was used as the calculation voices. Vibrato M ¼ 5:45 Hz, SD ¼ 0:75, n ¼ 24 for
formula. ‘‘RATE’’ and M ¼ 64:5 cents, SD ¼ 30:90, n ¼ 24 for
Figure 1 shows the F0 of ‘‘Anger’’ when singing the end ‘‘EXTENT’’, the ‘‘RATE’’ ranges from 3.20 to 7.35 Hz,
of the note, where the position of note starting, the first vibrato and ‘‘EXTENT’’ ranges from 20 to 151 cents. Spectral
peak, and the last vibrato peak is illustrated. It also shows ‘‘CENTROID’’ (M ¼ 1;566:37 Hz, SD ¼ 6:64, n ¼ 24), the
vibrato interval, which is determined as the position of the red ranges from 923.6 to 2,110.9 Hz. The average ‘‘RATE’’ and
spot. We analysed the characteristic quantities of the follow- ‘‘EXTENT’’ of vibrato are similar to those of previous studies
ing seven acoustic features. The parentheses ( ) indicate unit. [3–5]. Among them, the lower ‘‘RATE’’ and ‘‘EXTENT’’
The square brackets [ ] indicate abbreviations. value of ‘‘Serenity’’ (3.20 Hz, 20 cents) may be caused by the
Note Duration(ms)[DURATION]: From the beginning longer ‘‘DELAY’’ of vibrato (1,641 ms). As shown in Fig. 2,
to the end of a note. the emotion of Serenity’s vibrato is unstable and the vibrato is
Vibrato Rate(Hz)[RATE]: The vibrato rate was mea- irregular or non-existent.
sured by identifying each complete vibrato period, which was The average vibrato ‘‘DURATION’’ of note is 2,151 ms,
composed of continuous peaks, and the total number of these and the average vibrato starting time (Onset ‘‘DELAY’’) is
periods was calculated and then divided by the duration of 595.5 ms. What was worthy of attention is that the notes
these periods, which produced a vibrato rate in hertz. ‘‘DURATION’’ of ‘‘Joy’’ segments is much longer (3,936 ms),
Vibrato Extent(cents)[EXTENT]: The vibrato extent and the corresponding vibrato starting time is 703 ms. When
was estimated by reading the difference between adjacent investigating the ratio of vibrato starting time to note duration,
peaks frequencies in the continuous vibrato area. In Fig. 1, the ratio of ‘‘Joy’’ is 703=3;936 ¼ 0:179, which means that the
the positions of the red points are determined as the peaks vibrato for the note ‘‘Joy’’ note began earlier than the average
(The mean value for vibrato rate and extent were measured starting time, also could be interpreted that the singer had
from the first peak until the final peak of the vibrato cycle). prolonged the singing in the emotion of ‘‘Joy.’’ The ratio of
In Eq. (1), pk is the extent value of each peak point, and average vibrato starting time to note duration is 0.269.
J is the total number of peak values. FM vibrato extent used The data analysis concerns the importance of seven
cent units. acoustic features of twenty-four emotions after a PCA
analysis. The biplot of the first and second principal
1 1 X J 1
components (varimax rotation) is shown in Fig. 3. The black
ExtentFM ¼ 1200 log2 jpkjþ1 pkj j ð1Þ
2 J 1 j¼1 points represent the distribution of each emotion, the arrows
represent the seven features’ radiation directions, and the
Vibrato Onset Delay(ms)[DELAY]: The delay was colour represents the degree of cos 2 (square cosine, squared
measured from the initiation of the note until the first coordinates). The ellipses represent different groupings.
conclusive peak of the vibrato cycle. The first principal component has a variance of 2.37,
explaining 33.9% (2.37/7) of the total variance. The second
principal component has a variance 27.9% (1.95/7) of the
total variance. More than 90.8% of the variance is contained
in the first four principal components.
The results show that ‘‘EXTENT-DELAY’’ is the first
component. The positive direction of the horizontal axis is
shown by ‘‘EXTENT,’’ whereas ‘‘DELAY’’ indicates the
‘‘Negative’’ direction. ‘‘INTENSITY’’ is the second compo-
nent. ‘‘INTENSITY’’ and ‘‘CENTROID’’ had a close sim-
ilarity. The expression of ‘‘Joy, Sadness, Anger, Rage’’ can be
Fig. 1 Vibrato interval from the emotion ‘‘Anger.’’ observed to a more significant extent. The ‘‘INTENSITY’’ is
202
J.Y. LIU et al.: ACOUSTIC EXPRESSION OF EMOTIONAL SINGING
Table 1 One-note analyses of singer’s duration, vibrato onset delay, rate and extent, intensity, pitch, and spectral centroid
for twenty-four kinds of emotions, which can be divided into three groups as neutral, positive and negative from top to
bottom in the table.
Duration Delay Rate Extent Intensity Pitch Centroid

Group Emotion
(ms) (ms) (Hz) (cents) (dB) (Hz) (Hz)
Neutral Expressionless 1,237 672 5.81 28 69.5 526.58 1,496.1
Serenity 1,941 1,641 3.20 20 71.9 530.26 1,130.7
Calm 2,819 1,163 4.80 32 66.6 525.52 1,210.1
Peace 2,005 191 5.37 33 70.2 522.54 1,546.0
Positive Cheerfulness 2,368 1,439 5.68 34 74.8 521.14 1,741.9
Lovesickness 2,219 1,254 4.95 32 61.8 524.88 1,211.3
Adoration 2,283 849 5.43 62 61.8 514.04 1,382.9
Thankfulness 2,325 763 5.05 52 72.9 532.16 1,598.3
Enjoyment 2,475 917 5.15 42 71.5 516.96 1,648.2
Jauntiness 2,901 1,341 5.26 57 74.7 526.1 1,713.9
Palpitation 2,016 445 5.43 68 71.7 534.6 1,572.8
Joy 3,936 703 5.00 77 72.7 530.6 1,820.4
Passion 2,432 744 5.43 96 72.3 523.1 1,692.2
Negative Envy 1,579 215 6.09 64 75.6 521.22 1,682.2
Anger 1,536 114 4.58 60 75.6 532.26 2,078.6
Annoyance 2,059 128 7.35 83 73.8 524.12 1,794.3
Fear 1,579 79 6.25 151 67.0 537.18 1,287.9
Rage 1,952 218 5.26 115 75.5 538.96 2,110.9
Pity 1,397 70 5.55 75 55.5 527.98 923.6
Apprehension 1,856 81 6.17 57 67.5 530.9 1,347.6
Glumly 1,515 256 5.74 63 77.3 525.86 1,793.6
Shame 1,739 411 6.32 45 62.7 511.04 1,290.5
Terror 2,453 241 5.95 95 66.5 531.72 1,693.7
Sadness 3,003 357 5.05 107 72.1 526.72 1,825.3
Means (SD) of all Voice 2,151.04 595.5 5.45 64.5 70.06 263.25 1,566.37
(595.99) (474.98) (0.75) (30.90) (5.31) (3.32) (288.41)
Fig. 2 ‘‘Vibrato’’ of twenty-four kinds of emotions.
203
Acoust. Sci. & Tech. 43, 3 (2022)
the vibrato parameters significantly change. It indicates that

the emotional involvement of the singer may influence
vibrato. We found that ‘‘DELAY’’ and ‘‘EXTENT’’ are
important features of the vibrato. As a variable overlooked in
previous studies, ‘‘DELAY’’ may be one of the skills used by
singers to arouse audience emotions when singing.
2. PCA analysis showed that two principal components
explain 61.8% of the total variance. ‘‘EXTENT-DELAY’’ was
the first component. ‘‘INTENSITY’’ was the second compo-
nent. The ‘‘Neutral’’ group was characterized by a longer
‘‘DELAY’’ time. Compared with the ‘‘Negative’’ group
and the ‘‘Positive’’ group, the vibrato ‘‘EXTENT’’ of the
‘‘Negative’’ group was bigger, and its vibrato onset
‘‘DELAY’’ time was earlier. The results suggest that there
is a correlation between the features of vibrato and the
expression of vocals.
As indicated by previous research, emotions can be
encoded using several acoustic features, supported by our
Fig. 3 The principal component analysis biplot indi-
research results. A good understanding of the components
cates the loading of each variable (arrows), correlation
of vocal expression can help singers and vocal teachers
and the scatter plot of the emotions (points). The length
of the arrows approximates the variance of each convey emotions more accurately in their singing. This result
variable, whereas the angels between the variables provides suggestions and directions for future study on
indicate their correlations. Dim1: the first principal vibrato. Future studies should aim to replicate results on
component 1. Dim2: the second principal compo- measuring changes in vibrato so as to synthesize sounds full
nent 2. of emotions.
References
stronger, and the ‘‘Neutral’’ group had a longer ‘‘DELAY’’ [1] M. Sherman, ‘‘Emotional character of the singing voice,’’ J.
and a narrower ‘‘EXTENT.’’ Exp. Psychol., 11, 495–497 (1928).
[2] J. Sundberg and T. D. Rossing, ‘‘The science of the singing
voice,’’ J. Acoust. Soc. Am., 87, 462–463 (1990).
4. Discussion
[3] C. E. Seashore, The Vibrato, Studies in the Psychology of
Figure 3 illustrates the relationship of seven acoustic Music (University of Iowa, Iowa City, 1932), pp. 30–37.
features. It shows that ‘‘EXTENT’’ and ‘‘PITCH’’ were [4] M. Baroni and L. Finarelli, ‘‘Emotions in spoken language and
positively correlated in the same direction. Thus, there are in vocal music,’’ Proc. 3rd Int. Conf. Music Perception and
opposing directions between ‘‘DELAY’’ and vibrato Cognition., pp. 343–345 (1994).
(‘‘EXTENT’’ and ‘‘RATE’’). This indicates that vibrato [5] E. Prame, ‘‘Measurement of the vibrato rate of ten singers,’’
onset ‘‘DELAY’’ was later, but the ‘‘RATE’’ and ‘‘EXTENT’’ J. Acoust. Soc. Am., 96, 1979–1984 (1994).
were smaller. [6] R. Miller, Singing Schumann: An Interpretive Guide for
The method applied in this study is far from enough to Performers (Oxford University Press, Oxford, 1999).
prove emotional vocal expression only by collecting the sound [7] D. Katok, The Versatile Singer: A Guide to Vibrato & Straight
Tone (City University of New York, New York, 2016).
of one singer or several singers. However, conducting an
[8] J. Sundberg, ‘‘Acoustic and psychoacoustic aspects of vocal
analysis of the same singer is suitable to better control the vibrato,’’ in Vibrato, P. Dejonckere, M. Hirano and J.
variables and identify the differences in the acoustic charac- Sundberg, Eds. (Singular Publishing, San Diego, 1995),
teristics. pp. 35–62.
[9] T. Saito and T. Nakamura, ‘‘Hierarchical structure of the
5. Conclusion categories of Japanese emotion,’’ Kyushu Univ. Psychol. Res.,
Our study analysed twenty-four emotions expressed by 4, 95–99 (2003).
one vocalist in the recording of a single note with vibrato. In [10] D. L. Robinson, ‘‘Brain function, emotional experience and
addition, the acoustic analysis of the vocalist’s singing during personality,’’ Neth. J. Psychol., 64, 152–167 (2009).
the performance was carried out to study the vibrato [11] L. Johnson-Read, ‘‘Performing lieder: Expert perspectives and
comparison of vibrato and singer’s formant with opera
parameters of the singer in the same pitch and pronunciation,
singers,’’ J. Voice, 29, 645. e15–32 (2015).
as well as the differences between the acoustic features. [12] P. Boersma, ‘‘PRAAT, a system for doing phonetics by
The analysis conducted in this study led to the following computer,’’ Glot Int., 5, 341–345 (2002).
conclusions: [13] M. Zhang, ‘‘A Matlab-based signal processing toolbox for
1. Through the analysis of our study, it was found that characterization and analysis of musical vibrato,’’ J. Audio
when a singer expresses different emotions at the same pitch, Eng. Soc., 65, 408–422 (2017).
204

Acoustic Expression of Emotions in Vocal

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Acoustic Expression of Emotions in Vocal

Uploaded by

Copyright:

Available Formats

Acoust. Sci. & Tech.

43, 3 (2022) #2022 The Acoustical Society of Japan

Acoustic expression of emotions in vocal performance:

1. Introduction diﬀerent emotions and how a singer’s emotional interpretation

Duration Delay Rate Extent Intensity Pitch Centroid

Fig. 2 ‘‘Vibrato’’ of twenty-four kinds of emotions.

the vibrato parameters signiﬁcantly change. It indicates that

You might also like