You are on page 1of 6

EVALUATING EXPRESSIVE PIANO PERFORMANCES WITH

PSYCHOACOUSTIC FITNESS FUNCTION


Jonatas Manzolli Iracele Livero Jose Fornari
NICS - UNICAMP NICS - UNICAMP NICS - UNICAMP
jonatas@nics.unicamp.br iracele@nics.unicamp.br fornari@nics.unicamp.br

ABSTRACT Particularly, the interest of this paper is derived from [6]


where we presented the ESSynth. Specifically in [5] this
The musician’s ability to develop expressive performances is method was extended to bear the manipulation of
one of the most important musical skills and it is also one of perceptually meaningful sonic features described by
the major criteria used to evaluate the musician’s musical three psychoacoustic parameters: loudness, pitch and
interpretation. Traditionally, expressive performances are spectrum magnitude. In this extension, the evaluation of
evaluated only by personal judgment of musically trained
synthesized sounds was done by the arithmetic mean of
professionals, however, also passive of personal preferences
and misconceptions. We present here a new method to the three PFF, loudness, pitch and spectrum.
automatically evaluate expressive piano performances using After studying applications using AI, and more
Psychoacoustic Fitness Function (PFF) based on three specifically EC methods, such as the ESSynth, we
psychoacoustic measurements: a) loudness, b) pitch and c) decided to investigate the potential usage of them for the
spectrum magnitude. In this paper we used PFF to evaluate the analysis (or evaluation) and synthesis application, such
dynamic development of four important piano touches that are as the one described in this work, where we present a
known to be common technique tools used in expressive piano method for evaluating expressive piano performances.
performances. They are: intensity, legato, staccato and
rhythmic pulsation (or simply pulse). This method is derived
2. THE FOUR PIANO TOUCHES
from the Evolutionary Sound Synthesis Method, the ESSynth
[6], more specifically, from the ESSynth’s Fitness Function in Despite of having a wide range of parameters to be
the selection process. Given a set of pianists recordings that evaluated, as we approached to study the problem of
are the reference in expressiveness, the Target set, we used the evaluating expressive piano performances, we decided to
PFF to evaluate by comparison with the Target, the
do it by evaluating a set of four piano touches: Pulse,
expressiveness in twelve pianists recordings, for the four piano
touches. The results, as they are presented below, are enough Legato, Staccato and Intensity.
to motivate us that this method, as it is further developed, may Pulse, refers to the pianist ability of keeping control of
turn to be an important contribution for the evaluation of its rhythm, exploring it as a way of inferring a discursive
expressive musical performances. rhythmic into its performance.
Legato is the ability of playing melodies (sequential
1. INTRODUCTION single notes) and harmonies or clusters (simultaneous
notes) in a way of connecting them so its slopes of
The problem of evaluating the expressiveness in piano intensity are lowered and less perceptible.
performances is long dated, since [2]. This is an Staccato, in opposition to Legato, is the ability of
interesting problem and has been addressed by means of separate or spread in time all notes and clusters so, in the
several different approaches. Lately, Artificial musical discourse, they can be presented as separated as
Intelligence (AI) techniques have also been applied to possible.
evaluate the problem of expressive piano performances. Intensity refers to the ability of controlling the variation
As few examples, Widmer studied the use of machine of strength of each musical entity (note or cluster) by
learning techniques to understand how expressive music controlling the velocity the pianist’s fingers hit the piano
performances are produced [15]. Goebl studied the role keys.
of timing and intensity in the production and perception In the paper we take the evaluation of a pianist in these
of melody of expressive piano performances [16]. four piano touches as being directly proportional to the
We also have studied AI methods such as Evolutionary evaluation of this pianist expressive performance. This
Computation (EC) and Neuroinformatics to simulate four piano touches are evaluated through the three PFF
compositional environments and thus create new (loudness, pitch and spectrum), where the arithmetic
methods for sound synthesis. In [6] we presented the mean of them is the final expressive performance
evolutionary sound synthesis, the ESSynth, a method for evaluation..
sound synthesis based on the Darwinian natural selection
theory. In ESSynth we have reproduction and selection
3. MEASURING PIANISTIC SONORITY WITH
of waveforms, playing the role of individuals in a
PFF
population. In [7] we have introduced the software
named Vox Populi, that uses evolutionary computation PFF is part of the ESSynth method. It is responsible for
methods for algorithmic musical composition. Together the selection of waveforms. In order to explain the
with Wasserman we created the Roboser, a live musical concept of PFF we first have to overview the basics
composition method based on synthetic emotions [14]. about ESSynth method. This is made of three structures:
• B(n), the population set, in its n-th generation. The first
generation population is B(0). This one is made of
individuals that will reproduce and be selected.
• T, the target set, made of the reference individuals
• f, the fitness function, that selects the best individual. Now we define the genotype distance between two
individuals. Being Ca and Cb two psychoacoustic curves
The best individual in the n-th generation is w*n. This of the same type (loudness, or pitch or spectrum),
one is the individual belonging to B(n) nearest of T. For w
w
a and b , the
each generation a new w*n is sought, and put to the respectively from the waveforms
system output as the synthesized sound. Euclidean distance between Ca and Cb is given by:

B(n) f T N

∑c
2
d c (c a , c b ) = a (k ) − c b (k )
k =1
Crossover Fitness (3)
Mutation Evaluation The equation (3) is a psychoacoustic distance from
where the PFF evaluation of the specific psychoacoustic
w*1, w*2, w*3,..., w*n parameter comes.
Output
waveforms
Given the genotypes g a = (l a , p a , s a ) and
Figure 1. Basic ESSynth diagram.
The sound segments, or waveforms, are named as g b = (lb , pb , sb ) both elements of G, it is possible to
n define the distance between them as:
w
individuals and represented by r , which means the r-

[ ]
th individual in the n-th generation population. 1
As we see it, the most relevant psychoacoustic measures D(ga , gb ) = dL (l a , lb ) + dP ( pa , pb ) + dS (sa , sb )
within the individuals are those covering the sonic 3 (4)
perception of: intensity, frequency and partials, or
magnitude spectrum composition. They are respectively: Equation (4) is the genotype distance, given by the
loudness, pitch and spectrum. arithmetic mean of the other three psychoacoustic
ESSynth represents the individual’s genotype with these gn gn gn
three psychoacoustic parameters, as it is seen in the distances. Being: G(n) = { 1 , 2 , ..., M } the n-th
equation below: genotype generation associated with the population set
B(n)
(
g n = l rn (t),p rn (t) , s rn (f)
r
) g
(1)
with M individuals and G = { 1 , 2 ,..., Q } the
g g
n
g genotype target set, extracted from T. The distance
The genotype r can be seeing as one element within
(n )
the space of vectors G, the space of psychoacoustic between sets G and G is defined as:
curves. G is a Cartesian product of three spaces of
DH (G , G ) =
(n)
min D( g an , g j )
continuous functions, as follows,
G = L× P× S 1≤ a ≤ M
(2)
1≤ j ≤ Q (5)
Where the spaces of functions are: L for loudness, P
for pitch and S spectrum. This distance gives the best individual in the Population,
The next draw depicts the correspondence between the in comparison with the individuals in the Target set. If
individual and its genotype the distance DH is zero then the individual within the
Population is equal to one within the Target set. PFF is
given simply by the normalization of all population
individual’s distances, so they will go from zero (in case
the individual is a clone of one belonging to T) to one
(the individual most different from the individuals in T).
n
Waveform wrn Genotype g r
Psychoacoustic Curves

Loudness l(t)

Pitch p(t)

Spectrum s(f)

Figure 2. The genotype representation of an individual.


4. RESULTS
We recorded twelve pianists, each one performing four
pre-selected piano pieces. Each piece was chosen to be a
good representation of one particular piano touch. Bela
Bartok's “Bear Dance” for Pulsation, Chopin's
“Nocturne op. 9 n. 2” for Legato, Bela Bartok's “Jeering
Song” for Staccato and Katunda's “Estro Africano 1” for
Intensity. These recordings are available to be heard and
downloaded at the link below (in MP3 audio format):
http://www.nics.unicamp.br/~fornari/epp.
The pianists recorded in the same conditions (same day,
same room, same piano and same recording equipment).
Three of them were professional pianists and recorded
versions that we considered to embody the desired
Figure 3. PFF evaluation for the “Pulse” recording group.
aspects of expressive piano performance, for each piano The Target set here is made of the individuals 1, 4 and 11.
piece. These three groups of 4 recordings were then put
as the Target set, which is the reference for the PFF The best individual is #5. This is a recording of a pianist
evaluation. The other nine group of recordings made part playing with more pulsation inference, although it is not
of the Population set to be evaluated in the as regular as the ones in the Target set. The most distant
expressiveness by the PFF. Following the ESSynth one, #3, plays with pulse but keeps the sustain pedal
terminology, each recording is named as individual, so pressed, what may have been the cause for the algorithm
we had four Populations and four correspondent Target ranked it as the most distant one. Listening to the
sets, each one for one piano piece, representing one recordings, though, we would pick the #9 as the most
particular piano touch. As a minor variation, we placed distant one.
all twelve individuals in the Population, including the
three that also belong to the Target. This was done to
check that the PFF evaluation of the individual clones of
the ones in the Target set would be zero. Then we
proceeded by calculating the genotype of each individual
and measuring their genotype distances with the Target
set. For each Population we then normalized the twelve
measurements which gave us the PFF evaluation. The
results are shown in the next four graphics. Each one
depicts four curves, from top to bottom, the PFF
evaluation for: loudness, pitch, spectrum magnitude, and
the total PFF.
The individuals are digital audio files recorded at the
sampling rate of 44.1KHz, with 16 bits of resolution and Figure 4. PFF evaluation for the “Legato” recording group.
one channel (mono). The Target set here is made of the individuals 1, 4 and 12.

The best individual here is #2, that in fact is a recording


of a pianist playing with great legato (linking the notes
as much as possible), as well as the Target set ones do.
The most distant individual is #10. This is a recording of
a pianist playing staccato (detaching the notes as much
as possible), which is the opposite of legato.
Table 1. The best individuals of each group of recordings.

4.1. Pulse 4.1.1. Legato 4.1.2. Staccato Intensity

Target set: Target set: Target set: Target


1,4,11 1,4,12 1,4,11 set:
1,11,12
Dl = 5 Dl = 2 Dl = 9 Dl = 2
Dp = 10 Dp = 2 Dp = 2 Dp = 2
De = 5 De = 7 De = 5 De = 8
D=5 D=2 D=2 D=8

The next four tables show the PFF for each individual
within the four groups of recordings. The value for the
best individual are underlined.
Figure 5. PFF evaluation for the “Staccato” recording group.
The Target set here is made of the individuals 1, 4 and 11.
4.1.2.1.1 PULSE
The algorithm picked #2 as the best individual. Listening individual Dl Dp De D
1 0 0 0 0
to this one and comparing to the individuals in Target 2 0.799 0.0568 0.6826 0.5858
set, we come to the conclusion that #2 is in fact the one 8
that presents a better Staccato, although its recordings 3 0.824 1.0000 0.8028 1.0000
has some mistaken notes. In contrast with it, the 3
individual #10, the most distant one, is a recording of a 4 0 0 0 0
5 0.686 0.0557 0.5374 0.4871
pianist playing with the sustain pedal pressed, which 5
make the Staccato impossible to be perceived, although 6 0.707 0.0823 0.5737 0.5190
this pianist played the piece with no mistaken notes. 5
7 0.710 0.0588 1.0000 0.6734
2
8 0.913 0.0681 0.7705 0.6667
0
9 0.741 0.0578 0.6242 0.5417
2
10 0.793 0.0536 0.7660 0.6141
7
11 0 0 0 0
12 1.000 0.0699 0.8248 0.7212
0

4.1.2.1.2 LEGATO
individual Dl Dp De D
1 0 0 0 0
2 0.592 0.2334 0.4543 0.4690
0
Figure 6. PFF evaluation for the “Intensity” recording group.
3 0.748 0.4359 0.5599 0.6393
The Target set here is made of the individuals 1, 11 and 12. 7
4 0 0 0 0
The best individual, #8, is a recording of a pianist 5 0.754 0.5565 0.5282 0.6742
playing with great variations of intensity (from 8
pianissimo to fortissimo), as well as the ones in the 6 0.769 0.3651 0.4353 0.5752
Target. The most distant one, #5, is a recording with 1
7 0.937 0.9071 0.2542 0.7692
almost no intensity variation, with the only exception 7
being the termination in fading out. 8 1.000 0.4974 1.0000 0.9153
The next table shows what are the best individuals (by 0
its number within the Population) according to each PFF 9 0.623 0.3853 0.6652 0.6136
8
evaluation: Dl (loudness PFF), Dp (pitch PFF), De
10 0.847 1.0000 0.8811 1.0000
(spectrum PFF) and D (total PFF). 5
11 0.647 0.3541 0.6134 0.5918
4
12 0 0 0 0
individual among Population. 2) The best individual’s
4.1.2.1.3 STACCATO selection is automatic and therefore shields away human
individual Dl Dp De D matters not beneficial for the evaluation process, such as
1 0 0 0 0 personal preferences, prejudices, misconceptions and
2 0.633 0.5758 0.6622 0.5762 biases.
9
3 0.803 0.7478 0.7575 0.7484 As we had said, PFF evaluation comes from the
6 ESSynth’s selection process. As we briefly said,
4 0 0 0 0 ESSynth has also another process called reproduction.
5 0.640 0.8733 0.3728 0.8734 This one uses genetic operators such as crossover and
7
mutation to create new individuals inheriting sonic
6 0.647 0.8093 0.6239 0.8095
7 features of its predecessors, that are the offspring
7 0.690 10.000 0.4920 10.000 between the population individuals and the best
6 individual. We believe that the reproduction process of
8 10.00 0.9352 0.5778 0.9360 ESSynth can also be used the expressive piano
0
performance problem, however not to evaluate but as a
9 0.535 0.6212 0.6525 0.6214
2 way to manipulate the expressive performance. Of
10 0.798 0.8112 10.000 0.8116 course this is not a trivial task and so ought to be
0 thoroughly studied in order to further this research into
11 0 0 0 0 this direction. This may open a new field of researches
12 0.974 0.7194 0.8089 0.7205
4
in sound synthesis, bringing about new evolutionary
methods that not just generate new waveforms, but also
generate new performances. This is particularly an
4.1.2.1.4 INTENSITY original feature found in EC methods as they are
individual Dl Dp De D
evolutionaries (dynamic in time) rather than
1 0 0 0 0 deterministics (i.e. additive synthesis, fm synthesis and
2 0.576 0.0446 1.0000 1.0000 so forth).
6 Another aspect that deserves deeper explanation is in
3 0.819 0.0548 0.9781 0.9781 regard of the PFF pitch evaluation. It is known that pitch
1
4 0.949 0.2611 0.6516 0.6516
is a psychoacoustic parameter related to the perception
6 of the fundamental harmonic of sounds. In fact only a
5 0.747 1.0000 0.7935 0.7935 narrow group of sounds have a defined pitch: the
6 melodic sounds. These are sounds such as the ones
6 1.000 0.0532 0.6910 0.6910
produced by the piano playing. However, we developed
0
7 0.848 0.0478 0.6868 0.6868 our pitch algorithm to extract pitch from melodies
3 (single notes played in sequence) instead of harmony
8 0.641 0.0510 0.4518 0.4518 (several notes playing simultaneously). We have not yet
3 studied an extension for our algorithm to calculate
9 0.842 0.0627 0.8257 0.8257
8
harmonic pitch, but we have used the melodic pitch
10 0.950 0.0483 0.5738 0.5738 algorithm to calculate the pitch PFF for the piano
1 playing recordings, knowing that they have melodic and
11 0 0 0 0 harmonic pitch information. Further study is therefore
12 0 0 0 0 necessary to conclude whether or not its pitch PFF is
valid for both melodic and harmonic pitch. It would be
interesting to compare its PFF evaluation from audio
5. DISCUSSION AND CONCLUSION recording and MIDI, once they have its actual pitch
information listed in terms of musical notes.
We have presented here a method for automatic To conclude we would like to say that EC methods
evaluation of expressive piano performance based on the applied to audio and music seem to us as a viable way of
psychoacoustic measurement of four piano touchs: opening a new field of investigation that may lead to the
pulse, legato, staccato and intensity. This evaluation is exploration of subtle perceptual sonic nuances so far
based on what we called PFF evaluation: the normalized impossible of being analyzed, transformed or
genotype distance between the genotypes of each synthesized.
individual within the Population and the Target set, the
genotypes of individuals previously selected as the
reference in expressiveness. This method accomplishes
two important tasks: 1) it keeps the human sensibility
and choice in the decision loop, by deciding which
individuals will belong to the Target and therefore
influence on the decision making of selecting the best
Based on Synthetic Emotions. In: Published by IEEE
Multimedia Computer Society, Vol:10:4, p. 82-90.
6. REFERENCES [15] Widmer, G. 2001. Using AI and machine learning to
study expressive music performance: Project survey
[1] Author, E. ''The title of the conference paper'', and first report. AI Communications 14(3), 149-162.
Proceedings of the International Computer
[16] Goebl, W. 2003. The role of timing and intensity in
Music Conference, Miami, USA, 2004.
the Production and Perception of Melody in
[2] Allen. F. J. 1913. Pianoforte touch. Nature Expressive Piano Performance. PhD Dissertation,
91(2278),424-425. Institut fur Musikwissenschaft, Karl-Franzens
University, Germany.
[3] Askenfelt, A., Galembo, A., Cuddy, L. E. 1998. On
the accoustics and psychology of piano touch and
tone. Journal of Acoustical Society of America.
103(5 Pt. 2), 2873.
[4] Bresin, R., Battel, G. 2000. Articulation strategies in
expressive piano performances. Journal of New
Music Research. 29 (3), 211-224.
[5] Fornari, J. 2003. A Síntese Evolutiva de Segmentos
Sonoros. PhD Dissertation. Faculty of Electrical
Engeering, State University of Campinas
(UNICAMP). Brazil.
[6] Manzolli, J., Maia Jr. A., Fornari, J. & Damiani, F.
2001. The evolutionary sound synthesis method.
Proceedings of the ninth ACM international
conference on Multimedia, Ottawa, Canada, 585 –
587, ISBN:1-58113-394-4.
[7] Moroni, A., Manzolli, J., Von Zuben, F. and
Gudwin, R. 2000. Vox Populi: An Interactive
Evolutionary System for Algorithmic Music
Composition, San Francisco, USA: Leonardo Music
Journal, - MIT Press, Vol 10, pg 49-54.
[8] Moroni, A, von Zuben, F. and Manzolli, J. 2002.
ArTbitration, San Francisco, USA: Leonardo Music
Journal - MIT Press, 2002, Vol:11-45-55.
[9] Richerme, C. 1996. A técnica pianística. Uma
abordagem científica. S.João Boa Vista. Air
Musical. p.27 and 28.
[10] Repp. B. H. 1993. Some empirical observations on
sound level properties of recorded piano tones.
Journal of the Acoustical Society of America.
93(2),1136-44.
[11] Repp. B. H. 1996. Patterns of note onset
asynchronies in expressive piano performances.
Journal of the Acoustical Society of America.
100(6),3917-3932.
[12] Shaffer, L. H. 1981. Performances of Chopin, Bach
and Bartòk: Studies in motor programming.
Cognitive Psycology 13,326-376.
[13] Tro. J. 1998. Micro dynamics deviation as a
measure of musical quality in piano performances?.
In Proceedings of the 5th International Conference
on Music Perception and Cognition (ICPMC5),
August, 26-30, edited by S. W. Yi (Western Music
Research Institute, Seoul National University, Seoul,
Korea).
[14] Wasserman, K.C., Eng, K., Verschure, P.F.M.J.,
Manzolli, J. 2003. Live Soundscape Composition

You might also like