Professional Documents
Culture Documents
B(n) f T N
∑c
2
d c (c a , c b ) = a (k ) − c b (k )
k =1
Crossover Fitness (3)
Mutation Evaluation The equation (3) is a psychoacoustic distance from
where the PFF evaluation of the specific psychoacoustic
w*1, w*2, w*3,..., w*n parameter comes.
Output
waveforms
Given the genotypes g a = (l a , p a , s a ) and
Figure 1. Basic ESSynth diagram.
The sound segments, or waveforms, are named as g b = (lb , pb , sb ) both elements of G, it is possible to
n define the distance between them as:
w
individuals and represented by r , which means the r-
[ ]
th individual in the n-th generation population. 1
As we see it, the most relevant psychoacoustic measures D(ga , gb ) = dL (l a , lb ) + dP ( pa , pb ) + dS (sa , sb )
within the individuals are those covering the sonic 3 (4)
perception of: intensity, frequency and partials, or
magnitude spectrum composition. They are respectively: Equation (4) is the genotype distance, given by the
loudness, pitch and spectrum. arithmetic mean of the other three psychoacoustic
ESSynth represents the individual’s genotype with these gn gn gn
three psychoacoustic parameters, as it is seen in the distances. Being: G(n) = { 1 , 2 , ..., M } the n-th
equation below: genotype generation associated with the population set
B(n)
(
g n = l rn (t),p rn (t) , s rn (f)
r
) g
(1)
with M individuals and G = { 1 , 2 ,..., Q } the
g g
n
g genotype target set, extracted from T. The distance
The genotype r can be seeing as one element within
(n )
the space of vectors G, the space of psychoacoustic between sets G and G is defined as:
curves. G is a Cartesian product of three spaces of
DH (G , G ) =
(n)
min D( g an , g j )
continuous functions, as follows,
G = L× P× S 1≤ a ≤ M
(2)
1≤ j ≤ Q (5)
Where the spaces of functions are: L for loudness, P
for pitch and S spectrum. This distance gives the best individual in the Population,
The next draw depicts the correspondence between the in comparison with the individuals in the Target set. If
individual and its genotype the distance DH is zero then the individual within the
Population is equal to one within the Target set. PFF is
given simply by the normalization of all population
individual’s distances, so they will go from zero (in case
the individual is a clone of one belonging to T) to one
(the individual most different from the individuals in T).
n
Waveform wrn Genotype g r
Psychoacoustic Curves
Loudness l(t)
Pitch p(t)
Spectrum s(f)
The next four tables show the PFF for each individual
within the four groups of recordings. The value for the
best individual are underlined.
Figure 5. PFF evaluation for the “Staccato” recording group.
The Target set here is made of the individuals 1, 4 and 11.
4.1.2.1.1 PULSE
The algorithm picked #2 as the best individual. Listening individual Dl Dp De D
1 0 0 0 0
to this one and comparing to the individuals in Target 2 0.799 0.0568 0.6826 0.5858
set, we come to the conclusion that #2 is in fact the one 8
that presents a better Staccato, although its recordings 3 0.824 1.0000 0.8028 1.0000
has some mistaken notes. In contrast with it, the 3
individual #10, the most distant one, is a recording of a 4 0 0 0 0
5 0.686 0.0557 0.5374 0.4871
pianist playing with the sustain pedal pressed, which 5
make the Staccato impossible to be perceived, although 6 0.707 0.0823 0.5737 0.5190
this pianist played the piece with no mistaken notes. 5
7 0.710 0.0588 1.0000 0.6734
2
8 0.913 0.0681 0.7705 0.6667
0
9 0.741 0.0578 0.6242 0.5417
2
10 0.793 0.0536 0.7660 0.6141
7
11 0 0 0 0
12 1.000 0.0699 0.8248 0.7212
0
4.1.2.1.2 LEGATO
individual Dl Dp De D
1 0 0 0 0
2 0.592 0.2334 0.4543 0.4690
0
Figure 6. PFF evaluation for the “Intensity” recording group.
3 0.748 0.4359 0.5599 0.6393
The Target set here is made of the individuals 1, 11 and 12. 7
4 0 0 0 0
The best individual, #8, is a recording of a pianist 5 0.754 0.5565 0.5282 0.6742
playing with great variations of intensity (from 8
pianissimo to fortissimo), as well as the ones in the 6 0.769 0.3651 0.4353 0.5752
Target. The most distant one, #5, is a recording with 1
7 0.937 0.9071 0.2542 0.7692
almost no intensity variation, with the only exception 7
being the termination in fading out. 8 1.000 0.4974 1.0000 0.9153
The next table shows what are the best individuals (by 0
its number within the Population) according to each PFF 9 0.623 0.3853 0.6652 0.6136
8
evaluation: Dl (loudness PFF), Dp (pitch PFF), De
10 0.847 1.0000 0.8811 1.0000
(spectrum PFF) and D (total PFF). 5
11 0.647 0.3541 0.6134 0.5918
4
12 0 0 0 0
individual among Population. 2) The best individual’s
4.1.2.1.3 STACCATO selection is automatic and therefore shields away human
individual Dl Dp De D matters not beneficial for the evaluation process, such as
1 0 0 0 0 personal preferences, prejudices, misconceptions and
2 0.633 0.5758 0.6622 0.5762 biases.
9
3 0.803 0.7478 0.7575 0.7484 As we had said, PFF evaluation comes from the
6 ESSynth’s selection process. As we briefly said,
4 0 0 0 0 ESSynth has also another process called reproduction.
5 0.640 0.8733 0.3728 0.8734 This one uses genetic operators such as crossover and
7
mutation to create new individuals inheriting sonic
6 0.647 0.8093 0.6239 0.8095
7 features of its predecessors, that are the offspring
7 0.690 10.000 0.4920 10.000 between the population individuals and the best
6 individual. We believe that the reproduction process of
8 10.00 0.9352 0.5778 0.9360 ESSynth can also be used the expressive piano
0
performance problem, however not to evaluate but as a
9 0.535 0.6212 0.6525 0.6214
2 way to manipulate the expressive performance. Of
10 0.798 0.8112 10.000 0.8116 course this is not a trivial task and so ought to be
0 thoroughly studied in order to further this research into
11 0 0 0 0 this direction. This may open a new field of researches
12 0.974 0.7194 0.8089 0.7205
4
in sound synthesis, bringing about new evolutionary
methods that not just generate new waveforms, but also
generate new performances. This is particularly an
4.1.2.1.4 INTENSITY original feature found in EC methods as they are
individual Dl Dp De D
evolutionaries (dynamic in time) rather than
1 0 0 0 0 deterministics (i.e. additive synthesis, fm synthesis and
2 0.576 0.0446 1.0000 1.0000 so forth).
6 Another aspect that deserves deeper explanation is in
3 0.819 0.0548 0.9781 0.9781 regard of the PFF pitch evaluation. It is known that pitch
1
4 0.949 0.2611 0.6516 0.6516
is a psychoacoustic parameter related to the perception
6 of the fundamental harmonic of sounds. In fact only a
5 0.747 1.0000 0.7935 0.7935 narrow group of sounds have a defined pitch: the
6 melodic sounds. These are sounds such as the ones
6 1.000 0.0532 0.6910 0.6910
produced by the piano playing. However, we developed
0
7 0.848 0.0478 0.6868 0.6868 our pitch algorithm to extract pitch from melodies
3 (single notes played in sequence) instead of harmony
8 0.641 0.0510 0.4518 0.4518 (several notes playing simultaneously). We have not yet
3 studied an extension for our algorithm to calculate
9 0.842 0.0627 0.8257 0.8257
8
harmonic pitch, but we have used the melodic pitch
10 0.950 0.0483 0.5738 0.5738 algorithm to calculate the pitch PFF for the piano
1 playing recordings, knowing that they have melodic and
11 0 0 0 0 harmonic pitch information. Further study is therefore
12 0 0 0 0 necessary to conclude whether or not its pitch PFF is
valid for both melodic and harmonic pitch. It would be
interesting to compare its PFF evaluation from audio
5. DISCUSSION AND CONCLUSION recording and MIDI, once they have its actual pitch
information listed in terms of musical notes.
We have presented here a method for automatic To conclude we would like to say that EC methods
evaluation of expressive piano performance based on the applied to audio and music seem to us as a viable way of
psychoacoustic measurement of four piano touchs: opening a new field of investigation that may lead to the
pulse, legato, staccato and intensity. This evaluation is exploration of subtle perceptual sonic nuances so far
based on what we called PFF evaluation: the normalized impossible of being analyzed, transformed or
genotype distance between the genotypes of each synthesized.
individual within the Population and the Target set, the
genotypes of individuals previously selected as the
reference in expressiveness. This method accomplishes
two important tasks: 1) it keeps the human sensibility
and choice in the decision loop, by deciding which
individuals will belong to the Target and therefore
influence on the decision making of selecting the best
Based on Synthetic Emotions. In: Published by IEEE
Multimedia Computer Society, Vol:10:4, p. 82-90.
6. REFERENCES [15] Widmer, G. 2001. Using AI and machine learning to
study expressive music performance: Project survey
[1] Author, E. ''The title of the conference paper'', and first report. AI Communications 14(3), 149-162.
Proceedings of the International Computer
[16] Goebl, W. 2003. The role of timing and intensity in
Music Conference, Miami, USA, 2004.
the Production and Perception of Melody in
[2] Allen. F. J. 1913. Pianoforte touch. Nature Expressive Piano Performance. PhD Dissertation,
91(2278),424-425. Institut fur Musikwissenschaft, Karl-Franzens
University, Germany.
[3] Askenfelt, A., Galembo, A., Cuddy, L. E. 1998. On
the accoustics and psychology of piano touch and
tone. Journal of Acoustical Society of America.
103(5 Pt. 2), 2873.
[4] Bresin, R., Battel, G. 2000. Articulation strategies in
expressive piano performances. Journal of New
Music Research. 29 (3), 211-224.
[5] Fornari, J. 2003. A Síntese Evolutiva de Segmentos
Sonoros. PhD Dissertation. Faculty of Electrical
Engeering, State University of Campinas
(UNICAMP). Brazil.
[6] Manzolli, J., Maia Jr. A., Fornari, J. & Damiani, F.
2001. The evolutionary sound synthesis method.
Proceedings of the ninth ACM international
conference on Multimedia, Ottawa, Canada, 585 –
587, ISBN:1-58113-394-4.
[7] Moroni, A., Manzolli, J., Von Zuben, F. and
Gudwin, R. 2000. Vox Populi: An Interactive
Evolutionary System for Algorithmic Music
Composition, San Francisco, USA: Leonardo Music
Journal, - MIT Press, Vol 10, pg 49-54.
[8] Moroni, A, von Zuben, F. and Manzolli, J. 2002.
ArTbitration, San Francisco, USA: Leonardo Music
Journal - MIT Press, 2002, Vol:11-45-55.
[9] Richerme, C. 1996. A técnica pianística. Uma
abordagem científica. S.João Boa Vista. Air
Musical. p.27 and 28.
[10] Repp. B. H. 1993. Some empirical observations on
sound level properties of recorded piano tones.
Journal of the Acoustical Society of America.
93(2),1136-44.
[11] Repp. B. H. 1996. Patterns of note onset
asynchronies in expressive piano performances.
Journal of the Acoustical Society of America.
100(6),3917-3932.
[12] Shaffer, L. H. 1981. Performances of Chopin, Bach
and Bartòk: Studies in motor programming.
Cognitive Psycology 13,326-376.
[13] Tro. J. 1998. Micro dynamics deviation as a
measure of musical quality in piano performances?.
In Proceedings of the 5th International Conference
on Music Perception and Cognition (ICPMC5),
August, 26-30, edited by S. W. Yi (Western Music
Research Institute, Seoul National University, Seoul,
Korea).
[14] Wasserman, K.C., Eng, K., Verschure, P.F.M.J.,
Manzolli, J. 2003. Live Soundscape Composition