Professional Documents
Culture Documents
483
ISMIR 2008 – Session 4b – Musical Expression and Meaning
approaches. Lopez de Mantaras et al. [6] report on SaxEx, Frdric Chopin is used. They propose an ensemble of sim-
a performance system capable of generating expressive solo ple classifiers derived by both subsampling the training set
saxophone performances in Jazz. One limitation of their and subsampling the input features. Experiments show that
system is that it is incapable of explaining the predictions the proposed features are able to quantify the differences be-
it makes and it is unable to handle melody alterations, e.g. tween music performers.
ornamentations.
Ramirez et al. [11] have explored and compared diverse 3 MELODIC DESCRIPTION
machine learning methods for obtaining expressive music
performance models for Jazz saxophone that are capable First of all, we perform a spectral analysis of a portion of
of both generating expressive performances and explaining sound, called analysis frame, whose size is a parameter of
the expressive transformations they produce. They propose the algorithm. This spectral analysis consists of multiplying
an expressive performance system based on inductive logic the audio frame with an appropriate analysis window and
programming which induces a set of first order logic rules performing a Discrete Fourier Transform (DFT) to obtain
that capture expressive transformation both at an inter-note its spectrum. In this case, we use a frame width of 46 ms,
level (e.g. note duration, loudness) and at an intra-note level an overlap factor of 50%, and a Keiser-Bessel 25dB win-
(e.g. note attack, sustain). Based on the theory generated dow. Then, we compute a set of low-level descriptors for
by the set of rules, they implemented a melody synthesis each spectrum: energy and an estimation of the fundamen-
component which generates expressive monophonic output tal frequency. From these low-level descriptors we perform
(MIDI or audio) from inexpressive melody MIDI descrip- a note segmentation procedure. Once the note boundaries
tions. are known, the note descriptors are computed from the low-
With the exception of the work by Lopez de Mantaras level values. the main low-level descriptors used to charac-
et al and Ramirez et al, most of the research in expressive terize note-level expressive performance are instantaneous
performance using machine learning techniques has focused energy and fundamental frequency.
on classical piano music where often the tempo of the per-
formed pieces is not constant. The works focused on classi- Energy computation. The energy descriptor is computed
cal piano have focused on global tempo and loudness trans- on the spectral domain, using the values of the amplitude
formations while we are interested in note-level tempo and spectrum at each analysis frame. In addition, energy is com-
loudness transformations. puted in different frequency bands as defined in [5], and
Nevertheless, the use of expressive performance mod- these values are used by the algorithm for note segmenta-
els, either automatically induced or manually generated, for tion.
identifying musicians has received little attention in the past.
This is mainly due to two factors: (a) the high complexity of Fundamental frequency estimation. For the estimation of
the feature extraction process that is required to character- the instantaneous fundamental frequency we use a harmonic
ize expressive performance, and (b) the question of how to matching model derived from the Two-Way Mismatch pro-
use the information provided by an expressive performance cedure (TWM) [7]. For each fundamental frequency candi-
model for the task of performance-based performer iden- date, mismatches between the harmonics generated and the
tification. To the best of our knowledge, the only group measured partials frequencies are averaged over a fixed sub-
working on performance-based automatic performer iden- set of the available partials. A weighting scheme is used to
tification is the group led by Gerhard Widmer. Saunders et make the procedure robust to the presence of noise or ab-
al [13] apply string kernels to the problem of recognizing sence of certain partials in the spectral data. The solution
famous pianists from their playing style. The characteris- presented in [7] employs two mismatch error calculations.
tics of performers playing the same piece are obtained from The first one is based on the frequency difference between
changes in beat-level tempo and beat-level loudness. From each partial in the measured sequence and its nearest neigh-
such characteristics, general performance alphabets can be bor in the predicted sequence. The second is based on the
derived, and pianists’ performances can then be represented mismatch between each harmonic in the predicted sequence
as strings. They apply both kernel partial least squares and and its nearest partial neighbor in the measured sequence.
Support Vector Machines to this data. This two-way mismatch helps to avoid octave errors by ap-
Stamatatos and Widmer [16] address the problem of iden- plying a penalty for partials that are present in the measured
tifying the most likely music performer, given a set of per- data but are not predicted, and also for partials whose pres-
formances of the same piece by a number of skilled candi- ence is predicted but which do not actually appear in the
date pianists. They propose a set of very simple features for measured sequence. The TWM mismatch procedure has
representing stylistic characteristics of a music performer also the benefit that the effect of any spurious components or
that relate to a kind of ’average’ performance. A database partial missing from the measurement can be counteracted
of piano performances of 22 pianists playing two pieces by by the presence of uncorrupted partials in the same frame.
484
ISMIR 2008 – Session 4b – Musical Expression and Meaning
485
ISMIR 2008 – Session 4b – Musical Expression and Meaning
by the tuple logic extension of the C4.5 decision tree algorithm: instead
of testing attribute values at the nodes of the tree, Tilde
(Pitch, Dur, PrevPitch, PrevDur, NextPitch, NextDur, Nar1, tests logical predicates. This provides the advantages of
Nar2, Nar3) both propositional decision trees (i.e. efficiency and prun-
ing techniques) and the use of first order logic (i.e. in-
creased expressiveness). The musical context of each note
4.2 Algorithm
is defined by predicates context and narmour. context
We are ultimately interested in obtaining a classification func- specifies the note features described above and narmour
tion F of the following form: specifies the Narmour groups to which a particular note be-
longs, along with its position within a particular group. Ex-
F (M elodyF ragment(n1, . . . , nk)) −→ P erf ormers pressive deviations in the performances are encoded using
predicates stretch and dynamics. stretch specifies the
where M elodyF ragment(n1, . . . , nk) is the set of melody stretch factor of a given note with regard to its duration
fragments composed of notes n1, . . . , nk and P erf ormers in the score and dynamics specifies the mean energy of a
is the set of possible performers to be identified. For each given note. The temporal aspect of music is encoded via the
performer i to be identified we induce an expressive per- predicates pred and succ. For instance, succ(A, B, C, D)
formance model M i predicting his/her timing and energy indicates that note in position D in the excerpt indexed by
expressive transformations: the tuple(A, B) follows note C.
486
ISMIR 2008 – Session 4b – Musical Expression and Meaning
7 REFERENCES
[3] Friberg, A.; Bresin, R.; Fryden, L.; 2000. Music from
Motion: Sound Level Envelopes of Tones Expressing
Human Locomotion. Journal of New Music Research
29(3): 199-210.
[8] McNab, R.J., Smith Ll. A. and Witten I.H., (1996). Sig-
6 CONCLUSION nal Processing for Melody Transcription, SIG working
paper, vol. 95-22.
In this paper we focused on the task of identifying perform- [9] Mitchell, T.M. (1997). Machine Learning. McGraw-
ers from their playing style using note descriptors extracted Hill.
from audio recordings. In particular, we concentrated in
identifying violinists playing Irish popular pieces (Irish jigs). [10] Narmour, E. (1990). The Analysis and Cognition of
We characterized performances by representing each note in Basic Melodic Structures: The Implication Realization
the performance by a set of inter-note features represent- Model. University of Chicago Press.
ing the context in which the note appears. We then in-
[11] Rafael Ramirez, Amaury Hazan, Esteban Maestre,
duced an expressive performance model for each of the per-
formers and presented a successful performer identification Xavier Serra, A Data Mining Approach to Expressive
case study. The results obtained seem to indicate that the Music Performance Modeling, in Multimedia Data min-
ing and Knowledge Discovery, Springer.
inter-note features presented contain sufficient information
to identify the studied set of performers, and that the ma- [12] Repp, B.H. (1992). Diversity and Commonality in Mu-
chine learning method explored is capable of learning per- sic Performance: an Analysis of Timing Microstructure
formance patterns that distinguish these performers. This in Schumann’s ‘Traumerei’. Journal of the Acoustical
paper present preliminary work so there is further work in Society of America 104.
several directions. Our immediate plans are to extend our
database and to test different distance measures for updat- [13] Saunders C., Hardoon D., Shawe-Taylor J., and Wid-
ing the performer scores in our algorithm. We also plan to mer G. (2004). Using String Kernels to Identify Fa-
evaluate our algorithm considering melody fragments of dif- mous Performers from their Playing Style, Proceedings
ferent size and to evaluate the predictive power of timing ex- of the 15th European Conference on Machine Learning
pressive variations relative to energy expressive variations. (ECML’2004), Pisa, Italy.
487
ISMIR 2008 – Session 4b – Musical Expression and Meaning
488