Professional Documents
Culture Documents
Chapter I
Content-Based Indexing of
Symbolic Music Documents
Nicola Orio
University of Padova, Italy
aBstract
Indexing is the core component of most information retrieval systems, because it allows for a compact
representation of the content of a collection of documents, aimed at efficient and scalable access and
retrieval. Indexing techniques can be extended also to music, providing that significant descriptors
are computed from music documents. These descriptors can be defined as the “lexical units” of music,
depend on the dimensions that are taken into account – melody, harmony, rhythm, timbre – and are
related to the way listeners perceive music. This chapter describes some relevant aspects of indexing of
symbolic music documents, giving a review of its basic concepts and going in more detail about some
key aspects, such as the consistency at which candidate index terms are perceived by listeners, the ef-
fectiveness of alternative approaches to compute indexes, and how individual indexing schemes can be
combined together by applying data fusion approaches.
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Content-Based Indexing of Symbolic Music Documents
not be directly employed for other media. On the dissemination of cultural heritage can be the
other hand, these specific techniques should be result of the combination of digital library and
integrated whenever different media are present in information indexing techniques (Ferrari & Haus,
an individual item. The core information retrieval 1999). Yet most of the projects involving digital
(IR) techniques based on statistics and probability libraries, such as Harmonica (2006), are still based
theory may be more generally employed outside on bibliographic values rather than on indexing
the textual case and within specific nontextual document contents, meaning that research on
application domains, like music. This is because content-based approaches is required.
the underlying models, such as the vector-space
and the probabilistic models, are likely to describe Indexing
fundamental characteristics being shared by dif-
ferent media, languages and application domains One of the main components of an IR system is
(Sparck Jones & Willett, 1997). indexing (Baeza-Yates & Ribeiro-Neto, 1999). In-
The requirement for a music content-based IR dexing can be defined as “the process of analyzing
has been stressed, since many years, within the the informational content of records of knowledge
research area of music information systems as and expressing the informational content in the
well. The developments in the representation of language of the indexing system. It involves: (1)
music “suggest a need for an information retrieval selecting indexable concepts in a document; and
philosophy directed toward non-text searching and (2) expressing these concepts in the language of
eventual expansion to a system that encompasses the indexing system (as indexing items)” (Borko
the full range of information found in multime- & Bernier, 1978). An indexing system is composed
dia documents”, as stressed by McLane (1996). of a number of automatic procedures that allows
As IR has dealt with the representation and the for the organization of document contents, and for
disclosure of content from its early days (van their access, retrieval and dissemination.
Rijsbergen, 1979), it is natural to consider that Indexes are used as guidance towards items in
IR techniques should be investigated to evaluate a collection of documents. In particular, the fact
their application to music retrieval. By conclud- that indexes can be ordered, stored in complex
ing his survey, McLane stressed that “what has data structures, and accessed with fast techniques
been left out of this discussion, and will no doubt such as hashing functions or tree searches, allows
be a topic for future study, is the potential for for efficient retrieval of documents in a collection.
applying some of the standard principles of text The effectiveness of indexes is part of everyday
information retrieval to music representations”. life, when looking up a dictionary or looking for
Since 1996, many approaches have been applied the content of a book through its list of relevant
to music access, browsing, retrieval, personaliza- names and concepts (which is precisely called
tion, both proposing original techniques tailored “index”). Because it allows fast access to a syn-
to the music domain and adapting IR techniques thetic description of documents content, index-
(Downie, 2003). Many approaches to music re- ing allows for the scalability of an IR system.
trieval are related to the field of digital libraries Efficient data structures, such as inverted files
(Bainbridge, Nevill-Manning, Witten, Smith & (Baeza-Yates & Ribeiro-Neto, 1999) have been
Mc-Nab, 1999; Agosti, Bombi, Melucci & Mian, proposed to connect the indexes—which are used
2000). Because of their multimedia and multi- for retrieval—to the documents—which are of
disciplinary nature, digital libraries may profit interested for the user.
from results in music indexing and retrieval. Many approaches to music retrieval are based
In particular, projects on the preservation and on online searches, where the user’s query is
Content-Based Indexing of Symbolic Music Documents
compared with the documents in the collection the consistency at which candidate index terms
using approximate string matching. For example, are perceived by listeners, the effectiveness of
approximate string matching has been proposed alternative approaches to compute indexes, and
in one of the earliest paper on music retrieval how individual indexing schemes can be combined
(Ghias, Logan, Chamberlin & Smith, 1995) while together by applying data fusion approaches.
Dynamic Time Warping has been proposed in Hu
and Dannenberg (2002). Statistical approaches
have been proposed as well, in particular Markov metadata vs. content-Based
chains (Birmingham, Dannenberg, Wakefield, IndexIng
Bartsch, Bykowski & Mazzoni, 2001) and hid-
den Markov models (Shifrin, Pardo, Meek & The first problem that arises when choosing an
Birmingham, 2002). The advantage of these ap- indexing scheme for a music collection regards
proaches is that the difference between the query the most effective representation of documents
and the documents can be modeled, considering content, in particular whether documents have to
explicitly all the possible mismatches. Thus very be described by external metadata or directly by
high performances in terms of retrieval effective- a synthetic representation of their content. Both
ness can be achieved. On the other hand, all these approaches have positive and negative aspects.
techniques require that the string representing the Metadata usually requires extensive manual
query is matched against all the documents in work for retrieving external information on the
the collection, giving a complexity that is linear documents and for representing in a compact
with the number of documents in the collection. way most of the subtleties of document content,
Scalability to large collections of millions of but it increases the cost of indexing and does not
documents becomes then an issue. guarantee consistency when different documents
For this reason alternative approaches have are indexed by different persons. Automatic com-
been proposed that take advantage from indexing putation of metadata based on external resources
(Doraisamy & Rüger, 2004; Downie & Nelson, has been proposed in systems for collaborative
2000; Melucci & Orio, 2004; Pienimäki, 2002). filtering aimed, for example, at recommendation
Moreover, other IR techniques can be applied to systems, but the results are in terms of similarity
music retrieval. For instance, Hoashi, Matsumoto between documents and are biased by the pres-
and Inoue (2003) applied relevance feedback ence of scattered data (Stenzel & Kamps, 2005).
to a melodic retrieval task, with the main goal At the state of the art they do not seem suitable
of personalization of the results. The metaphor for a retrieval task. Content-based indexing is car-
of navigation inside a collection of documents, ried out starting from a set of features extracted
which corresponds to document browsing, has automatically from the document itself, and it is
also been proposed (Blackburn & DeRoure, 1998). the main focus of this chapter.
On the other hand, indexing is also widely used
to retrieve or recognize music in audio format, metadata
in particular for audio fingerprint and audio wa-
termarking techniques (Cano, Batlle, Kalker & For most media, such as images and video, the
Haitsma, 2005). choice of textual metadata proved to be par-
This chapter describes some aspects of content- ticularly effective. Textual metadata as a tool to
based indexing, as opposed to metadata indexing, describe and indexing music is a natural choice
giving a review of its basic concepts and going that has been made for centuries (Dunn & Mayer,
in more detail about some key aspects, such as 1999). In general metadata, especially in the form
Content-Based Indexing of Symbolic Music Documents
of semantic labels, are a very compact represen- some music features also in other music genres:
tation of the document content, because they terms like “bossa”, “waltz”, and “blues” appear
can summarize a complete document with few often in jazz compositions with that particular
keywords. Though not the main subject of this feature, and terms like “jig” and “reel” are often
discussion, it is hence worth spending few words part of the title of the respective dances in Irish
on music indexing through metadata and on its music tradition.
limitations for an efficient and effective retrieval General information is often too generic to
task. A number of music digital libraries are ac- be a good discriminator between different music
cessible through the use of metadata. For instance, works. For example in tonal music there are only
Cantate (2006) and Musica (2006) allow users to 21 major and 21 minor different tonalities, while
access to choral music using metadata and lyrics. thousands of compositions of tonal Western music
Another project based on the use of metadata is can be labelled with the term “cantata” or “con-
Jukebox (Harvell & Clark, 1995). certo”, and the same applies with terms such as
As for many other media, music metadata “up tempo” or “slow” for pop and rock genres. The
addresses different characteristics of document genre information itself groups together hundreds
content. In the particular case of music, it can be of thousands of different works. Another problem
roughly divided in three categories: that arises with metadata on general information
is that the terminology is not consistent across
• Bibliographic values: Suthor’s name, genres and historical periods. For example, the
performer’s name (in the case of audio re- term “sonata” has different meanings for Baroque
cordings), title, year of publication, editor, and Romantic repertoires and the term “ballad”
cataloguing number. refers to different characteristics in jazz and in
• General information on document con- folk music. This kind of metadata can be useful
tent: Time and key signatures, musical form, to refine the description of a music information
structure, music genre, orchestration. need, but it can hardly be used to completely
• Additional available information: Lyrics define it. Moreover, a preliminary study on users
and, if applicable, related documents that information needs (Lee & Downie, 2004) showed
create a context for the music work (e.g., a that users are interested in retrieving songs by
drama, a movie, a poem). their specific content.
Additional information in the form of lyr-
The search through bibliographic values can ics, when present, can be particularly useful to
be very effective in terms of retrieval effective- describe an information need, yet in this case
ness, even if in this case a database approach to the retrieval of music documents becomes an ap-
match exact values in predefined fields would be plication of textual IR. Contextual information,
more suitable than an IR approach. On the other such as the movie where a particular soundtrack
hand, the user is required to have a good knowl- has been used, or the poem that inspired a par-
edge of the music domain, being able to clearly ticular composition, can be very helpful as well
describe the documents of interest. In the case to describe a user information need. In many
of tonal Western music, the title of a music work cases the information need is motivated by the
is often nondescriptive, describing part of the contextual information itself—that is, a user may
general information on document content, and be searching for the theme song of a TV series
it is typical to have titles such as “Sonata in D or for the music of a known ballet—yet this kind
flat, Op. 5” or “Fugue”. The title can be based on of contextual information applies only to a small
percentage of music documents.
Content-Based Indexing of Symbolic Music Documents
What is normally missing in music metadata dimensions, in particular the ones that may have
is a textual description of the document content a symbolic representation, which is more suitable
other than its musical structure, which is a peculiar for the creation of an index of a music collection.
situation of the music language that is due to the For any chosen dimension, the indexing scheme
fact that music is not aimed at describing some- has to be based on a suitable definition of the
thing with a known semantic—like text, images, particular lexical units of the dimension and their
speech, video or 3D models. This is probably the representation. A taxonomy of the characteristics
main limitation of the use of metadata for music of music and their potential interest for users is
indexing, and it is the motivation for the number reported in Lesaffre, Leman, Tanghe, De Baets,
of content-based approaches proposed in the last De Meyer and Martens (2003).
years, compared to textual metadata approaches. The representation of the melody can build
Moreover, it has to be considered that music upon traditional score representation, which is
representation is aimed at giving directions to based on the drawing of a sequence of notes, each
performers and, at least for Western music, is one with a given pitch and a duration relative to the
biased by the characteristics of the music scores tempo of the piece. This symbolic representation
that allows a limited representation of high level is particularly suitable for indexing, providing
characteristics (Middleton, 2002). that the melodic lexical units are highlighted.
This is a more difficult task also for musicians
music dimensions and music scholars; the results of a perceptual
study on manual segmentation are presented in a
Music has a multidimensional nature. Rhythm, following section. The representation of rhythm
melody and harmony are all well-known dimen- can be considered as a variation of melodic
sions that capture distinctive features of a music representation, where pitch information can be
document. These dimensions are conveyed ex- discarded or substituted with the information of
plicitly by music scores and recognized easily by the particular percussive instrument that plays
listeners of audio recordings, and can be defined each rhythmic element. Also the indexing of
as the canonical dimensions of music, because the harmonic dimension can be based on com-
they are used extensively by music theorists and mon chord representation. In this case there are
musicologists as tools to describe, analyze, and alternative representations, from figured bass to
study music works. Another perceptually relevant functional harmony and chord names. An over-
music dimension is timbre, which is related to the view of chord representations, aimed at their an-
quality of sounds and is conveyed only by audio notation, is presented in Harte, Sandler, Abdallah
recordings. Yet timbre is a multidimensional and Gómez (2005). The segmentation of chords
feature by itself, and can be described using a in their lexical units can be based on notions of
set of continuous parameters, such as spectral harmony, including modulations, cadences, and
power energy or Mel-Cepstrum Coefficients, and the use of particular chord progressions in dif-
by perceptually based features such as spectral ferent music genres.
centroid, roughness, and attack time. As stated The analysis of different dimensions and
by Krumhansl (1989), timbre remains a difficult their representation as building blocks of music
dimension to understand and represent, though documents may be of interest also for musicolo-
studies have been carried out on the perception gists, composers and performers. To this end, it
of timbre similarity (Berenzweig, Logan, Ellis is interesting to cite Humdrum (Huron, 1995),
& Whitman, 2004). The discussion on content- which apart from retrieval allows a number of
based music indexing will be limited to canonical manipulations for analyzing music scores.
Content-Based Indexing of Symbolic Music Documents
Content-Based Indexing of Symbolic Music Documents
F i g u re 1 . P o s s i b l e o u t c o m e s o f t h e l e x i c a l a n a l y s i s o f a s i m p l e m e l o d y ;
the bars in grey are alternative segmentations in melodic lexical units
Even after a sequence of features has been presented, and in terms of indexing and retrieval
extracted automatically from a music document, effectiveness. As an example, Figure 1 reports
lexical analysis of music documents remains a possible segmentations of the same musical
difficult task. The reason is that music language excerpt.
lacks of explicit separators between candidate
index terms for all of its dimensions. Melodic stop-Words removal
phrases are not contoured by particular signs or
sounds that express the presence of a boundary be- Many words that are part of a textual document
tween two phrases. The same applies to harmonic have only a grammatical function, and do not
progressions, or rhythmic patterns. In all these express any semantics. In most languages articles,
cases there is no additional symbol that expresses conjunctions, prepositions and so on, can be
the ending of a lexical unit and the beginning of deleted without substantially affecting the com-
the next one. This is not surprising, because the prehension of the text. Moreover, if after indexing
same concept of lexical unit is borrowed from the a document is described by a simple list of terms
textual domain, and it is not part of the traditional in a given order (i.e., alphabetical), the fact that
representation of music documents. Even if there the documents contained a particular conjunc-
is a wide consensus in considering music as a tion does not give any additional information on
structured organization of different elements, and the document content. These words can thus be
not just a pure sequence of sounds, there was no removed, or stopped from which the term “stop-
historical need to represent directly this aspect. words”, from the output of the lexical analysis
Music is printed for musicians, who basically need without affecting the overall performance of an
the information to create a correct performance, indexing system. Given that stop-words of this
and who could infer the presence of basic elements kind are very frequent in textual documents, their
from the context. Different approaches have been removal improves the system performances in
proposed for lexical analysis, considering musical terms of storage needed for the indexes and thus
patterns (Hsu, Liu & Chen, 1998), main themes on the computational cost of the retrieval. For any
(Meek & Birmingham, 2003), or musical phrases particular language a list of stop-words, named
(Melucci & Orio, 1999). stop-list, can be derived from a priori knowledge
The consistency between musicians in per- of the grammatical rules.
forming the lexical analysis of some monophonic Stop-words removal can be applied also to
written scores has been investigated in a percep- words that, though carrying a semantic that could
tual study, which is presented in the next section. be used to describe the document content, are
The lexical analysis of music documents is still extremely frequent inside a collection of docu-
an open problem, both in terms of musicological ments. For example, the lexical analysis of a set of
analysis because alternative theories have been documents on music processing will very likely
Content-Based Indexing of Symbolic Music Documents
show that all documents contain words such as The choice of the particular stop-list to use, if
“music”, “computer”, “note”, “algorithm”, and so any, could be driven by both musicological and
on. Moreover, these words are probably evenly computational motivations and by the character-
distributed across the collection, and their con- istics of the music collection itself. A statistical
tribution to specify the content of a particular analysis of the distribution of lexical units across
document in respect to the others is very low. Also documents may highlight which are the potential
in this case, a collection dependent stop-list can stop-words that can be used. It has to be noted
be created, and words belonging to the stop-list that this approach is not usually exploited in the
can be ignored in subsequent phases of document literature of music indexing and retrieval. The term
indexing. The stop-list can be computed automati- “stop-list” is quite infrequent in music retrieval,
cally by analyzing a representative sample of the and the common approach is to select carefully
collection, adding to the stop-list all the words that the parameters to avoid the computation of lexical
consistently appear in all (or in a high percentage) units that are believed to be uninformative about
of the analyzed documents. Clearly, this kind the document content. What it is important for
of analysis would highlight also the words that this discussion is to highlight the fact that not all
have a grammatical function and no semantic as the lexical units are equally informative about the
described above, thus a two-step removal of stop- document content and its differences with other
words can be avoided. Nevertheless, the designer documents in the collection (which is aim of term
of an IR system can choose to remove only the weighting described below) and that some lexical
frequent and uninformative words, keeping the units may be totally uninformative as a sort of
ones that are only frequent. background noise.
It is difficult to state whether or not a musical Many words, though different in the way they are
lexical unit has a meaning in order to create a spelled, can be considered as different variants that
priori a stop-list of musical lexical units that can stem from a common morphological root. This is
be ignored during indexing. It is preferable to face the case of the English words “music”, “musical”
the problem considering how much a particular (adjective and substantive), “musicology”, “mu-
unit is a good discriminator between different sician”; the number of variants may increase if
music documents. For instance, in the case of singular and plural forms are taken into account,
indexing of melodic intervals, a lexical unit of together with the gender information (which does
two notes that form a major second is likely to not apply to English but applies to most European
be present in almost all of the documents, and languages) and other possible variants which are
thus not being a good index in the case of a col- peculiar of some languages. Moreover, in many
lection of “cantate” of tonal Western music, and languages verbs are conjugated, that is the root
probably for any collection of music documents. of the verb is varied depending on mode, person
A single major chord is unlikely to be a good dis- and time. Thus a textual document may contain
criminator as well. Depending on the particular different word variants, which are identified as
set of features used to index a music collection, different from lexical analysis but share a similar
the designer of the indexing and retrieval engine meaning. Intuitively, it can be considered that a
can make a number of choices about the possible textual document could be relevant for a given
stop-list of lexical units. information need even if it does not contain the
Content-Based Indexing of Symbolic Music Documents
exact term chosen by the user for his query, but it It can be noted that the analogous of stem-
contains other variants. For example, a document ming is regularly carried out in many approaches
that contains many times the words “computing”, to music retrieval, and it is normally addressed
“compute”, and “computation” is likely to address as feature quantization. The main motivation
the subject of “computers” even if this exact word of feature quantization in music processing is
is missing. On the other hand, many words, even probably related to the fact that each feature
though stemming from the same root, evolved to extraction process is error prone: quantization
express different meanings. partially overcomes this problem if erroneous
The basic idea of stemming is to conflate into measurements are reported to the same quantized
a single index all the words that have slightly value of the correct one. For example, because
different meaning but stem from a common pitch detectors are known to produce octave er-
morphological root. The positive effects are a rors, a solution that has been often proposed in
generalization of the concepts that are carried by the literature is to represent only the name of the
the stems and not by the single words, and a lower notes, with eventual alterations, and not their ac-
number of index terms. The higher generaliza- tual octave (Birmingham et al., 2001). Automatic
tion is expected to improve recall, at a probable chord detection from polyphonic audio signals is
cost of lowering precision. There are different still very error prone, thus quantization to a fixed
approaches to automatic stemming, depending number of chords—for example triads only—may
on the morphology of the languages, and on the help removing part of the measurement noise. Yet
used techniques. The research on stemming is still quantization can be useful when the automatic
very active, in particular for languages different detection is reliable, but it is known in advance
from English and that have a rich morphological that the signal itself may have variations, like in
structure, with derivations expressed by prefixes, the case of onset notes and note durations even
infixes and suffixes. for performances of the same score.
Quantization can be useful also as a stemming
Application to the Music Domain procedure. It is well known that many composi-
tions are based on a limited number of music
The idea behind stemming is that two indexes materials, which is presented and then varied
may be different but can be perceived/consid- and developed during the piece. In this case, the
ered similar. Analogously, two musical lexical conflation of different thematic variations into a
units may be slightly different, yet listeners can single index will improve the recall because the
perceive them as almost identical, or confuse one user may choose any of these variations to express
from the other when recalling from memory, or the same information need. Quantization can be
consider that they play a similar role in the musi- carried out on any music dimension, and at dif-
cal structure. For instance, two identical rhythmic ferent levels. Table 1 shows possible approaches
patterns played with a different tempo and small to the quantization of melodic intervals, some of
variations in the actual onset time, two musical them already proposed in the literature, from the
phrases that differ only for one interval that from more fine-grained to the more-coarse. Figure 2
major turns minor, chords progressions where gives a graphical representation on the amount
one chord is substituted by another with a similar of information that is lost through quantization,
function as it is routinely done in jazz music, are in particular when melodic or rhythmic infor-
all situations where stemming may become useful. mation is quantized in a single level and thus
In practice, all the perceptually similar variants discarded.
could be conflated into a common stem.
Content-Based Indexing of Symbolic Music Documents
Table 1. Number of different indexes when quan- generic information needs the user had to provide
tization is applied to ascending and descending more complete information, in particular using
intervals within an octave (including unison) long melodic excerpts to create his query.
A similar approach to quantization can be car-
Quantization Level #symbols
ried out on rhythmic information. In this case it
Cents 2401
has to be noted that the same score representation
Semitones: 0, +1, +2, … 25
is a quantized version of possible performances,
Music intervals: unison, second, third, … 15
because it is not expected that onset times and du-
Perceptual intervals: unison, small, medium,
9 ration are played using the exact values computed
large
from the beats per minute (when that happens
Direction: up, down, same 3
the performance sounds mechanic). In the case
of scores that are transcriptions of pre-existing
As for stemming, quantization may improve
performances, the transcriber chooses which is
recall because more documents may contain a
the level of quantization as a compromise between
quantized lexical unit. The increase in recall usu-
the readability of the score and the precision
ally correspond to a lowering in precision, because
of the reported times. Rhythmic quantization
a quantized lexical unit is usually more generic
for index conflation can be carried out with an
and it describes less precisely a user information
approach similar to melodic quantization, from
need. From a computational point of view, quan-
milliseconds to very coarse levels (such as the
tization may also speed up retrieval, because the
levels long and short). A number of approaches
decrease of the number of different symbols used
to melodic indexing do not take into account note
as building blocks of the index correspond to a
durations, but are based only on pitch informa-
decrease in the computational cost to perform the
tion, and they can be considered as a limit case
matching. This characteristic has been exploited
where there is only one level of quantization for
in system for melodic retrieval (Ghias et al., 1995)
note onsets and durations.
where only three levels have been used. Authors
reported that, in order to overcome the problem of
Figure 2. Graphical representation of the information loss when increasing levels of quantization of
pitch and duration are applied (from top: original score, removal of rhythm, pitch quantization, removal
of pitch)
0
Content-Based Indexing of Symbolic Music Documents
Content-Based Indexing of Symbolic Music Documents
Content-Based Indexing of Symbolic Music Documents
Figure 3. Parallel processing of documents and Retrieval can be then carried out by measuring
queries aimed at retrieval of potentially relevant the similarity (or the distance) between the two
documents strings, ranking the retrieved documents accord-
ing to their decreasing similarity (or decreasing
distance) from the query. The complexity of these
techniques is linear with the size of the document
collection, because all the documents have to be
matched against the query.
Indexing techniques does not require this
exhaustive comparison, and in fact the main
motivation behind indexing is its efficiency and
scalability also for very large collections of docu-
ments. Let us consider each index term as a pointer
to the list of the documents that contain it. It is
assumed that the number of documents in each list
is small if compared to the number of documents
in the collection, apart from stop-words that are
usually not used as index terms. This assumption
is surely true for textual documents, but it applies
involve the application of noise reduction and pitch also to music documents because melodies have
tracking techniques (de Cheveigné & Baskind, different thematic material. Index terms can
2003), in the case of audio queries, followed by then be stored in efficient data structures, such
an approach to segmentation that can be carried as hash tables that can be accessed in constant
out with the same technique used for document time or binary search trees that can be accessed
segmentation or with different techniques tailored in logarithmic time.
to the peculiarities of the queries. Figure 3 rep- The efficiency implied by indexing is somehow
resents the parallel processing of documents and balanced by the retrieval effectiveness. The main
queries aimed at computing the RSV for ranking issue is that, in order to be efficient, the access
relevant documents. to the data structure requires an exact match be-
tween documents and query indexes. While this
Efficiency and Effectiveness assumption is reasonable for textual documents,
because the user is expected to spell correctly the
It can be argued that all these steps, although useful words of the query, in the music domains there
for indexing textual documents, are not neces- are many sources of mismatch that may affect
sary for a music retrieval task that can be solved retrieval effectiveness. A melodic query can ei-
directly within an approximate string matching ther contain errors, due to imprecise recall of the
framework, as mentioned in the introduction of melody, or be a different variant of a particular
this chapter. For instance, the main melodies of theme. These differences may affect the way in-
music documents can be represented by arrays of dex terms are computed from the query and the
symbols, where the number of different symbols way they are represented. For this reason, some
depends on the kind of quantization applied to peculiar aspects of music document indexing are
melodic and rhythmic information. The user’s addressed in more detail.
query is normally an excerpt of a complete melody
and thus can undergo the same representation.
Content-Based Indexing of Symbolic Music Documents
Content-Based Indexing of Symbolic Music Documents
Table 2. Complete list of the music works used for of a musical phrase, a simple and a double bar
the segmentation test; for each score is reported respectively indicating the presence of a normal
the length in bars. or of a strong boundary between lexical units.
The test package was given to the subject, who
No. Title Bars
had a complete freedom in the development of
J. S. Bach
the test. There was no maximum time for giving
1 Sinfonia Cantata no. 186, Adagio 7
back the compiled tests. Moreover, they were al-
2 Orchestral Suite no. 3, Aria 6
lowed to help themselves by playing the excerpts
3 Orchestral Suite no. 2, Bourreé 13
on their instrument, and to make corrections of
4 Cantata BWV 147, Choral 26
previous choices. Even if it was roughly calcu-
5 Preludium n. 9, BWV 854 8
lated that the development of the test would take
L. Van Beethoven
about 20 minutes for each excerpt, subjects had
6 Symphony n. 5, 4th movement 22 the test at their homes for more than one month.
7 Symphony n. 7, 1st movement 21 This was because almost all of them claimed that
8 Sonata n. 14, 3rd movement 12 the task of segmenting the excerpt was tiring and
9 Sonata n. 7, Minuetto 17 time-consuming.
10 Sonata n. 8, Rondò 18
F. Chopin analysis of the results
11 Ballade no. 1, op. 23 11
12 Impromptu op. 66, 2nd movement 16 The first, quite surprising, result was that more
13 Nouvelle Etude no. 3 21 than half of the subjects followed the given instruc-
14 Waltz no. 7 16 tions only partially and provided indirectly useful
15 Waltz no. 9 17 feedback. As reported in the previous section,
W. A. Mozart subjects were asked to put a marker, by drawing
16 Concerto no. 1, K313 10 a single or double bar, between two subsequent
17 “Don Giovanni”, Aria 18 notes of the score to highlight the presence of a
18 “Le Nozze di Figaro”, Aria 10 boundary in the melodic surface. Hence, instruc-
19 Sonata no. 11, K331 18 tions had the implicit assumption that melodic
20 Sonata no. 9, K310 22
lexical units do not overlap.
Some subjects, that is, 8 out of 17 subjects,
disregarded this assumption and invented a new
concerning: played musical instrument, years of sign (different among subjects, but with the same
music practice, expertise on music analysis and meaning) that clearly indicated that some notes
knowledge of the proposed melodic excerpts. were both the last of a lexical unit and the first
The major direction given to the subjects was of the next one. This result implies that, at least
an operative definition of lexical units, which were for these subjects, the concept of melodic contour
expressed as “the musical phrases, or musical cannot be applied, unless we take into account
gestures, in which a melody can be divided during the fact that contours may overlap of at least one
its performance, playing a similar role of words note. Another result is that subjects very seldom
in the spoken language.” The example annexed highlighted the presence of a strong boundary
to the test showed some possible musical phrases, by drawing a double marker. The number of
while stating that subjects may disagree with the double markers represents the 4.5% of the overall
particular choices. Instructions suggested to use number of markers (including also the ones used
two different graphic signs to be drawn at the end for overlapping phrases), thus preventing for a
Content-Based Indexing of Symbolic Music Documents
quantitative analysis of strong separators between lowing for the representation of each score as a
musical phrases. This result can be partially ex- histogram (or a curve) where notes numbers are
plained considering that, in most cases, musical on X-axis and weights sums are on the Y-axis.
excerpts were too short to allow the presence of Peaks in the histograms correspond to positions
strong separators. It is likely that the musicolo- where many of the subjects put a marker, while
gist who suggested which music works should be low values that is the noise between peaks, cor-
used for the test, decided to truncate the excerpt respond to positions where subjects disagreed.
in coincidence with the first strong separator. Figure 4 reports two representative histograms
A visual representation of the position and of subjects’ choices. Excerpt No. 3 (on the left)
number of markers along the score helped in vi- shows that subjects substantially agreed in putting
sualizing the subjects behavior. Arrays of weights markers in a single positions, which corresponded
have been assigned to each subject, one for each to the only three-quarters note in a continuous
score; the elements of the arrays correspond to flow of one-octave notes, while their concordance
the spaces between two subsequent notes. The is very low even though all of them perceived the
following weighting rules were applied for array presence of boundaries (weights are nonzero). An
as[i] of subject s, where the index i indicates the opposite behavior can be observed for excerpt
position in the score, m(i) is the kind of eventual No. 15, where there is a high concordance among
marker, Mn a normal and Mo an overlapping subjects around particular notes, corresponding
marker, at position i in the score: to long notes with a duration about four times the
surrounding notes. An important characteristics,
1 if if m (i ) = M n highlighted by Figure 4, is that often peaks are
a s [i ] = 0.5 if
if m(i ) = M o or m(i + 1) = M o contoured by positions with high concordance:
0 elsewhere most of the subjects perceived the presence of a
boundary in the melodic contour, but they often
For each score, the sum of all the weights disagreed by judging a given note either as the
assigned by each subject has been computed, al- last of a phrase, or as the first of the next one (or
both in case of overlapping markers).
Figure 4. Frequency by which subjects highlighted a boundary between lexical units for Excerpt No. 3
(left) and Excerpt No. 15 (right)
Content-Based Indexing of Symbolic Music Documents
Quantitative analysis has been carried out of a boundary between two lexical units. In other
computing a distance measure between subjects. cases, like the example shown in Figure 4 for
To this end a symmetric matrix of distances D excerpt No. 3, different strategies can be applied
between couple of segmentations made by the by subjects in defining the presence of a boundary
subjects has been computed for each excerpts, between lexical units. The fact that cluster analysis
according to the formula: did not highlight any particular group of subjects
suggests that subjects changed their strategies
P[ s, t ]
D[ s, t ] = 1 −
2 2
with
with P[ s, t ] = a sT ⋅ at according to the excerpt to be segmented, but no
P[ s, s ] + P[t , t ] trend can be highlighted.
These results show that melodic segmentation
Hence D[s,t] = 0 means that judgments of is a complex task, and that the concept of lexical
subjects i and t are perfectly equal and D[s,t] = 1 unit is not well defined as it is for text where, at
means that judgments of subjects s and t do not least for most Western languages, the organization
have any marker in common. Cluster analysis and of sentences in words, and the existence of clear
multidimensional scaling have been carried out separators between them, allows for an easy com-
using the proposed distance function, highlighting putation of indexing terms. It has to be considered
that the group of subjects was uniform, without that the perceptual study has been carried out only
any cluster of subjects. using melodic information, and results could be
A feature of interest for application in the in- different for other dimensions. For instance, the
formation retrieval domain is the typical length of segmentation of the harmony may take advantage,
lexical units. The average length varied consider- at least for musicians and musicologists, by the
ably depending on the subject and on the excerpt. theory on chord progressions and cadences, while
Yet, no one of the subjects indicated a lexical unit the segmentation of rhythm may be carried out
of unitary length. Furthermore, only two subjects considering that rhythmic patterns tend to repeat
indicated lexical units of two notes length, while almost exactly, allowing for an easier identifica-
for four subjects the minimum length of a lexi- tion and subsequent segmentation.
cal unit was three notes. The rest of the subjects
indicated a minimum length between four and
five notes. On the other hand, subjects did not an experImental comparIson
show the same agreement regarding the maximum of melodIc segmentatIon
length of musical phrases. Apart from subject No. technIques
11, who indicated a musical phrase of 38 notes in
excerpt No. 4 (clearly indicating the reasons of Given that music is a continuous flow of events
this choice, which then cannot be considered an without explicit separators, automatic indexing
error), the maximum length of musical phrases needs to rely on automatic segmentation tech-
is within the range of 8 and 18 notes. niques, that is techniques that detect automatically
the lexical units of music documents. Different
results of the perceptual study strategies of melodic segmentation can be applied,
each one focusing on particular aspects of music
The results of the perceptual study showed that information. A study has been carried out on the
subjects agree on perceiving a boundary between effectiveness, in terms of retrieval performances,
lexical units only when there are strong cues. In of different approaches to segmentation. The
particular, the presence of long notes surrounded study has been limited to melodic segmenta-
by short ones seems to give the strongest evidence tion, because as already stressed melody is the
Content-Based Indexing of Symbolic Music Documents
most used dimension in music retrieval. Another straightforward, and can be carried out in linear
interesting comparison of approaches to music time. The idea underlying this approach is that
retrieval has been presented in Hu and Dannen- the effect of musically irrelevant N-grams will be
berg (2002), where the focus was on alternative compensated by the presence of all the musically
representations for a dynamic programming relevant ones. It is common practice to choose
approach, both from the retrieval effectiveness small values for N, typically from 3 to 7 notes,
and from the computational cost points of view. because short units give higher recall, which is
In the presented study the computational costs of considered more significant than the subsequent
the tested approaches were comparable, and thus lowering in terms of precision. Fixed-length seg-
result are not reported. mentation can be extended to polyphonic scores,
The organization of a number of evaluation with the aim to extract all relevant monophonic
campaigns by the research community working tokens from concurrent multiple voices (Do-
on the different aspects of music access, retrieval, raisamy & Rüger, 2004).
and feature extraction (IMIRSEL, 2006), which
started in 2005 (preceded in 2004 by an evalu- Data-Driven Segmentation (DD)
ation effort on audio analysis), will increasingly
allow for the comparison of different approaches Segmentation can be performed considering
to music indexing, using standard collections that typical passages of a given melody tend to
(Downie, Futrelle & Tcheng, 2004). be repeated many times (Pienimäki, 2002). The
repetitions can simply be due to the presence of
approaches to melodic different choruses in the score or can be related
segmentation to the use of the same melodic material along
the composition. Each sequence that is repeated
The approaches to music segmentation can be at least K times—normally twice—is usually
roughly divided in two main groups: the ones that defined a pattern, and is used for the description
highlight the lexical units using only the document of a music document. This approach is called data-
content, and the ones that exploit prior informa- driven because patterns are computed only from
tion about the music theory and perception. Four the document data without exploiting knowledge
different approaches, two for each group, have on music perception or structure. This approach
been tested. can be considered as an extension of the N-grams
approach, because DD units can be of any length,
Fixed-Length Segmentation (FL) with the limitation that they have to be repeated
inside the melody—subpatterns that are included
The simplest segmentation approach consists of in longer patterns are discarded, if they have the
the extraction from a melody of subsequences same multiplicity. Patterns can be computed from
of exactly N notes, called N-grams (Downie & different features, like pitch or rhythm, each fea-
Nelson, 2000). N-grams may overlap, because no ture giving a different set of DD units to describe
assumption is made on the possible starting point document content. Patterns can be truncated by
of a theme, neither on the possible repetitions of applying a given threshold, to reduce the size of
relevant music passages. The strength of this ap- the index and to achieve a higher robustness to
proach is its simplicity, because it is based neither local errors in the query (Neve & Orio, 2004). The
on assumption on theories on music composition extension to polyphonic scores can be carried out
or perception, nor on analysis of complete melo- similarly to the FL approach.
dies. The exhaustive computation of FL units is
Content-Based Indexing of Symbolic Music Documents
Figure 5. Graphical representation of different automatic segmentation (from the top: PB, a statistical
approach not tested, and MO)
Content-Based Indexing of Symbolic Music Documents
The effect of alternative approaches to seg- Table 3. Main characteristics of the index terms
mentation is shown in Figure 5, where the lexi- obtained from the different segmentations tech-
cal units highlighted by different algorithms are niques
graphically shown. The algorithms are the ones FL DD PB MO
included in the MidiToolbox (Eerola & Toiviainen, Average length 3.0 4.8 4.0 3.6
2004) and correspond, from the top, to PB, to a Average units/document 52.1 61.9 43.2 45.0
probabilistic approach not tested in the present Number of units 70093 123654 70713 67893
study, and to MO.
The comparison has been carried out according of the relevance judgments that can be built au-
to the Cranfield model for information retrieval. tomatically. Alternatively, relevance judgments
A music test collection of popular music has been can be created using a pool of excerpt that may
created with 2310 MIDI files as music documents. find that more than a document is relevant to a
MIDI is a well- known standard for the representa- particular query (Typke, den Hoed, de Nooijer,
tion of music documents that can be synthesized Wiering & Veltkamp, 2005). The initial queries did
to create audible performances (Rothstein, 1991). not contain errors and had a length that allowed
MIDI is becoming obsolete both as a format for the for a clear recognition of the main theme. The
representation of music to be listened to because robustness of errors has been tested by modify-
of the widespread diffusion of compressed audio ing notes pitch and duration, while the effect of
formats such as MP3, and as a format for represent- query length has been tested by shortening the
ing notated music because of the creation of new original queries.
formats for analyzing, structuring and printing Table 3 shows the main characteristics of lexi-
music (Selfridge-Field, 1997). The availability cal units, and thus of the index terms, extracted
of large collections of music files in MIDI is the with the segmentation approaches, giving a pre-
main reason why this format is still widely used liminary idea on how each segmentation approach
for music retrieval experiments. describes the document collection. The values
From the collection of MIDI files, the chan- reported in the table have been computed with
nels containing the melody have been extracted the following experimental setup: FL has been
automatically and the note durations have been computed with N-grams of three notes; DD has
normalized; the highest pitch has been chosen been computed applying a threshold of five notes;
as part of the melody for polyphonic channels PB and MO have been computed using the algo-
(Uitdenbogerd & Zobel, 1998). After preprocess- rithms presented in Eerola and Toiviainen (2004).
ing, the collection contained complete melodies For these four approaches, units were sequences
with an average length of 315.6 notes. A set of of couples of values, pitch and duration, and the
40 queries, with average length of 9.7 notes, has index is built with one entry for each different
been created as recognizable examples of both sequence.
choruses and refrains of 20 randomly selected The approaches gave comparable results in
songs. Only the theme from which the query was terms of average length of lexical units, which is
taken was considered as relevant, considering a about three to four notes, and also in the average
query-by-examples paradigm where the example number of different units per document. This
is an excerpt of a particular work that needs to be behavior is different from the results given by
retrieved. This assumption simplifies the creation the perceptual study on manual segmentation,
0
Content-Based Indexing of Symbolic Music Documents
Figure 6. Retrieval effectiveness of the different approaches depending on the number of errors added
to the query (left) and on the shortening of query length (right)
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 FL 0.5 FL
DD DD
0.4 PB 0.4 PB
MO MO
0.3 FUS 0.3 FUS
100% 90% 80% 70% 60% 50% correct 1 error 2 errors 3 errors
Content-Based Indexing of Symbolic Music Documents
not retrieve at all the relevant document in 10% length of all the index terms), and hence local
of the queries. This is a negative aspect of PB, perturbations due to errors in the query do not
due to the fact that its units do not overlap and, affect a high number of indexes.
for short queries, it may happen that none of the The fact that a simple approach to melodic seg-
note sequences match with the segmented units. mentation such as FL outperforms all other ones
A similar consideration applies also to MO, but that are based on content specific characteristics
this effect seems to be bounded by the fact that is somehow counterintuitive. For this reason, a
MO units are shorter. number of experiments have been carried out in
The performances of the different approaches order to highlight the best configuration of the
depending on the presence of errors in the query parameters for each approach. The results reported
are shown on the left of Figure 6, which reports in Table 4 and Figure 6 are the best ones achieved
the average precision of the approaches. Apart by each approach. It has to be noted that the
from FL, the other segmentation techniques had overall performances are biased by the particular
a clear drop in the performances, also when a implementation of the different segmentation
single error was introduced. In particular, PB algorithms, and this is particularly true for PB
and MO showed a similar negative trend, almost and MO. The aim of the study was not to state
linear with the number of errors. It is interesting to which is the best approach, but to compare the
note that DD, even if its performances are almost experimental results of different implementations
comparable to FL in the case of a correct query, using a common testbed.
had a faster degradation in performances. The In addition to the results of the segmentation
average precision depending on query length is algorithms, Figure 6 reports also the average preci-
shown on the right of Figure 6. Similar consider- sion of a fifth approach, named FUS, which with
ations can be made on the trends of the different this particular setting outperforms all the others
approaches. PB and MO had a similar behavior, in terms of robustness to errors and short queries.
and also in this case FL was the one with the The approach is discussed in the next section.
best performances. It can be noted that, when the
queries are moderately shortened, the average
precision of FL and DD is almost constant. The parallel Indexes
drop in performances appears earlier, and more
remarkably, for DD than for FL. Up to this point, the discussion has been carried
From the analyses, it appears that simple ap- out assuming that only one index is built on a
proaches to segmentation, which have redundant document collection, eventually using a combina-
information through overlapping units, give better tion of features. In general, this is a reasonable
performances than approaches based on music approach, because the creation of an index file
perception or music theory. Moreover, fixed- is computationally costly, and may require a
length segmentation was more robust to errors in remarkable amount of memory storage. On the
the queries and to short queries than data-driven other hand, different indexes may capture different
segmentation. From these results, it seems that characteristics of a document collection, which
for music indexing an approach that does not fil- usefulness may depend on the user information
ter out any information, improves recall without need, on the way the query is created, and on the
degrading precision. The good performances approach to evaluation of retrieved documents
of FL can be also due to the fact that it has the carried out by the user. The presence of a number
shortest average length of index terms (actually, of alternative indexing schemes can be exploited
being N-grams, the average correspond to the by running a number of parallel retrieval sessions
Content-Based Indexing of Symbolic Music Documents
on the different indexing schemes, obtaining retrieval rather than indexing of music documents,
a number of ranked list of potentially relevant it is worth mentioning an experiment on data fu-
documents, and combining the results in a single sion of alternative indexing schemes.
ranked list using some strategies. The approach is
named data fusion or collection fusion, where the fusion of different melodic
latter term more precisely addresses the problem descriptors
of combining together the results from indexing
schemes built on different—and potentially non- Even when a single dimension is used to extract
overlapping—collections of documents. content descriptors, there are a number of choices
Collection fusion techniques are quite popular that have to be made on the way lexical units are
in Web metasearch engines, which are services for computed that affect the effectiveness of an index-
the automatic parallel querying of a number the ing scheme. Let us consider the common situa-
normal Web search engines where overall results tion in which the melodic information is used as
are presented in a single ranked list (Lee, 1997). content descriptor, using an example of a complete
The advantages of a metasearch engines are an evaluation of music indexing schemes.
higher coverage of the Web pages, which is the The first choice in music indexing is how lexical
union of the coverage of single search engines, units are computed, as described in the previous
and improvements of the retrieval effectiveness section. In the running example, the DD ap-
in terms of recall—because more documents are proach is used—Data Driven, where lexical units
retrieved—and in terms of precision—because are computed using a pattern analysis approach
multiple evidences of the relevance of some presented in the preceding section—because it
documents are available. The crucial point in gives high performances in terms of retrieval
the development of a collection (or data) fusion while allowing for different lengths of the index
technique is on the way different ranked lists are terms. The second step consists of choosing
fused together. A number of constraints have to whether using absolute or relative features. The
be considered for typical collection fusion ap- third step regards the levels of quantization that
plications, namely the indexing schemes of the has to be applied to each feature, that may range
different search engines are not known; there is a from one single level—meaning that the feature
different coverage of the overall set of documents; is not used in practice—to as many levels as
the individual RSVs, or the similarity score, may the possible values—meaning that no quantiza-
not be known by the metasearch engine; if known, tion is applied. Table 5 represents the different
the RSVs may be expressed in different scales and combinations of time and pitch information of
have different statistical distributions. For this melodic lexical units; the three cells marked with
reason, some techniques have been proposed using an acronym in bold are the ones that have been
the only information that is surely available: the used in the experiment on data fusion, the two
rank of each retrieved document for each search cells marked with “---” highlight combinations
engine (Fox & Shaw, 1994). that do not make sense.
Most of these constraints do not hold when the As shown in Table 5, three indexing schemes
parallel indexes are built within the same retrieval have been used: PIT that uses only relative pitch
system, because there is complete control on each information, with N=9 levels of quantization of
weighting scheme, on the range and distribution melodic intervals; IOI that uses only absolute
of each RSVs which can be obtained using the duration information, with N=11 levels the quan-
same retrieval engine that is run on the different tization of exact durations; BTH that uses both
indexes. Even if this aspect is more related to the relative pitch and absolute duration. Having used
Content-Based Indexing of Symbolic Music Documents
Table 5. Possible combinations of duration and pitch information, according to the absolute or relative
representation and on the levels of quantization
Duration
Abs. Rel.
1 N ∞ 1 N ∞
1 --- IOI
A
b N
P s.
∞
i
t
1 ---
c R
h e N PIT BTH
l.
∞
the DD approach, lexical units may be different format downloaded from the Web. As for any test
from an index to the other, because IOI patterns collection, documents may contain errors. In a
may not correspond to PIT or BTH patterns, and preprocessing step, the channels containing the
vice versa. melody have been extracted automatically and
It can be noted that any combination reported in the note durations have been normalized; in case
the table, eventually varying quantization, can be of polyphonic scores, the highest pitch has been
used to index music documents from melodic in- chosen as part of the melody. After preprocessing,
formation. An extensive evaluation of the retrieval the collection contained 107 complete melodies
effectiveness of any combination of choice—and with an average length of 244 notes, ranging from
their merging with data fusion techniques—has 89 of the shortest melody to 564 of the longest.
not been carried out yet. The individual indexing Indexes were built on complete melodies, because
schemes can be fused in any combination. In the repetitions are important for the DD approach to
presented evaluation, two data fusion approaches melodic segmentation. A set of 40 queries has
have been tested: Fuse2 that merges the results been created by randomly selecting 20 themes in
from PIT and IOI, and Fuse3 that merges all three the dataset and using the first notes of the chorus
indexing schemes. and of the refrain. The initial note and the length
of each query were chosen to have recognizable
experimental evaluation motifs that could be considered representative of
real users’ queries. The queries had an average
The effect of data fusion has been tested on a length of 9.75 notes, ranging from 4 to 21 notes.
small test collection of popular music, which has Only the theme from which the query was taken
been created using 107 Beatles’ songs in MIDI was considered as relevant.
Content-Based Indexing of Symbolic Music Documents
Results are shown in Table 6, where the aver- selective than simple IOI and PIT and which gave
age precision (Av.Prec.), the percentage queries a very high value score to the relevant document
that gave the relevant document within the first in case of a good match.
k positions (k with values in [1,3,5,10]), and the In these experiments, and with this particular
ones that did not retrieve the relevant document at setup, the best results for Fuse2 and Fuse3 have
all (“not found”), are reported. As it can be seen, been obtained assigning equal weights to the single
IOI gave the poorest results, even if for 90% of RSVs, thus computing the final similarity as the
the queries the relevant document was among average of individual similarities. This setup was
the first three retrieved. The highest average the one that gave the best results. Yet data fusion
precision using a single feature was obtained by can be used also to allow the user a refinement of
BTH, with the drawback of an on-off behavior: the query, by manually assigning which are the
either the relevant document is the first retrieved dimensions and the features that are more relevant
or it is not retrieved at all (2.5% of the queries). for the user’s information need. For instance, if
PIT gave good results, with all the queries that the peculiarity of a song is on the rhythm of the
found the relevant document among the first three melody rather than on the pitch contour, the user
documents. may choose a particular data fusion strategy that
The first interesting result is that Fuse2 gave underlines this characteristic. Data fusion allows
an improvement in respect to the separate fea- also to increases in robustness to errors in the
tures—IOI and PIT—with an average precision query and to short queries, as shown in Figure
of 0.96, hence with values comparable to BTH 6 for the experiments on the comparison of dif-
and without the drawback of not retrieving the ferent segmentation techniques. In this case, the
relevant document for 2.5% of the queries. It is values reported for FUS are obtained by fusing
worth noting that even if the retrieval effectiveness the individual results of the four techniques.
of IOI is very low compared to PIT, nevertheless The drawback of data fusion techniques is that
the combination of the two in a fused ranked list they require to create parallel indexing schemes,
gave an improvement of the recognition rate (the to carry out parallel retrievals, and finally to fuse
relevant document retrieved at top rank) of 3%. the results together. Nevertheless, results are en-
It could be expected that adding BTH in the data couraging, and are worth to be tested extensively.
fusion would not give further improvements, since A possible complete scheme for indexing, retrieval
BTH is already a combination of the first two. The and data fusion is plotted in Figure 7.
set of BTH patterns is a subset of the union of set
of IOI and PIT patterns, while it can be shown
that set BTH includes the intersection of sets IOI
and PIT, because of the choice of not consider- Table 6. Retrieval effectiveness using single index-
ing subpatterns that have the same multiplicity ing schemes and data fusion approaches
of longer ones. Given these considerations, it is IOI PIT BTH Fuse2 Fuse3
clear that BTH does not introduce new patterns Av.Prec. 0.74 0.93 0.98 0.96 0.98
in respect to IOI and PIT. Yet, as can be seen =1 57.5% 87.5% 97.5% 92.5% 95.0%
from column labeled with Fuse3 in Table 6 the ≤3 90.0% 100% 97.5% 100% 100%
use of all the three features allowed for reducing ≤5 95.0% 100% 97.5% 100% 100%
the drawbacks of the three single rankings. This ≤ 10 97.5% 100% 97.5% 100% 100%
result can be explained considering that BTH had not found 0 0 2.5 0 0
different tfidf weights, which were somehow more
Content-Based Indexing of Symbolic Music Documents
Figure 7. Main components of a complete music retrieval engine where multiple indexing schemes are
combined with a data fusion technique
Content-Based Indexing of Symbolic Music Documents
Content-Based Indexing of Symbolic Music Documents
Cambouropoulos, E. (1997). Musical rhythm: A Ferrari, E. & Haus, G. (1999). The musical archive
formal model for determining local boundaries. information system at Teatro alla Scala. In Pro-
In E. Leman (Ed.), Music, gestalt and computing ceedings of the IEEE International Conference
(pp. 277-293). Berlin: Springer-Verlag. on Multimedia Computing and Systems (Vol. 2,
pp. 817-821).
Cano, P., Batlle, E., Kalker, T. & Haitsma, J.
(2005). A review of audio fingerprinting. Journal Fox, E. A. & Shaw, J .A. (1994). Combination of
of VLSI Signal Processing, 41, 271-284. multiple searches. In The Second Text REtrieval
Conference, TREC-2 (pp. 243-249).
Cantate (2006). Computer access to notation and
text in music libraries. Retrieved May 17, 2007, Ghias, A., Logan, J., Chamberlin, D. & Smith, B.
from http://projects.fnb.nl/cantate/ C. (1995). Query by humming: Musical informa-
tion retrieval in an audio database. In Proceedings
de Cheveigné, A. & Baskind, A. (2003). F0
of the ACM Conference on Digital Libraries (pp.
extimation. In Proceedings of Eurospeech (pp.
231-236).
833-836).
Gómez, E. & Herrera, P. (2004). Estimating the
Doraisamy, S. & Rüger, S. (2004). A polyphonic
tonality of polyphonic audio files: Cognitive
music retrieval system using N-grams. In Proceed-
versus machine learning modelling strategies. In
ings of the International Conference on Music
Proceedings of the International Conference on
Information Retrieval (pp. 204-209).
Music Information Retrieval (pp. 92-95).
Downie, S. & Nelson, M. (2000). Evaluation of a
Harmonica (2006). Accompanying action on
simple and effective music information retrieval
music information in libraries. Retrieved May 17,
method. In Proceedings of the ACM International
2007, from http://projects.fnb.nl/harmonica/
Conference on Research and Development in
Information Retrieval (pp. 73-80). Harte, C., Sandler, M., Abdallah, S. & Gómez,
E. (2005). Symbolic representation of musical
Downie, J. S. (2003). Music information retrieval.
chords: A proposed syntax for text annotations.
Annual Review of Information Science and Tech-
In Proceedings of the International Conference
nology, 37, 295-340.
on Music Information Retrieval (pp. 66-71).
Downie, J. S., Futrelle, J. & Tcheng, D. (2004). The
Harvell, J. & Clark, C. (1995). Analysis of the
international music information retrieval systems
quantitative data of system performance. Deliv-
evaluation laboratory: Governance, access and secu-
erable 7c, LIB-JUKEBOX/4-1049: Music across
rity. In Proceedings of the International Conference
borders. Retrieved May 17, 2007, from http://www.
on Music Information Retrieval (pp. 9-14).
statsbiblioteket.dk/Jukebox/edit-report-1.html
Dunn, J. & Mayer, C. (1999). VARIATIONS: A
Hoashi, K., Matsumoto, K. & Inoue, N. (2003).
Digital Music Library System at Indiana Uni-
Personalization of user profiles for content-based
versity. In Proceedings of ACM Conference on
music retrieval based on relevance feedback. In
Digital Libraries (pp. 12-19).
Proceedings of the ACM International Conference
Eerola, T. & Toiviainen, P. (2004). MIR in Mat- on Multimedia (pp. 110-119).
lab: The Midi Toolbox. In Proceedings of the
International Conference on Music Information
Retrieval (pp. 22-27).
Content-Based Indexing of Symbolic Music Documents
Hsu, J.-L., Liu, C. C. & Chen, A. L. P. (1998). Effi- Lerdhal, F. & Jackendoff, R. (1983). A generative
cient repeating pattern finding in music databases. theory of tonal music. Cambridge: The MIT Press.
In Proceeding of the International Conference
Lesaffre, M., Leman, M., Tanghe, K., De Baets,
on Information and Knowledge Management
B., De Meyer, H. & Martens, J.-P. (2003). User-
(pp. 281-288).
dependent taxonomy of musical features as a
Hu, N. & Dannenberg, R. B. (2002). A comparison conceptual framework for musical audio-mining
of melodic database retrieval techniques using technology. In Proceedings of the Stockholm
sung queries. In Proceedings of the ACM/IEEE Music Acoustics Conference (pp. 635-638).
Joint Conference on Digital Libraries (pp. 301-
McLane, A. (1996). Music as information. In M.
307).
E. Williams (Ed.), Arist (Vol. 31, pp. 225-262).
Humdrum. The Humdram toolkit: Software for American Society for Information Science.
music research. Retrieved May 17, 2007, from
Meek, C. & Birmingham, W. (2003). Automatic
http://www.music-cog.ohio-state.edu/Humdrum/
thematic extractor. Journal of Intelligent Informa-
Huron D. (1995). The Humdrum toolkit: Reference tion Systems, 21(1), 9-33.
manual., Menlo Park, CA: Center for Computer
Melucci, M. & Orio, N. (1999). Musical informa-
Assisted Research in the Humanities.
tion retrieval using melodic surface. In Proceed-
IMIRSEL (2006). The international music in- ings of the ACM Conference on Digital Libraries
formation retrieval system evaluation laboratory (pp. 152-160).
project. Retrieved May 17, 2007, from http://www.
Melucci, M. & Orio, N. (2004). Combining
music-ir.org/evaluation/
melody processing and information retrieval
Krumhansl, C. L. (1989). Why is musical timbre techniques: Methodology, evaluation, and system
so hard to understand? In S. Nielsen and O. Olsson implementation. Journal of the American Society
(Eds.), Structure and perception electroacoustic for Information Science and Technology, 55(12),
sound and music (pp. 45-53). Amsterdam, NL: 1058-1066.
Elsevier.
Middleton, R. (2002). Studying popular music.
Lavrenko, V. & Pickens, J. (2003). Polyphonic Philadelphia: Open University Press.
music modeling with random fields. In Proceed-
Moen, W. E. (1998). Accessing distributed cul-
ings of the ACM International Conference on
tural heritage information. Communications of
Multimedia (pp. 120-129).
the ACM, 41(4), 45-48.
Lee, J. H. (1997). Analysis of multiple evidence
Musica. The international database of choral
combination. In Proceedings of the ACM Interna-
repertoire. Retrieved May 17, 2007, from http:
tional Conference on Research and Development
//www.musicanet.org/
in Information Retrieval (pp. 267-275).
Narmour, E. (1990). The analysis and cognition of
Lee, J. H. & Downie, J. S. (2004). Survey of music
basic melodic structures. Chicago, MI: University
information needs, uses, and seeking behaviours:
of Chicago Press.
Preliminary findings. In Proceedings of the In-
ternational Conference on Music Information
Retrieval (pp. 441-446).
Content-Based Indexing of Symbolic Music Documents
Neve, G. & Orio, N. (2004). Indexing and retrieval Stenzel, R. & Kamps, T. (2005). Improving
of music documents through pattern analysis and content-based similarity measures by training
data fusion techniques. In Proceedings of the a collaborative model. In Proceedings of the
International Conference on Music Information International Conference on Music Information
Retrieval (pp. 216-223). Retrieval (pp. 264-271).
Pienimäki, A. (2002). Indexing music database Tenney, J. & Polansky, L. (1980). Temporal gestalt
using automatic extraction of frequent phrases. perception in music. Journal of Music Theory,
In Proceedings of the International Conference 24(2), 205-241.
on Music Information Retrieval (pp. 25-30).
TREC. Text REtrieval conference home page. Re-
Rothstein, J. (1991). MIDI: A comprehensive trieved May 17, 2007, from http://trec.nist.gov/
introduction. Madison, WI: A-R Editions.
Typke, R., den Hoed, M., de Nooijer, J., Wiering,
Selfridge-Field, E. (1997). Beyond MIDI: The F. & Veltkamp, R.C. (2005). A ground truth for
handbook of musical codes. Cambridge: The half a million musical incipits. Journal of Digital
MIT Press. Information Management, 3(1), 34-39.
Shifrin, J., Pardo, B., Meek, C. & Birmingham, W. Uitdenbogerd, A. & Zobel, J. (1998). Manipula-
(2002). HMM-based musical query retrieval. In tion of music for melody matching. In Proceed-
Proceedings of the ACM/IEEE Joint Conference ings of the ACM Conference on Multimedia (pp.
on Digital Libraries (pp. 295–300). 235-240).
Sparck Jones, K. & Willett, P. (1997). Readings van Rijsbergen, C. J., (1979). Information retrieval
in information retrieval., San Francisco: Morgan (2nd ed.). London: Butterworths.
Kaufmann.
0