Content-Based Indexing of Symbolic Music Documents

Chapter I
Content-Based Indexing of
Symbolic Music Documents
Nicola Orio
University of Padova, Italy
aBstract
Indexing is the core component of most information retrieval systems, because it allows for a compact
representation of the content of a collection of documents, aimed at efficient and scalable access and
retrieval. Indexing techniques can be extended also to music, providing that significant descriptors
are computed from music documents. These descriptors can be defined as the “lexical units” of music,
depend on the dimensions that are taken into account – melody, harmony, rhythm, timbre – and are
related to the way listeners perceive music. This chapter describes some relevant aspects of indexing of
symbolic music documents, giving a review of its basic concepts and going in more detail about some
key aspects, such as the consistency at which candidate index terms are perceived by listeners, the ef-
fectiveness of alternative approaches to compute indexes, and how individual indexing schemes can be
combined together by applying data fusion approaches.
IntoductIon based multimedia access, the development of new

techniques for indexing, searching and retrieving
The core problem of Information Retrieval (IR) multimedia documents have been the focus of
is to effectively retrieve documents which convey many researchers in IR. The research projects in
content being relevant to the user’s information digital libraries, and specifically those carried out
needs. Effective and efficient techniques have been in cultural heritage domain, have shown that the
developed to index, search and retrieve documents integrated management of diverse media—text,
from collections of hundreds of thousands, or audio, image, video—is a necessary step (Moen,
millions of textual items. The most consolidated 1998). As stressed in Sparck Jones and Willett
results have been obtained for collections of docu- (1997), the problem with content-based access
ments and user’s queries in textual form and in to multimedia data is twofold. On the one hand,
English language. In order to provide a content- each media requires specific techniques that can-
Copyright © 2008, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Content-Based Indexing of Symbolic Music Documents
not be directly employed for other media. On the dissemination of cultural heritage can be the
other hand, these specific techniques should be result of the combination of digital library and
integrated whenever different media are present in information indexing techniques (Ferrari & Haus,
an individual item. The core information retrieval 1999). Yet most of the projects involving digital
(IR) techniques based on statistics and probability libraries, such as Harmonica (2006), are still based
theory may be more generally employed outside on bibliographic values rather than on indexing
the textual case and within specific nontextual document contents, meaning that research on
application domains, like music. This is because content-based approaches is required.
the underlying models, such as the vector-space
and the probabilistic models, are likely to describe Indexing
fundamental characteristics being shared by dif-
ferent media, languages and application domains One of the main components of an IR system is
(Sparck Jones & Willett, 1997). indexing (Baeza-Yates & Ribeiro-Neto, 1999). In-
The requirement for a music content-based IR dexing can be defined as “the process of analyzing
has been stressed, since many years, within the the informational content of records of knowledge
research area of music information systems as and expressing the informational content in the
well. The developments in the representation of language of the indexing system. It involves: (1)
music “suggest a need for an information retrieval selecting indexable concepts in a document; and
philosophy directed toward non-text searching and (2) expressing these concepts in the language of
eventual expansion to a system that encompasses the indexing system (as indexing items)” (Borko
the full range of information found in multime- & Bernier, 1978). An indexing system is composed
dia documents”, as stressed by McLane (1996). of a number of automatic procedures that allows
As IR has dealt with the representation and the for the organization of document contents, and for
disclosure of content from its early days (van their access, retrieval and dissemination.
Rijsbergen, 1979), it is natural to consider that Indexes are used as guidance towards items in
IR techniques should be investigated to evaluate a collection of documents. In particular, the fact
their application to music retrieval. By conclud- that indexes can be ordered, stored in complex
ing his survey, McLane stressed that “what has data structures, and accessed with fast techniques
been left out of this discussion, and will no doubt such as hashing functions or tree searches, allows
be a topic for future study, is the potential for for efficient retrieval of documents in a collection.
applying some of the standard principles of text The effectiveness of indexes is part of everyday
information retrieval to music representations”. life, when looking up a dictionary or looking for
Since 1996, many approaches have been applied the content of a book through its list of relevant
to music access, browsing, retrieval, personaliza- names and concepts (which is precisely called
tion, both proposing original techniques tailored “index”). Because it allows fast access to a syn-
to the music domain and adapting IR techniques thetic description of documents content, index-
(Downie, 2003). Many approaches to music re- ing allows for the scalability of an IR system.
trieval are related to the field of digital libraries Efficient data structures, such as inverted files
(Bainbridge, Nevill-Manning, Witten, Smith & (Baeza-Yates & Ribeiro-Neto, 1999) have been
Mc-Nab, 1999; Agosti, Bombi, Melucci & Mian, proposed to connect the indexes—which are used
2000). Because of their multimedia and multi- for retrieval—to the documents—which are of
disciplinary nature, digital libraries may profit interested for the user.
from results in music indexing and retrieval. Many approaches to music retrieval are based
In particular, projects on the preservation and on online searches, where the user’s query is

compared with the documents in the collection the consistency at which candidate index terms
using approximate string matching. For example, are perceived by listeners, the effectiveness of
approximate string matching has been proposed alternative approaches to compute indexes, and
in one of the earliest paper on music retrieval how individual indexing schemes can be combined
(Ghias, Logan, Chamberlin & Smith, 1995) while together by applying data fusion approaches.
Dynamic Time Warping has been proposed in Hu
and Dannenberg (2002). Statistical approaches
have been proposed as well, in particular Markov metadata vs. content-Based
chains (Birmingham, Dannenberg, Wakefield, IndexIng
Bartsch, Bykowski & Mazzoni, 2001) and hid-
den Markov models (Shifrin, Pardo, Meek & The first problem that arises when choosing an
Birmingham, 2002). The advantage of these ap- indexing scheme for a music collection regards
proaches is that the difference between the query the most effective representation of documents
and the documents can be modeled, considering content, in particular whether documents have to
explicitly all the possible mismatches. Thus very be described by external metadata or directly by
high performances in terms of retrieval effective- a synthetic representation of their content. Both
ness can be achieved. On the other hand, all these approaches have positive and negative aspects.
techniques require that the string representing the Metadata usually requires extensive manual
query is matched against all the documents in work for retrieving external information on the
the collection, giving a complexity that is linear documents and for representing in a compact
with the number of documents in the collection. way most of the subtleties of document content,
Scalability to large collections of millions of but it increases the cost of indexing and does not
documents becomes then an issue. guarantee consistency when different documents
For this reason alternative approaches have are indexed by different persons. Automatic com-
been proposed that take advantage from indexing putation of metadata based on external resources
(Doraisamy & Rüger, 2004; Downie & Nelson, has been proposed in systems for collaborative
2000; Melucci & Orio, 2004; Pienimäki, 2002). filtering aimed, for example, at recommendation
Moreover, other IR techniques can be applied to systems, but the results are in terms of similarity
music retrieval. For instance, Hoashi, Matsumoto between documents and are biased by the pres-
and Inoue (2003) applied relevance feedback ence of scattered data (Stenzel & Kamps, 2005).
to a melodic retrieval task, with the main goal At the state of the art they do not seem suitable
of personalization of the results. The metaphor for a retrieval task. Content-based indexing is car-
of navigation inside a collection of documents, ried out starting from a set of features extracted
which corresponds to document browsing, has automatically from the document itself, and it is
also been proposed (Blackburn & DeRoure, 1998). the main focus of this chapter.
On the other hand, indexing is also widely used
to retrieve or recognize music in audio format, metadata
in particular for audio fingerprint and audio wa-
termarking techniques (Cano, Batlle, Kalker & For most media, such as images and video, the
Haitsma, 2005). choice of textual metadata proved to be par-
This chapter describes some aspects of content- ticularly effective. Textual metadata as a tool to
based indexing, as opposed to metadata indexing, describe and indexing music is a natural choice
giving a review of its basic concepts and going that has been made for centuries (Dunn & Mayer,
in more detail about some key aspects, such as 1999). In general metadata, especially in the form

of semantic labels, are a very compact represen- some music features also in other music genres:
tation of the document content, because they terms like “bossa”, “waltz”, and “blues” appear
can summarize a complete document with few often in jazz compositions with that particular
keywords. Though not the main subject of this feature, and terms like “jig” and “reel” are often
discussion, it is hence worth spending few words part of the title of the respective dances in Irish
on music indexing through metadata and on its music tradition.
limitations for an efficient and effective retrieval General information is often too generic to
task. A number of music digital libraries are ac- be a good discriminator between different music
cessible through the use of metadata. For instance, works. For example in tonal music there are only
Cantate (2006) and Musica (2006) allow users to 21 major and 21 minor different tonalities, while
access to choral music using metadata and lyrics. thousands of compositions of tonal Western music
Another project based on the use of metadata is can be labelled with the term “cantata” or “con-
Jukebox (Harvell & Clark, 1995). certo”, and the same applies with terms such as
As for many other media, music metadata “up tempo” or “slow” for pop and rock genres. The
addresses different characteristics of document genre information itself groups together hundreds
content. In the particular case of music, it can be of thousands of different works. Another problem
roughly divided in three categories: that arises with metadata on general information
is that the terminology is not consistent across
• Bibliographic values: Suthor’s name, genres and historical periods. For example, the
performer’s name (in the case of audio re- term “sonata” has different meanings for Baroque
cordings), title, year of publication, editor, and Romantic repertoires and the term “ballad”
cataloguing number. refers to different characteristics in jazz and in
• General information on document con- folk music. This kind of metadata can be useful
tent: Time and key signatures, musical form, to refine the description of a music information
structure, music genre, orchestration. need, but it can hardly be used to completely
• Additional available information: Lyrics define it. Moreover, a preliminary study on users
and, if applicable, related documents that information needs (Lee & Downie, 2004) showed
create a context for the music work (e.g., a that users are interested in retrieving songs by
drama, a movie, a poem). their specific content.
Additional information in the form of lyr-
The search through bibliographic values can ics, when present, can be particularly useful to
be very effective in terms of retrieval effective- describe an information need, yet in this case
ness, even if in this case a database approach to the retrieval of music documents becomes an ap-
match exact values in predefined fields would be plication of textual IR. Contextual information,
more suitable than an IR approach. On the other such as the movie where a particular soundtrack
hand, the user is required to have a good knowl- has been used, or the poem that inspired a par-
edge of the music domain, being able to clearly ticular composition, can be very helpful as well
describe the documents of interest. In the case to describe a user information need. In many
of tonal Western music, the title of a music work cases the information need is motivated by the
is often nondescriptive, describing part of the contextual information itself—that is, a user may
general information on document content, and be searching for the theme song of a TV series
it is typical to have titles such as “Sonata in D or for the music of a known ballet—yet this kind
flat, Op. 5” or “Fugue”. The title can be based on of contextual information applies only to a small
percentage of music documents.

What is normally missing in music metadata dimensions, in particular the ones that may have
is a textual description of the document content a symbolic representation, which is more suitable
other than its musical structure, which is a peculiar for the creation of an index of a music collection.
situation of the music language that is due to the For any chosen dimension, the indexing scheme
fact that music is not aimed at describing some- has to be based on a suitable definition of the
thing with a known semantic—like text, images, particular lexical units of the dimension and their
speech, video or 3D models. This is probably the representation. A taxonomy of the characteristics
main limitation of the use of metadata for music of music and their potential interest for users is
indexing, and it is the motivation for the number reported in Lesaffre, Leman, Tanghe, De Baets,
of content-based approaches proposed in the last De Meyer and Martens (2003).
years, compared to textual metadata approaches. The representation of the melody can build
Moreover, it has to be considered that music upon traditional score representation, which is
representation is aimed at giving directions to based on the drawing of a sequence of notes, each
performers and, at least for Western music, is one with a given pitch and a duration relative to the
biased by the characteristics of the music scores tempo of the piece. This symbolic representation
that allows a limited representation of high level is particularly suitable for indexing, providing
characteristics (Middleton, 2002). that the melodic lexical units are highlighted.
This is a more difficult task also for musicians
music dimensions and music scholars; the results of a perceptual
study on manual segmentation are presented in a
Music has a multidimensional nature. Rhythm, following section. The representation of rhythm
melody and harmony are all well-known dimen- can be considered as a variation of melodic
sions that capture distinctive features of a music representation, where pitch information can be
document. These dimensions are conveyed ex- discarded or substituted with the information of
plicitly by music scores and recognized easily by the particular percussive instrument that plays
listeners of audio recordings, and can be defined each rhythmic element. Also the indexing of
as the canonical dimensions of music, because the harmonic dimension can be based on com-
they are used extensively by music theorists and mon chord representation. In this case there are
musicologists as tools to describe, analyze, and alternative representations, from figured bass to
study music works. Another perceptually relevant functional harmony and chord names. An over-
music dimension is timbre, which is related to the view of chord representations, aimed at their an-
quality of sounds and is conveyed only by audio notation, is presented in Harte, Sandler, Abdallah
recordings. Yet timbre is a multidimensional and Gómez (2005). The segmentation of chords
feature by itself, and can be described using a in their lexical units can be based on notions of
set of continuous parameters, such as spectral harmony, including modulations, cadences, and
power energy or Mel-Cepstrum Coefficients, and the use of particular chord progressions in dif-
by perceptually based features such as spectral ferent music genres.
centroid, roughness, and attack time. As stated The analysis of different dimensions and
by Krumhansl (1989), timbre remains a difficult their representation as building blocks of music
dimension to understand and represent, though documents may be of interest also for musicolo-
studies have been carried out on the perception gists, composers and performers. To this end, it
of timbre similarity (Berenzweig, Logan, Ellis is interesting to cite Humdrum (Huron, 1995),
& Whitman, 2004). The discussion on content- which apart from retrieval allows a number of
based music indexing will be limited to canonical manipulations for analyzing music scores.

BasIc concepts of IndexIng words. Attention has to be paid in some particular

cases, for example an acronym where letters are
Early models and experiments on textual infor- separated by dots has to be considered as a single
mation retrieval date back to the 1970’S. Textual term and not a sequence of one-letter terms. The
information retrieval, which for years has been creation of a lexical analyzer for these languages
simply addressed information retrieval (IR) tout- can be done through regular expressions, and
court, has a long history, where many different normally poses only implementation issues. For
approaches have been experimented and tested. other languages, such as Chinese and Japanese,
Being indexing one of the core elements of an IR the written text does not have necessarily to be
system, many approaches have been proposed to divided in terms by special characters, and the
optimize indexing from the computational cost, compounding of ideograms in different words has
memory storage, and retrieval effectiveness points to be inferred from the context. Automatic lexical
of view and, most of all, these approaches have analysis for these languages is nontrivial, and it
been extensively tested and validated experimen- has been a research area since years.
tally using standard test collections, in particular
in the framework of the Text REtrieval Conference Application to the Music Domain
(TREC). For this reason, the main ideas underlying
textual indexing will be reviewed, together with As discussed in the previous section, the first issue
possible applications to music indexing. on music indexing is the choice of the dimensions
Textual indexing is based on four main sub- to be used as content descriptors. This choice influ-
sequent steps, which are respectively: ences also the approaches to the lexical analysis.
For instance, if rhythm is used to index music
• Lexical analysis documents, attack time of the different notes has
• Stop-words removal to be automatically detected and filtered, which is
• Stemming an easy task for symbolic documents and can be
• Term weighting carried out with good results also for documents
in audio format. On the other hand, if harmony
It has to be noted that existing IR systems is used to compute indexes, lexical analysis has
may not follow all these steps; in particular the to rely on complex techniques for the automatic
effectiveness of stemming has been often debated, extraction of chords from a polyphonic music
at least for languages with a simple morphology document, which is still an error prone task
such as English. especially in the case of audio documents, even
though encouraging results have been obtained
lexical analysis (Gómez & Herrera, 2004). The automatic extrac-
tion of high level features from symbolic and audio
The first step of indexing consists in the analysis music formats is a very interesting research area,
of the content of a document in order to find its studied by a very active research community, but
candidate index terms. In the case of textual it is beyond the aims of this discussion. For sim-
documents, index terms are the words that form plicity, it is assumed that a sequence of features
the document, thus lexical analysis corresponds to is already available, describing some high-level
document parsing for highlighting its individual characteristics of a music documents, related to
words. Lexical analysis is straightforward with one or more of its dimensions. It is also assumed
European languages, where blanks, commas, that the feature extraction is affected by errors that
dots, are clear separators between two subsequent should be taken into account during the design
of the indexing scheme.

F i g u re 1 . P o s s i b l e o u t c o m e s o f t h e l e x i c a l a n a l y s i s o f a s i m p l e m e l o d y ;
the bars in grey are alternative segmentations in melodic lexical units
Even after a sequence of features has been presented, and in terms of indexing and retrieval
extracted automatically from a music document, effectiveness. As an example, Figure 1 reports
lexical analysis of music documents remains a possible segmentations of the same musical
difficult task. The reason is that music language excerpt.
lacks of explicit separators between candidate
index terms for all of its dimensions. Melodic stop-Words removal
phrases are not contoured by particular signs or
sounds that express the presence of a boundary be- Many words that are part of a textual document
tween two phrases. The same applies to harmonic have only a grammatical function, and do not
progressions, or rhythmic patterns. In all these express any semantics. In most languages articles,
cases there is no additional symbol that expresses conjunctions, prepositions and so on, can be
the ending of a lexical unit and the beginning of deleted without substantially affecting the com-
the next one. This is not surprising, because the prehension of the text. Moreover, if after indexing
same concept of lexical unit is borrowed from the a document is described by a simple list of terms
textual domain, and it is not part of the traditional in a given order (i.e., alphabetical), the fact that
representation of music documents. Even if there the documents contained a particular conjunc-
is a wide consensus in considering music as a tion does not give any additional information on
structured organization of different elements, and the document content. These words can thus be
not just a pure sequence of sounds, there was no removed, or stopped from which the term “stop-
historical need to represent directly this aspect. words”, from the output of the lexical analysis
Music is printed for musicians, who basically need without affecting the overall performance of an
the information to create a correct performance, indexing system. Given that stop-words of this
and who could infer the presence of basic elements kind are very frequent in textual documents, their
from the context. Different approaches have been removal improves the system performances in
proposed for lexical analysis, considering musical terms of storage needed for the indexes and thus
patterns (Hsu, Liu & Chen, 1998), main themes on the computational cost of the retrieval. For any
(Meek & Birmingham, 2003), or musical phrases particular language a list of stop-words, named
(Melucci & Orio, 1999). stop-list, can be derived from a priori knowledge
The consistency between musicians in per- of the grammatical rules.
forming the lexical analysis of some monophonic Stop-words removal can be applied also to
written scores has been investigated in a percep- words that, though carrying a semantic that could
tual study, which is presented in the next section. be used to describe the document content, are
The lexical analysis of music documents is still extremely frequent inside a collection of docu-
an open problem, both in terms of musicological ments. For example, the lexical analysis of a set of
analysis because alternative theories have been documents on music processing will very likely

show that all documents contain words such as The choice of the particular stop-list to use, if
“music”, “computer”, “note”, “algorithm”, and so any, could be driven by both musicological and
on. Moreover, these words are probably evenly computational motivations and by the character-
distributed across the collection, and their con- istics of the music collection itself. A statistical
tribution to specify the content of a particular analysis of the distribution of lexical units across
document in respect to the others is very low. Also documents may highlight which are the potential
in this case, a collection dependent stop-list can stop-words that can be used. It has to be noted
be created, and words belonging to the stop-list that this approach is not usually exploited in the
can be ignored in subsequent phases of document literature of music indexing and retrieval. The term
indexing. The stop-list can be computed automati- “stop-list” is quite infrequent in music retrieval,
cally by analyzing a representative sample of the and the common approach is to select carefully
collection, adding to the stop-list all the words that the parameters to avoid the computation of lexical
consistently appear in all (or in a high percentage) units that are believed to be uninformative about
of the analyzed documents. Clearly, this kind the document content. What it is important for
of analysis would highlight also the words that this discussion is to highlight the fact that not all
have a grammatical function and no semantic as the lexical units are equally informative about the
described above, thus a two-step removal of stop- document content and its differences with other
words can be avoided. Nevertheless, the designer documents in the collection (which is aim of term
of an IR system can choose to remove only the weighting described below) and that some lexical
frequent and uninformative words, keeping the units may be totally uninformative as a sort of
ones that are only frequent. background noise.
Application to the Music Domain stemming
It is difficult to state whether or not a musical Many words, though different in the way they are
lexical unit has a meaning in order to create a spelled, can be considered as different variants that
priori a stop-list of musical lexical units that can stem from a common morphological root. This is
be ignored during indexing. It is preferable to face the case of the English words “music”, “musical”
the problem considering how much a particular (adjective and substantive), “musicology”, “mu-
unit is a good discriminator between different sician”; the number of variants may increase if
music documents. For instance, in the case of singular and plural forms are taken into account,
indexing of melodic intervals, a lexical unit of together with the gender information (which does
two notes that form a major second is likely to not apply to English but applies to most European
be present in almost all of the documents, and languages) and other possible variants which are
thus not being a good index in the case of a col- peculiar of some languages. Moreover, in many
lection of “cantate” of tonal Western music, and languages verbs are conjugated, that is the root
probably for any collection of music documents. of the verb is varied depending on mode, person
A single major chord is unlikely to be a good dis- and time. Thus a textual document may contain
criminator as well. Depending on the particular different word variants, which are identified as
set of features used to index a music collection, different from lexical analysis but share a similar
the designer of the indexing and retrieval engine meaning. Intuitively, it can be considered that a
can make a number of choices about the possible textual document could be relevant for a given
stop-list of lexical units. information need even if it does not contain the

exact term chosen by the user for his query, but it It can be noted that the analogous of stem-
contains other variants. For example, a document ming is regularly carried out in many approaches
that contains many times the words “computing”, to music retrieval, and it is normally addressed
“compute”, and “computation” is likely to address as feature quantization. The main motivation
the subject of “computers” even if this exact word of feature quantization in music processing is
is missing. On the other hand, many words, even probably related to the fact that each feature
though stemming from the same root, evolved to extraction process is error prone: quantization
express different meanings. partially overcomes this problem if erroneous
The basic idea of stemming is to conflate into measurements are reported to the same quantized
a single index all the words that have slightly value of the correct one. For example, because
different meaning but stem from a common pitch detectors are known to produce octave er-
morphological root. The positive effects are a rors, a solution that has been often proposed in
generalization of the concepts that are carried by the literature is to represent only the name of the
the stems and not by the single words, and a lower notes, with eventual alterations, and not their ac-
number of index terms. The higher generaliza- tual octave (Birmingham et al., 2001). Automatic
tion is expected to improve recall, at a probable chord detection from polyphonic audio signals is
cost of lowering precision. There are different still very error prone, thus quantization to a fixed
approaches to automatic stemming, depending number of chords—for example triads only—may
on the morphology of the languages, and on the help removing part of the measurement noise. Yet
used techniques. The research on stemming is still quantization can be useful when the automatic
very active, in particular for languages different detection is reliable, but it is known in advance
from English and that have a rich morphological that the signal itself may have variations, like in
structure, with derivations expressed by prefixes, the case of onset notes and note durations even
infixes and suffixes. for performances of the same score.
Quantization can be useful also as a stemming
Application to the Music Domain procedure. It is well known that many composi-
tions are based on a limited number of music
The idea behind stemming is that two indexes materials, which is presented and then varied
may be different but can be perceived/consid- and developed during the piece. In this case, the
ered similar. Analogously, two musical lexical conflation of different thematic variations into a
units may be slightly different, yet listeners can single index will improve the recall because the
perceive them as almost identical, or confuse one user may choose any of these variations to express
from the other when recalling from memory, or the same information need. Quantization can be
consider that they play a similar role in the musi- carried out on any music dimension, and at dif-
cal structure. For instance, two identical rhythmic ferent levels. Table 1 shows possible approaches
patterns played with a different tempo and small to the quantization of melodic intervals, some of
variations in the actual onset time, two musical them already proposed in the literature, from the
phrases that differ only for one interval that from more fine-grained to the more-coarse. Figure 2
major turns minor, chords progressions where gives a graphical representation on the amount
one chord is substituted by another with a similar of information that is lost through quantization,
function as it is routinely done in jazz music, are in particular when melodic or rhythmic infor-
all situations where stemming may become useful. mation is quantized in a single level and thus
In practice, all the perceptually similar variants discarded.
could be conflated into a common stem.

Table 1. Number of different indexes when quan- generic information needs the user had to provide
tization is applied to ascending and descending more complete information, in particular using
intervals within an octave (including unison) long melodic excerpts to create his query.
A similar approach to quantization can be car-
Quantization Level #symbols
ried out on rhythmic information. In this case it
Cents 2401
has to be noted that the same score representation
Semitones: 0, +1, +2, … 25
is a quantized version of possible performances,
Music intervals: unison, second, third, … 15
because it is not expected that onset times and du-
Perceptual intervals: unison, small, medium,
9 ration are played using the exact values computed
large
from the beats per minute (when that happens
Direction: up, down, same 3
the performance sounds mechanic). In the case
of scores that are transcriptions of pre-existing
As for stemming, quantization may improve
performances, the transcriber chooses which is
recall because more documents may contain a
the level of quantization as a compromise between
quantized lexical unit. The increase in recall usu-
the readability of the score and the precision
ally correspond to a lowering in precision, because
of the reported times. Rhythmic quantization
a quantized lexical unit is usually more generic
for index conflation can be carried out with an
and it describes less precisely a user information
approach similar to melodic quantization, from
need. From a computational point of view, quan-
milliseconds to very coarse levels (such as the
tization may also speed up retrieval, because the
levels long and short). A number of approaches
decrease of the number of different symbols used
to melodic indexing do not take into account note
as building blocks of the index correspond to a
durations, but are based only on pitch informa-
decrease in the computational cost to perform the
tion, and they can be considered as a limit case
matching. This characteristic has been exploited
where there is only one level of quantization for
in system for melodic retrieval (Ghias et al., 1995)
note onsets and durations.
where only three levels have been used. Authors
reported that, in order to overcome the problem of
Figure 2. Graphical representation of the information loss when increasing levels of quantization of
pitch and duration are applied (from top: original score, removal of rhythm, pitch quantization, removal
of pitch)
0
term Weighting A special case of term weighting can be

found in binary weighting, which is normally
The last phase of an indexing process is related to used when Boolean searches are carried out. In
a main consideration: inde terms do not describe binary weighting an index term has a weight of
the content of a document to the same extent. false if it does not appear in the document, true
It has already been mentioned that stop-words, otherwise. Retrieval is carried out as the solution
which are frequent inside the collection, are not of a Boolean expression, where the values true
good descriptors because they do not allow the or false correspond to the value of the proposi-
differentiation between documents. On the other tion “the term t belongs to document d” and are
hand, it can be argued that a particular set of terms combined with Boolean operators—that is, “and”,
that are peculiar of a particular documents, in “or”, “not”—in order to create complex queries.
which they are extensively used, are very good Binary weighting is still very popular because
descriptors of that document because they allow it is easy to implement and allows for a great
its exact identification. Clearly, the importance expressivity in describing the user information
of a term in describing a document varies along need through a query.
a continuum that ranges from totally irrelevant
to totally relevant. Application to the Music Domain
For textual documents it has been proposed
that the frequency at which a word appears in a If a musical lexical unit, for any chosen dimen-
document is directly proportional to its relevance, sion, appears frequently inside a given document,
while the frequency at which it appears in the col- it is very likely that listeners will remember it.
lection is inversely proportional to its relevance. Moreover, a frequent lexical unit can be part of the
These considerations gave birth to a very popular music material that is proposed and developed by
weighting scheme, called term frequency—in- the composer, or can be also part of the composer’s
verse document frequency, in short tfidf. There personal style. Finally, frequent lexical units have
are a number of different variants of this scheme, good chances to be part of a user query. Thus, the
which share the same principles: term frequency seems to be a reasonable choice
also for music documents. On the other hand, a
• Term frequency is computed, for each term lexical unit that is very common inside a collection
t and each document d, from a monotonic of documents can be related to style of a thematic
increasing function of the number of counts collection—the chord progression of blues songs,
of t appearing in d. the accent on the up beat in reggae music—or
• Inverse document frequency is computed, for can correspond to a simple musical gesture—a
the set of documents dt that belong to collec- repeated note, a major scale—or can be the most
tion C and contain at least one occurrence of used solution for particular passages—the de-
term t, from a monotonic decreasing function scending bass connecting two chords, a seventh
of the size dt normalized by the size of C. chord introducing a modulation. Moreover, a
user may not use frequent lexical units as parts
A widely used implementation of the two of a query because it is clear that they will not
monotonic functions for the computation of tfidf address any particular document. Thus, inverse
is reported in the following formula: document frequency seems to be a reasonable
choice as well.
freq(t ∈ d ) C
tf tf⋅ ·idf
idft , dt,d ==wwt ,t,d
d = = × log Yet, some care has to be paid to a direct ap-
max l freq(l ∈ d ) dt plication of a tfidf weighting scheme to music in-

dexing because of the evident difference between retrieval techniques

textual and musical communication. One thing
that is worth mentioning is that users access the Once indexes have been built through the four
two medias very differently. In particular, music steps described earlier, and both the collection of
documents are accessed many times by users, documents and the user query have been indexed,
who may choose to not listen to the complete it is possible to perform retrieval. It is important
song, but only to a part of the song. Moreover, it to note that also the query has to be analyzed and
is common practice of radio stations to broadcast indexed in order to retrieve relevant documents,
only the parts of the songs with the sung melody, because the similarity between the query and the
skipping the intro and the coda, and fading out documents is carried out using indexes only.
during long guitar solos. The computation of Different approaches can be applied to retriev-
the relative importance by which a lexical unit al; the one that is more intuitive, and that has been
describes a document should deal also with these extensively applied in the experiments reported in
aspects. Moreover, listeners are likely to remember the following sections, is the Vector-Space Model
and use in their queries the part of the song where (VSM). Accordingly to the VSM, both documents
the title is sung, which becomes more relevant and queries are represented as K-variate vectors of
disregarding its frequency inside the documents descriptor weights wt,d, provided that K is the total
and inside the collection. Yet, there have been very number of unique descriptors or indexes. Then,
few studies that investigate the best weighting document di is represented as di = (wi1,…,wiK),
scheme for music indexing, and in many cases while query q is represented as q = (q1,…,qK). The
a direct implementation of the tfidf (such as the weight wt,d of index term t within document d are
one presented in this section) is used. computed according to the tfidf scheme already
It is important to note that the possibility to give described. Query descriptor weights are usually
different weights to lexical units is an important binary values, then qt = 1 if term t occurs within
difference between information retrieval and ap- query q, 0 otherwise.
proaches based on recognition—such as approxi- The retrieval status value (RSV) is the cosine
mate string matching techniques. The former allows of the angle between the query vector and the
users to rank the documents depending on the document vector. That is:
relevance of their lexical units as content descrip-
tors, while the latter allows for document ranking  
  d ⋅q
depending on the degree at which an excerpt of RSV ( d , q) = cos(d , q ) =  
each document matches the query. In other words, d ⋅q
a good match with an almost irrelevant excerpt
may give a higher rank than a more approximate where d and q are the document and the query
match with a highly relevant excerpt. It could be respectively, with their vectorial representations,
advisable to extend weighting approaches also to and |x| is the norm of vector x. As the cosine
methods other than indexing. To this end, a mixed function normalizes the RSV to the query and
approach of indexing with approximate matching document lengths, long documents have the same
has been proposed in Basaldella and Orio (2006), chance of being retrieved than short ones.
where each index term was represented by a sta- In order to be comparable, both documents
tistical model and the final weight of each index and queries need to be transformed. This pro-
term of the query was computed combining the cess usually corresponds to the segmentation of
tfidf scheme with the probability by which it was music documents in their lexical units, and to a
generated by the model. more complex query processing. The latter can

Figure 3. Parallel processing of documents and Retrieval can be then carried out by measuring
queries aimed at retrieval of potentially relevant the similarity (or the distance) between the two
documents strings, ranking the retrieved documents accord-
ing to their decreasing similarity (or decreasing
distance) from the query. The complexity of these
techniques is linear with the size of the document
collection, because all the documents have to be
matched against the query.
Indexing techniques does not require this
exhaustive comparison, and in fact the main
motivation behind indexing is its efficiency and
scalability also for very large collections of docu-
ments. Let us consider each index term as a pointer
to the list of the documents that contain it. It is
assumed that the number of documents in each list
is small if compared to the number of documents
in the collection, apart from stop-words that are
usually not used as index terms. This assumption
is surely true for textual documents, but it applies
involve the application of noise reduction and pitch also to music documents because melodies have
tracking techniques (de Cheveigné & Baskind, different thematic material. Index terms can
2003), in the case of audio queries, followed by then be stored in efficient data structures, such
an approach to segmentation that can be carried as hash tables that can be accessed in constant
out with the same technique used for document time or binary search trees that can be accessed
segmentation or with different techniques tailored in logarithmic time.
to the peculiarities of the queries. Figure 3 rep- The efficiency implied by indexing is somehow
resents the parallel processing of documents and balanced by the retrieval effectiveness. The main
queries aimed at computing the RSV for ranking issue is that, in order to be efficient, the access
relevant documents. to the data structure requires an exact match be-
tween documents and query indexes. While this
Efficiency and Effectiveness assumption is reasonable for textual documents,
because the user is expected to spell correctly the
It can be argued that all these steps, although useful words of the query, in the music domains there
for indexing textual documents, are not neces- are many sources of mismatch that may affect
sary for a music retrieval task that can be solved retrieval effectiveness. A melodic query can ei-
directly within an approximate string matching ther contain errors, due to imprecise recall of the
framework, as mentioned in the introduction of melody, or be a different variant of a particular
this chapter. For instance, the main melodies of theme. These differences may affect the way in-
music documents can be represented by arrays of dex terms are computed from the query and the
symbols, where the number of different symbols way they are represented. For this reason, some
depends on the kind of quantization applied to peculiar aspects of music document indexing are
melodic and rhythmic information. The user’s addressed in more detail.
query is normally an excerpt of a complete melody
and thus can undergo the same representation.

an experIment on the of the styles of 4 well-known composers of tonal

perceptIon of lexIcal unIts Western music, namely Bach, Mozart, Beethoven
and Chopin. The criterion for the choice was to
It is normally assumed that the dimensions that have a good sampling of different kinds of melodic
form the music flow can be divided in their lexical structure, in which they were all present the dif-
units by listeners, depending on the characteristics ferent cues that listeners may use for segmenting
of the music structure (Lerdhal & Jackendoff, the melodies.
1983; Narmour, 1990). This means that it is as- Each melodic excerpt was transcribed on a
sumed that listeners are able to single out one or separated sheet, without reporting composer
more dimensions of interest and to segment them. and composition names. The excerpts had dif-
Segmentation can be considered the process by ferent tempos, different time and key signatures,
which listeners recognize boundaries of lexical which were all maintained in the transcriptions.
units, being able to recognize the presence of Moreover, the transcribed melodies had different
boundaries according to a number of perceptu- length, ranging from 7 to 26 bars and from 36 to
ally and culturally based strategies. Given these 192 notes. The musicologist indicated the length
assumptions, it is not clear the degree by which of each excerpt, which depended on the melodic
listeners agree in recognizing the exact positions structure and on the length of the main theme.
of these boundaries. A similar situation applies Finally, in each sheet there were four lines for
to other application domains of media segmenta- comments, whether the subjects would like to
tion, like text, image, and video segmentation. For describe the reasons of a particular choice. The
instance, experiments on manual text segmenta- complete list of the music works, from which ex-
tion showed that subjects might have different cerpts were taken, is reported in Table 2. Another
concepts of the meaning of a textual segment, short melodic excerpt was used as a graphical
and thus recognize boundaries at different loca- example of the segmentation task.
tions, or not agree at all about the presence of a A group of 17 subjects participated in the
given boundary. experiment. Subjects were asked to perform the
A perceptual study has been carried out on a segmentation task directly on the paper where
number of subjects to verify the degree of consis- the music scores were printed. They were pro-
tency of manual melodic segmentation. Melodic vided with the melodic excerpts plus one page
information has been used as the preferred dimen- of instructions on their task. Instructions also
sion for the segmentation task, even if it has to be included a short explanation about the motiva-
noted that melody carries information also about tion of the research work and its application to
rhythm and harmony that can be inferred at least music retrieval. All subjects were professional
by experienced musicians and musicologists. This or semi-professional musicians. The choice of
choice is motivated by the fact that most of the including only musicians in the group is due to a
approaches to music retrieval are based on the main consideration. A musician is able to relate
melodic dimension, with few exceptions such as himself directly with the written score, without the
exploiting harmonic information (Lavrenko & need of someone else’s performance. This avoids
Pickens, 2003). the possible bias in the recognition of melodic
contours given by an intermediate interpreta-
experimental setting tion. Moreover, musicians are more familiar with
the concepts of phrases in music. Beside of the
An expert musicologist was asked to highlight 20 segmentation task, subjects were required to give
melodic excerpts that he considered representative some information about their background in music

Table 2. Complete list of the music works used for of a musical phrase, a simple and a double bar
the segmentation test; for each score is reported respectively indicating the presence of a normal
the length in bars. or of a strong boundary between lexical units.
The test package was given to the subject, who
No. Title Bars
had a complete freedom in the development of
J. S. Bach
the test. There was no maximum time for giving
1 Sinfonia Cantata no. 186, Adagio 7
back the compiled tests. Moreover, they were al-
2 Orchestral Suite no. 3, Aria 6
lowed to help themselves by playing the excerpts
3 Orchestral Suite no. 2, Bourreé 13
on their instrument, and to make corrections of
4 Cantata BWV 147, Choral 26
previous choices. Even if it was roughly calcu-
5 Preludium n. 9, BWV 854 8
lated that the development of the test would take
L. Van Beethoven
about 20 minutes for each excerpt, subjects had
6 Symphony n. 5, 4th movement 22 the test at their homes for more than one month.
7 Symphony n. 7, 1st movement 21 This was because almost all of them claimed that
8 Sonata n. 14, 3rd movement 12 the task of segmenting the excerpt was tiring and
9 Sonata n. 7, Minuetto 17 time-consuming.
10 Sonata n. 8, Rondò 18
F. Chopin analysis of the results
11 Ballade no. 1, op. 23 11
12 Impromptu op. 66, 2nd movement 16 The first, quite surprising, result was that more
13 Nouvelle Etude no. 3 21 than half of the subjects followed the given instruc-
14 Waltz no. 7 16 tions only partially and provided indirectly useful
15 Waltz no. 9 17 feedback. As reported in the previous section,
W. A. Mozart subjects were asked to put a marker, by drawing
16 Concerto no. 1, K313 10 a single or double bar, between two subsequent
17 “Don Giovanni”, Aria 18 notes of the score to highlight the presence of a
18 “Le Nozze di Figaro”, Aria 10 boundary in the melodic surface. Hence, instruc-
19 Sonata no. 11, K331 18 tions had the implicit assumption that melodic
20 Sonata no. 9, K310 22
lexical units do not overlap.
Some subjects, that is, 8 out of 17 subjects,
disregarded this assumption and invented a new
concerning: played musical instrument, years of sign (different among subjects, but with the same
music practice, expertise on music analysis and meaning) that clearly indicated that some notes
knowledge of the proposed melodic excerpts. were both the last of a lexical unit and the first
The major direction given to the subjects was of the next one. This result implies that, at least
an operative definition of lexical units, which were for these subjects, the concept of melodic contour
expressed as “the musical phrases, or musical cannot be applied, unless we take into account
gestures, in which a melody can be divided during the fact that contours may overlap of at least one
its performance, playing a similar role of words note. Another result is that subjects very seldom
in the spoken language.” The example annexed highlighted the presence of a strong boundary
to the test showed some possible musical phrases, by drawing a double marker. The number of
while stating that subjects may disagree with the double markers represents the 4.5% of the overall
particular choices. Instructions suggested to use number of markers (including also the ones used
two different graphic signs to be drawn at the end for overlapping phrases), thus preventing for a

quantitative analysis of strong separators between lowing for the representation of each score as a
musical phrases. This result can be partially ex- histogram (or a curve) where notes numbers are
plained considering that, in most cases, musical on X-axis and weights sums are on the Y-axis.
excerpts were too short to allow the presence of Peaks in the histograms correspond to positions
strong separators. It is likely that the musicolo- where many of the subjects put a marker, while
gist who suggested which music works should be low values that is the noise between peaks, cor-
used for the test, decided to truncate the excerpt respond to positions where subjects disagreed.
in coincidence with the first strong separator. Figure 4 reports two representative histograms
A visual representation of the position and of subjects’ choices. Excerpt No. 3 (on the left)
number of markers along the score helped in vi- shows that subjects substantially agreed in putting
sualizing the subjects behavior. Arrays of weights markers in a single positions, which corresponded
have been assigned to each subject, one for each to the only three-quarters note in a continuous
score; the elements of the arrays correspond to flow of one-octave notes, while their concordance
the spaces between two subsequent notes. The is very low even though all of them perceived the
following weighting rules were applied for array presence of boundaries (weights are nonzero). An
as[i] of subject s, where the index i indicates the opposite behavior can be observed for excerpt
position in the score, m(i) is the kind of eventual No. 15, where there is a high concordance among
marker, Mn a normal and Mo an overlapping subjects around particular notes, corresponding
marker, at position i in the score: to long notes with a duration about four times the
surrounding notes. An important characteristics,
 1 if if m (i ) = M n highlighted by Figure 4, is that often peaks are

a s [i ] = 0.5 if
if m(i ) = M o or m(i + 1) = M o contoured by positions with high concordance:
 0 elsewhere most of the subjects perceived the presence of a

boundary in the melodic contour, but they often
For each score, the sum of all the weights disagreed by judging a given note either as the
assigned by each subject has been computed, al- last of a phrase, or as the first of the next one (or
both in case of overlapping markers).
Figure 4. Frequency by which subjects highlighted a boundary between lexical units for Excerpt No. 3
(left) and Excerpt No. 15 (right)

Quantitative analysis has been carried out of a boundary between two lexical units. In other
computing a distance measure between subjects. cases, like the example shown in Figure 4 for
To this end a symmetric matrix of distances D excerpt No. 3, different strategies can be applied
between couple of segmentations made by the by subjects in defining the presence of a boundary
subjects has been computed for each excerpts, between lexical units. The fact that cluster analysis
according to the formula: did not highlight any particular group of subjects
suggests that subjects changed their strategies
P[ s, t ]
D[ s, t ] = 1 −
2 2
with
with P[ s, t ] = a sT ⋅ at according to the excerpt to be segmented, but no
P[ s, s ] + P[t , t ] trend can be highlighted.
These results show that melodic segmentation
Hence D[s,t] = 0 means that judgments of is a complex task, and that the concept of lexical
subjects i and t are perfectly equal and D[s,t] = 1 unit is not well defined as it is for text where, at
means that judgments of subjects s and t do not least for most Western languages, the organization
have any marker in common. Cluster analysis and of sentences in words, and the existence of clear
multidimensional scaling have been carried out separators between them, allows for an easy com-
using the proposed distance function, highlighting putation of indexing terms. It has to be considered
that the group of subjects was uniform, without that the perceptual study has been carried out only
any cluster of subjects. using melodic information, and results could be
A feature of interest for application in the in- different for other dimensions. For instance, the
formation retrieval domain is the typical length of segmentation of the harmony may take advantage,
lexical units. The average length varied consider- at least for musicians and musicologists, by the
ably depending on the subject and on the excerpt. theory on chord progressions and cadences, while
Yet, no one of the subjects indicated a lexical unit the segmentation of rhythm may be carried out
of unitary length. Furthermore, only two subjects considering that rhythmic patterns tend to repeat
indicated lexical units of two notes length, while almost exactly, allowing for an easier identifica-
for four subjects the minimum length of a lexi- tion and subsequent segmentation.
cal unit was three notes. The rest of the subjects
indicated a minimum length between four and
five notes. On the other hand, subjects did not an experImental comparIson
show the same agreement regarding the maximum of melodIc segmentatIon
length of musical phrases. Apart from subject No. technIques
11, who indicated a musical phrase of 38 notes in
excerpt No. 4 (clearly indicating the reasons of Given that music is a continuous flow of events
this choice, which then cannot be considered an without explicit separators, automatic indexing
error), the maximum length of musical phrases needs to rely on automatic segmentation tech-
is within the range of 8 and 18 notes. niques, that is techniques that detect automatically
the lexical units of music documents. Different
results of the perceptual study strategies of melodic segmentation can be applied,
each one focusing on particular aspects of music
The results of the perceptual study showed that information. A study has been carried out on the
subjects agree on perceiving a boundary between effectiveness, in terms of retrieval performances,
lexical units only when there are strong cues. In of different approaches to segmentation. The
particular, the presence of long notes surrounded study has been limited to melodic segmenta-
by short ones seems to give the strongest evidence tion, because as already stressed melody is the

most used dimension in music retrieval. Another straightforward, and can be carried out in linear
interesting comparison of approaches to music time. The idea underlying this approach is that
retrieval has been presented in Hu and Dannen- the effect of musically irrelevant N-grams will be
berg (2002), where the focus was on alternative compensated by the presence of all the musically
representations for a dynamic programming relevant ones. It is common practice to choose
approach, both from the retrieval effectiveness small values for N, typically from 3 to 7 notes,
and from the computational cost points of view. because short units give higher recall, which is
In the presented study the computational costs of considered more significant than the subsequent
the tested approaches were comparable, and thus lowering in terms of precision. Fixed-length seg-
result are not reported. mentation can be extended to polyphonic scores,
The organization of a number of evaluation with the aim to extract all relevant monophonic
campaigns by the research community working tokens from concurrent multiple voices (Do-
on the different aspects of music access, retrieval, raisamy & Rüger, 2004).
and feature extraction (IMIRSEL, 2006), which
started in 2005 (preceded in 2004 by an evalu- Data-Driven Segmentation (DD)
ation effort on audio analysis), will increasingly
allow for the comparison of different approaches Segmentation can be performed considering
to music indexing, using standard collections that typical passages of a given melody tend to
(Downie, Futrelle & Tcheng, 2004). be repeated many times (Pienimäki, 2002). The
repetitions can simply be due to the presence of
approaches to melodic different choruses in the score or can be related
segmentation to the use of the same melodic material along
the composition. Each sequence that is repeated
The approaches to music segmentation can be at least K times—normally twice—is usually
roughly divided in two main groups: the ones that defined a pattern, and is used for the description
highlight the lexical units using only the document of a music document. This approach is called data-
content, and the ones that exploit prior informa- driven because patterns are computed only from
tion about the music theory and perception. Four the document data without exploiting knowledge
different approaches, two for each group, have on music perception or structure. This approach
been tested. can be considered as an extension of the N-grams
approach, because DD units can be of any length,
Fixed-Length Segmentation (FL) with the limitation that they have to be repeated
inside the melody—subpatterns that are included
The simplest segmentation approach consists of in longer patterns are discarded, if they have the
the extraction from a melody of subsequences same multiplicity. Patterns can be computed from
of exactly N notes, called N-grams (Downie & different features, like pitch or rhythm, each fea-
Nelson, 2000). N-grams may overlap, because no ture giving a different set of DD units to describe
assumption is made on the possible starting point document content. Patterns can be truncated by
of a theme, neither on the possible repetitions of applying a given threshold, to reduce the size of
relevant music passages. The strength of this ap- the index and to achieve a higher robustness to
proach is its simplicity, because it is based neither local errors in the query (Neve & Orio, 2004). The
on assumption on theories on music composition extension to polyphonic scores can be carried out
or perception, nor on analysis of complete melo- similarly to the FL approach.
dies. The exhaustive computation of FL units is

Perception-Based Segmentation (PB) Musicological-Oriented Segmentation

(MO)
Melodies can be segmented accordingly to
theories on human perception. Listeners have Another approach to segmentation is based on
the ability to segment the unstructured auditory knowledge on music theory, in particular for
stream into smaller units, which may correspond classical music. According with music theorists,
to melodic phrases, motifs or musical gestures. music is based on the combination of musical
Even if listeners may disagree on the exact location structures (Lerdhal & Jackendoff, 1983; Narmour,
of boundaries between subsequent units, as high- 1990), even if its actual notation may lack of clear
lighted by the perceptual experiment described representations of such structures. Yet, they can
above, it is likely that perceptually-based units are be inferred by applying a number of rules, and
good descriptors of a document content because part of the analysis of compositions consists in
they capture melodic information that appears to their identification. It is likely that the same ap-
be relevant for users. The ability of segmenting proach can be extended to less structured music,
the auditory stream may vary depending on the like popular or ethnic music. It is assumed that
level of musical training of listeners and their a hierarchical relationship exists among music
knowledge of rules on music theory. Yet, a number structures, from musical phrases at the lower
of strategies can be generalized for all listeners, level to movements at the higher level. MO units
in particular the ones related to the detection of are computed by analyzing the musical score,
clear changes in the melodic flow such as large applying rules for structure identification and
pitch intervals or note durations. This behavior segmenting the score in units that correspond
can be partially explained by the principles of to low-level structures. The computation of MO
Gestalt psychology. Computational approaches units should be carried out using the global in-
have been proposed by music theorists for the formation of the score, but it has been proposed
automatic emulation of listener’s behavior (Ten- an algorithm that uses only local information and
ney & Polansky, 1980). PB units do not overlap gave results comparable to more complex ones
and are based on information on note pitch and (Cambouropoulos, 1997). Structures may overlap
duration of monophonic melodies. in principle, but the current implementations do
not take into account this possibility.
Figure 5. Graphical representation of different automatic segmentation (from the top: PB, a statistical
approach not tested, and MO)

The effect of alternative approaches to seg- Table 3. Main characteristics of the index terms
mentation is shown in Figure 5, where the lexi- obtained from the different segmentations tech-
cal units highlighted by different algorithms are niques
graphically shown. The algorithms are the ones FL DD PB MO
included in the MidiToolbox (Eerola & Toiviainen, Average length 3.0 4.8 4.0 3.6
2004) and correspond, from the top, to PB, to a Average units/document 52.1 61.9 43.2 45.0
probabilistic approach not tested in the present Number of units 70093 123654 70713 67893
study, and to MO.
characteristics of the Index terms
The comparison has been carried out according of the relevance judgments that can be built au-
to the Cranfield model for information retrieval. tomatically. Alternatively, relevance judgments
A music test collection of popular music has been can be created using a pool of excerpt that may
created with 2310 MIDI files as music documents. find that more than a document is relevant to a
MIDI is a well- known standard for the representa- particular query (Typke, den Hoed, de Nooijer,
tion of music documents that can be synthesized Wiering & Veltkamp, 2005). The initial queries did
to create audible performances (Rothstein, 1991). not contain errors and had a length that allowed
MIDI is becoming obsolete both as a format for the for a clear recognition of the main theme. The
representation of music to be listened to because robustness of errors has been tested by modify-
of the widespread diffusion of compressed audio ing notes pitch and duration, while the effect of
formats such as MP3, and as a format for represent- query length has been tested by shortening the
ing notated music because of the creation of new original queries.
formats for analyzing, structuring and printing Table 3 shows the main characteristics of lexi-
music (Selfridge-Field, 1997). The availability cal units, and thus of the index terms, extracted
of large collections of music files in MIDI is the with the segmentation approaches, giving a pre-
main reason why this format is still widely used liminary idea on how each segmentation approach
for music retrieval experiments. describes the document collection. The values
From the collection of MIDI files, the chan- reported in the table have been computed with
nels containing the melody have been extracted the following experimental setup: FL has been
automatically and the note durations have been computed with N-grams of three notes; DD has
normalized; the highest pitch has been chosen been computed applying a threshold of five notes;
as part of the melody for polyphonic channels PB and MO have been computed using the algo-
(Uitdenbogerd & Zobel, 1998). After preprocess- rithms presented in Eerola and Toiviainen (2004).
ing, the collection contained complete melodies For these four approaches, units were sequences
with an average length of 315.6 notes. A set of of couples of values, pitch and duration, and the
40 queries, with average length of 9.7 notes, has index is built with one entry for each different
been created as recognizable examples of both sequence.
choruses and refrains of 20 randomly selected The approaches gave comparable results in
songs. Only the theme from which the query was terms of average length of lexical units, which is
taken was considered as relevant, considering a about three to four notes, and also in the average
query-by-examples paradigm where the example number of different units per document. This
is an excerpt of a particular work that needs to be behavior is different from the results given by
retrieved. This assumption simplifies the creation the perceptual study on manual segmentation,
0
Table 4. Retrieval effectiveness of the different retrieval effectiveness

approaches
FL DD PB MO All the retrieval experiments have been carried
Av.Prec. 0.98 0.96 0.80 0.83 out using the same retrieval engine, which is based
=1 97.5% 92.5% 72.5% 77.5%
on the Vector Space Model and implements the
tfidf weighting scheme described previously.
≤3 97.5% 100% 87.5% 87.5%
The results in terms of retrieval effectiveness are
≤5 97.5% 100% 87.5% 92.5%
presented in Table 4, where the average precision
≤ 10 100% 100% 90.0% 95.0% (Av.Prec.), the percentage of queries that gave the
not found 0.0% 0.0% 10.0% 2.5% relevant document within the first k positions (k
with values in [1,3,5,10]), and the ones that did not
which for many subjects gave a minimum length retrieve the relevant document at all (“not found”),
of lexical units of about four to five notes. Another are reported. It has to be noted that retrieval
interesting feature is the average number of differ- effectiveness is usually reported through preci-
ent lexical units for documents that range from 43.2 sion/recall plots, using the rank list of retrieved
for PB to 61.9 for DD. Given that these values are documents. The particular choice of parameters
computed for complete music documents, even if adopted in this experiment depends on the fact
only on the melodic line, music indexing is based that there was only one relevant document for
on a very compact description of document con- each query in the test collection.
tents, at least compared with indexing of textual FL and DD had comparable results, because
documents that, also in the case of short docu- they have close average precision and both ap-
ments, have hundreds of different index terms. proaches always retrieved the relevant document
The last row reports the number of different lexical within the first ten positions (DD within the first
units that corresponds to the number of entries three, but with a lower percentage of queries that
in the index file. As it can be seen, segmentation retrieved the relevant document at top rank). Also
with overlapping units with different lengths (DD) PB and MO had comparable results in terms of
has the drawback of an increase of the index size average precision; with slightly better perfor-
and in memory requirements. mances of MO, in particular because PB did
Figure 6. Retrieval effectiveness of the different approaches depending on the number of errors added
to the query (left) and on the shortening of query length (right)
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 FL 0.5 FL
DD DD
0.4 PB 0.4 PB
MO MO
0.3 FUS 0.3 FUS
100% 90% 80% 70% 60% 50% correct 1 error 2 errors 3 errors

not retrieve at all the relevant document in 10% length of all the index terms), and hence local
of the queries. This is a negative aspect of PB, perturbations due to errors in the query do not
due to the fact that its units do not overlap and, affect a high number of indexes.
for short queries, it may happen that none of the The fact that a simple approach to melodic seg-
note sequences match with the segmented units. mentation such as FL outperforms all other ones
A similar consideration applies also to MO, but that are based on content specific characteristics
this effect seems to be bounded by the fact that is somehow counterintuitive. For this reason, a
MO units are shorter. number of experiments have been carried out in
The performances of the different approaches order to highlight the best configuration of the
depending on the presence of errors in the query parameters for each approach. The results reported
are shown on the left of Figure 6, which reports in Table 4 and Figure 6 are the best ones achieved
the average precision of the approaches. Apart by each approach. It has to be noted that the
from FL, the other segmentation techniques had overall performances are biased by the particular
a clear drop in the performances, also when a implementation of the different segmentation
single error was introduced. In particular, PB algorithms, and this is particularly true for PB
and MO showed a similar negative trend, almost and MO. The aim of the study was not to state
linear with the number of errors. It is interesting to which is the best approach, but to compare the
note that DD, even if its performances are almost experimental results of different implementations
comparable to FL in the case of a correct query, using a common testbed.
had a faster degradation in performances. The In addition to the results of the segmentation
average precision depending on query length is algorithms, Figure 6 reports also the average preci-
shown on the right of Figure 6. Similar consider- sion of a fifth approach, named FUS, which with
ations can be made on the trends of the different this particular setting outperforms all the others
approaches. PB and MO had a similar behavior, in terms of robustness to errors and short queries.
and also in this case FL was the one with the The approach is discussed in the next section.
best performances. It can be noted that, when the
queries are moderately shortened, the average
precision of FL and DD is almost constant. The parallel Indexes
drop in performances appears earlier, and more
remarkably, for DD than for FL. Up to this point, the discussion has been carried
From the analyses, it appears that simple ap- out assuming that only one index is built on a
proaches to segmentation, which have redundant document collection, eventually using a combina-
information through overlapping units, give better tion of features. In general, this is a reasonable
performances than approaches based on music approach, because the creation of an index file
perception or music theory. Moreover, fixed- is computationally costly, and may require a
length segmentation was more robust to errors in remarkable amount of memory storage. On the
the queries and to short queries than data-driven other hand, different indexes may capture different
segmentation. From these results, it seems that characteristics of a document collection, which
for music indexing an approach that does not fil- usefulness may depend on the user information
ter out any information, improves recall without need, on the way the query is created, and on the
degrading precision. The good performances approach to evaluation of retrieved documents
of FL can be also due to the fact that it has the carried out by the user. The presence of a number
shortest average length of index terms (actually, of alternative indexing schemes can be exploited
being N-grams, the average correspond to the by running a number of parallel retrieval sessions

on the different indexing schemes, obtaining retrieval rather than indexing of music documents,
a number of ranked list of potentially relevant it is worth mentioning an experiment on data fu-
documents, and combining the results in a single sion of alternative indexing schemes.
ranked list using some strategies. The approach is
named data fusion or collection fusion, where the fusion of different melodic
latter term more precisely addresses the problem descriptors
of combining together the results from indexing
schemes built on different—and potentially non- Even when a single dimension is used to extract
overlapping—collections of documents. content descriptors, there are a number of choices
Collection fusion techniques are quite popular that have to be made on the way lexical units are
in Web metasearch engines, which are services for computed that affect the effectiveness of an index-
the automatic parallel querying of a number the ing scheme. Let us consider the common situa-
normal Web search engines where overall results tion in which the melodic information is used as
are presented in a single ranked list (Lee, 1997). content descriptor, using an example of a complete
The advantages of a metasearch engines are an evaluation of music indexing schemes.
higher coverage of the Web pages, which is the The first choice in music indexing is how lexical
union of the coverage of single search engines, units are computed, as described in the previous
and improvements of the retrieval effectiveness section. In the running example, the DD ap-
in terms of recall—because more documents are proach is used—Data Driven, where lexical units
retrieved—and in terms of precision—because are computed using a pattern analysis approach
multiple evidences of the relevance of some presented in the preceding section—because it
documents are available. The crucial point in gives high performances in terms of retrieval
the development of a collection (or data) fusion while allowing for different lengths of the index
technique is on the way different ranked lists are terms. The second step consists of choosing
fused together. A number of constraints have to whether using absolute or relative features. The
be considered for typical collection fusion ap- third step regards the levels of quantization that
plications, namely the indexing schemes of the has to be applied to each feature, that may range
different search engines are not known; there is a from one single level—meaning that the feature
different coverage of the overall set of documents; is not used in practice—to as many levels as
the individual RSVs, or the similarity score, may the possible values—meaning that no quantiza-
not be known by the metasearch engine; if known, tion is applied. Table 5 represents the different
the RSVs may be expressed in different scales and combinations of time and pitch information of
have different statistical distributions. For this melodic lexical units; the three cells marked with
reason, some techniques have been proposed using an acronym in bold are the ones that have been
the only information that is surely available: the used in the experiment on data fusion, the two
rank of each retrieved document for each search cells marked with “---” highlight combinations
engine (Fox & Shaw, 1994). that do not make sense.
Most of these constraints do not hold when the As shown in Table 5, three indexing schemes
parallel indexes are built within the same retrieval have been used: PIT that uses only relative pitch
system, because there is complete control on each information, with N=9 levels of quantization of
weighting scheme, on the range and distribution melodic intervals; IOI that uses only absolute
of each RSVs which can be obtained using the duration information, with N=11 levels the quan-
same retrieval engine that is run on the different tization of exact durations; BTH that uses both
indexes. Even if this aspect is more related to the relative pitch and absolute duration. Having used

Table 5. Possible combinations of duration and pitch information, according to the absolute or relative
representation and on the levels of quantization
Duration
Abs. Rel.
1 N ∞ 1 N ∞
1 --- IOI
A
b N
P s.
∞
i
t
1 ---
c R
h e N PIT BTH
l.
∞
the DD approach, lexical units may be different format downloaded from the Web. As for any test
from an index to the other, because IOI patterns collection, documents may contain errors. In a
may not correspond to PIT or BTH patterns, and preprocessing step, the channels containing the
vice versa. melody have been extracted automatically and
It can be noted that any combination reported in the note durations have been normalized; in case
the table, eventually varying quantization, can be of polyphonic scores, the highest pitch has been
used to index music documents from melodic in- chosen as part of the melody. After preprocessing,
formation. An extensive evaluation of the retrieval the collection contained 107 complete melodies
effectiveness of any combination of choice—and with an average length of 244 notes, ranging from
their merging with data fusion techniques—has 89 of the shortest melody to 564 of the longest.
not been carried out yet. The individual indexing Indexes were built on complete melodies, because
schemes can be fused in any combination. In the repetitions are important for the DD approach to
presented evaluation, two data fusion approaches melodic segmentation. A set of 40 queries has
have been tested: Fuse2 that merges the results been created by randomly selecting 20 themes in
from PIT and IOI, and Fuse3 that merges all three the dataset and using the first notes of the chorus
indexing schemes. and of the refrain. The initial note and the length
of each query were chosen to have recognizable
experimental evaluation motifs that could be considered representative of
real users’ queries. The queries had an average
The effect of data fusion has been tested on a length of 9.75 notes, ranging from 4 to 21 notes.
small test collection of popular music, which has Only the theme from which the query was taken
been created using 107 Beatles’ songs in MIDI was considered as relevant.

Results are shown in Table 6, where the aver- selective than simple IOI and PIT and which gave
age precision (Av.Prec.), the percentage queries a very high value score to the relevant document
that gave the relevant document within the first in case of a good match.
k positions (k with values in [1,3,5,10]), and the In these experiments, and with this particular
ones that did not retrieve the relevant document at setup, the best results for Fuse2 and Fuse3 have
all (“not found”), are reported. As it can be seen, been obtained assigning equal weights to the single
IOI gave the poorest results, even if for 90% of RSVs, thus computing the final similarity as the
the queries the relevant document was among average of individual similarities. This setup was
the first three retrieved. The highest average the one that gave the best results. Yet data fusion
precision using a single feature was obtained by can be used also to allow the user a refinement of
BTH, with the drawback of an on-off behavior: the query, by manually assigning which are the
either the relevant document is the first retrieved dimensions and the features that are more relevant
or it is not retrieved at all (2.5% of the queries). for the user’s information need. For instance, if
PIT gave good results, with all the queries that the peculiarity of a song is on the rhythm of the
found the relevant document among the first three melody rather than on the pitch contour, the user
documents. may choose a particular data fusion strategy that
The first interesting result is that Fuse2 gave underlines this characteristic. Data fusion allows
an improvement in respect to the separate fea- also to increases in robustness to errors in the
tures—IOI and PIT—with an average precision query and to short queries, as shown in Figure
of 0.96, hence with values comparable to BTH 6 for the experiments on the comparison of dif-
and without the drawback of not retrieving the ferent segmentation techniques. In this case, the
relevant document for 2.5% of the queries. It is values reported for FUS are obtained by fusing
worth noting that even if the retrieval effectiveness the individual results of the four techniques.
of IOI is very low compared to PIT, nevertheless The drawback of data fusion techniques is that
the combination of the two in a fused ranked list they require to create parallel indexing schemes,
gave an improvement of the recognition rate (the to carry out parallel retrievals, and finally to fuse
relevant document retrieved at top rank) of 3%. the results together. Nevertheless, results are en-
It could be expected that adding BTH in the data couraging, and are worth to be tested extensively.
fusion would not give further improvements, since A possible complete scheme for indexing, retrieval
BTH is already a combination of the first two. The and data fusion is plotted in Figure 7.
set of BTH patterns is a subset of the union of set
of IOI and PIT patterns, while it can be shown
that set BTH includes the intersection of sets IOI
and PIT, because of the choice of not consider- Table 6. Retrieval effectiveness using single index-
ing subpatterns that have the same multiplicity ing schemes and data fusion approaches
of longer ones. Given these considerations, it is IOI PIT BTH Fuse2 Fuse3
clear that BTH does not introduce new patterns Av.Prec. 0.74 0.93 0.98 0.96 0.98
in respect to IOI and PIT. Yet, as can be seen =1 57.5% 87.5% 97.5% 92.5% 95.0%
from column labeled with Fuse3 in Table 6 the ≤3 90.0% 100% 97.5% 100% 100%
use of all the three features allowed for reducing ≤5 95.0% 100% 97.5% 100% 100%
the drawbacks of the three single rankings. This ≤ 10 97.5% 100% 97.5% 100% 100%
result can be explained considering that BTH had not found 0 0 2.5 0 0
different tfidf weights, which were somehow more

Figure 7. Main components of a complete music retrieval engine where multiple indexing schemes are
combined with a data fusion technique
conclusIon Yet there are a number of factors that need to

be taken into account when extending the indexing
Indexing is based on the concept that documents concept to the music domain. First of all, the fact
become more accessible if a number of guidance that it is difficult to state which is the semantic
tools are provided. This fact can be exploited to of a music document, if a semantic exists. Thus
improve the retrieval effectiveness, reducing its the choice of which are the most representative
computational cost because the content of a col- index terms have to be carried out with a different
lection of documents is accessed through a set of approach. To this end, the concept of lexical units
pointers: instead of browsing all the documents to of the music language has been introduced, taking
find which are relevant to the user’s information into account that music has a multidimensional
need, the system may access only the ones that nature, and that not all the dimensions may be of
are potentially relevant, depending on the set of interest for the final user. Furthermore, it is not
indexes that point to them. Indexing is the key clear to which extent the users agree on the way
to scalability. they perceive lexical units.
The application of indexing to music retrieval To investigate this aspect, a perceptual study
is motivated by the need for a scalable system to has been carried out on the way a number of
access music documents, because music collec- musicians highlighted melodic lexical units of 20
tions are increasingly growing, both in digital excerpts of music scores. The analysis highlighted
libraries systems at server side and in storage that, even if there are some common trends in
devices at user side. Given that the main ideas user’s behavior, the consistency among subjects
behind textual document indexing are quite depends on the availability of particular cues in
general, a parallelism can be drawn between the the music documents. Even if subjects may not
phases of textual document indexing—namely, agree when they refer to lexical units, their use
lexical analysis, stop-words removal, stemming as index terms may be evaluated experimentally,
and index weighting—and the phases that may using a test collection of documents, queries
be required for music document indexing.

and relevance judgments. This is the goal of an references

experiment that has been reported, where four
different approaches of melodic segmentation, Agosti, M., Bombi, F., Melucci, M. & Mian, G. A.
aimed at highlighting lexical units, have been (2000). Towards a digital library for the Venetian
compared. Results showed that simple approaches music of the eighteenth century. In J. Anderson,
outperform more complex ones that exploit a M. Deegan, S. Ross & S. Harold (Eds.), DRH
priori information either on music perception or 98: Selected papers from digital resources for
on music theory. Simple approaches are based on the humanities (pp. 1-16). Office for Humanities
the creation of a redundant index, where different Communication.
elements (i.e., a given note) belong to more than one
Baeza-Yates, R. & Ribeiro-Neto, B. (1999).
index term. From the experimental comparison it
Modern information retrieval. New York: ACM
may be inferred that redundancy is an important
Press.
aspect of music indexing. To this end, a final ex-
periment has been proposed, where a data fusion Bainbridge, D., Nevill-Manning, C. G., Witten, I.
approach has been exploited to mix together the H., Smith, L. A. & Mc-Nab, R. J. (1999). Towards
results of alternative indexing schemes. Results a digital library of popular music. In Proceedings
showed that data fusion allows for an improve- of the ACM Conference on Digital Libraries (pp.
ment of the retrieval effectiveness. 161-169).
From this discussion it can be concluded that
music indexing can inherit most of the advantages Basaldella, D. & Orio, N. (2006). An application
of textual indexing, which is still a promising ap- of weighted transducers to music information
proach to music access, provided that the peculiari- retrieval. In Proceedings of Electronic Imaging
ties of the music language are taken into account. (pp. 607306/1-607306/10).
Even if research in music retrieval should not be Berenzweig, A., Logan, B., Ellis, D. P. W. &
limited to the extension of well-known techniques Whitman, B. (2004). A large-scale evaluation of
to the music domain, indexing can be considered acoustic and subjective music-similarity mea-
as a core technique for building more complex sures. Computer Music Journal, 28(2), 63-76.
systems. Furthermore, it has to be noted that the
actual trends in the Music Information Retrieval Birmingham, W. P., Dannenberg, R. B., Wake-
(MIR) research community encompass a number field, G. H., Bartsch, M., Bykowski, D., Mazzoni,
of approaches that are beyond the pure retrieval D., Meek, C., Mellody, M. & Rand, W. (2001).
task. The user of a MIR system may have differ- MUSART: Music retrieval via aural queries. In
ent needs other than searching for a particular Proceedings of the International Conference on
song that she has in mind. The user may need an Music Information Retrieval (pp. 73-82).
information filtering system that allows the user Blackburn, S. & DeRoure, D. (1998). A tool for
to listen only to the music documents that the content-based navigation of music. In Proceed-
user (potentially) likes, or a browsing system for ings of the ACM Multimedia Conference (pp.
managing a personal music collection. Also in 361-368).
these cases, indexing can be used to increase the
efficiency of new approaches to music access. Borko, H. & Bernier, C. L. (1978). Indexing con-
cepts and methods. New York: Academic Press.

Cambouropoulos, E. (1997). Musical rhythm: A Ferrari, E. & Haus, G. (1999). The musical archive
formal model for determining local boundaries. information system at Teatro alla Scala. In Pro-
In E. Leman (Ed.), Music, gestalt and computing ceedings of the IEEE International Conference
(pp. 277-293). Berlin: Springer-Verlag. on Multimedia Computing and Systems (Vol. 2,
pp. 817-821).
Cano, P., Batlle, E., Kalker, T. & Haitsma, J.
(2005). A review of audio fingerprinting. Journal Fox, E. A. & Shaw, J .A. (1994). Combination of
of VLSI Signal Processing, 41, 271-284. multiple searches. In The Second Text REtrieval
Conference, TREC-2 (pp. 243-249).
Cantate (2006). Computer access to notation and
text in music libraries. Retrieved May 17, 2007, Ghias, A., Logan, J., Chamberlin, D. & Smith, B.
from http://projects.fnb.nl/cantate/ C. (1995). Query by humming: Musical informa-
tion retrieval in an audio database. In Proceedings
de Cheveigné, A. & Baskind, A. (2003). F0
of the ACM Conference on Digital Libraries (pp.
extimation. In Proceedings of Eurospeech (pp.
231-236).
833-836).
Gómez, E. & Herrera, P. (2004). Estimating the
Doraisamy, S. & Rüger, S. (2004). A polyphonic
tonality of polyphonic audio files: Cognitive
music retrieval system using N-grams. In Proceed-
versus machine learning modelling strategies. In
ings of the International Conference on Music
Proceedings of the International Conference on
Information Retrieval (pp. 204-209).
Music Information Retrieval (pp. 92-95).
Downie, S. & Nelson, M. (2000). Evaluation of a
Harmonica (2006). Accompanying action on
simple and effective music information retrieval
music information in libraries. Retrieved May 17,
method. In Proceedings of the ACM International
2007, from http://projects.fnb.nl/harmonica/
Conference on Research and Development in
Information Retrieval (pp. 73-80). Harte, C., Sandler, M., Abdallah, S. & Gómez,
E. (2005). Symbolic representation of musical
Downie, J. S. (2003). Music information retrieval.
chords: A proposed syntax for text annotations.
Annual Review of Information Science and Tech-
In Proceedings of the International Conference
nology, 37, 295-340.
on Music Information Retrieval (pp. 66-71).
Downie, J. S., Futrelle, J. & Tcheng, D. (2004). The
Harvell, J. & Clark, C. (1995). Analysis of the
international music information retrieval systems
quantitative data of system performance. Deliv-
evaluation laboratory: Governance, access and secu-
erable 7c, LIB-JUKEBOX/4-1049: Music across
rity. In Proceedings of the International Conference
borders. Retrieved May 17, 2007, from http://www.
statsbiblioteket.dk/Jukebox/edit-report-1.html
Dunn, J. & Mayer, C. (1999). VARIATIONS: A
Hoashi, K., Matsumoto, K. & Inoue, N. (2003).
Digital Music Library System at Indiana Uni-
Personalization of user profiles for content-based
versity. In Proceedings of ACM Conference on
music retrieval based on relevance feedback. In
Digital Libraries (pp. 12-19).
Proceedings of the ACM International Conference
Eerola, T. & Toiviainen, P. (2004). MIR in Mat- on Multimedia (pp. 110-119).
lab: The Midi Toolbox. In Proceedings of the
International Conference on Music Information
Retrieval (pp. 22-27).

Hsu, J.-L., Liu, C. C. & Chen, A. L. P. (1998). Effi- Lerdhal, F. & Jackendoff, R. (1983). A generative
cient repeating pattern finding in music databases. theory of tonal music. Cambridge: The MIT Press.
In Proceeding of the International Conference
Lesaffre, M., Leman, M., Tanghe, K., De Baets,
on Information and Knowledge Management
B., De Meyer, H. & Martens, J.-P. (2003). User-
(pp. 281-288).
dependent taxonomy of musical features as a
Hu, N. & Dannenberg, R. B. (2002). A comparison conceptual framework for musical audio-mining
of melodic database retrieval techniques using technology. In Proceedings of the Stockholm
sung queries. In Proceedings of the ACM/IEEE Music Acoustics Conference (pp. 635-638).
Joint Conference on Digital Libraries (pp. 301-
McLane, A. (1996). Music as information. In M.
307).
E. Williams (Ed.), Arist (Vol. 31, pp. 225-262).
Humdrum. The Humdram toolkit: Software for American Society for Information Science.
music research. Retrieved May 17, 2007, from
Meek, C. & Birmingham, W. (2003). Automatic
http://www.music-cog.ohio-state.edu/Humdrum/
thematic extractor. Journal of Intelligent Informa-
Huron D. (1995). The Humdrum toolkit: Reference tion Systems, 21(1), 9-33.
manual., Menlo Park, CA: Center for Computer
Melucci, M. & Orio, N. (1999). Musical informa-
Assisted Research in the Humanities.
tion retrieval using melodic surface. In Proceed-
IMIRSEL (2006). The international music in- ings of the ACM Conference on Digital Libraries
formation retrieval system evaluation laboratory (pp. 152-160).
project. Retrieved May 17, 2007, from http://www.
Melucci, M. & Orio, N. (2004). Combining
music-ir.org/evaluation/
melody processing and information retrieval
Krumhansl, C. L. (1989). Why is musical timbre techniques: Methodology, evaluation, and system
so hard to understand? In S. Nielsen and O. Olsson implementation. Journal of the American Society
(Eds.), Structure and perception electroacoustic for Information Science and Technology, 55(12),
sound and music (pp. 45-53). Amsterdam, NL: 1058-1066.
Elsevier.
Middleton, R. (2002). Studying popular music.
Lavrenko, V. & Pickens, J. (2003). Polyphonic Philadelphia: Open University Press.
music modeling with random fields. In Proceed-
Moen, W. E. (1998). Accessing distributed cul-
ings of the ACM International Conference on
tural heritage information. Communications of
Multimedia (pp. 120-129).
the ACM, 41(4), 45-48.
Lee, J. H. (1997). Analysis of multiple evidence
Musica. The international database of choral
combination. In Proceedings of the ACM Interna-
repertoire. Retrieved May 17, 2007, from http:
tional Conference on Research and Development
//www.musicanet.org/
in Information Retrieval (pp. 267-275).
Narmour, E. (1990). The analysis and cognition of
Lee, J. H. & Downie, J. S. (2004). Survey of music
basic melodic structures. Chicago, MI: University
information needs, uses, and seeking behaviours:
of Chicago Press.
Preliminary findings. In Proceedings of the In-
ternational Conference on Music Information
Retrieval (pp. 441-446).

Neve, G. & Orio, N. (2004). Indexing and retrieval Stenzel, R. & Kamps, T. (2005). Improving
of music documents through pattern analysis and content-based similarity measures by training
data fusion techniques. In Proceedings of the a collaborative model. In Proceedings of the
International Conference on Music Information International Conference on Music Information
Retrieval (pp. 216-223). Retrieval (pp. 264-271).
Pienimäki, A. (2002). Indexing music database Tenney, J. & Polansky, L. (1980). Temporal gestalt
using automatic extraction of frequent phrases. perception in music. Journal of Music Theory,
In Proceedings of the International Conference 24(2), 205-241.
TREC. Text REtrieval conference home page. Re-
Rothstein, J. (1991). MIDI: A comprehensive trieved May 17, 2007, from http://trec.nist.gov/
introduction. Madison, WI: A-R Editions.
Typke, R., den Hoed, M., de Nooijer, J., Wiering,
Selfridge-Field, E. (1997). Beyond MIDI: The F. & Veltkamp, R.C. (2005). A ground truth for
handbook of musical codes. Cambridge: The half a million musical incipits. Journal of Digital
MIT Press. Information Management, 3(1), 34-39.
Shifrin, J., Pardo, B., Meek, C. & Birmingham, W. Uitdenbogerd, A. & Zobel, J. (1998). Manipula-
(2002). HMM-based musical query retrieval. In tion of music for melody matching. In Proceed-
Proceedings of the ACM/IEEE Joint Conference ings of the ACM Conference on Multimedia (pp.
on Digital Libraries (pp. 295–300). 235-240).
Sparck Jones, K. & Willett, P. (1997). Readings van Rijsbergen, C. J., (1979). Information retrieval
in information retrieval., San Francisco: Morgan (2nd ed.). London: Butterworths.
Kaufmann.
0

Content-Based Indexing of Symbolic Music Documents

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Content-Based Indexing of Symbolic Music Documents

Uploaded by

Copyright:

Available Formats

IntoductIon based multimedia access, the development of new

BasIc concepts of IndexIng words. Attention has to be paid in some particular

Application to the Music Domain stemming

term Weighting A special case of term weighting can be

dexing because of the evident difference between retrieval techniques

an experIment on the of the styles of 4 well-known composers of tonal

Perception-Based Segmentation (PB) Musicological-Oriented Segmentation

characteristics of the Index terms

Table 4. Retrieval effectiveness of the different retrieval effectiveness

conclusIon Yet there are a number of factors that need to

and relevance judgments. This is the goal of an references

You might also like

Content-Based Indexing of Symbolic Music Documents

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Content-Based Indexing of Symbolic Music Documents

Uploaded by

Copyright:

Available Formats



IntoductIon based multimedia access, the development of new

BasIc concepts of IndexIng words. Attention has to be paid in some particular

Application to the Music Domain stemming

term Weighting A special case of term weighting can be

dexing because of the evident difference between retrieval techniques

an experIment on the of the styles of 4 well-known composers of tonal

Perception-Based Segmentation (PB) Musicological-Oriented Segmentation

characteristics of the Index terms

Table 4. Retrieval effectiveness of the different retrieval effectiveness

conclusIon Yet there are a number of factors that need to

and relevance judgments. This is the goal of an references

You might also like