You are on page 1of 3


discussions, stats, and author profiles for this publication at:

Mr. Emo: Music retrieval in the emotion plane

Conference Paper January 2008

DOI: 10.1145/1459359.1459550 Source: DBLP


20 75

4 authors, including:

yi-hsuan Yang Yu-Ching Lin

Academia Sinica National Taiwan University


Homer Chen
National Taiwan University


All content following this page was uploaded by Yu-Ching Lin on 16 May 2014.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Mr. Emo: Music Retrieval in the Emotion Plane
Yi-Hsuan Yang, Yu-Ching Lin, Heng-Tze Cheng, and Homer Chen
National Taiwan University
1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan
{affige, vagante, mikejdionline},

This technical demo presents a novel emotion-based music
retrieval platform, called Mr. Emo, for organizing and browsing
music collections. Unlike conventional approaches which
quantize emotions into classes, Mr. Emo defines emotions by two
continuous variables arousal and valence and employs regression
algorithms to predict them. Associated with arousal and valence
values (AV values), each music sample becomes a point in the
arousal-valence emotion plane, so a user can easily retrieve music
samples of certain emotion(s) by specifying a point or a trajectory
in the emotion plane. Being content centric and functionally
powerful, such emotion-based retrieval complements traditional
keyword- or artist-based retrieval. The demo shows the
effectiveness and novelty of music retrieval in the emotion plane.

Categories and Subject Descriptors

H.5.5 [Sound and Music Computing]: systems
Fig. 1. With Mr. Emo, a user can easily retrieve songs
General Terms of certain emotions by specifying a point or drawing a
Algorithms, performance, design, human factors trajectory in the displayed emotion plane.

Instead, we view emotions from a continuous perspective and
Music information retrieval, emotion recognition, emotion plane
define emotions in a 2-D plane in terms of arousal (how exciting
or calming) and valence (how positive or negative). Therefore,
1. INTRODUCTION MER becomes the prediction of the arousal and valence values
Due to the fast growth of digital music collection, effective (AV values) corresponding to a point in the emotion plane. A user
retrieval and management of music is needed in the digital era. can then retrieve music samples of certain emotions by specifying
Music classification and retrieval by emotion is a plausible a point or drawing a trajectory in the emotion plane, as shown in
approach, for it is content-centric and functionally powerful. Fig. 1. In this way, the granularity and ambiguity issues
associated with emotion classes or adjectives can be successfully
Various research results have been reported in the field of music
resolved since no categorical classes are needed, and hence
emotion recognition (MER) for recognizing the affective content
numerous novel emotion-based music organization, browsing, and
(or evoking emotion) of music signals [1]. A typical approach is
retrieval methods can be easily realized.
to categorize emotions into a number of classes (e.g., happy,
angry, sad and relaxing) and apply machine learning techniques to This demo illustrates an emotion-based music retrieval platform,
train a classifier. This approach, though widely adopted, faces the called Mr. Emo. The critical task of predicting the AV values is
granularity issue when it comes to practical usage. Obviously, accomplished by regression, which has sound theoretical basis
classifying emotions into only a handful of classes cannot meet and yields satisfactory prediction accuracy. We apply the trained
the user demand for effective information access. Using a finer regression models to a mildly large scale music database and
granularity for emotion description does not necessarily address design numerous emotion-plane-based retrieval methods.
the issue since language is ambiguous and the description for the
same emotion varies from person to person.
The system consists of two main parts as shown in Fig. 2: 1) the
Copyright is held by the author/owner(s). prediction of AV values using regression models, and 2) the
MM'08, October 2327, 2008, Vancouver, Canada. emotion-based visualization and retrieval of music samples.
ACM 1-59593-447-2/06/0010
Fig. 3. Distributions of the music samples of three
Fig. 2. System architecture of Mr. Emo. famous artists in the emotion plane.

2.1 Emotion Prediction

Viewing arousal and valence as real values in [-1, 1], we 3. SYSTEM DEMOSTRATION
formulate the prediction of AV values as a regression problem. Our music collection consists of 1000 pop songs of 52 artists.
Given N inputs (xi, yi), 1 i N, where xi is a feature vector for the Feature extraction and AV values prediction are efficient and
ith input sample, and yi is the real value to be predicted, a takes less than five seconds per song. We demonstrate three novel
regression model (regressor) R( )is trained to minimize the retrieval methods that can be easily realized by Mr. Emo.
mismatch (mean squared difference) between the predicted and Query-by-emotion-point (QBEP). The user can retrieve music of
ground truth value. Two regression models are trained for arousal a certain emotion by specifying a point in the emotion plane. The
and valence respectively. system would then return the music samples whose AV values are
In our implementation, support vector regression [3] is adopted closest to the point. This retrieval method is functionally powerful
for training since it yields the best prediction accuracy. The since peoples criterion of music selection is often related to the
training set is composed of 60 English pop songs, whose AV emotion state at the moment of music selection. In addition, a user
values are annotated by 40 participants using the AnnoEmo [2] can easily discover previously unfamiliar songs which is now
software in a subjective test. For feature extraction, we apply the organized and browsed according to emotion.
Marsyas [4] toolkit to generate 52 timbral texture features Query-by-emotion-trajectory (QBET). We can also generate a
(spectral centroid, spectral rolloff, spectral flux and MFCC) and playlist by drawing a free trajectory representing a sequence of
192 MPEG-7 features (spectral flatness measure and spectral crest emotions in the emotion plane. As the trajectory goes from one
factor). The prediction accuracy, when evaluated in terms of the quadrant to another, the emotions of the songs in the playlist
R2 statistics [1] using ten-fold cross validation, reaches 0.793 for would vary accordingly.
arousal and 0.334 for valence1. This performance is considered
satisfactory in light of the difficulty of valence modeling pointed Query-by-artist-and-emotion (QBAE). Associated with artist
out in previous MER works and the fact that even human subjects metadata, we can combine the emotion-based retrieval with the
can easily perceive opposite valence for the same song. conventional artist-based retrieval. As shown in Fig. 3, we can
easily visualize the distribution of the music samples of an artist
2.2 Emotion-based Visualization and Retrieval and browse them 2 . With QBAE, we can learn that Sex Pistol
Given the regression models, we automatically predict the AV usually sings songs of the second quadrant, or retrieve sad songs
values of a music sample without manual labeling. Associated sung by Beatles. In addition, QBEP and QBAE can be used in a
with AV values, each music sample is visualized as a point in the cooperative way: We can select a song and browse the other ones
emotion plane, and the similarity between music samples is sung by the same artist by QBAE, or select a song and browse the
measured by Euclidean distance. Many novel retrieval methods other songs that sound similar to it using QBEP. We can also
can be realized in the emotion plane, making music information recommend similar artists by modeling the distributions of music
access much easier and more effective. With Mr. Emo, one can emotions as GMMs and measuring similarity by KL distance.
easily retrieve music samples of a certain emotion without
knowing the titles, or browse personal collection in the emotion
plane on mobile devices. One can also couple emotion-based
[1] Y.-H. Yang et al, A regression approach to music emotion
retrieval with traditional keyword- or artist-based ones, to retrieve
recognition, IEEE Trans. Audio, Speech and Language
songs similar (in the sense of evoking emotion) to a favorite piece,
Processing, vol. 16, no. 2, pp. 448457, 2008.
or to select the songs of an artist according to emotion. In addition,
it is also possible to playback music that matches a users current [2] Y.-H. Yang et al, Music emotion recognition: The role of
emotion state, which can be estimated from facial or prosodic individuality, Proc. ACM HCM, pp. 1321, 2007.
cues. [3] LIBSVM.
[4] G. Tzanetakis et al, Musical genre classification of audio
signals, IEEE Trans. Speech and Audio Processing, vol. 10,
no.5, pp. 293302, 2002.
R2 is a standard measurement for regression models. An R2 of
1.0 means the model perfectly fits the data, while a negative R2 2
Fig. 3 also shows the accuracy of Mr. Emo. The distributions
means the model is worse than simply taking the sample mean. match our common understanding of the styles of these artists.

View publication stats