You are on page 1of 9

Modelling aective-based music compositional intelligence

with the aid of ANS analyses


Toshihito Sugimoto
b
, Roberto Legaspi
a,
*
, Akihiro Ota
b
, Koichi Moriyama
a
,
Satoshi Kurihara
a
, Masayuki Numao
a
a
The Institute of Scientic and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japan
b
Department of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan
Available online 23 November 2007
Abstract
This research investigates the use of emotion data derived from analyzing change in activity in the autonomic nervous system (ANS)
as revealed by brainwave production to support the creative music compositional intelligence of an adaptive interface. A relational model
of the inuence of musical events on the listeners aect is rst induced using inductive logic programming paradigms with the emotion
data and musical score features as inputs of the induction task. The components of composition such as interval and scale, instrumen-
tation, chord progression and melody are automatically combined using genetic algorithm and melodic transformation heuristics that
depend on the predictive knowledge and character of the induced model. Out of the four targeted basic emotional states, namely, stress,
joy, sadness, and relaxation, the empirical results reported here show that the system is able to successfully compose tunes that convey one
of these aective states.
2007 Elsevier B.V. All rights reserved.
Keywords: Adaptive user interface; EEG-based emotion spectrum analysis; User modelling; Automated reasoning; Machine learning
1. Introduction
It is no surprise that only a handful of research works
have factored in human aect in creating an intelligent
music system or interface (e.g., [1,6,13,17,23]). One major
reason is that the general issues alone when investigating
music and emotion are enough to immediately confront
and intimidate the researcher. More specically, how can
music composition, which is a highly structured cognitive
process, be modelled and how can emotion, which consists
of very complex elements and is dependent on individuals
and stimuli, be measured? [7]. The other is the fact that
music is a reliable elicitor of aective response immediately
raises the question as to what exactly in music can inuence
an individuals mood. For example, is it the case that musi-
cal structures contain related musical events (e.g., chord
progression, melody change, etc.) that allow emotionally-
stimulating mental images to surface? Although attempts
have been made to pin-point which features of the musical
structure elicit which aect (e.g., [2,20]), the problem
remains compelling because the solutions are either partial
or uncertain.
Our research addresses the problem of determining the
extent by which emotion-inducing music can be modelled
and generated using creative music compositional AI.
Our approach involves inducing an aects-music relations
model that describes musical events related to the listeners
aective reactions and then using the predictive knowledge
and character of the model to automatically control the
music generation task. We have embodied our solution in
a constructive adaptive user interface (CAUI) that re-
arranges or composes [13] a musical piece based on ones
aect. We have reported the results of combining inductive
logic programming (in [8,13]) or multiple-part learning (in
[7]) to induce the model and a genetic algorithm whose t-
ness function is inuenced by the model. In these previous
versions of the CAUI, an evaluation instrument based on
0950-7051/$ - see front matter 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.knosys.2007.11.010
*
Corresponding author. Tel.: +81 6 6879 8426; fax: +81 6 6879 8428.
E-mail address: roberto@ai.sanken.osaka-u.ac.jp (R. Legaspi).
www.elsevier.com/locate/knosys
Available online at www.sciencedirect.com
Knowledge-Based Systems 21 (2008) 200208
the semantic dierential method (SDM) was used to mea-
sure aective responses. The listener rated musical pieces
on a scale of 15 for a set of bipolar aective descriptor
pairs (e.g., happysad). Each subjective rating indicates
the degree of the positive or negative aect.
We argue that for the CAUI to accurately capture the
listeners aective responses, it must satisfy necessary con-
ditions that the SDM-based self-reporting instrument does
not address. Emotion detection must capture the dynamic
nature of both music and emotion. With the rating instru-
ment, the listener can only evaluate after the music is
played. This means that only one evaluation is mapped
to the entire musical piece rather than having possibly var-
ied evaluations as the musical events unfold. Secondly, the
detection task should not impose a heavy cognitive load
upon the listener. It must ensure that listening to music
remains enjoyable and avoid, if not minimize, disturbing
the listener. In our prior experiments, the listener was asked
to evaluate 75 musical pieces, getting interrupted the same
number of times. If indeed the listener experienced stress or
anxiety in the process, it was dicult to factor this in the
calculations. Lastly, the emotion detection task should be
language independent, which can later on permit cross-cul-
tural analyses. This exibility evades the need to change the
aective labels (e.g., Japanese to English).
We believe that the conditions stated above can be sat-
ised by using a device that can analyze emotional states
by observing the change in activity in the autonomic ner-
vous system (ANS). Any intense feeling has consequent
physiological eects on the ANS [19]. These eects include
faster and stronger heartbeat, increased blood pressure or
breathing rate, muscle tension and sweating, accelerated
mental activity, among others. This is the reason ANS
eects can be observed using devices that can measure
blood pressure, skin or heart responses, or brainwave pro-
duction. Researchers in the eld of aective computing are
active in developing such devices (e.g., [14]). We have mod-
ied the learning architecture of the CAUI to incorporate
an emotion spectrum analyzing system (ESA)
1
that detects
emotional states by observing brainwave activities that
accompany the emotion [11].
The learning architecture is shown in Fig. 1. The rela-
tional model is induced by employing the inductive logic
programming paradigms of FOIL and R taking as inputs
the musical score features and the ESA-provided emotion
data. The musical score features are represented as deni-
tions of rst-order logic predicates and serve as back-
ground knowledge to the induction task. The next task
employs a genetic algorithm (GA) that produces variants
of the original score features. The tness function of the
GA ts each generated variant to the knowledge provided
by the model and music theory. Finally, the CAUI creates
using its melody-generating module an initial tune consist-
ing of the GA-obtained chord tones and then alters certain
chord tones to become non-chord tones in order to embel-
lish the tune.
Using the ESA has several advantages. The dynamic
changes in both emotions and musical events can now be
monitored and mapped continuously over time. Secondly,
it allows mapping of emotion down to the musical bar
level. This means that many training examples can be
obtained from a single piece. Using the self-reporting
instrument, the listener needed to hear and evaluate many
musical pieces just to obtain fairly enough examples.
Thirdly, more accurate measurements can now be acquired
objectively. Lastly, it is unobtrusive thereby relieving the
listener of any cognitive load and allowing him/her to just
sit back and listen to the music.
In this paper, we rst discuss the domain knowledge rep-
resentations, learning parameters and learning tasks used
for the CAUI in Sections 2 to 3. Section 4 details our exper-
imentation methodology and analysis of the empirical
results we gathered. Section 5 briey locates the contribu-
tion of the CAUI in the eld. Discussions on what we
intend to carry out as possible future works can be found
part of our analysis and conclusion.
2. Knowledge acquisition and representation
In order to obtain a personalized model of the coupling
of emotional expressions and the underlying music param-
eters, it is vital to: (1) identify which musical features (e.g.,
tempo, rhythm, harmony, etc.) should be represented as
background knowledge, (2) provide an instrument to
map the features to identied emotion descriptors, (3) log-
ically represent the music parameters, and (4) automati-
cally induce the model. Although the inuence of various
features have been well studied (e.g., refer to a comprehen-
sive summary on the inuence of compositional parameters
[2] and an overview of recent investigations on the inuence
of performance parameters [4,5]), the task of the CAUI is
to automatically nd musical structure and sequence fea-
tures that are inuential to specic emotions.
2.1. Music theory
The aspect of music theory relevant to our research is
the interaction of music elements into patterns that can
1
Developed by the Brain Functions Laboratory, Inc. (http://
www.b.co.jp/main.html).
Fig. 1. The learning architecture of the CAUI.
T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 201
help the composition techniques. We have a narrow music
theory that consists of a limited set of music elements (see
Fig. 2). The reason is that we need the predictive model to
be tractable in order to perform controlled experimenta-
tions and obtain interpretable results. The denitions of
the concepts listed in Fig. 2 can be found in texts on music
theory. The methods by which music theory is utilized by
the genetic algorithm and melodic transformation heuris-
tics are explained in Section 3.
Fourteen musical piece segments were prepared consist-
ing of four pieces from classical music, three from Japanese
Pop, and seven from harmony textbooks. The amount of
time a segment may play is from 7.4 to 48 s (an average
of 24.14 s). These pieces were selected, albeit not randomly,
from the original 75 segments that were used in our previ-
ous experiments. Based on prior results, these selected
pieces demonstrate a high degree of variance in emotional
content when evaluated by previous users of the system. In
other words, these pieces seem to elicit aective avours
that are more distinguishable.
2.2. Emotion acquisition features of the ESA
Through proper signal processing, scalp potentials that
are measured by an electroencephalograph (EEG) can
provide global information about mental activities and
emotional states [11]. With the ESA, EEG features associ-
ated with emotional states are extracted into a set of 45
cross-correlation coecients. These coecients are calcu-
lated for each of the h(58 Hz), a(813 Hz) and b(13
20 Hz) frequency components forming a 135-dim EEG
state vector. Operating a transformation matrix on this
state vector linearly transforms it to a 4-dim vector
E = e
1
,e
2
,e
3
,e
4
, with the four components representing
levels of stress, joy, sadness and relaxation, respectively.
The maximum time resolution of the emotion analysis per-
formed in real-time is 0.64 s. More detailed discussions on
the ideas behind ESA can be found in [11]. The emotion
charts in Fig. 3 graphically show series of readings that
were taken over time. The higher the value means the
more evident is the emotion being displayed. The two
wave charts at the bottom indicate levels of alertness
and concentration, respectively. These readings help gauge
the reliability of the emotion readings. For example, the
level of alertness should be high when the music is being
played indicating that the listener is being keen to the
tune. Low alert points are valid so long as these corre-
spond to the silent pauses inserted between tunes since
there is no need for the user to listen to the pauses. How-
ever, acceptably high values for concentration should be
expected at any point in time. The collected emotion data
are then used by the model induction task.
Fig. 2. Basic aspects of music theory that are being used for this version of the CAUI.
Fig. 3. EEG signals used for emotion analyses are obtained using scalp electrodes.
202 T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208
Brainwave analysis is a delicate task that can easily be
distorted by external factors including an eye blink. Hence,
careful attention needs to be given when acquiring the
readings. The listener needs to be in a closed room with
minimal noise and other external distractions as possible.
The listener is also required to close his/her eyes at all
times. This set-up is necessary to obtain stable readings.
Any series of measurements should be taken without dis-
turbing the listener.
2.3. First-order logic representation of the score features
The background knowledge of the CAUI are denitions
in rst-order logic that describe musical score features.
The language of rst-order logic, or predicate logic, is
known to be well-suited both for data representation
and describing the desired outputs. The representational
power of predicate logic permits describing existing feature
relations among data, even complex relations, and pro-
vides comprehensibility of the learned results [12]. Score
features were encoded into a predicate variable, or rela-
tion, named music(), which contains one song_frame()
and a list of sequenced chord() relations describing the
frame and chord features, respectively. Fig. 4 shows the
music() representation (- means NIL) of the musical
score segment of the prelude of Jacques Oenbachs
Orphee aux Enfers.
The CAUI needs to learn three kinds of target relations
or rules, namely, frame(), pair() and triplet(), wherein the
last two represent patterns of two and three successive
chords, respectively. These rules comprise the aects-music
relational model. Fig. 5-left, for example, shows structural
information contained in the given sample relations and
the actual musical notation they represent. Fig. 5-right
shows a segment of an actual model learned by the CAUI
that can be used to construct a musical piece that is sup-
posed to induce in one user a sad feeling.
2.4. Model induction using FOIL and R
The CAUI employs the combination of FOIL and R
(Renement by Example) to model the musical structures
that correlate with the listeners emotions with the musical
structures comprising the set of training examples.
Fig. 5. A segment of a set of rules that are supposed to stimulate a sad feeling.
Fig. 4. A musical score represented in music() predicate.
T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 203
FOIL [16] is a rst-order inductive learning system that
induces a theory represented as function-free Horn clauses.
Each clause is a conjunction of literals, where each literal
consists of a relation and an ordering of the variable argu-
ments of the relation. The training examples are repre-
sented extensionally as sets of ground tuples, i.e., the
constant values of the relations present in the examples.
Tuples belonging or not belonging to the relation are
labelled as and tuples, respectively. FOIL assumes
that all tuples exhibit a relationship R and the tuples
do not. FOIL iteratively learns a clause of the theory and
removes from the training set the tuples of the relation
R covered by that clause until all tuples are covered by
one or more clauses.
Induction of a single clause starts with it having an
empty body, and body literals are iteratively added at the
end of the clause until no tuple is covered by the clause.
FOIL selects one literal to be added from a set of candidate
literals based on an information gain heuristic that esti-
mates the utility of a literal in discriminating from
tuples. The information gained for adding a literal is com-
puted as
GainL
i
T

i
IT
i
IT
i1
1
IT
i
log
2
T

i
T

i
T

i
; IT
i1
log
2
T

i1
T

i1
T

i1
2
T

i
and T

i
denote the number of and tuples in the
training set T
i
. Adding the literal L
m
to the partially devel-
oping clause: = R(v
1
,v
2
,. . .,v
k
) :-L
1
, L
2
,. . .,L
m1
results to
the new set T
i+1
, which contains the tuples that remained
from T
i
. T

i
denotes the number of tuples in T

i
that led
to another tuple after adding L
m
. The candidate literal
L
i
that yields the largest gain becomes L
m
.
R [21] is a system that automatically renes the theory
in the function-free rst-order logic. It assumes that the
induced theory can only be approximately correct, hence,
needs to be rened to improve its accuracy using the train-
ing examples. R implements a four-step theory revision
process, i.e., (1) operationalization, (2) specialization, (3)
rule creation, and (4) unoperationalization. Operational-
ization expands the theory into a set of operational clauses,
detecting and removing useless literals. A literal is useful if
its normalized gain, i.e., computing only for I(T
i
) I(T
i+1
)
of Eq. (1), is >h, where h is a specied threshold, and if it
produces new variables for the other literals in the clause,
i.e., it is generative [21]. R considers the useless literals
as faults in the theory. Specialization uses FOIL to add lit-
erals to the overly general clauses covering tuples to
make them more specic. Rule creation uses FOIL to
introduce more operational clauses in case some tuples
cannot be covered by existing ones. Finally, unoperational-
ization re-organizes the clauses to reect the hierarchical
structure of the original theory.
The training examples suitable for inducing the model
are generated as follows. Each musical piece is divided into
musical bars or measures. A piece may contain eight to 16
bars (an average of 11.6 bars per piece). Every three succes-
sive bars in a piece together with the music frame are trea-
ted as one training example, i.e., example
i
= (frame, bar
i2
,
bar
i1
, bar
i
). Each bar consists of a maximum of four
chords. The idea here is that sound owing from at least
three bars is needed to elicit an aective response. The rst
two examples in every piece, however, will inherently con-
tain only one and two bars, respectively. The components
of each bar are extracted from music() and represented
as ground tuples. A total of 162 examples were obtained
from the 14 pieces with each bar having an average play-
time of 2.1 s.
Recall that emotion readings are taken while the music
is being played. Using the available synchronization tools
of the ESA and music segmenting tools, the emotion mea-
surements are assigned to the corresponding musical seg-
ments. Subsequently, each emotion measure is discretized
to a value between 1 and 5 based on a pre-determined
threshold. Using the same range of values as that of the
SDM-based instrument permits us to retain the learning
techniques in [8] while evaluating the new emotion detec-
tion scheme. It is also plausible for us to dene a set of
bipolar aective descriptor pairs ed
1
ed
2
(e.g., joyfulnot
joyful). It is important to note that antonymic semantics
(e.g., stressed vs. relaxed and joyful vs. sad) do not hold
for the ESA since the four emotions are dened along
orthogonal dimensions. Hence, four separate readings are
taken instead of just treating one as inversely proportional
to the other. This is consistent with the circumplex model
of aect [15] where each of the four emotions can be seen
in dierent quadrants of this model. One relational model
is learned for each aect in the four bipolar emotion pairs
ed
1
ed
2
(a total of 4 2 = 8 models).
To generate the training instances specic to FOIL, for
any emotion descriptor ed
1
in the pair ed
1
ed
2
, the exam-
ples labelled as 5 are represented as tuples, while those
labelled as 64 as tuples. Conversely for ed
2
, and
tuples are formed from bars which were evaluated as 1
and P2, respectively. In other words, there are correspond-
ing sets of and tuples for each aect and a tuple for
ed
1
does not mean that it is a tuple for ed
2
. Examples are
derived almost in the same way for FOIL+R . For exam-
ple, the tuples of ed
1
and ed
2
are formed from bars
labelled as P4 and 62, respectively.
3. Composing using GA and melody heuristics
Evolutionary computational models have been dominat-
ing the realm of automatic music composition (as reviewed
by [24]). One major problem in user-oriented GA-based
music creation (e.g., [3,22]), however, is that the user is
required to listen and then rate the composed musical
sequences in each generation. This is obviously burden-
some, tiring and time-consuming. Although the CAUI is
user-oriented, it need not solicit user intervention since it
uses the relational model as critic to control the quality
of the composed tunes.
204 T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208
We adapted the conventional bit-string chromosome
representation in GA as a columns-of-bits representation
expressed in music() form (see Fig. 6, where F is the song_
frame() and C
i
is a chord()). Each bit in a column repre-
sents a component of the frame (e.g., tempo) or chord
(e.g., root). The performance of our GA depends on two
basic operators, namely, single-point crossover and muta-
tion. With the rst operator, the columns of bit strings
from the beginning of the chromosome to a selected
crossover point is copied from one parent and the rest is
copied from the other. Mutation inverts selected bits
thereby altering the individual frame and chord informa-
tion. The more fundamental components (e.g., tempo,
rhythm and root) are mutated less frequently to avoid a
drastic change in musical events, while the other features
are varied more frequently to acquire more variants.
The fundamental idea of GA is to produce increasingly
better solutions in each new generation of the evolutionary
process. During the genetic evolution process, candidate
chromosomes are being produced that may be better or
worse than what has already been obtained. Hence, the t-
ness function is necessary to evaluate the utility of each
candidate. The CAUIs tness function takes into account
the user-specic relational model and music theory:
fitnessChromosomeM fitnessUserM
fitnessTheoryM 3
where M is a candidate chromosome. This function makes
it possible to generate frames and chord progressions that
t the music theory and stimulate the target feeling. tness-
User(M) is computed as follows:
fitnessUserM fitnessFrameM fitnessPairM
fitnessTripletM 4
Each function at the right-hand side of Eq. (4) is generally
computed as follows:
fitnessXM
X
L
i1
Averaged
F
P
i
; d
0
F
P
i
; d
FR
P
i
; d
0
FR
P
i

5
The meanings of the objects in Eq. (5) are shown in Table
1. The only variable parameter is P
i
, which denotes the
component/s extracted from M, that will serve as input
to the four subfunctions of tnessX. If there are n chord()
predicates in M, there will be L P
i
s formed depending on
the tnessX. For example, given chromosome M: = music
(song_frame(),chord
1
(),. . .,chord
8
()), where the added sub-
scripts denote chord positions, computing for tnes-
sPair(M) will have 7 P
i
s (L = 8-1): P
1
= (chord
1
(),
chord
2
()),. . .,P
7
= (chord
7
(),chord
8
()). With tnessFrame(M),
it will only be P
1
= song_frame().
The values of the subfunctions in Eq. (5) will dier
depending on whether an ed
1
(e.g., sad) or ed
2
(e.g., not
sad) music is being composed. Let us denote the target
aect of the current composition as emo
P
and the opposite
of this aect as emo
N
(e.g., if ed
1
is emo
P
then emo
N
refers
to ed
2
, and vice versa). d
F
and d
FR
(where F and FR refer to
the models obtained using FOIL alone or FOIL+R ,
respectively) return +2 and +1, respectively, if P
i
appears
in any of the corresponding target relations (see Table 1)
in the model learned for emo
P
. On the other hand, d
0
F
and d
0
FR
return 2 and 1, respectively, if P
i
appears in
any of the corresponding relations in the emo
N
model. In
eect, the structure P
i
is rewarded if it is part of the desired
relations and is penalized if it also appears in the model for
the opposite aect since it does not possess a distinct aec-
tive avour. The returned values (2 and 1) were deter-
mined empirically.
tnessTheory(M) seeks to reward chromosomes that are
consistent with our music theory and penalize those that
violate. This is computed in the same way as Eq. (4) except
that each of the three functions at the right shall now be
computed as
fitnessXM
X
L
i1
AveragegP
i
6
The denitions of the objects in Eq. (6) follow the ones in
Table 1 except that P
i
is no longer checked with the rela-
tional models but with the music theory. The subfunction
g returns the score of tting P
i
with the music theory, which
is either a reward or a penalty. Structures that earn a high
reward include frames that have complete or half cadence,
chord triplets that contain the transition T S D of the
tonal functions tonic (T), subdominant (S) and dominant
(D), and pairs that transition from dominant to secondary
dominant (e.g., V/II II). On the other hand, penalty is
given to pairs or triplets that have the same root, form
Fig. 6. GA chromosome structure and operators.
Table 1
Meanings of the objects in Eq. (5)
tnessX P
i
(component/s of M) L Target relation
tnessFrame song_frame() 1 frame()
tnessPair (chord
i
(), chord
i+1
()) n 1 pair()
tnessTriplet (chord
i
(), chord
i+1
(), chord
i+2
()) n 2 triplet()
T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 205
and inversion values, have the same tonal function and
form, or have the transition D S. All these heuristics
are grounded in basic music theory. For example, the ca-
dence types are scored based on the strength of their eects
such that the complete cadence is given the highest score
since it is the strongest. Another is that the transition
T S D is rewarded since it is often used and many
songs have been written using this. D S is penalized
since a dominant chord will not resolve with a
subdominant.
Overall, the scheme we just described is defensible given
that music theory can be represented using heuristics for
evaluating the tness of each GA-generated music variant.
The character of each generated variant is immediately t
not just to the music theory, but more importantly, to
the desired aective perception. It is also clear in the com-
putations that the presence of the models permits the
absence of human intervention during composition thereby
relieving the user of unnecessary cognitive load and achiev-
ing full automation. Fig. 7 shows one of the best-t GA-
generated chromosomes to stimulate a sad feeling.
The outputs of the GA contain only chord progressions.
Musical lines with only chord tones may sound monoto-
nous or homophonic. A non-chord tone may serve to
embellish the melodic motion surrounding the chord tones.
The CAUIs melody-generating module rst generates
chord tones using the GA-obtained music() information
and then utilizes a set of heuristics to generate the non-
chord tones in order to create a non-monotonic piece of
music.
To create the chord tones, certain aspects of music the-
ory are adopted including the harmonic relations V7 I
(or D T, which is known to be very strong), T D,
T S, S T, and S D, and keeping the intervals in
octaves. Once the chord tones are created, the non-chord
tones, which are supposed to be not members of the accom-
panying chords, are generated by selecting and disturb-
ing the chord tones. All chord tones have an equal
chance of being selected. Once selected, a chord tone is
modied into a non-chordal broderie, appoggiatura or pass-
ing tone. How these non-chord tones are adopted for the
CAUI is detailed in [7].
4. Experimentation and analysis of results
We performed a set of individualized experiments to
determine whether the CAUI-composed pieces can actually
stimulate the target emotion. Sixteen subjects were asked to
hear the 14 musical pieces, at the same time, wear the
ESAs helmet. The subjects were all Japanese male with
ages ranging from 18 to 27 years. Although it is ideal to
increase the heterogeneity of the subjects prole, it seems
more appropriate at this stage to limit their diversity in
terms of their background and focus more on the possibly
existing dierences in their emotional reactions. For the
subject to hear the music playing continuously, all the
pieces were sequenced using a music editing tool and silent
pauses of 15 s each were inserted before and after each
piece with the exemption of the rst which is preceded by
a 30-s silence so as to condition the subject. Personalized
models were learned for each subject based on their emo-
tion readings and new pieces were composed independently
for each. The same subjects were then asked to go through
the same process using the set of newly composed pieces.
Twenty-four tunes were composed for each subject, i.e.,
three for each of the bipolar aective descriptors. Fig. 8
shows that the CAUI was able to compose a sad piece, even
without prior handcrafted knowledge of any aect-induc-
ing piece.
We computed for the dierence of the averaged emotion
readings for each ed
1
ed
2
pair. The motivation here is that
the higher the dierence is the more distinct/distinguishable
is the aective avour of the composed pieces. We also per-
formed a paired t-test on the dierences to determine if
these are signicant. Table 2 shows that the composed
sad pieces are the only ones that correlate with the subjects
emotions. A positive dierence was seen in many instances,
albeit not necessarily signicant statistically. This indicates
that the system is not able to dierentiate the structures
that can arouse such impressions.
The version of the CAUI reported in [8] is similar to the
current except for two things: (1) it used self-reporting and
(2) evaluated on a whole-music, instead of bar, level. Its
compositions are signicant in only two out of six emotion
dimensions at level a = 0.01 using students t-test. The cur-
rent version used only 14 pieces but was able to produce
signicant outputs for one emotion. This shows that we
cannot easily dismiss the potential of the current version.
The results obtained can be viewed as acceptable if the
current form of the research is taken as a proof of concept.
The acceptably sucient result for one of the emotion
dimensions shows a promise in the direction we are head-
ing and motivates to further enhance the systems capabil-
ity in terms of its learning techniques. The unsatisfactory
Fig. 7. An actual GA-generated musical piece.
Fig. 8. A CAUI-composed sad musical piece.
206 T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208
results obtained for the other emotion descriptors can also
be attributed to shortcomings in creating adequately struc-
tured tunes due to our narrow music theory. For instance,
the composed tunes at this stage consist only of eight bars
and are rhythmically monotonic. Admittedly, we need to
take more of music theory into consideration. Secondly,
since the number of training examples has been downsized,
the number of distinct frames, i.e., in terms of attribute val-
ues, became fewer. There is no doubt that integrating the
more complex musical knowledge and scaling to a larger
dataset are feasible provided that the CAUI suciently
denes and represents the degrees of musical complexity
(e.g., structure in the melody) and acquires the needed stor-
age to store the training data (this has become our immedi-
ate obstacle). It is also an option to investigate the eect of
just a single music element that is very inuential in creat-
ing music and stimulating emotions (e.g., the role of beat in
African music). This will permit a more focused study while
lessening the complexity in scope.
5. Related works
To comprehend the signicant link that unites music and
emotion has been a subject of considerable interest involv-
ing various elds (refer to [5]). For about ve decades, arti-
cial intelligence has played a crucial role in computerized
music (reviewed in [10]), yet there seems to be a scarcity of
research that tackles the compelling issues of a user aect-
specic automated composition. As far as our limited
knowledge of the literature is concerned, it has been dicult
to nd a study that aims to measure the emotional inuence
of music and then heads towards a fully automated compo-
sition task. This is in contrast to certain works that did not
deal with music composition even if they have achieved
detecting the emotional inuence of music (e.g., [1,9]) or
to systems that solicit users ratings during composition
(e.g., [22,23]). Other works attempt to compose music with
EEG or other biological signals as direct generative source
(e.g., refer to the concepts outlined in [18]) but may not nec-
essarily distinguish the aective characteristics of the com-
posed pieces. We single out the work of Kim and Andre
[6] which deals with more aective dimensions whose mea-
sures are based on users self-report and results of physiolog-
ical sensing. It diers with the CAUI in the sense that it does
not induce a relational model and it dealt primarily with
generating rhythms.
6. Conclusion
This paper proposes the technique of composing music
based on the users emotions as analyzed from changes in
brainwave activities. The results reported here show that
learning is feasible even with the currently small training
set. The current architecture also permitted evading a tiring
and burdensome self-reporting as emotion detection task
while achieving partial success in composing an emotion-
inducing tune. We cannot deny that the system falls a long
way short of human composers, nevertheless, we believe
that the potential of its compositional intelligence should
not be easily dismissed.
The CAUIs learning architecture will remain viable even
if other ANS measuring devices are used. The problem with
the ESA is that it practically limits itself from being bought
by ordinary people since it is expensive and it restricts users
mobility (e.g., eye blinks can easily introduce noises). We are
currently developing a multi-modal emotion recognition
scheme that will allow us to investigate other means to mea-
sure expressed emotions (e.g., through ANS response and
human locomotive features) using devices that permit
mobility and are cheaper than the ESA.
References
[1] R. Bresin, A. Friberg, Emotional coloring of computer-controlled
music performance, Computer Music Journal 24 (4) (2000) 4462.
[2] A. Gabrielsson, E. Lindstrom, The inuence of musical structure on
emotional expression, in: P.N. Juslin, J.A. Sloboda (Eds.), Music and
Emotion: Theory and Research, Oxford University Press, New York,
2001, pp. 223248.
[3] B.E. Johanson, R. Poli, GP-Music: An interactive genetic program-
ming system for music generation with automated tness raters,
Technical Report CSRP-98-13, School of Computer Science, The
University of Birmingham, 1998.
[4] P.N. Juslin, Studies of music performance: A theoretical analysis of
empirical ndings, in: Proc. Stockholm Music Acoustics Conference,
2003, pp. 513516.
[5] P.N. Juslin, J.A. Sloboda, Music and Emotion: Theory and Research,
Oxford University Press, New York, 2001.
[6] S. Kim, E. Andre, Composing aective music with a generate and
sense approach, in: V. Barr, Z. Markov (Eds.), Proc. 17th Interna-
tional FLAIRS Conference, Special Track on AI and Music, AAAI
Press, 2004.
Table 2
Results of empirical validation
Subject Stressed Joyful Sad Relaxed
Average dierence of ed
1
(+) and ed
2
() emotion analyses values
A 1.67 2.33 0.67 3.00
B 0.67 0.33 1.33 1.33
C 1.00 1.00 0.67 1.33
D 1.00 0.67 0.67 2.33
E 2.67 1.00 1.33 1.00
F 0.67 0.33 0.00 0.67
G 0.67 0.33 1.67 1.33
H 1.00 0.00 1.33 0.67
I 0.67 0.33 1.67 0.67
J 0.67 0.33 0.33 2.00
K 0.33 0.33 0.67 0.00
L 0.67 0.33 2.33 0.00
M 0.67 0.33 0.33 1.33
N 0.33 2.33 1.00 2.00
O 0.33 0.33 0.67 1.00
P 1.67 1.67 0.00 1.00
Average 0.13 0.04 0.63 0.02
Sample variance 1.18 1.07 0.85 2.12
Standard error 0.28 0.27 0.24 0.38
t Value 0.45 0.16 2.63 0.06
Signicant (5%) False False True False
Signicant (1%) False False True False
T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 207
[7] R. Legaspi, Y. Hashimoto, K. Moriyama, S. Kurihara, M. Numao,
Music compositional intelligence with an aective avour, in: Proc.
12th International Conference on Intelligent User Interfaces, ACM
Press, 2007, pp. 216224.
[8] R. Legaspi, Y. Hashimoto, M. Numao, An emotion-driven musical
piece generator for a constructive adaptive user interface, in: Proc. 9th
Pacic Rim International Conference on Articial Intelligence,
Lecture Notes in Articial Intelligence, vol. 4009, Springer, 2006,
pp. 890894.
[9] T. Li, M. Ogihara, Detecting emotion in music, in: Proc. 4th
International Conference on Music Information Retrieval, 2003, pp.
239240.
[10] R. Lopez de Mantaras, J.L. Arcos, AI and Music: From Composition
to Expressive Performances, AI Magazine 23 (3) (2002) 4357.
[11] T. Musha, Y. Terasaki, H.A. Haque, G.A. Ivanitsky, Feature
Extraction from EEGs Associated with Emotions, Artif Life Robotics
1 (1997) 1519.
[12] C. Nattee, S. Sinthupinyo, M. Numao, T. Okada, Learning rst-order
rules from data with multiple parts: Applications on mining chemical
compound data, in: Proc. 21st International Conference on Machine
Learning, 2004, pp. 7785.
[13] M. Numao, S. Takagi, K. Nakamura, Constructive adaptive user
interfaces Composing music based on human feelings, in: Proc.
18th National Conference on AI, AAAI Press, 2002, pp. 193
198.
[14] R.W. Picard, J. Healey, Aective Wearables, Personal and Ubiqui-
tous Computing 1 (4) (1997) 231240.
[15] J. Posner, J.A. Russell, B.S. Peterson, The circumplex model of aect:
an integrative approach to aective neuroscience, cognitive develop-
ment, and psychopathology, Development and Psychopathology 17
(2005) 715734.
[16] J.R. Quinlan, Learning logical denitions from relations, Machine
Learning 5 (1990) 239266.
[17] D. Riecken, Wolfgang: Emotions plus goals enable learning, in:
Proc. IEEE International Conference on Systems, Man and Cyber-
netics, 1998, pp. 11191120.
[18] D. Rosenboom, Extended Musical Interface with the Human
Nervous System: Assessment and Prospectus, Leonardo Monograph
Series, Monograph No. 1 (1990/1997).
[19] C. Roz, The autonomic nervous system: Barometer of emotional
intensity and internal conict, A lecture given for Confer, 27 March
2001, a copy can be found in: http://www.thinkbody.co.uk/papers/
autonomic-nervous-system.htm.
[20] J.A. Sloboda, Music structure and emotional response: some empir-
ical ndings, Psychology of Music 19 (2) (1991) 110120.
[21] S. Tangkitvanich, M. Shimura, Rening a relational theory with
multiple faults in the concept and subconcept, in: Machine Learning:
Proc. of the Ninth International Workshop, 1992, pp. 436444.
[22] M. Unehara, T. Onisawa, Interactive music composition system
Composition of 16-bars musical work with a melody part and backing
parts, in: Proc. IEEE International Conference on Systems, Man and
Cybernetics, 2004, pp. 57365741.
[23] M. Unehara, T. Onisawa, Music composition system based on
subjective evaluation, in: Proc. IEEE International Conference on
Systems, Man and Cybernetics, 2003, pp. 980986.
[24] G.A. Wiggins, G. Papadopoulos, S. Phon-Amnuaisuk, A. Tuson,
Evolutionary Methods for Musical Composition, International
Journal of Computing Anticipatory Systems 1 (1) (1999).
208 T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208

You might also like