Professional Documents
Culture Documents
i
IT
i
IT
i1
1
IT
i
log
2
T
i
T
i
T
i
; IT
i1
log
2
T
i1
T
i1
T
i1
2
T
i
and T
i
denote the number of and tuples in the
training set T
i
. Adding the literal L
m
to the partially devel-
oping clause: = R(v
1
,v
2
,. . .,v
k
) :-L
1
, L
2
,. . .,L
m1
results to
the new set T
i+1
, which contains the tuples that remained
from T
i
. T
i
denotes the number of tuples in T
i
that led
to another tuple after adding L
m
. The candidate literal
L
i
that yields the largest gain becomes L
m
.
R [21] is a system that automatically renes the theory
in the function-free rst-order logic. It assumes that the
induced theory can only be approximately correct, hence,
needs to be rened to improve its accuracy using the train-
ing examples. R implements a four-step theory revision
process, i.e., (1) operationalization, (2) specialization, (3)
rule creation, and (4) unoperationalization. Operational-
ization expands the theory into a set of operational clauses,
detecting and removing useless literals. A literal is useful if
its normalized gain, i.e., computing only for I(T
i
) I(T
i+1
)
of Eq. (1), is >h, where h is a specied threshold, and if it
produces new variables for the other literals in the clause,
i.e., it is generative [21]. R considers the useless literals
as faults in the theory. Specialization uses FOIL to add lit-
erals to the overly general clauses covering tuples to
make them more specic. Rule creation uses FOIL to
introduce more operational clauses in case some tuples
cannot be covered by existing ones. Finally, unoperational-
ization re-organizes the clauses to reect the hierarchical
structure of the original theory.
The training examples suitable for inducing the model
are generated as follows. Each musical piece is divided into
musical bars or measures. A piece may contain eight to 16
bars (an average of 11.6 bars per piece). Every three succes-
sive bars in a piece together with the music frame are trea-
ted as one training example, i.e., example
i
= (frame, bar
i2
,
bar
i1
, bar
i
). Each bar consists of a maximum of four
chords. The idea here is that sound owing from at least
three bars is needed to elicit an aective response. The rst
two examples in every piece, however, will inherently con-
tain only one and two bars, respectively. The components
of each bar are extracted from music() and represented
as ground tuples. A total of 162 examples were obtained
from the 14 pieces with each bar having an average play-
time of 2.1 s.
Recall that emotion readings are taken while the music
is being played. Using the available synchronization tools
of the ESA and music segmenting tools, the emotion mea-
surements are assigned to the corresponding musical seg-
ments. Subsequently, each emotion measure is discretized
to a value between 1 and 5 based on a pre-determined
threshold. Using the same range of values as that of the
SDM-based instrument permits us to retain the learning
techniques in [8] while evaluating the new emotion detec-
tion scheme. It is also plausible for us to dene a set of
bipolar aective descriptor pairs ed
1
ed
2
(e.g., joyfulnot
joyful). It is important to note that antonymic semantics
(e.g., stressed vs. relaxed and joyful vs. sad) do not hold
for the ESA since the four emotions are dened along
orthogonal dimensions. Hence, four separate readings are
taken instead of just treating one as inversely proportional
to the other. This is consistent with the circumplex model
of aect [15] where each of the four emotions can be seen
in dierent quadrants of this model. One relational model
is learned for each aect in the four bipolar emotion pairs
ed
1
ed
2
(a total of 4 2 = 8 models).
To generate the training instances specic to FOIL, for
any emotion descriptor ed
1
in the pair ed
1
ed
2
, the exam-
ples labelled as 5 are represented as tuples, while those
labelled as 64 as tuples. Conversely for ed
2
, and
tuples are formed from bars which were evaluated as 1
and P2, respectively. In other words, there are correspond-
ing sets of and tuples for each aect and a tuple for
ed
1
does not mean that it is a tuple for ed
2
. Examples are
derived almost in the same way for FOIL+R . For exam-
ple, the tuples of ed
1
and ed
2
are formed from bars
labelled as P4 and 62, respectively.
3. Composing using GA and melody heuristics
Evolutionary computational models have been dominat-
ing the realm of automatic music composition (as reviewed
by [24]). One major problem in user-oriented GA-based
music creation (e.g., [3,22]), however, is that the user is
required to listen and then rate the composed musical
sequences in each generation. This is obviously burden-
some, tiring and time-consuming. Although the CAUI is
user-oriented, it need not solicit user intervention since it
uses the relational model as critic to control the quality
of the composed tunes.
204 T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208
We adapted the conventional bit-string chromosome
representation in GA as a columns-of-bits representation
expressed in music() form (see Fig. 6, where F is the song_
frame() and C
i
is a chord()). Each bit in a column repre-
sents a component of the frame (e.g., tempo) or chord
(e.g., root). The performance of our GA depends on two
basic operators, namely, single-point crossover and muta-
tion. With the rst operator, the columns of bit strings
from the beginning of the chromosome to a selected
crossover point is copied from one parent and the rest is
copied from the other. Mutation inverts selected bits
thereby altering the individual frame and chord informa-
tion. The more fundamental components (e.g., tempo,
rhythm and root) are mutated less frequently to avoid a
drastic change in musical events, while the other features
are varied more frequently to acquire more variants.
The fundamental idea of GA is to produce increasingly
better solutions in each new generation of the evolutionary
process. During the genetic evolution process, candidate
chromosomes are being produced that may be better or
worse than what has already been obtained. Hence, the t-
ness function is necessary to evaluate the utility of each
candidate. The CAUIs tness function takes into account
the user-specic relational model and music theory:
fitnessChromosomeM fitnessUserM
fitnessTheoryM 3
where M is a candidate chromosome. This function makes
it possible to generate frames and chord progressions that
t the music theory and stimulate the target feeling. tness-
User(M) is computed as follows:
fitnessUserM fitnessFrameM fitnessPairM
fitnessTripletM 4
Each function at the right-hand side of Eq. (4) is generally
computed as follows:
fitnessXM
X
L
i1
Averaged
F
P
i
; d
0
F
P
i
; d
FR
P
i
; d
0
FR
P
i
5
The meanings of the objects in Eq. (5) are shown in Table
1. The only variable parameter is P
i
, which denotes the
component/s extracted from M, that will serve as input
to the four subfunctions of tnessX. If there are n chord()
predicates in M, there will be L P
i
s formed depending on
the tnessX. For example, given chromosome M: = music
(song_frame(),chord
1
(),. . .,chord
8
()), where the added sub-
scripts denote chord positions, computing for tnes-
sPair(M) will have 7 P
i
s (L = 8-1): P
1
= (chord
1
(),
chord
2
()),. . .,P
7
= (chord
7
(),chord
8
()). With tnessFrame(M),
it will only be P
1
= song_frame().
The values of the subfunctions in Eq. (5) will dier
depending on whether an ed
1
(e.g., sad) or ed
2
(e.g., not
sad) music is being composed. Let us denote the target
aect of the current composition as emo
P
and the opposite
of this aect as emo
N
(e.g., if ed
1
is emo
P
then emo
N
refers
to ed
2
, and vice versa). d
F
and d
FR
(where F and FR refer to
the models obtained using FOIL alone or FOIL+R ,
respectively) return +2 and +1, respectively, if P
i
appears
in any of the corresponding target relations (see Table 1)
in the model learned for emo
P
. On the other hand, d
0
F
and d
0
FR
return 2 and 1, respectively, if P
i
appears in
any of the corresponding relations in the emo
N
model. In
eect, the structure P
i
is rewarded if it is part of the desired
relations and is penalized if it also appears in the model for
the opposite aect since it does not possess a distinct aec-
tive avour. The returned values (2 and 1) were deter-
mined empirically.
tnessTheory(M) seeks to reward chromosomes that are
consistent with our music theory and penalize those that
violate. This is computed in the same way as Eq. (4) except
that each of the three functions at the right shall now be
computed as
fitnessXM
X
L
i1
AveragegP
i
6
The denitions of the objects in Eq. (6) follow the ones in
Table 1 except that P
i
is no longer checked with the rela-
tional models but with the music theory. The subfunction
g returns the score of tting P
i
with the music theory, which
is either a reward or a penalty. Structures that earn a high
reward include frames that have complete or half cadence,
chord triplets that contain the transition T S D of the
tonal functions tonic (T), subdominant (S) and dominant
(D), and pairs that transition from dominant to secondary
dominant (e.g., V/II II). On the other hand, penalty is
given to pairs or triplets that have the same root, form
Fig. 6. GA chromosome structure and operators.
Table 1
Meanings of the objects in Eq. (5)
tnessX P
i
(component/s of M) L Target relation
tnessFrame song_frame() 1 frame()
tnessPair (chord
i
(), chord
i+1
()) n 1 pair()
tnessTriplet (chord
i
(), chord
i+1
(), chord
i+2
()) n 2 triplet()
T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 205
and inversion values, have the same tonal function and
form, or have the transition D S. All these heuristics
are grounded in basic music theory. For example, the ca-
dence types are scored based on the strength of their eects
such that the complete cadence is given the highest score
since it is the strongest. Another is that the transition
T S D is rewarded since it is often used and many
songs have been written using this. D S is penalized
since a dominant chord will not resolve with a
subdominant.
Overall, the scheme we just described is defensible given
that music theory can be represented using heuristics for
evaluating the tness of each GA-generated music variant.
The character of each generated variant is immediately t
not just to the music theory, but more importantly, to
the desired aective perception. It is also clear in the com-
putations that the presence of the models permits the
absence of human intervention during composition thereby
relieving the user of unnecessary cognitive load and achiev-
ing full automation. Fig. 7 shows one of the best-t GA-
generated chromosomes to stimulate a sad feeling.
The outputs of the GA contain only chord progressions.
Musical lines with only chord tones may sound monoto-
nous or homophonic. A non-chord tone may serve to
embellish the melodic motion surrounding the chord tones.
The CAUIs melody-generating module rst generates
chord tones using the GA-obtained music() information
and then utilizes a set of heuristics to generate the non-
chord tones in order to create a non-monotonic piece of
music.
To create the chord tones, certain aspects of music the-
ory are adopted including the harmonic relations V7 I
(or D T, which is known to be very strong), T D,
T S, S T, and S D, and keeping the intervals in
octaves. Once the chord tones are created, the non-chord
tones, which are supposed to be not members of the accom-
panying chords, are generated by selecting and disturb-
ing the chord tones. All chord tones have an equal
chance of being selected. Once selected, a chord tone is
modied into a non-chordal broderie, appoggiatura or pass-
ing tone. How these non-chord tones are adopted for the
CAUI is detailed in [7].
4. Experimentation and analysis of results
We performed a set of individualized experiments to
determine whether the CAUI-composed pieces can actually
stimulate the target emotion. Sixteen subjects were asked to
hear the 14 musical pieces, at the same time, wear the
ESAs helmet. The subjects were all Japanese male with
ages ranging from 18 to 27 years. Although it is ideal to
increase the heterogeneity of the subjects prole, it seems
more appropriate at this stage to limit their diversity in
terms of their background and focus more on the possibly
existing dierences in their emotional reactions. For the
subject to hear the music playing continuously, all the
pieces were sequenced using a music editing tool and silent
pauses of 15 s each were inserted before and after each
piece with the exemption of the rst which is preceded by
a 30-s silence so as to condition the subject. Personalized
models were learned for each subject based on their emo-
tion readings and new pieces were composed independently
for each. The same subjects were then asked to go through
the same process using the set of newly composed pieces.
Twenty-four tunes were composed for each subject, i.e.,
three for each of the bipolar aective descriptors. Fig. 8
shows that the CAUI was able to compose a sad piece, even
without prior handcrafted knowledge of any aect-induc-
ing piece.
We computed for the dierence of the averaged emotion
readings for each ed
1
ed
2
pair. The motivation here is that
the higher the dierence is the more distinct/distinguishable
is the aective avour of the composed pieces. We also per-
formed a paired t-test on the dierences to determine if
these are signicant. Table 2 shows that the composed
sad pieces are the only ones that correlate with the subjects
emotions. A positive dierence was seen in many instances,
albeit not necessarily signicant statistically. This indicates
that the system is not able to dierentiate the structures
that can arouse such impressions.
The version of the CAUI reported in [8] is similar to the
current except for two things: (1) it used self-reporting and
(2) evaluated on a whole-music, instead of bar, level. Its
compositions are signicant in only two out of six emotion
dimensions at level a = 0.01 using students t-test. The cur-
rent version used only 14 pieces but was able to produce
signicant outputs for one emotion. This shows that we
cannot easily dismiss the potential of the current version.
The results obtained can be viewed as acceptable if the
current form of the research is taken as a proof of concept.
The acceptably sucient result for one of the emotion
dimensions shows a promise in the direction we are head-
ing and motivates to further enhance the systems capabil-
ity in terms of its learning techniques. The unsatisfactory
Fig. 7. An actual GA-generated musical piece.
Fig. 8. A CAUI-composed sad musical piece.
206 T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208
results obtained for the other emotion descriptors can also
be attributed to shortcomings in creating adequately struc-
tured tunes due to our narrow music theory. For instance,
the composed tunes at this stage consist only of eight bars
and are rhythmically monotonic. Admittedly, we need to
take more of music theory into consideration. Secondly,
since the number of training examples has been downsized,
the number of distinct frames, i.e., in terms of attribute val-
ues, became fewer. There is no doubt that integrating the
more complex musical knowledge and scaling to a larger
dataset are feasible provided that the CAUI suciently
denes and represents the degrees of musical complexity
(e.g., structure in the melody) and acquires the needed stor-
age to store the training data (this has become our immedi-
ate obstacle). It is also an option to investigate the eect of
just a single music element that is very inuential in creat-
ing music and stimulating emotions (e.g., the role of beat in
African music). This will permit a more focused study while
lessening the complexity in scope.
5. Related works
To comprehend the signicant link that unites music and
emotion has been a subject of considerable interest involv-
ing various elds (refer to [5]). For about ve decades, arti-
cial intelligence has played a crucial role in computerized
music (reviewed in [10]), yet there seems to be a scarcity of
research that tackles the compelling issues of a user aect-
specic automated composition. As far as our limited
knowledge of the literature is concerned, it has been dicult
to nd a study that aims to measure the emotional inuence
of music and then heads towards a fully automated compo-
sition task. This is in contrast to certain works that did not
deal with music composition even if they have achieved
detecting the emotional inuence of music (e.g., [1,9]) or
to systems that solicit users ratings during composition
(e.g., [22,23]). Other works attempt to compose music with
EEG or other biological signals as direct generative source
(e.g., refer to the concepts outlined in [18]) but may not nec-
essarily distinguish the aective characteristics of the com-
posed pieces. We single out the work of Kim and Andre
[6] which deals with more aective dimensions whose mea-
sures are based on users self-report and results of physiolog-
ical sensing. It diers with the CAUI in the sense that it does
not induce a relational model and it dealt primarily with
generating rhythms.
6. Conclusion
This paper proposes the technique of composing music
based on the users emotions as analyzed from changes in
brainwave activities. The results reported here show that
learning is feasible even with the currently small training
set. The current architecture also permitted evading a tiring
and burdensome self-reporting as emotion detection task
while achieving partial success in composing an emotion-
inducing tune. We cannot deny that the system falls a long
way short of human composers, nevertheless, we believe
that the potential of its compositional intelligence should
not be easily dismissed.
The CAUIs learning architecture will remain viable even
if other ANS measuring devices are used. The problem with
the ESA is that it practically limits itself from being bought
by ordinary people since it is expensive and it restricts users
mobility (e.g., eye blinks can easily introduce noises). We are
currently developing a multi-modal emotion recognition
scheme that will allow us to investigate other means to mea-
sure expressed emotions (e.g., through ANS response and
human locomotive features) using devices that permit
mobility and are cheaper than the ESA.
References
[1] R. Bresin, A. Friberg, Emotional coloring of computer-controlled
music performance, Computer Music Journal 24 (4) (2000) 4462.
[2] A. Gabrielsson, E. Lindstrom, The inuence of musical structure on
emotional expression, in: P.N. Juslin, J.A. Sloboda (Eds.), Music and
Emotion: Theory and Research, Oxford University Press, New York,
2001, pp. 223248.
[3] B.E. Johanson, R. Poli, GP-Music: An interactive genetic program-
ming system for music generation with automated tness raters,
Technical Report CSRP-98-13, School of Computer Science, The
University of Birmingham, 1998.
[4] P.N. Juslin, Studies of music performance: A theoretical analysis of
empirical ndings, in: Proc. Stockholm Music Acoustics Conference,
2003, pp. 513516.
[5] P.N. Juslin, J.A. Sloboda, Music and Emotion: Theory and Research,
Oxford University Press, New York, 2001.
[6] S. Kim, E. Andre, Composing aective music with a generate and
sense approach, in: V. Barr, Z. Markov (Eds.), Proc. 17th Interna-
tional FLAIRS Conference, Special Track on AI and Music, AAAI
Press, 2004.
Table 2
Results of empirical validation
Subject Stressed Joyful Sad Relaxed
Average dierence of ed
1
(+) and ed
2
() emotion analyses values
A 1.67 2.33 0.67 3.00
B 0.67 0.33 1.33 1.33
C 1.00 1.00 0.67 1.33
D 1.00 0.67 0.67 2.33
E 2.67 1.00 1.33 1.00
F 0.67 0.33 0.00 0.67
G 0.67 0.33 1.67 1.33
H 1.00 0.00 1.33 0.67
I 0.67 0.33 1.67 0.67
J 0.67 0.33 0.33 2.00
K 0.33 0.33 0.67 0.00
L 0.67 0.33 2.33 0.00
M 0.67 0.33 0.33 1.33
N 0.33 2.33 1.00 2.00
O 0.33 0.33 0.67 1.00
P 1.67 1.67 0.00 1.00
Average 0.13 0.04 0.63 0.02
Sample variance 1.18 1.07 0.85 2.12
Standard error 0.28 0.27 0.24 0.38
t Value 0.45 0.16 2.63 0.06
Signicant (5%) False False True False
Signicant (1%) False False True False
T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 207
[7] R. Legaspi, Y. Hashimoto, K. Moriyama, S. Kurihara, M. Numao,
Music compositional intelligence with an aective avour, in: Proc.
12th International Conference on Intelligent User Interfaces, ACM
Press, 2007, pp. 216224.
[8] R. Legaspi, Y. Hashimoto, M. Numao, An emotion-driven musical
piece generator for a constructive adaptive user interface, in: Proc. 9th
Pacic Rim International Conference on Articial Intelligence,
Lecture Notes in Articial Intelligence, vol. 4009, Springer, 2006,
pp. 890894.
[9] T. Li, M. Ogihara, Detecting emotion in music, in: Proc. 4th
International Conference on Music Information Retrieval, 2003, pp.
239240.
[10] R. Lopez de Mantaras, J.L. Arcos, AI and Music: From Composition
to Expressive Performances, AI Magazine 23 (3) (2002) 4357.
[11] T. Musha, Y. Terasaki, H.A. Haque, G.A. Ivanitsky, Feature
Extraction from EEGs Associated with Emotions, Artif Life Robotics
1 (1997) 1519.
[12] C. Nattee, S. Sinthupinyo, M. Numao, T. Okada, Learning rst-order
rules from data with multiple parts: Applications on mining chemical
compound data, in: Proc. 21st International Conference on Machine
Learning, 2004, pp. 7785.
[13] M. Numao, S. Takagi, K. Nakamura, Constructive adaptive user
interfaces Composing music based on human feelings, in: Proc.
18th National Conference on AI, AAAI Press, 2002, pp. 193
198.
[14] R.W. Picard, J. Healey, Aective Wearables, Personal and Ubiqui-
tous Computing 1 (4) (1997) 231240.
[15] J. Posner, J.A. Russell, B.S. Peterson, The circumplex model of aect:
an integrative approach to aective neuroscience, cognitive develop-
ment, and psychopathology, Development and Psychopathology 17
(2005) 715734.
[16] J.R. Quinlan, Learning logical denitions from relations, Machine
Learning 5 (1990) 239266.
[17] D. Riecken, Wolfgang: Emotions plus goals enable learning, in:
Proc. IEEE International Conference on Systems, Man and Cyber-
netics, 1998, pp. 11191120.
[18] D. Rosenboom, Extended Musical Interface with the Human
Nervous System: Assessment and Prospectus, Leonardo Monograph
Series, Monograph No. 1 (1990/1997).
[19] C. Roz, The autonomic nervous system: Barometer of emotional
intensity and internal conict, A lecture given for Confer, 27 March
2001, a copy can be found in: http://www.thinkbody.co.uk/papers/
autonomic-nervous-system.htm.
[20] J.A. Sloboda, Music structure and emotional response: some empir-
ical ndings, Psychology of Music 19 (2) (1991) 110120.
[21] S. Tangkitvanich, M. Shimura, Rening a relational theory with
multiple faults in the concept and subconcept, in: Machine Learning:
Proc. of the Ninth International Workshop, 1992, pp. 436444.
[22] M. Unehara, T. Onisawa, Interactive music composition system
Composition of 16-bars musical work with a melody part and backing
parts, in: Proc. IEEE International Conference on Systems, Man and
Cybernetics, 2004, pp. 57365741.
[23] M. Unehara, T. Onisawa, Music composition system based on
subjective evaluation, in: Proc. IEEE International Conference on
Systems, Man and Cybernetics, 2003, pp. 980986.
[24] G.A. Wiggins, G. Papadopoulos, S. Phon-Amnuaisuk, A. Tuson,
Evolutionary Methods for Musical Composition, International
Journal of Computing Anticipatory Systems 1 (1) (1999).
208 T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208