You are on page 1of 4

Creative Capabilities of Machine Learning

Evaluating music created by algorithms

Thomas Mejtoft Linus Lagerhjelm


Umeå University Umeå University
Digital Media Lab Digital Media Lab
Sweden Sweden
thomas.mejtoft@umu.se lila0158@student.umu.se

Ulrik Söderström Ole Norberg


Umeå University Umeå University
Digital Media Lab Digital Media Lab
Sweden Sweden
ulrik.soderstrom@umu.se ole.norberg@umu.se

ABSTRACT 1 INTRODUCTION
The concept of creativity is an important part of human society and Creativity research has been in existence for hundreds of years [15]
the continuous evolution of artificial minds has raised questions and throughout history, the view on the concept of creativity has
on creativity among machines. This aim of the this study is to both differed and changed a lot. Most notably the modern view on
explore machine learning algorithms’ ability to be creative. The creativity, the creation of novel works, were seen as impossible by
study reported in this paper uses short samples of music generated both ancient Buddhists and Plato which both thought that noth-
by IBM Watson beats that are evaluated using expert assessment of ing new could ever be created. Over time, this view has gradually
51 music teachers together with samples generated by humans as evolved into today’s view where there are a consensus that cre-
control samples. The results show that one of the machine learning ativity can exist on an individual level or, in other words, someone
generated samples showed the same level of creativity as the human can be creative. It would seem reasonable to hypothesize that if
generated samples. Hence, there are indications that today machine it would be possible to construct an artificial mind with the same
learning algorithms can create music that is hard to distinguish complexity as the human mind it would be able to display similar
from human created music and can be considered creative. creative abilities. The externalization of this artificial creativity may
or may not look analogous to that found in humans but it would
CCS CONCEPTS still be there.
Artificial creativity is something that is highly controversial and
• Theory of computation → Design and analysis of algorithms; some argue that such a thing does not exist. For example Boden
Machine learning theory; • Human-centered computing → In- [4] claims that artificial creativity cannot exist as works made by
teraction techniques. computers lack “a subtle appreciation of relevance” and a “capacity
to remind us of musical or cultural associations that wouldn’t have
KEYWORDS occurred to us otherwise”. Others such as Câmara [6] argue that
creativity, machine learning, music such argumentation is “too human-centric” and further more that
creativity is a matter of combining multiple factors of which some
ACM Reference Format: are better suited for computers and vice versa. Despite this contro-
Thomas Mejtoft, Linus Lagerhjelm, Ulrik Söderström, and Ole Norberg. versy, the research of the topic is limited. The question of artificial
2021. Creative Capabilities of Machine Learning: Evaluating music created creativity will be subject to evaluation in this study.
by algorithms. In European Conference on Cognitive Ergonomics 2021 (ECCE
2021), April 26–29, 2021, Siena, Italy. ACM, New York, NY, USA, 4 pages.
https://doi.org/10.1145/3452853.3452863
1.1 Defining creativity
Creativity may perhaps be the most crucial definition to this paper.
Unfortunately the literature fails to agree on a formal definition.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed In an article in the Annual Review of Psychology from 2010, Hen-
for profit or commercial advantage and that copies bear this notice and the full citation nessey & Amabile [10] covered more than 110 different articles
on the first page. Copyrights for components of this work owned by others than the on the topic and the lack of consensus is striking. Hennessey &
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission Amabile [10] state, however, that most of the experts agree that
and/or a fee. Request permissions from permissions@acm.org. “creativity involves the development of a novel product, idea, or
ECCE 2021, April 26–29, 2021, Siena, Italy problem solution that is of value to the individual and/or the larger
© 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-8757-6/21/04. . . $15.00 social group”. Aside from that, various frameworks for defining
https://doi.org/10.1145/3452853.3452863 creativity has been proposed only to be discarded by others.
ECCE 2021, April 26–29, 2021, Siena, Italy T. Mejtoft, L. Lagerhjelm, U. Söderström & O. Norberg

In spite of all this there seems to exist a somewhat established Input Hidden Output
layer layer layer
way of determine whether a piece of work can be deemed as creative
or not. This method is based on consensual assessment of experts
x1
[10] that gets to judge a piece by their own subjective definition of
creativity. What constitutes an expert is not clearly defined in the x2
review but Amabile [3] discusses a similar concept which is referred Output
x3
to as “appropriate observers”. These observers are defined as “those
familiar with the domain in which the product was created or the x4
response articulated” [3].
In this study this method when defining whether a piece of work
is creative will be used. Consequently we will assert that the creator Figure 1: A simple example of the structure of a neural net-
of a creative piece must also be considered creative. Such a creator work.
will be referred to as a creative agent.

1.2 Objective the output layer is a single neuron which produces the output from
The objective of this short research paper is to investigate whether the network. Usually, neural networks consists of many hidden
machine learning systems can be considered creative based on their layers of various sizes. The output from the neuron corresponds to
ability to compose novel works of music. This study is intended as the dot product between the input vector and a weight vector. The
a springboard to further interdisciplinary research touching upon weight w i j between input i and neuron j determines how much a
machine learning and psychology. specific input affect the output of a neuron. Learning in terms of
a neural network is the process of finding these weights in such a
2 THEORY way that the network produces the desired output [16] for a given
This section includes the theoretical framework needed to under- input. The output from a network typically consists of either a
stand the paper starting with an overview of machine learning binary value or real number that represents a class or percentage.
and neural networks. Later in the section some specific kinds of
networks that are central to this paper will be covered. 2.3 Boltzmann Machines
The Boltzmann Machine is a special kind of neural network that,
2.1 Machine Learning just like the traditional neural network that were described in the
There are plenty of problems in modern software development that previous section, consists of primitive computing units [1]. The
is characterized by the fact that the algorithm for solving them difference between the previous model and the Boltzmann Machine
are unknown [2]. We know the desired output from our program is that the connections between units are bidirectional and can only
given some input data but we lack the ability to formally specify assume exactly one of two states: on or off represented as 1 and 0.
the required steps to derive this answer from the data. In machine These states are chosen as a function of the states of the connected
learning this problem is addressed by the construction of programs units, the weights between them and a small degree of chance. Each
that can extract certain characterizing hidden patterns from the unit forms part of an interpretation of the input data and the on or
data and exploit them to produce the desired output. This process off state determines if this interpretation is accepted or rejected [1].
of extracting patterns from input data to produce correct or good An example of an interpretation of input could be: given the pixels
enough (application dependent) output is usually referred to as of an image, do we interpret the image as representing a digit?
learning [2]. There are plenty of approaches to machine learning The different parts of the network, weights and on or off state,
and a wide range of different models. This paper will mainly focus can be used to describe the current state (usually referred to as
on a subset of machine learning known as neural networks. energy) of the network which is denoted E and computed as:
Õ Õ
2.2 Neural Networks E=− w i, j si s j + θ i si
i <j i
The idea of neural networks is very old. The very first research
paper where the ideas occurred was published already in 1943 [12]. Where w i, j is the weight between unit i and j, si and s j are either 1
The idea was to mimic how neurons work in the human brain. A or 0 if neuron i or j are on or off. θ is a threshold for neuron i.
neural network in a computer is a composition of multiple simple This energy is interpreted as how many of the interpretations
computational threshold units. Units that produce real valued out- imposed by the different units are incorrect. Thus, by choosing w i, j
puts in response to sensory inputs [16]. These units are referred that minimizes E we can make correct interpretations of an input.
to as neurons. Neurons are typically hierarchically organized in a
structure where the neurons are grouped together in smaller groups 2.4 Restricted Boltzmann Machines
to form layers. All the neurons in a layer are interconnected with The Restricted Boltzmann Machine has exactly the same properties
all the neurons in the previous layer via a set of connection weights. as the generalized Boltzmann Machine [11] but it is restricted to
An example of this structure are depicted in figure 1 below where using only two layers - one visible and one hidden layer. In the
input layer denotes an input vector to the network. Hidden layer is a first step the visible layer consists of the actual input vector of
group of neurons whose output is only visible to other neurons and binary values. This input is used to update the units in the hidden
Creative Capabilities of Machine Learning ECCE 2021, April 26–29, 2021, Siena, Italy

layer. In the next step, the hidden layer becomes the input to the 4 RESULTS
visible layer which is in turn updated the same way as the hidden The participants were asked to grade each sample individually on
layer. This process is repeated back and forth until the network is a six degree scale. During this results section these samples are
trained. Once trained, the hidden units can be used as the input referred to as ML1 & ML2 for the two samples that were gener-
data vector to another restricted Boltzmann Machine. By stacking ated by Watson Beats and H1 & H2 for the two samples that were
restricted Boltzmann Machines like this, it is possible to achieve a composed by humans.
very efficient network which work very well in practice [11].
4.1 Distribution
2.5 Watson Beat The distribution of the creativity rating has been plotted as a box
Watson Beat [5] is a branch of the IBM Watson [14] project carried plot (figure 2). The x-axis of the box plot represents the four different
out by IBM research. It is a machine learning algorithm that, when samples, labeled the same as described in the previous section. The
trained on music samples, can produce unique music compositions y-axis represents the scoring on the samples. The box represents
in response to a suggested emotion. Behind the cover, Watson Beat the interval in which 50% of the most central answer lay.
consists of a stack of Restricted Boltzmann Machines that reads The plots look very similar for the different samples except for
and outputs MIDI files [13]. the sample H1 that differs on a few points. Starting, the spread
of H1 is much less than for the other samples, the grading ranges
3 METHOD between 2-5 with a single outlier grading it a 6. Further more the
50% box only ranges between 3-4. ML1, ML2 and H2 all have grading
This study will utilize a method that is established within creativity
ranging from 1-6 while the 50% box of H2 differs from those of ML1
research [3, 10] and briefly explained in section 1.1. The method is
and ML2 where H1 ranges between 2.25-4 and the two ML samples
built upon the insight that a piece of work is “creative to the extent
ranges between 2-4.
that appropriate observers independently agree it is creative”. Using
The median of H1 is 4 compared to the other samples that has
this statement as basis for the design, Amabile [3] describes a study
their median at 3. It is worth noting that the half numbers on the
which uses multiple experts that assesses the creativity of the piece
y-axis were not possible choices in the questionnaire. Those are
of work. As expert judges, a crowd of studio artists with at least
there only for visualization purposes.
5 years of experience were used. This study aims towards closely
mimic the study by Amabile [3]. The judges used in this study
were currently employed as music teachers and should therefore be Distribution of answers

considered domain experts within music and the creation of music. 6

5.5

3.1 Participants 5

For this study, 51 domain experts were used. As previously men-


4.5
tioned the experts were all, or had previously been, employed as
4
music teachers of some form. A majority of the participants (71,7%)
Grading

had been at their current position for at least 10 years. When asked 3.5

about total experience working with music, 58,5% stated that they 3
had more than 40 years of experience regarding music. Everyone
2.5
of the 51 participants had at least 10 years of experience working
with music in some way. 2

1.5

3.2 Implementation 1

In order to increase the number of of participants in the study, to ML1 H1 ML2 H2


Sample
be able to perform quantitative analysis, the study was performed
using an online questionnaire. The questionnaire contained four
different pieces of instrumental music, each 58 seconds in length. Figure 2: Distribution of creativity ratings among the differ-
Two of the samples were created by humans and two were created ent samples.
by Watson Beat. The subjects were asked to rate the creativity of
each of the four clips individually on a scale of 1 - 6 (1 = not creative
at all, 6 = a very high level of creativity) . An even numbered scale 4.2 Hypothesis testing
was intentionally chosen in order to prevent neutral answers [7]. To determine whether there is a significant difference between
In addition to grading the pieces of music, the subjects were also the ratings of the different samples, the samples were compared
provided with the ability to explain their grading in free text format. in MATLAB using a paired t-test. This test is used to determine
To prevent bias in the judgments, no background on the creation of whether there exists a significant difference between two, relatively
the samples were provided. Statistical analysis on the gathered data small, samples.
were performed using MATLAB. The entire study were carried out Table 1 displays the result of the significance tests. The number
during spring 2018. is the resulting p value when performing t-test between the samples
ECCE 2021, April 26–29, 2021, Siena, Italy T. Mejtoft, L. Lagerhjelm, U. Söderström & O. Norberg

Table 1: Significance levels between samples using t-test. if it has the ability to produce creative pieces of work. When using
this definition we arrive at the conclusion that machine learning
ML1 ML2 algorithms can be creative.
H1 0.0005 0.0685 The extent of this creativity is, however, something that has to
H2 0.0773 0.5969 be mapped out in future research. It is possible that this study found
the only piece of music generated by machine learning algorithms
that has the property of being creative. It should also be investi-
annotated by each corresponding label. For this experiment the gated which machine learning algorithms that performs better on
significance level where chosen as 95% using α = 0.05 This means creative tasks than others and which creative tasks are well suited
that if the t-test results in a value of p < α the null hypothesis for machine learning algorithms. This latter part provides a good
(the samples are equal) is rejected and we state that the results are starting point for inter disciplinary work between the machine
different. learning community and the psychology community.
On basis of these values of p it is stated that there is a significant This is an fast moving area and there is no doubt that machine
difference between the mean of the samples ML1 and H1 while for learning algorithms will be able to be even more “creative” in the
the comparison of the other samples there are no significant differ- future, making it increasingly hard to distinguish between human-
ence. For completeness sake, it should be noted that the confidence made and machine-made. During the last couple of years Generative
interval for the comparison between the two samples ML1 and H1 Adversarial Network (GAN) [9] has proven to become a important
was <-0.9362, -0.2795>. Since the interval contains only negative part of the creativity within machine learning. The goal of GAN is to
values, the second sample was rated significantly higher than the “give machines something akin to an imagination” [8]. Results from
first one. In this case the second sample was H1. e.g. image creation have been impressive in terms of creative effort
by machines. This kind of image generation has raised concerns
5 DISCUSSION since it can be used for sinister purposes, e.g hoaxes and frauds
Before digging into the interpretation and implications of these using so-called deepsfakes [17].
results we will spend some time discussing the methodology and
its effect on the result. Due to the relatively large number of partici- REFERENCES
pants, it is reasonable to believe that the statistical analysis provides [1] David H. Ackley, Geoffrey E Hinton, and Terrence J. Sejnowski. 1985. A Learning
an accurate description of reality. Algorithm for Boltzmann Machines. Cognitive Science 9, 1 (1985), 147–169.
[2] Ethem Alpaydin. 2014. Introduction to Machine Learning, Third Edition. The MIT
Factors that may have influenced the result is our choice of an Press, Cambridge, MA.
even numbered scale. As already mentioned an even numbered [3] Teresa M. Amabile. 1982. Social Psychology of Creativity: A Consensual As-
scale prevents the user from providing neutral answers. The benefit sessment Technique. Journal of Personality and Social Psychology 43, 5 (1982),
997–1013.
of this is of course that we will avoid the case were a majority of [4] Margaret A. Boden. 2015. Artificial Creativity. MIT Technology Review 118, 6
the users do not want to express any opinion. Due to the nature of (2015), 12–13.
[5] Camille Charluet. 2017. IBM’s Watson Beat: who owns music made by a machine?
this study it is likely that the subjects would have been tempted to Retrieved August 14, 2018 from https://thenextweb.com/apps/2017/10/20/ibms-
provide more neutral answers in order to not upset an assumed com- watson-beat-who-owns-music-made-by-machine
poser of the sample. As the subjects were not told that some pieces [6] Francisco Câmara Pereira. 2014. Creativity and Artificial Intelligence: A Conceptual
Blending Approach. Mouton de Gruyter, Berlin, Germany.
were created using machine learning algorithms, this could have [7] Ron Garland. 1991. The Mid-Point on a Rating Scale: Is it Desirable? Marketing
shown a noticeable effect. Our choice to leave out this information Bulletin 2 (1991), 66–70.
is motivated by the fact that the subjects preconceptions about the [8] Martin Giles. 2018. The GANfather: The man who’s given machines the gift of
imagination. MIT Technology Review (21 Feb. 2018). Retrieved November 28,
creative abilities of machine learning algorithms otherwise could 2020 from https://www.technologyreview.com/2018/02/21/145289/the-ganfather-
have influenced the result. the-man-whos-given-machines-the-gift-of-imagination/
[9] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,
The generalizability of this study is somewhat limited due to the Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial
number of available samples. It is not possible to make a reliable Networks. In Proceedings of the International Conference on Neural Information
statement about weather all machine learning algorithms are more, Processing Systems (NIPS 2014). 2672–2680.
[10] Beth A. Hennessey and Teresa M. Amabile. 2010. Creativity. Annual Review of
less, or equally creative as humans. The only thing that can be stated Psychology 61 (2010), 569–598.
is that the specific algorithm used can produce musical pieces that, [11] Geoffrey E. Hinton. 2007. Boltzmann Machines. Scholarpedia 2, 5 (2007), 1668.
to human experts, shows the same level of creativity as pieces [12] Warren S. McCulloch and Walter Pitts. 1943. A Logical Calculus of the Ideas
Immanent in Nervous Activity. The bulletin of mathematical biophysics 5, 4 (1943),
created by creative humans. 115–133.
[13] IBM Research. 2015. The Music of Machines. Retrieved August 14, 2018 from
5.1 Conclusions and future research https://researcher.watson.ibm.com/researcher/view_group.php?id=6376
[14] IBM Research. n.d.. IBM Watson. Retrieved August 18, 2018 from https:
As discussed previously, it is difficult to draw any generalized con- //www.ibm.com/watson/
[15] Mark A. Runco and Robert S. Albert. 2010. Creativity research: A historical
clusion from these result. However, it is possible to say that one view. In The Cambridge Handbook of Creativity, James C. Kaufman and Robert J.
sample in our experiment were rated at the same level of creativity Sternberg (Eds.). Cambridge University Press, Cambridge, UK, 3–19.
as its human counter part. On the other hand: this finding is quite [16] Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview.
Neural Networks 61 (2015), 85–117.
remarkable still as the only reasonable interpretation of this result [17] Oscar Schwartz. 2018. You thought fake news was bad? Deep fakes are where
is that machine learning has the ability to produce creative pieces truth goes to die. The Guardian (12 Nov. 2018). Retrieved December 2, 2020
of work. Remember that the definition of a creative agent proposed from https://www.theguardian.com/technology/2018/nov/12/deep-fakes-fake-
news-truth
in section 1.1 of this paper were that an agent is considered creative

You might also like