You are on page 1of 8

Representing Musical Knowledge in a Jazz Improvisation System

Damon Horowitz
MIT Media Lab
20 Ames St.
Cambridge, MA 02139 U.S.A.
damon@media.mit.edu

Abstract intelligence to equip the computer program with explicit


knowledge about musical structures by developing
A computer system for analyzing and generating
mechanisms which parse, organize, manipulate, and produce
improvisational jazz solos is currently under
these structures.
development. I maintain that an enumeration and
In order for music programs which emulate human
quantification of musical common sense concepts
musical abilities to behave in an increasingly intelligent
is necessary for the construction of musically
fashion, they must share humans' common sense
intelligent systems. My design of an architecture
understanding of music. The long term goal is for the
to organize and manipulate musical knowledge is
system’s knowledge representation to correlate as closely as
influenced by the agent-based approach of [Minsky
possible with our ideas about music, insofar as we can
86], and the spreading activation networks of [Maes
formulate descriptions of our understanding of music.
89]. A model of improvisation based upon this
Concepts such as "phrase" and "variation" must be
approach is described, focusing on the development
represented in terms of the interconnected mechanisms that
of a knowledge representation and control structure
give rise to their production and recognition. These
for this domain.
mechanisms must demonstrate the properties that musical
1 Introduction perception and cognition research have revealed, while also
suggesting methods by which new musical materials may be
A Generative Theory of Tonal Music [Lerdahl and Jackendoff created. I consider a representation of musical common
83] discusses the intuitions of an experienced listener in a sense concepts combined with the mechanisms that
musical genre. These musical intuitions are described manipulate them to be the basis of musical intelligence.
largely by preference rules with which structures are As an example of this approach, a model of analyzing
extracted from a musical surface. One reason why this work and generating jazz improvisations is presented. 1 Section 2
is significant is for its attempt to detail aspects of musical provides background relevant to this work. Section 3
comprehension which are recognized as being matters of describes my theory of how improvisational behavior can be
common sense, but which have not been explicitly produced. The development of this model has two main
enumerated previously. components. First, the model provides an example the
Unfortunately, the motivation for GTTM is largely types of information about music that I suggest should be
neglected by computer music systems for performance and explicitly represented in an intelligent system. Second, the
composition. There have been systems which employ some model describes an architectural framework, inspired largely
of the GTTM rules for surface analysis, but the more by the work in artificial intelligence by Minsky and Maes,
fundamental impetus behind this work has been lost: the within which this information can be organized and used in
need to make explicit the common intuitions of humans the service of intelligent behavior.
experienced in a genre has been ignored. I suggest that this
is the underlying cause for the limitations of current 2 Background
systems.
The two major disciplines from which this work inherits are
I share Rowe’s frustration with previous systems’
the study of music and the study of intelligent processes.
“algorithmic inability to locate salient structural chunks or
The fields of music perception and music cognition seek to
describe their function” [Rowe 93]. Generative computer
identify and experimentally verify properties of human
music systems are frequently noted for several categories of
musical processing. Particularly useful here are the results
shortcomings in their behavior: their inability to perceive
found regarding mental concepts of tonality, rhythm, and
what is salient, and to respect this in production; their
inflexibility in dealing with novel input; and their lack of
sophistication in producing increasingly complex output. 1 The question of whether or not a system is musically
Essentially, these systems are not equipped with human- intelligent in the sense that humans are is best pursued in an
level musical intelligence. Interpreting instances of music familiar genre where its output can be judged and compared to
as artifacts of cognitive processes, my approach incorporates human output; all too frequently, the opposite approach is taken
and the curious behavior of a system which is called intelligent
ideas from traditional and behavior based artificial is cloaked behind the mystique of an unfamiliar new genre.
melody [Bharucha 94; Desain 92; Dowling 94; Krumhansl
90]. These results suggest models for representations that 3 The Model
reflect musical features with proven psychological salience. The model presented in this paper focuses on the style of
Within music theory, grammatical parallels between music music performed by Louis Armstrong in his famous
and language have been explored in [Lerdahl and Jackendoff recordings with the Hot Fives in 1926 (although many of
83], which suggests cognitively motivated and linguistically the concepts here generalize to other musical domains). The
conventional methods for the extraction of structural main task for the program based upon the model is to play
information from a musical surface; several of these improvisations in 12-bar blues and 32-bar song formats
segmentation and grouping rules are directly implemented in which are similar to a target set of actual Armstrong solos
this project. [Lidov 75; Narmour 90] provide other from this period.2 This task includes the ability to listen to
contemporary concepts in general music theory which have solos as well, in order to refer to the musical material of
influenced the high-level ideas for this project and are other players when improvising and to exchange
partially reflected in the implementation. Less general improvisatory phrases with another player. With this task
works, specific to the jazz genre, are used for their the in mind, the goal of the model is to functionally describe the
enumeration of syntactic conventions which a system must musical style in terms of processes that create and parse it.
explicitly understand and respect [Amadie 90; Mehegan 59], The model proposed here provides a knowledge
as well as for qualitative descriptions of jazz improvisations representation and control structure which can perform these
[Schuler 68]. analysis and generation tasks. Thus, this model serves as a
Paradigms from traditional AI which are considered here theory of how improvisational behavior can be produced.
are scripts (for solo-structuring behavior), plans (modeling The resulting program will allow users to investigate
pitch trajectories over chord changes), and frames (for the workings of the model by adjusting state variables in the
organizing K-lines) [Schank 77; Agre 88; Minsky 86]. The system (variables with meaningful semantic labels, intended
behavior-based approach's use of spreading activation to indicate the mental state of the computer performer) to
networks for action selection provides robustness and produce changes in the output behavior that remain within
flexibility unavailable from more traditional rule-based the style but reflect the change in the program’s disposition
approaches [Maes 89, 94], yet can produce some of the same (including qualitative concepts such as emotion, as well as
plan-following behavior. However, much of the precision more technical concepts such as submission to conventions
and expressive power of traditional methods is lost with or use of favorite figures). This is the most simple mode of
agent-based approaches, and scaling them to larger and more interaction with the program, allowing clear evaluation of
cognitively challenging domains is difficult. Therefore, the the theory behind it; however, any program which
present system works with a hybrid of traditional and agent explicitly represents common sense concepts can use these
methods [Horowitz 94]. This is appropriate because music as hooks to interface elements for a wide variety of
is a domain which presents a dynamic and unpredictable applications.
environment combined with the need for rational high-level The following sections provide a summary discussion
structured behavior. Finally, the work in artificial of the model for this program. The implementation of the
intelligence which has the largest influence over this system is currently underway in C++, using mix-in classes
approach is the theory of mind described by Minsky for the shared functionality of different agents. The program
[Minsky 86, 89], which is elaborated upon later in this is being written on top of a scheduler that I developed using
document. Apple’s MIDI Manager. An initial demonstration version of
There have been several previous efforts to describe jazz the system will be completed by summer 1995. A more
improvisation in computational terms, as well as several detailed description of the model will be available at that
computer jazz programs which attempt to consider some of time as part of the documentation of the working system.
the theoretical issues posed by the problem. [Johnson-Laird
91, Pressing 85, Sudnow 78] present fundamental ideas Global Description
about the nature of cognitive processes producing The general task of “soloing like Armstrong” has been
improvisations, suggesting that much of improvisation restricted in order to make an implementation feasible. The
results from the triggering of learned behaviors (similar to program is provided with the structure and main melody of
the learned patterns described in [Pennycook 93], or the the songs it will play, much as a musician is given a lead
PACTS of [Ramalho 94]), and instantaneous choices sheet, or is expected to know the chord changes of standard
regarding whether to alter or continue a triggered path. songs. The program is not responsible for determining
[Levitt 81] presents an early model for a jazz improvisation where the beat is (beat-tracking is a difficult problem by
system which is more inspired by an intention to model the itself, addressed elsewhere [Desain and Honing 92]); a
cognitive and psychological factors of improvisation, and daemon runs concurrently with the program and updates the
[Widmer 92] stands out for the explicit acknowledgment that program’s conception of its position in the piece’s form (so
musical systems must be knowledge-intensive to function in
a sophisticated manner reflecting the music’s context.
Agent based approaches have also been suggested, usually in 2 Trademark aspects of Armstrong’s playing style such as his
order to separate processing for different tasks; [Riecken 92; innovative use of the trumpet and timbre are not considered
Rowe 93] offer more direct attempts to implement portions here; I consider only MIDI information such as pitches and
of a Society of Mind architecture. rhythms here. The inventiveness of his improvised tunes
provides sufficient material for study.
Goals and Emotions Melodic Concepts
Database of Melodic Lines
reductions contours
Context Lines Active in Song
structures groupings
meter
Motor-Action Module
harmony Low-level Features
pitch rhythm
Input Solo Line Output Solo Line

Figure 1: components of the program.

that the program always knows the metric and harmonic high-level actions and thoughts is described as requiring the
context, where the “breaks” are, etc.). The program listens activation of many different kinds of representations and
only to the single-line improvisations of the other soloists, processes with localized information. In this scheme,
and produces single-line improvisations in its turn; both of concepts are represented in memory by K-lines, which are
these lines are represented as MIDI streams, and the recorded state values of different agents. For example, an
operations happen in real-time. Finally, the current system apple is represented by a set of K-lines, each of which
has no provision for retaining learned material beyond a activates agents of color, taste, shape, weight, etc. to be in a
single session; it begins each session knowing only those particular state.
musical lines, figures, and concepts which are part of its In my model, musical concepts are represented in the
database. In principle, the current architecture is capable of model by specialized agents. Each type of concept has an
supporting an extension which retains new musical and agent that describes it, listing: its components and their
conceptual material over time. relations and qualities; the preconditions for its occurrence,
The central portion of the model is the representation of and the postcondition effects of its occurrence; the
melodic concepts, the types of musical ideas that the parameters which determine the range of different
program knows about and can therefore create and recognize. instantiations of the concept; and pointers to higher-level
This representation is used by the procedures that analyze agents for which this concept plays a role. By separating
and generate melodic lines. These procedures make use of concepts into agents, each agent can be an expert in a single
the following additional modules in the program (Figure 1): specialized and simple task; it need only be aware of that
a set of mental state variables, which represents the information which relates to its concept, thus insulating it
program’s current set of goals and emotions, as well as the from the complexity of dealing with unnecessary
levels of activation of concepts in the concept network; a information. Each agent is able to detect the occurrence of
motor-action module, which takes as input short fragments the concept that it corresponds to through sensors, and also
of textural descriptions of a line by the low-level pitch and to effect an occurrence of the concept through actuators.
rhythm agents and produces the actual output notes; For example, an agent for ornamentation of a pitch lists
routines for updating the current harmonic and metric the different types of ornaments which are possible
context (e.g., spreading activation to indicate an approaching (described in terms of the parameters that distinguish them),
chord change); a database of melodic lines and figures that has sensors that parse streams of pitches to locate these
correspond to the wide set of tunes, sections of ornaments around an important pitch, and can produce
improvisations, etc., that commonly occur in the solos of rhythm and pitch textures that realize ornaments of a given
this period (each musician is familiar with a set of phrases, pitch. Each agent is sensitive to context in that the
and also has his own “bag of tricks” that are repeatedly behavior of each of the mechanisms that works with each
used); and the set of melodic lines and figures that have been type of information is influenced by the settings of other
played during the current song and are therefore active ideas agents. In this example, the ornamentation agent is told
in the environment. These elements constitute the song- which pitch to elaborate, and then in turn tells the rhythm
environment or context that the program knows about when and pitch line agents how to create a specific ornament.
analyzing and generating. The agents are arranged in overlapping hierarchies
according to the nature of the concept that they represent.
Representation of Music The main axes of these hierarchies correspond to the
I propose that multiple representations of musical material following features: the effect of the concept, what it
are necessary for flexible and sophisticated analysis and conceptually opens or closes, its emotional quality, etc.; and
generation. My perspective in developing mechanisms the level of abstraction of the concept (from the musical
which can handle the large amount of information required surface to abstract concepts, and also from small chunks of
for musical intelligence is largely influenced by the theory time to larger forms), e.g., a concept that handles lists of
of mind described in Society of Mind [Minsky 86]. notes, contains contours of simple features, depicts an
Minsky suggests that intelligent behavior can arise from the abstract relation between chunks of music, etc. Thus, the
interaction of many specialized smaller agents; each of our set of musical concepts that the program knows about are
arranged in a large interconnected network, linked both by “marked for consciousness”, in terms of the coloration of
their positions in the hierarchies and by their relations as each event and its perceived accent given the roles that it is
components and pre-/post-conditions of each other. serving. For example, we might say that the first note of a
Each concept also has an activation level, a number phrase is the accessor through which we start replaying the
which reflects its presence in the musical material when phrase in our heads, but it is really the first foundational
listening, and its influence on the musical material that is tone that we remember as being what the phrase is about in
being generated. Activation is spread through the network a reduction, while we also focus on the particular elaborating
to guide analysis and generation. When a concept is tone that gave the phrase its distinctive quality.
recognized during the analysis of input, the agent associated Maintaining these different perspectives is useful for both
with that concept becomes more active, corresponding to the generating and analyzing new material.
intuitive notion of semantic priming used in cognitive These concepts are filters through which we can
science and linguistics. The more active agents are then the examine a phrase in order to determine its qualitative effects
first to be considered when examining new material. When and its similarity to other phrases from different
a concept is determined to be appropriate to occur in a perspectives. Identifying these levels in the musical surface
generated improvisation, it is also given higher activation; allows us to have a notion of a phrase that is richly
the degree of activation then determines the influence that interwoven with our general knowledge about types of
each agent has over the output. In addition, each of the phrases. It is through these devices that the concept of one
musical lines in the known database and each of the active phrase referring to another (either within the solo, or from a
musical figures in the song-environment also has a level of separate tune) is realized.
activation, corresponding to the degree to which it is
recognized in incoming music or the degree to which it is Melodic Line
considered in generating music. Activation levels are Different agents capture different features of a melodic line.
propagated between the figures and the concepts which Actual pieces of a line are stored as a collection of links to
constitute the figures; this completes the intuitive model of the states of agents in the network. In the terminology of
specific instances of figures priming the general concepts of SOM, a fragment of music is a set of K-lines which record
the figures, as well as the reverse process in which a general the state of different agents as they reflect the music, and can
concept primes the specific figures that are instances of it. later reactivate the agents to assume the same state. Each
The following section describes the types of concepts that agent can be seen as functioning as a feature-space onto
are thus represented in the system. which a section of music is projected, thus focusing upon
some subset of its features. The state of the agent when
Musical Concepts considering this piece of music is the set of parameter values
The musical concepts represented by the system are those and pointers to other agents that it uses to understand (parse)
which describe a musical phenomenon in terms of its the music. In sum, a melodic line is represented as the set
composition of smaller musical structures. While there are of perspectives on the line contained by the different agents
a variety of conflicting interpretations about how humans in the system.
represent music, there are several types of structures which For example, the multiple representations of a simple
seem to be generally and uncontroversially acknowledged as motive (a list of timed notes) are the set of K-lines reflecting
contributing to our cognition of music. On a basic level, its rhythmic, melodic, and harmonic components (Figure 2).
features of notes (their relative emphases, articulations, color The rhythm agent determines the level of activity (which is
[harmonically], register, etc.) must be recognized and roughly similar to pulse level), the amount and type of
recorded as contours of changing parameter values over time syncopation, and the prominence of other cycles (e.g., a
[Dowling 94]. These features also imply expected crossrhythm or polyrhythm) to find salient rhythmic figures.
continuations, as suggested by [Narmour 90]. The next The melody agent looks at the pitch contour and the
level of structure consists of groupings of events (notes and abstracted simplified lines (prolongation and timespan
sets of notes), based upon similarity of features. These reductions) for key anchor points. The harmony agent
simple levels of comprehension are necessary for musical determines the hierarchical relations of tension and
tasks, and are producable by mechanisms which exist for resolution pitches given the harmonic framework of the
general purpose human cognition. song. Each of these agents then has its own representation
Analytic reductions of music are higher-level musical of possible groupings, and points of emphasis. Finally,
structures; these are interpretations through which some there is a structural agent, which examines the patterns
notes can be heard as serving subsidiary roles to others. In present in each of the specific agents' representations (by
particular, some events are said to be elaborations or querying them) and indicates places of repetition or simple
ornaments of major events, or are said to lead up to or variation of structure. One of the strong features of this
prolong major events, recursively through larger sections of model is that it allows this approach to maintain the
music [Lerdahl and Jackendoff 83]. This is relevant here in integrity of having each agent manage its own information
so far as it relates to human comprehension and memory of in its own way; for example, each agent has different criteria
a line of music; instead of simply memorizing a list of for what constitutes a similar repetition or a slight variation.
timed pitches as such, a musician can remember the major The key point is that the system has multiple
events and the ways in which they are elaborated. Forming representations of a melodic line; this is appropriate
groups and reductions requires perspectives on what in the becauseof the fact that different functions require different
musical surface is perceived as standing out or being types of information about a line, since different tasks
reduced melody: F g E D a C
rhythmic activity:

repeated structures:

Figure 2: a melodic line from three perspectives represented in the model.

operate from different perspectives. in the agents conceptually beneath it) that is a component of
The benefit of this type of representation is that a single the concept it represents. In this way, the different agents
musical line can be manipulated in different ways depending piggyback on each others’ discoveries of features in the
upon which aspects of it (i.e. which agents’ features) are input. For example, the low-level rhythm agent parses the
considered. A simple example of this is the separation of input into notational rhythm values, which are then grouped
rhythmic and pitch information about a line; a more by a rhythm-grouping agent, and examined by a cycle-agent
complex example is the separation of a prolongation for the presence of crossrhythms.
reduction perspective from the set of local elaboration High level agents represent musical form, a concept
figures that are used. Further, since the agents themselves referring to any level of structural repetition. This concept
exist in hierarchies of types of concepts, a given line can be identifies the repetition of similar phrases and the repetition
seen as a specific instantiation of general abstract types of of a figure within a phrase. These agents recognize
concepts. This abstraction allows for comparison of repeating patterns in the music by looking for patterns in
musical figures according to their types along abstract the other agents’ parses of musical lines into features.
feature axes, and provides a rich set of metrics for Rather than examining each agent’s state over time, the
determining their similarity or relation (information which form-agents poll the other agents, requesting to be notified
is necessary for understanding music, and also used in when a repetition has occurred. Each individual agent
generating variations). locates simple patterns in its type of feature by itself,
Given these types of representations of the musical line similar to the phenomenon suggested by SOM as time-
itself, specifically of the structures of phrases, parts of b l i n k i n g . The form-agents look for repetition in
phrases and groups of phrases related to context over time, conventional places (e.g., corresponding to metrical
several phenomena related to our perception, consideration, divisions), but can also be activated if an agent determines
and creation of music can also be modeled. The general idea an unconventional repetition, which in turn will cause the
of chaining musical ideas, or of having musical associations form-agents to suggest to the cycle-agent that a crossrhythm
or priming of categories, is represented in this model by may exist. Again, the abstract hierarchies present in the
spreading activation of ideas through hierarchies of types of representation are useful here; patterns which are not literal
structures defined above. This leads directly into the model’s repetitions in the music surface but are direct conceptual
description of generating music, which is summarily repetitions can be easily detected with this scheme.
described as follows: goals and intentions spread activation The other major activity of listening is the spreading of
to concepts which realize them (downward through a activation between concepts. This models the experience of
network), while at the same time the currently active figures priming and expectation -- predicting future material based
and structures in the environment spread activation to their upon the past. The representation of concepts containing
related concepts (up through a network), and the concepts sequential events lends itself to a direct implementation of
with the highest activation are the ones which are realized in this idea: when a given agent’s sensors have noticed the
the generated music. The following sections describe the presence of a component which begins a sequence in a
listening and generating functions of the program. These concept, the agent primes the appropriate feature detectors
functions are the inspiration for the general representation (either sensors or lower agents) to expect the next
scheme outlined above, which is designed to accommodate components in the sequence. For example, an agent for
the types of manipulations of information that analyzing and tonal resolution expects a tonic after detecting a leading
playing require. tone, and thus the pitch agent is set to look for the
appropriate note. This corresponds roughly to the
Listening description of language understanding in SOM. The input
The program’s task when listening is to identify, in an of a word (here, a musical “word” is an event for an agent at
incoming stream of notes, the musical concepts which the any level) activates a set of frames which are interested in
system knows about. In other words, the activity of this word, either semantically (e.g., an apple activates ideas
listening consists of building instances of the about eating) or in terms of conceptual dependencies (e.g.,
representations discussed above. This is accomplished the origin role of a trans-frame). These correspond
through the use of the sensors on each type of respectively to spreading of activation to more generalized
representation; the sensors detect a feature in the input (or concepts from the parsed one, and following a sequential
figure which expects later roles to be filled in. A failure of determine what the next motor action will be. In other
the sensors to find the expected event requires extra words, the decision of what to do is made by simply
processing to recommence the parse of the input. The checking which actions look best at the time that an action
occurrence of this frustration, at different levels of severity, is needed; agents with the highest values among the
is a key component to Meyer’s theory of emotion in music competing agents when polled by the motor routine are the
[Meyer 56]. The system implements this theory literally by ones which affect the output. This corresponds to the
changing the emotional state variables of the program when intuitive notion of a player selecting a sequence of small
these frustrations occur. The listener thus indicates both the paths that he can realize on his instrument; this approach
comprehensibility of input (in terms of its susceptibility to assumes that all actions are motivated, if by nothing else
each agent's attempts to analyze it) and also its qualitative then by the goal to just do something, or by a kinesthetic
interest (each agent can subjectively label the local sense [Sudnow 78]. Note that to be silent is to have the
phenomena it encounters). explicit action concept of rest be more active than the
The final result of the listening process is that the input others. Here are some other examples of action/concept
musical line is memorized as part of the active environment descriptions which could be actively influencing the output:
during the song. It remains in this buffer, accessible for launch a new phrase, choose a motive from the melody, vary
further listening and playing processes, until the conclusion the previous phrase, repeat a partial structure, hit a high
of the song. The active musical ideas already in the “C”, use filler material for two beats, emphasize a structural
environment are compared against the incoming line. A element, assert a syncopated rhythmic figure, conclude this
match can be used to identify the higher-level structural solo, etc.
forms being played. That is, the system can recognize if The use of goals here is based upon the standard use in
material (conceptual or literal notes) from its set of ideas is spreading activation networks for action selection [Maes 94].
being quoted or referred to, and can then label the current The goals spread to concepts that produce the qualitative
setting appropriately (e.g., as an elaboration of the song’s label associated with the goal: pitch/time trajectories
tune, or as a series of variations on the previous solo’s modeled after [Clynes 78] relate directly to emotional
closing phrase). At no point is it necessary to decide upon a qualities, as do higher-level relations between phrases (e.g.,
single group parsing or reduction interpretation of the input, call and response, amplification or elaboration, contrast,
since all of the active agents’ states are stored as a set of K- etc.). [Rieken 92] describes a system that is similarly
lines that represents the figure. However, the relationship interested in the relationship between the effect of musical
between consecutive phrases (or between any phrase and its features (his focus is on pitch intervals) and emotions in
original referent) indicates which aspects of the determining generated music. In my model, the framework
representation, and thus which interpretation, are focused allows for the labeling of any type of concept (e.g.,
upon in a given instance. These sequences of relations (such structures, groupings, reductions, roles, colorations) with an
as the maintenance of a reduced melodic line or of rhythmic affective quality, which can vary as a function of context.
figures) are then stored as the high-level structural form Each concept can be seen as a short script for how to
describing a script that the input is assumed to have perform a certain behavior. The actuators of an agent define
followed. Using the original solo’s relations between its the sequence of steps and types of conditions required to
phrases is a technique for creating a similar set of phrases in realize the concept. This occurs both on a small level, as in
a new solo, leading to the effect of having played the same the case for a script ornamenting a pitch, and on a larger
“sort of thing” or in the same “style”. level, such as a conventional script for developing material
over the course of a solo. A script for a solo is represented
Generation in terms of form-agent structural relations between the
Generation of an improvisational line is viewed here as an phrases in the solo, with the corresponding emotional effects
action selection problem. I assume that an improvisation for each relation; this allows for a description of rising
does not follow a simple set of rules, but rather is influenced intensity, or trajectories through “moods”, or an entire a
by a variety of sources concurrently. This type of decision dramatic form played over the course of a chorus. For
making is well modeled by a spreading activation network example, a script for a solo could say: begin with a
which responds to both goals and the environment [Maes statement of material active in the environment (perhaps
89]. My model uses a hybrid system in which actions are from the song’s main tune); focus on a perspective of this
chosen by competition through spreading activation material that corresponds to an affective label in keeping
combined with traditional AI structures of rule-based with the program’s mental state; maintain this aspect while
constraints and script-following fragments. The actions in creating a variation of the original statement; evolve from
the network are the hierarchically arranged musical concepts, this newly played material through successive variations
each of which is a type of thing that can happen in music. while rising in intensity (modeled as a degree of
When generating a line, the program’s goals spread exaggeration in the parameters of a perspective) towards a
activation through the network of concepts, as do the active climax; then close the solo, resolving any active musical
ideas in the song environment. As this spreading activation concepts.
is occurring, the motor module launches sequences of a few Spreading of activation happens in several directions.
notes (corresponding to a single learned physical gesture, a In a top-down fashion, goals spread to those mechanisms
motor “riff”) according to the lowest level agents’ that realize them, such as the scripts for how to behave over
descriptions (a pitch and rhythm texture). As each motor the course of a solo and the treatment of conventions. In
routine is concluded, these agents are polled again to addition, the agent for each concept has actuators which
deliver activation to the agents which realize its “groove”, the type of interaction between two soloists, etc.
preconditions and to those which represent the its sub- These questions can be better pursued on top of a framework
components. For example, the appoggiatura concept of common sense knowledge about music. To program this
activates the rhythm agent and the pitch agent in synchrony knowledge is a difficult task. Towards this end, I have
to place the expected goal tone on a less stressed pulse created an architecture based on ideas from Minsky and
following a coloration tone in the stressed position. Maes, and a specific model of a task in a restricted genre.
Bottom-up spreading occurs from those figures which are The proposed computational model of improvisation
selected to be played to their associated successors. This is a reflects my approach to building intelligent music systems.
specific case of the general spreading of activation from lines The use of multiple representations, consisting of K-lines
active in the song-environment. A figure active in the activating agents, combined with a spreading activation
environment spreads activation to its associated concepts’ network, allows for an interesting model of the chaining of
actuators at those moments when it can begin (with respect musical associations in the mind of a performer. As
to the position in the solo, the chord changes, the meter, different aspects of what is being played (the focus of
etc.). For example, a likely figure for a turnaround or the different agents) are given more attention, the active agents
concept of starting a solo with two similar 4-bar phrases are can emphasize and expand this type of information in a
each activated at particular moments. If a figure has been melodic line. The spreading activation network can model
started (that is, if its initial concepts correspond to those both a temporary obsession with a particular melodic idea,
chosen to be played), it remains active and activates the rest as well as its eventual fatigue. The desired result is that an
of its sequence of concepts until they are not chosen by the improvisation can develop a sense of direction over time
system. Thus, there is spreading activation from the context based upon the stringing together of fragments of the
of the song (serving as the environment): it both suggests representations of active musical materials. The limitations
figures and concepts that can occur at a particular location of the current model include the absence of an ability to
from the set of figures active in the song-environment, and learn and the restriction to a specific genre. Further
also works on a low level by urging rhythmic actions (e.g., evaluation of the model and the general approach awaits
spreading activation to the agent that seeks to play rhythms completion of the implementation.
in phase with the meter) and pitch choices corresponding to
the metrical and harmonic context. References
This use of spreading activation is essentially a way to
combine the interests of different musical concepts that the Agre and Chapman (1988). “What are Plans for?”. A.I.
system wants to realize at a given point. However, it also Memo #1050. MIT.
does allow for simple emergent planning behavior [Maes Amadie, J. (1990). Jazz Improv. Thornton Publications,
89]; for example, the concept of doing a leap requires that Pennsylvania.
the leap start from a chord tone and thus spreads activation Bharucha, J. (1994). “Tonality and Expectation”, in
to the pitch agent to effect this. However, the most Musical Perceptions. Oxford University Press, New York.
important feature of the use of a spreading activation
network here is its capacity to model the phenomenon of Clynes, M. (1978). Sentics: The Touch of Emotions.
chaining musical associations and ideas in the mind of a Anchor Press, Garden City.
musician. This allows exploration of high-level ideas about Desain, P. and Honing, H. (1992). Music, Mind, and
how a solo evolves over time, maintains a sense of Machine: Studies in Computer Music, Music Cognition,
direction, or follows a dramatic curve of intensity. By and Artificial Intelligence. Thesis Publishers, Amsterdam.
continually referring to what it has done, a system can be
Dowling, W. (1994). “Melodic Contour in Hearing and
opportunistic in its composition, cascading local processes
Remembering Melodies”, in Musical Perceptions. Oxford
to create larger forms.
University Press, New York.
4 Conclusion Horowitz, D. (1994). “A Hybrid Approach to Scaling
Action Selection”. Unpublished paper, MIT Media Lab.
This paper proposes an architecture for the representation of
common sense concepts about musical improvisation, of the Johnson-Laird (1991). “Jazz Improvisation: A Theory at the
structures we notice when listening and intend to create Computational Level”, in Representing Musical Structure.
when playing. The representation models how these Academic, London.
concepts may be recognized and produced. Each concept is Krumhansl, C. (1990). Cognitive Foundations of Musical
embedded in a network of other musical concepts, so that the Pitch. Oxford University Press, New York.
meaning and consequences of concepts are defined by their
effect upon other agents, the mental state, etc. I suggest Lerdahl, F. and Jackendoff, R. (1983). A Generative
that it is through such explicit enumeration and Theory of Tonal Music. MIT Press, Cambridge.
quantification of musical concepts, which can represent Levitt, D. (1981). A Melody Description System for Jazz
different and conflicting perspectives, that progress can be Improvisation. Masters Thesis, MIT, Cambridge.
made towards understanding musical intelligence. Lidov, D. (1975). On Musical Phrase. Groupe de
I share Widmer’s acknowledgment of the knowledge Recherches en Semiologie Musicale, Universite de
intensive nature of questions about musical understanding Montreal.
concerning performance expression, and believe this also
applies to beat-tracking, the nature of swing or of a
Maes, P. (1989). “How to do the Right Thing”, in
Connection Science Journal Vol. 1, No. 3.
Maes, P. (1994). “Modeling Adaptive Autonomous
Agents”, in Artifical Life Journal Vol. 1, No. 1 and 2. MIT
Press, Cambridge.
Mehegan, J. (1959-1965). Jazz Improvisation, in four
volumes. Watson-Guptill Publications, New York.
Meyer, L. (1956). Emotion and Meaning in Music.
University of Chicago Press, Chicago.
Minsky, M. (1986). The Society of Mind. Simon and
Schuster, New York.
Minsky, M. (1989). “Music, Mind, and Meaning”, in The
Music Machine. MIT Press, Cambridge.
Narmour, E. (1990). The Analysis and Cognition of Basic
Melodic Structures: The Implication-Realization Model.
University of Chicago Press,
Pennycook et al. (1993). “Toward a Computer Model of a
Jazz Improvisor”, in International Computer Music
Conference 1993 Proceedings, ICMA, California.
Pressing, J. (1985). “Experimental Research into Musical
Generative Ability”, in Generative Processes in Music.
Oxford University Press, New York.
Ramalho, G. and Ganascia, J. (1994). “Simulating
Creativity in Jazz Performance”, in Proceedings of the
Twelfth National Conference on Artificial Intelligence,
Seattle, WA.
Riecken, R. (1992). “Wolfgang: A System Using Emotion
Potentials to Manage Musical Design”, in Understanding
Music with AI. MIT Press, Cambridge.
Rowe, R. (1993). Interactive Music Systems. MIT Press,
Cambridge.
Schank, R and Abelson, R. (1977). Scripts, Plans, Goals,
and Understanding. Erlbaum, Hillsdale, New Jersey.
Schuller, Gunther (1968). Early Jazz: Its Roots and
Musical Development. Oxford University Press, New York.
Sudnow, D. (1978). Ways of the Hand: The Organization
of Improvised Conduct. Harper and Row, New York.
Widmer, G. (1992). “A Knowledge Intensive Approach to
Machine Learning in Tonal Music”, in Understanding Music
with AI. MIT Press, Cambridge.

You might also like