You are on page 1of 37


Jazz Improvisation:

A .lheory at the Computational Level

P. N. JOHNSON-LAIRD Department of Psychology Princeton University Princeton, NJ 085,44, USA


There are two principal reasons for cognitive psychologists to study the improvisations of jazz musicians. First, the process of improvisation is an unusual form of expertise, and all forms of expertise are a proper concern for anyone interested in how the mind works. A better understanding of the skill might even be pedagogically useful one day. Second, jazz improvisation depends on imagination and its study may help psychologists to understand the nature of creative mental processes.

My aim in this chapter is to outline a psychological theory of improvisation in modern jazz - the idiom that was developed in the 1940s by Charlie Parker, Dizzy Gillespie and their colleagues, and that has continued to be the dominant style to this day. The theory concerns what the mind has to compute in order to produce an acceptable improvisation. A theory of what is computed is not, of course, a theory of how the computation is carried out. Indeed, the distinction between the two, which was emphasized


Copyright © 1991 Academic Press Limited All rights of reproduction in any form reserved.



P. N. Johnson-Laird

by the late David Marr (1982) in his studies of vision, is crucial to advancing knowledge - to understand how the mind functions, we" need first a good account of what it is doing. However, I will also discuss two general approaches to .how the mind may generate improvisations: one is based on the manipulation of explicitly structured symbols, which is the traditional method of modelling mental processes in computer programs; and the other is based on the manipulation of distributed representations in which explicit structure plays no part in processing, and which forms the basis of 'connectionist' theories (Rumelhart and McClelland, 1986; and for an introduction, Ch. 10 of Johnson-Laird, 1988a). Finally, I will spell out how jazz improvisation relates to the general nature of creative mental processes.

The essential psychological feature of musical improvisation, whether it be modern jazz, classical music, Indian, African, or music of any other sort, is that the musicians themselves do not have conscious access to the processes underlying their production of music. A lay person may find this claim surprising, perhaps even incredible; a cognitive psychologist will find it prosaic. The fact is that human beings have conscious access to only a small part of the contents of their minds, and hardly any access whatsoever to mental processes. This point was known to Helmholtz (1897) and it is corroborated by the very existence of cognitive psychology. A direct way to convince those who may doubt it is to ask them to devise a computer program that produces a musical improvisation, or, for those who are musically naive, to devise a computer program that tells stories. If one had conscious access to the complete processes underlying such skills, the demand would be trivial. Existing programs for improvisation and telling stories, however, have only the most rudimentary abilities because programmers, even if they are competent musicians or authors, cannot discern the basis of their abilities merely by introspection. The ethnomethodologist, David Sudnow (1978), has written an engaging account of the phenomenology of learning to play jazz piano. The very title of his book, Ways of the Hand, indicates that as one develops the ability to improvise the skill seems to come out of the end of one's fingertips. Its main components are profoundly unconscious.

A common misconception about improvisation is that it depends on acquiring a repertoire of motifs - 'licks' as they used to be called by musicians - which are then strung together one after the other to form an improvisation, suitably modified to meet the exigencies of the harmonic sequence. There are even books containing sets of 'licks' to be committed to memory to aid the process. Surprisingly, the error has also been perpetrated by theorists. In characterizing jazz improvisations, Ulrich (1977) writes: 'Sequences of motifs are woven together to form a melody. Rather than constantly inventing new motifs, the musician modifies old ones to

9. Jazz Improvisation: A Theory at the Computational Level


fit new harmonic situations.' A similar idea has been implemented in a program devised by Levitt (1981) for improvising jazz melodies. The program takes as input a chord sequence and an existing melody. It divides the melody up into units of two bars, and then re-uses these elements in a different order and in variant forms. The program is entirely deterministic, i.e. given the same input melody and chord sequence, it produces the same improvisation. This characteristic, as I shall argue, is also inappropriate for . human improvisations.

Why can one be confident that the 'motif' theory is wrong? There are three reasons. First, someone has to invent the motifs. If a musician is the first to playa particular motif, then he or she cannot merely be regurgitating it from memory. Second, although most musicians have certain phrases - often a rhythmic pattern rather than a melodic motif - to which they are addicted, an analysis of corpora of the musician's improvisations yields many phrases that occur only once. A sceptic might say that an analysis of every single improvisation made by a musician would falsify this claim. Yet, there would still be many possible phrases characteristic of the musician's style even if they never occurred in the corpus. Third, the labour of committing to memory a sufficient number of motifs to guarantee the improvisation of complete solos is altogether too large to be practicable. Of course, there are jazz musicians who are not good at improvisation, and there were some in traditional jazz who never improvised a solo or else merely replayed one that they had worked out in the past. But, any competent jazz musician will tell you that it is far easier to make up new phrases than to try to learn a vast repertoire of them for use in solos. An apt analogy is speech: discourse would be intolerably difficult if it consisted solely in stringing together remarks that one had committed to memory. It is this sort of stilted jumble of phrases that one is forced to produce in a foreign language where one's only guide is indeed a book of 'licks', -i.e. a phrase book.


In modern jazz, an improvisation consists of an extemporized melody that fits a tonal chord sequence of, say, 3t bars in length, which is repeated as many times as necessary. The rhythm section, which typically consists of piano, double bass and drums, provides the accompaniment. The drums state the basic metrical pulse, usually four beats to the bar, emphasizing the weak second and fourth beats, and playing rhythmic figures to accompany and to stimulate the improvising soloist. The bass player improvises a base line to the chord sequence, and also helps to maintain the metrical pulse at a fixed tempo. The pianist improvises a statement of the chord sequence,


P. N. Johnson-Laird

varying the choice of chords and their voicings, and again providing rhythmic figures to help to create a feeling of relaxed 'swing'. Although there are big bands that play modern jazz, the genre was originally created by small groups containing two or three brass instruments - trumpet and saxophone - and the rhythm section. All the musicians in the group may take turns to improvise solos of several choruses, i.e. to the 32-bar chord sequence. A performance of a particular piece usually begins and ends with an ensemble statement of the melodic theme, which the front line instruments may play in unison or else in a simple arrangement. The improvised choruses occur between these two statements. The chord sequences favoured by modern jazz musicians derive from popular songs, typically by such composers as George Gershwin and Cole Porter, from compositions of their own, or from the ubiquitous 'twelve-bar blues'.

A fragment from a typical improvisation is shown in Figure 1. It is a transcription that I made of a melody improvised by the the late Bill Evans, an outstanding modern jazz pianist, on his record, Explorations (Riverside RLP 351). He was improvising to a chord sequence based on the harmonies of the popular song, 'How Deep is the Ocean', and the figure displays these chords in a conventional notation, which I will explain below. What the transcription fails to make explicit is the particular rhythmical quality of modern jazz. This failure is, in part, because there is no precise account of this style: musicians acquire it by listening to virtuosos and seeking to emulate them, and, though they develop a discriminating ear for what 'swings' and what does not, they are unable to explain the underlying rhythmic principles. Even if such an account existed, it is an open question

~~D~ITf I [fr 17fT Q) 1- I lil] I 17 cf

Dm Em 7,5 A7 om 7 3 8m 7·5 & E7q Am 7

Figure 1. An improvised melody by the late Bill Evans on a chord sequence based on 'How deep is the ocean'.

9. Jazz Improvisation: A Theory at the Computational Level


As written } j J J J j 7 )1 I ~ J. ..
As Played 116 1510 125 148 137 1658 107 I 1312 95·· •
[centisecs] 17 2517 '22' 20- 74 17 25 14"' Figure 2. The rhythm of the repeated phrase in Charlie Parker's theme, 'Now's the time'.

whether its description in conventional European musical notation would be particularly informative.

The problem of characterizing the rhythmic style of modern jazz can be illustrated by some empirical observations that I have made concerning the performance of the rhythm shown ill" Figure 2. Jazz enthusiasts will recognize it as that of Charlie Parker's twelve-bar blues theme, 'Now's the Time', which repeats the phrase shown in the figure several times. If a classically trained musician plays the phrase as it is notated, it will lack its essential jazz flavour. One approach to pinning down the nature of the rhythmic component is to measure the actual onsets and offsets of the notes from a recording of them by a jazz musician. Here, however, one runs into the problem of determining the precise points to measure. The start of a musical note turns out to be a somewhat indeterminate notion: should one measure from the point on the oscillograph at which the first vibration occurs? Presumably not, because this point will certainly not correspond to the perceived onset of the note. Rather than struggle with this problem, I used two complementary procedures. With a simple computer system for generating music, I manipulated the durations of notes until the output began to resemble the sound of an authentic performance. Figure 2 displays one set of durations that produced a satisfactory performance.

Another set of observations was collected through the good offices of Carol Krumhansl at Cornell University: I played a simple blues theme by Milt Jackson, 'Bag's Groove', and her computational set-up recorded the onsets and offsets of notes from the keyboard. An example of the resulting durations are shown along with the conventional notation of the theme in Figure 3. What is striking in both of these cases is that there appears to be no simple durational principles that are responsible for producing the timings of jazz performance. Readers who have never consciously listened to modern jazz are advised to listen to, say, the original Bill Evans recording, if they wish to get a feel for these tacit conventions of modern jazz. Although there is no explicit account of them, they undoubtedly exist, and they take some years of assiduous practice to acquire.


P. N. Johnson-Laird

73·5 4'0 6 4 12 24 4 4 4 13·5 24 4 5 3 14·5



Figure 3. The timing of a rendition of the opening phrase of 'Bag's groove'.

Lacking more comprehensive information, I will turn from the temporal conventions governing performance to the conception of the rhythms used in improvising melodic phrases, and I will assume that if and when the temporal conventions are satisfactorily described, they can be treated as a kind of 'filter' into which are fed rhythmic patterns, 'as conceived by the musician, to emerge with actual durations specified in real time. A similar conception of performance can be adopted for classical music: the score captures the conception of the rhythmic structure of the piece. Its realization in an actual performance depends on further tacit interpretative conventions that have been acquired by performers (Longuet-Higgins and Lee, 1984).

A melodic jazz improvisation is made up from phrases, which can vary in length from half-bar interpolations (see the third phrase in Figure 1) to lengthy phrases that spread over several bars. A phrase resembles the utterance of a sentence in a natural language except that, in music, a phrase does not refer to a state of affairs. It has no meaning other than its intrinsic musical meaning. At its highest level of organization, an improvised solo normally consists of a sequence of phrases. The sequence may itself have an intrinsic organization. Certainly, musicians aim for a variety of phrases, but what probably holds a lengthy improvisation together is not a precisely articulated musical structure, such as sonata form in classical music, but the repeating harmonic sequence on which the improvisation is based. No sophisticated musical plan appears to govern the structure of an improvisation above the level of individual phrases.

In order to devise a' computer program that improvises musical phrases, it is sensible to divide the task into several relatively independent modules corresponding to different elements of performance. A note in a musical phrase has five main components:

1;. a pitch, which in jazz may be bent, i.e. changed slightly during its performance;

2. an onset time with respect to the metrical structure of the bar;

3. a duration;

4. an intensity, i.e. a volume, which again changes during its performance;

5. a manner of articulation: it may be staccato, legato, slurred, ghosted, and so on, depending on the particular musical instrument.

9. Jazz Improvisation: A Theory at the Computational Level


A phrase may also contain rests, i.e. silences that playa particular role in the musical shape of the phrase (see the second phrase in Figure I). Phrases themselves are generally separated by rests, which in jazz are typically longer than the mean duration of the notes in the phrase. A rest has two components:

1. an onset time with respect to the metrical structure of the bar;

2. a duration defined by the onset of the next note in the phrase.

The specification of a phrase is complete when every note and rest in the phrase has been defined for all of these components. Undoubtedly, however, most of the work has been done when for each element in the phrase one has specified its onset time and, if it is a note, its pitch. A computer program, like a musician, is therefore principally concerned with two tasks: the generation of a rhythmic pattern for the phrase, i.e. a sequence of onsets of notes and rests, and the generation of a correlated sequence of pitches for the notes in the phrase. These two tasks are not completely independent of one another, but.they can be separately analysed to some extent. A reasonable strategy for a computer program - and one that I have adopted in the programs to be presently described - is to generate the next onset in a phrase, and then to select its pitch (if it is a note).

In the rhythmic pattern of a jazz phrase, the duration of the notes is not as important as the sequence of their onset times. Thus, the rhythm of the second phrase in Figure 1 can be represented thus:

3 3

rn I 'f ) __ ) rn

If you clap this rhythm, you convey its essentials. Clearly, the mind of a jazz musician must contain a tacit procedure that can generate a large variety of different rhythms. One way in which to specify what this procedure computes is to define a grammar that will generate all the possible rhythmic phrases within the musician's competence. Before I can sketch the sorts of rules likely to be needed in such a grammar, I need to say a little about different sorts of formal grammar.


A grammar is a set of rules, which, in themselves, can do nothing. Procedures can be devised, however, which can use a grammar to produce an actual output. Grammars differ in their power, and in particular in their


P. N. Johnson-Laird

'weak' generative power, i.e. in what sentences they can be used to generate. Hence, grammars restricted to rules of a certain form are unable to generate certain sorts of sentences. There is a well-known hierarchy, known as the Chomsky hierarchy, from the most powerful grammars to the weakest. The most powerful are unrestricted transformational grammars, which have rules that in effect allow one sequence of symbols to be transformed into another. As various constraints are placed on the form of grammatical rules, then so the generative power of the grammar is reduced. For example, if, instead of transforming a sequence of symbols, rules can rewrite only one symbol at a time, then the power of the grammar is severely curtailed.

The crucial point about the power of a grammar is that it has direct consequences for the demands on working memory made by the use of the grammar to generate or to parse symbols. This fact is highly pertinent to the psychology of improvisation. But, to demonstrate its relevance, I will first use some examples from mental arithmetic.

Suppose I write two numbers on a blackboard:



and I ask you to make a mental addition of them. You can say aloud each of the relevant digits as you proceed from right to left. Hence, you could perform as follows:

Add 3 and 8 together, which equals 11.

Say aloud the far right digit: '1'. Make a mental note of the carry of 1.

Add 2 and 4 and the current carry together, which equals 7.

Say aloud the digit: 7.

Make a mental note of the carry of O.

Add 1 and 9 and the current carry together, which equals 10.

Say aloud the digit: O.

Make a mental note of the carry of 1.

There are no more columns to be added, but you have a carry.

Say aloud the current value of the carry: 1. .


All you have to hold in working memory as you do the calculation is the

current value of the carry, i.e. whether it is 1 or O.

, Now, suppose I ask you to multiply the two numbers together. The task is significantly more demanding because of the load it places on working memory:

Multiply 100 times 948: 94800.

Make a mental note of 94800.

Multiply 20 times 948:

Multiply 2 times 8, and so on.

9. Jazz Improvisation: A Theory at the Computational Level


Although there exists an alternative algorithm for multiplication that allows you to speak out aloud each of the resulting digits from right to left, it nevertheless calls for an arbitrarily large amount of information to be stored in working memory depending on the particular numbers to be multiplied.

The use of the weakest possible grammar, a so-called 'regular' grammar, resembles mental addition. It places a minimal load on working memory for the results of intermediate calculations. The use of the strongest possible grammar, an 'unrestricted transformational' grammar, resembles mental multiplication. There is no limit to the amount of working memory that may have to be used during the course of intermediate calculations. Between these two extremes in the Chomsky hierarchy lie several other sorts of grammar. As we shall see, a set of simple, but plausible, assumptions will enable us to make a motivated choice of different sorts of grammar for characterizing what is computed by the different sorts of processes underlying musical improvisation. I emphasize again that this enterprise is concerned with what is being computed by the mind rather than with how the process is computed. Hence, the use of grammars does not necessarily imply that musicians themselves rely on grammars in order to improvise. They may do, or they may use an entirely different sort of algorithm. I will take up this issue later in the chapter.

Grammars contain two main sorts of symbols: the terminal symbols that occur in the actual output that the grammar is used to generate (e.g. the symbol 'the' is a terminal symbol for a grammar of everyday English); and the non-terminal symbols that occur only in the course of generating sentences and that are not part of the actual language that the grammar can be used to generate (e.g. the symbol 'NP', which denotes a noun phrase, is not part of everyday English). Each rule in a regular grammar - the least powerful sort of grammar - has one of only two possible forms. The first form is exemplified by:

1. NP --> John

which states that a non-terminal symbol, NP, can be rewritten as a terminal symbol, John. The second form is exemplified by:

2. NP --> the N

which states that a non-terminal symbol NP can be rewritten as a terminal, the, followed by a non-terminal, N (for noun). Only these two forms of rule are allowed, and as a consequence the structures that can be generated by a regular grammar are very simple. They consist solely of binary branch-


P. N. Johnson-Laird

ings of the following sort:



the man IS •••

Regular grammars can also be based on the convention that the terminal symbols follow the non-terminals on the right-hand side of rules.

A so-called 'context-free' grammar is more powerful than a regular grammar, but less powerful than a transformational grammar. It can have rules of the following sort:

S --> NP VP NP --> ART N VP --> V NP V --> loves ART --> the

N --> woman

Hence, unlike a regular grammar, a rule may have more than one nonterminal on its right-hand side. As a consequence, the grammar can generate richer branching structures than those of a regular grammar. Indeed, some linguists have argued that the grammar of natural language may not be much more powerful than a context-free grammar. The grammar above, for example, can be used to generate the following structure:




»<. .>-:










This 'tree' diagram is merely a graphical way of illustrating a labelled bracketing of a string of symbols:

(S (NP (ART the)(N woman» (VP (V loves)(NP (ART the)(Nchild))))

that specifies how the symbols should be grouped together. The grouping may playa crucial role in the semantic interpretation of sentences.

9. Jazz Improvisation: A Theory at the Computational Level


A context-free grammar can contain recursive rules, which have the same symbol on both sides of the rule. For example, a rule for possessive noun phrases might have the form:


where the non-terminal, NP, occurs on both sides of the rule. It can be used to generate such structures as:












which in theory could be indefinitely long.

The use of a grammar to generate sentences calls for a modicum of working memory in order to retain the non-terminal symbols that have yet to be re-written as terminals in a sentence. A working memory that suffices for context-free languages takes the form of a stack - rather like a pile of plates - in which each input item goes on the top of the stack, and each time an item is recalled it is taken from the top of the stack. With access only to the item at the top of the stack, the system lacks the power of random access memory in which any item anywhere in memory can be freely accessed.

In a musical improvisation, a musician has to generate notes in real time, and has no opportunity to go back to revise them. Hence, an optimal system will be one that operates highly efficiently and without the need for complex intermediate computations (Johnson-Laird, 1988b). It will place a minimal demand on the processing capacity of working memory. Such a system, of course, corresponds - in terms of characterizing its output - to a regular grammar. But, what evidence could in principle determine whether a regular grammar suffices to characterize the members of some corpus? There are, in fact, two principal considerations. The first concerns the weak generative capacity of the grammar. Imagine an abstract language that contains two symbols, say, a left bracket and a right bracket, along


P. N. Johnson-Laird

with a number of other symbols. Imagine further that the only well-formed expressions in this language are like those of arithmetic, and so the brackets must match one another. For example, the expression:

«a+b)xc) is well-formed, but the following string:

«a + b x c)

is not well-formed. There is no way in which all and only the well-formed expressions of this language can be captured by a regular grammar: such a grammar lacks the generative power to ensure that the brackets match. The rules that are needed, such as:

S ---> var

S ---> (S operator S) operator ---> + operator-s x

var ---> a, b, c, ...

must be recursive with more than one non-terminal on the right-hand side. In short, the grammar has to be context-free.

The second consideration is the structure of expressions in the language, particularly if it is to guide the process of semantic interpretation. A string of symbols can be grouped together in various ways, e.g.:

(the (man (laughed)))


«the man) (laughed))

Plainly, the second grouping is likely to be necessary for the proper interpretation of English sentences. I will take into account both of these considerations in developing a grammar for characterizing jazz improvisations.


The developnient of a grammar for the rhythms of musical phrases calls for the analysis of a large body of data. The nature of the exercise can be illustrated by considering the grammar for the phrases of Christmas carols - a genre that rhythmically speaking is much simpler than modern jazz.

I analysed a corpus of Christmas carols, which were all in common time, with a view to writing a grammar that would generate their rhythms. Table 1 presents a grammar that captures all the rhythms of the phrases in the

9. Jazz Improvisation: A Theory at the Computational Level


Table 1. A regular grammar for the rhythms of the first two bars of a corpus of Christmas carols (in common time)

Bar 1: Bar 2:
Beat-l --> J Beat-2 Beat-l --> .I Beat-2
J. Beat-2.5 --> J. Beat-2.5
Beat-2 --> ) Beat-2.5 --> J Beat-3'
J Beat-3 Beat-2 --> J Beat-3
Beat-2.5 --> ) Beat-3' --> J Beat-3'
Beat-3 --> ) Beat-3.5 Beat-2.5 --> ) Beat-3'
J Beat-4 Beat-3 --> J Beat-4
J. Beat-4.5 Beat-3' --> J Bar 3...
Beat-3' --> ) Beat-3.5 Beat-4 --> J Bar 3 ...
J Beat-4
Beat-3.5 --> ) Beat-4
Beat-4 --> J Bar-2 Beat-l
Beat-4.5 --> j Bar-2 Beat-l An example of a rhythm generated by the grammar:


corpus. The reader will observe that many possible combinations did not occur in the corpus. At this point, however, we run into the main problems of developing grammars (and computer programs) to characterize corpora. A theory may make errors of two sorts. On the one hand, it may fail to generate sequences that do in fact occur: it may undergenerate. On the other hand, it may generate sequences that would never in fact occur: it may overgenerate. A regular grammar can generate a potentially infinite number of sequences: it merely has to incorporate at least one recursive rule of the form:


which can be used to generate strings of any arbitrary length. Musical phrases, however, are always finite in length, and characteristically. are seldom more than a few bars long. There are therefore only a finite number of possible rhythms for them, but that number is vast. The corpus of carols contains certain phrases, but it does not include all possible rhythms for carols. Hence, a grammar based only on a corpus will undergenerate. The theorist's task is to go beyond the data, and to base a grammar on a plausible extrapolation from them. The concomitant risks are to overgenerate if the grammar is too bold, or to undergenerate if it is too close to the data.

The principles in an individual jazz musician's head at a particular point in time are determinate, and so there is a correct grammar for characterizing the rhythmic patterns improvised by the musician. Unfortunately, at


P. N. Johnson-Laird

Figure 4. Part of a transition diagram, equivalent to a regular grammar, for generating the rhythms of improvised phrases.

present, no method for checking a grammatical account of these mental principles currently exists. What we can assess, however, is whether there is any evidence. for rules that are more powerful than those of a regular grammar .

. Figure 4 presents part of a transition diagram that generates the rhythms improvised by Charlie Parker. Such a diagram is equivalent to a regular grammar: the different transitions from, a node correspond to different rules for rewriting the same symbol. Thus, the two alternative transitions from the initial note correspond to the rules:

So --->: s. So ---> j Sz

If a probability is assigned to each link, then the device, which is a finitestate automaton, generates a Markovian sequence of symbols. In the program, each link is assumed for simplicity to be equi-probable. A striking feature is the great variety of rhythmic patterns to be found even in a relatively small corpus of Parker's work. A more appropriate description might take a more abstract form. For example, one can distinguish between phrases that start just after the first beat of a bar, e.g.

I y

and phrases that start just after the fourth beat of the bar, e.g.

3 r----,

'1) l.rn J ...

9. Jazz Improvisation: A Theory at the Computational Level


A simplification of the overall transitions could be made if it were the case that these two sorts of phrases were treated in the same way: they have the same rhythm, but start at different points in the bar.

Another feature of the grammar, which it has in common with the grammar for carols, is that it generates many possible combinations that did not occur in the corpus. No doubt had I examined a larger corpus some of these possibilities would have occurred. A theorist is bound to exercise intuition - a tacit knowledge of the style of the musician - and to use such judgements to flesh out the grammar in a more complete form. Whether the grammar under- or overgenerates is not as critical as whether the general claim that the rhythms of improvised phrases can be characterized by regular grammars. An examination of the corpus provides no evidence of either strong constraints of one part of a phrase on another - analogous to the matching of parentheses, or of a need for a complex internal structure. Hence, the conjecture that modern jazz rhythms are generated by processes that place a minimal load on working memory appears to be borne out, and it should be possible to characterize the complete set of such phrases using a regular grammar.

Only a drummer is likely to be satisfied with an improvised rhythmic pattern devoid of melody, and even drum solos introduce certain differences in pitch and timbre. The extemporization of a melody in modern jazz, however, is strongly constrained by the chord sequence, and so I will now turn to an analysis of jazz harmonies.


Jazz is a tonal music, and the constraints on melody include as a major component the particular chords in the harmonic sequence. A knowledge of harmony can be divided into those principles that are accessible to consciousness and that can be described verbally, and those further principles, particularly important to the creation of chord sequences, that lie outside conscious awareness. If composers had introspective access to all the principles that guide the sequence of chords in their compositions, then the nature of harmony would not be controversial.

Most jazz musicians have a conscious knowledge of the particular notes comprising the main types of chords, and of the sequences of chords comprising the themes in their repertoire. Any type of chord can be realized in many different voicings: there are many ways in which to play the chord of F dominant 7th, but most of them will include at least one occurrence of the notes F, A, C, Eb. In modern jazz, the pianist may decorate this chord with many additional notes, and might even play it as an inversion in which Eb is the root, omitting F altogether. Taking such variations into


P. N. Johnson-Laird

account, there are six principal chords used in modern jazz, which are described here with roots of C, though they may occur in any key. The parenthesized symbols are the conventional abbreviations for the chords:

C major 7th, (mj7):

C dominant 7th (7):

C minor 7th (m7):

C minor 7th with Sb (m7.Sb):

C minor perfect 7th (m.per7):

C diminished 7th (dm7):

C,E,G,B C,E,G,Bb C,Eb,G,Bb C,Eb,Gb,Bb C,Eb,G,B C,Eb,Gb,A

These chords may be played with added 6ths and 9ths of various sorts, and in certain inversions. Figure 5 shows a typical realization of a modern jazz harmonic accompaniment. The chord sequence for an improvisation will be familiar to all musicians in its conceptual form, which can be symbolized by designating the root and type of chord that occurs in each bar. Here, for example, is the chord sequence of a variant of the twelve-bar blues popularized by Charlie Parker. The Roman numerals designate the roots of the chord where I is the keynote, V its dominant, and so on:

I Imj7

I IVmj7 I IIm7

I VIm7 II7

I IIIm7 VI7 I Imj7 IIIb7


I VIbm7 IIb7

I Vm7 Vb7 I I IIIbm7 VIb7 I I VIbmj7 IIb7 I

Of course, the rhythm section may depart from this sequence in various ways, and the actual choice of voicings for the chord is extemporized by the instrumentalists providing the accompaniment.

The most important point about an underlying chord sequence is that it is not improvised. It is composed by whoever was responsible for the original theme, though jazz chord sequences are often modified by other musicians as they evolve during the development of the music. The original twelve-bar blues, for example, goes back to the early history of jazz and in comparison to the variant above is a rudimentary affair:

I IV7 V7




17 I I



., cm~7P' Bmir' '---"'" Am~T D7 ~ G::; ~lcr cJ J~.
> E7 Fmj7·
-f,_ I he,
<:> -I ~I V VI


Figure 5. The harmonic voicings of a typical modern jazz accompaniment.

9. Jazz Improvisation: A Theory at the Computational Level


The fact that chord sequences are composed rather than improvised has a crucial corollary: there is no need to minimize the complexity of intermediate computations in producing them. The composer can try one idea and then another, and other musicians can modify the sequence. Notation makes the process possible because a good notation is a substitute for working memory. Hence, the computational power that goes into the making of chord sequences is likely to be far greater than the computational power that goes into the improvisations based on them (JohnsonLaird, 1988b). In order to test this conjecture, I have examined a corpus of modern jazz chord sequences in order to devise a grammar that could be used to generate such sequences. This project was inspired in part by an intriguing paper by Mark Steedman (1982). He developed a set of rules that took as their input a simple chord sequence, such as the traditional blues sequence above, and generated modern elaborations of it, such as the variant played by Parker. It was an open question whether a grammar that could generate the initial sequences for itself would still need the sort of rules postulated by Steedman for interpolating new chords, and for substituting one sort of chord for another. More recently, Conrad Cork's (1988) pedagogical notion of teaching beginners a repertoire of harmonic 'building blocks' has also contributed to the present theory.

The first assumption of the theory is that underlying the superficial variants of a modern jazz chord sequence there is a tonal chord sequence. These sequences, in fact, are often remarkably similar to those of European classical music. Here, for example, are two tonal sequences that are very similar:



v V






I 17







The first example is the opening of Mozart's Clarinet Quintet in A major, (K 581); the second example is a perennial favourite of jazz musicians, George Gershwin's 'I got rhythm'.

Many theories of tonal harmony are stated informally in terms that imply that acceptable chord sequences could be generated by a regular grammar (Forte, 1979). Such grammars, as we have seen, are not powerful enough to capture any internal structure other than binary divisions. They would assign the opening bars in the examples above with the following




P. N. Johnson-Laird

Middle ~False-end

I r----0pen~

II V I ...

This structure does not accord with musical intuition, which suggests that the first two bars are a cadence from tonic to dominant. The structure is more accurately represented as:

, Opening-sequence

First cad~d Cadence

TO~Dominant TonA

I 1\ I


Such a structure, of course, calls for the power of a context-free grammar.

In at least one respect, modern jazz chord sequences differ from the sequences of classical music: modern jazz employs a much greater use of modulation. Modulations occur in two principal forms, either from one major section of a chord sequence to another or else within such sections. A typical example of modulation between sections occurs in the theme 'Joyspring' by Clifford Brown. The first seven bars are as follows:

I Imj7

I IlIm7 bIlI7

I lIm7 V7

I IIm7 bII7

I Imj IIIm7 I Imj7

I IVm7 bVII7 I

but at this point the theme modulates to a key one semitone up, e.g. from F to Gb, and the modulation is effected by a typical manoeuvre in the eighth bar:

I bIIIm7 bVI7 I

9. Jazz Improvisation: A Theory at the Computational Level


which prepares the way for a chord of blImj7, which is the new tonic. The next eight bars begin with exactly the same chords but based on the new tonic. This sort of modulation is also very common for the so-called 'bridge' in pieces based on the standard 32-bar sequence of the form AABA, i.e. the first eight bars, A is repeated, and then leads into the bridge of eight bars, B, prior to the final reprise of A. There appear to be few, if any, constraints on the nature of the modulation: it can be to any new key, though modulations to a key a flattened fifth away from the original are rare (Cork, 1988).

Modulations also occur within sections. The 'locus classicus' for such effects, combined with a modulation between sections, is Jerome Kern's song, 'All the things you are', which contains chords based on all 12 possible roots. The opening sequence of the theme begins in a modern jazz variant as follows:

I VIm7

I lIm7

I bVIm7 bII7

I Imj7

and immediately proceeds to modulate to a tonic a major third above using as an ambiguous pivot a chord that is IVmj7 of the original key and a substitute dominant (blImj7) of the new key:

[up III]

I bIImj7

I lIm7 V7


I Imj7

The next eight bars repeats the sequence but modulated upwards by a minor third. Hence, the first eight bars modulates from Ab to C, and the second eight bars modulates to Eb and thence to G.

Modulations in modern jazz are effected by two main devices, either an immediate transition to a major seventh on the tonic of the new key or else, as in the examples above, by interpolating chords backwards from the new tonic according to the cycle of fifths: ... IIm7 V7 Imj7. Because these interpolations can be handled by the sorts of rules invoked by Steedman (see below), the only rule that is needed for modulation is one that signifies a change in tonic, e.g. Imj --> IIImj, which, as I have mentioned, can be to any new key.

The basic building blocks of tonal chord sequences fall into three main categories, which can be exemplified as follows:

1. Cadences from tonic to dominant

2. Cadences from dominant to tonic

3. Cadences from tonic to dominant and back to tonic

In the case of modern jazz, however, the use of the word 'dominant' here is slightly misleading because the relevant chord need not be based on V or on substitutions for it. For example, a chord based on IV, or even a chord


P. N. Johnson-Laird

based on III, can serve the function of a dominant as in the following case:

I Imj7


I lII7 I ...

where the VUm7 is derived by interpolation according to the cycle of fifths. Indeed, in the case of the third cadence above, many alternative roots, including b VI7, can function as the temporary resting point of the dominant.

Table 2 presents a context-free grammar for generating simple tonal chord sequences, and Table 3 presents some typical examples of its output. A grammar for modern jazz would allow alternatives to V to serve as a dominant. Given an output from such a grammar, then the sorts of sequences that actually occur in modern jazz can be derived, as Steedman argued, by rules that act as transducers. They take a chord sequence as input and produce an enriched sequence as their output. A major manipulation in jazz is the interpolation of chords according to the so-called 'cycle of fifths'. For example, given the opening of 'I got rhythm':




Table 2. A context-free grammar for generating eight-bar tonal chord sequences. The rules all concern variants on opening cadences (from tonic to dominant). Other rules in the complete grammar generate closing cadences (dominant to tonic) and complete cadences (tonic to dominant to tonic)

Eight-bars -> First-four Second-four
First-four -> Opening-cadence Opening-cadence
-> Opening-cadence I Opening-cadence
Second-four -> Middle-cadence Opening-cadence
Opening-cadence -> I I I I I
-> I I I V I
Opening-cadence I -> I I I III I
-> I I I IV I
Middle-cadence -> I I I IV I
-> I I I V I
-> I IV I I I Table 3. Some sample outputs generating by a program using the grammar in Table 2








9. Jazz Improvisation: A Theory at the Computational Level


an interpolation of this sort would lead to the VI chord prior to the II chord:




The step from VI down to II is indeed a fifth. Likewise, the development of Parker's blues sequence from the simple underlying blues:




I I7

calls for a series of such interpolations, working backwards from the dominant seventh at the end of the fourth bar:

I Vm7 17
II7 I Vm7 I7
VIm7 II7 I Vm7 17
Vlm7 II7 I Vm7 17
and so on:
III7 I VIm7 II7 I Vm7 17
I I I VIlm7 III7 I Vlm7 II7 I Vm7 17 It only remains to substitute a more complex chord for the opening tonic, and to substitute for the final chord one that is a flattened fifth away:

I Imj7


I Vlm7 II7

I Vm7 Vb7

The grammatical rules for making the interpolations and substitutions cannot be exercised with complete freedom. The substitutes of one dominant seventh by another a flattened fifth away, for instance, is permissible if the next bar in the sequence begins with a chord that is a fifth away (as in the substitution above). Likewise, the interpolation of chords according to the 'cycle of fifths' should not continue to the point where the opening tonic is eliminated. The rules are therefore sensitive to the context of the symbols to be modified. Instead of context-free rules like those used to generate the underlying sequence, it is necessary to use 'context-sensitive' rules, such as:

I-->IIm7 I_V

which specifies that I can be re-written as IIm7 provided that it occurs in the context specified after the slash, i.e. prior to V. In fact, one would need


P. N. Johnson-Laird

a large number of such specific rules to do justice to all the possible interpolations that can be made according to the cycle of fifths. A simpler solution, which I adopted in the program for testing the grammar, is to use meta-rules that capture a whole set of such rules (see Gazdar et al., 1985, for an account of meta-rules). In the program, the context is specified as follows:

Current I

Previous not I

Next xdom

where I is the symbol to be rewritten, the previous chord must not be I, and the next chord can be any chord that will be ultimately realized as a dominant seventh. Where a chord satisfies this context, then the value of x, which is the root of the next chord, is bound to an expression that generates the chord that can be interpolated according to the cycle of fifths. In fact, there are several alternative rewritings, including:

Imj (Fifth x)m

In this case, given an input sequence:

Previous Vdom

the rule yields the output:


Current I

Next Vdom ...


Vdom ...

The chord symbols in this output do not yet specify sevenths. The reason is that there remains a third stage in the generation of a final chord sequence.

One reason for the third stage is that there is a form of interpolation that can occur after interpolations according to the cycle of fifths. A sequence produced by the second stage can have the following form:

... 1 IIIdom

1 VIdom

1 IIdom

1 Vdom

into which can then be interpolated the following minor sevenths:

... 1 III7

1 IIIm7 VI7


1 IIm7 V7

The third stage also depends on context-sensitive rules.

In characterizing what has to be computed in an improvisation, I have argued that the production of suitable chord sequences calls for considerable computational power. The three-stage program requires a memory for the results of a large amount of intermediate computations, because the first stage requires at least context-free power, and the interpolations according to the cycle of fifths must be made one at a time in an interdependent way. If rules sensitive to the context of a symbol are used to gen-

9. Jazz Improvisation: A Theory at the Computational Level


erate strings, then they call for greater computational power. The load on working memory cannot be handled merely by a stack. No psychological price is exacted by such demands, however, because notation can be used to represent intermediate stages in the derivation. The use of this greater degree of computational power has an aesthetic advantage: the resulting chord sequences can have a more complex and interesting structure. They are a crutch on which the improviser can lean, and they enhance the quality of an improvisation without complicating the process of improvisation itself. One corollary, of course, is that musicians must have a secure grasp of the harmonic implications of the sequence if they are to exploit them to the full.


The characteristic 'walking bass' of modern jazz lays down a steady rhythmic pulse with one note for each beat in the bar, but the bass player extemporizes the particular melodic line so that it fits the harmonic sequence of the theme. Figure 6 shows a typical bass line (from Roidinger, 1980) for a twelve-bar blues. The rhythmic element of such improvisations is simple, though the exact durations vary in complicated ways that have yet to be fathomed, and the performance of a virtuoso contributes enormously to the sense of 'swing' created by the rhythm section. The steady pulse, however, allows us to approach the improvisation of melody without having to consider rhythmic phrases.

How does a bass player choose which note to play next? No-one knows the answer, and I shall not offer any strong hypotheses here. But, as in the

W. fJ j J loJ j ~J j I j HJ F I rJ J J I d J H I

F B~ Bdm Cm7 F7 ab

19:&ijJ j r r I HT~J I J br j ~J I j Jjl]J I J J d J I

Bdm F Am7 D7 Gm7 C7

19:bJbr FDJ I j P F Pk11

Am7 D7 Gm7 C7

Figure 6. A walking bass line for a twelve-bar blues in F (from Roidinger, 1980).


P. N. Johnson-Laird

previous sections, I shall aim to spell out what has to be computed to produce an acceptable bass line.

There are several possible theories of bass lines. The first resembles a null-hypothesis: each note in a bass line is chosen from the set of notes comprising the current chord. This hypothesis both overgenerates and undergenerates. With no constraints on the relative pitches of notes, a resulting bass line can leap wildly around from a high note to a low note in a way quite unlike the bass line in Figure 6. But, as the figure shows, not every note in a bass line comes directly from the notes in the chords. Some are 'passing' notes that might be part of an extension of the chord - sixths and ninths, such as the D that occurs in the first bar of Figure 6. Others are more chromatic passing notes that could not normally be sounded as even part of an extended chord. For example, the B that occurs with Gm7 in the 9th bar is clearly not an extension of the chord, but a note that passes chromatically from the Bb to the C. Likewise, the Bb to the F major chord in the 7th bar is not part of the chord: this choice of the note that is a fourth above the root of the chord is a melodic flourish characteristic of the playing of Charlie Parker. Its choice in the example supports the claim that improvized bass lines apart from their simpler rhythms are of a piece with the melodic improvisations of soloists.

A second possibility is the 'motif' theory that I discussed earlier. Certainly, there are motifs that bass players use, but, as I have explained, this theory cannot be the whole story. Good players improvise novel bass lines.

A third possibility is that bass players choose their notes in order to satisfy a number of sets of criteria. One set concerns the chord sequence, and the notes that are concordant with the different chords or that can be used as 'passing' notes. These constraints are normally accessible to introspection, at least in part, and they can be taught and learned explicitly, though they must eventually become 'second nature' to the musician. Another set of criteria are less tangible. They concern the 'contour' of acceptable melodies .. The role of contour in the perception of tonal music has been established experimentally (Dowling, 1978). A principle that seems to underlie the production of melodies is that after a series of small intervallic steps in a melody, a step of a larger interval produces an aesthetically pleasant melody, and vice versa. Thus, in bars 9 and 10 of Figure 6, there is a sequence of small intervals, and so the first interval in bar 11 makes a contrast. Similarly, the intervals in bars 1 and 2 are predominantly quite large, and so the small steps in bar 3 are a pleasing contrast.

I will examine the nature of these two sets of constraints - the harmonic and the melodic - in more detail, because there are alternative possibilities

9. Jazz Improvisation: A Theory at the Computational Level


for how they might themselves be best characterized. My starting point is a computer program that improvises jazz bass lines.

The program takes a chord sequence represented in symbols as its input - the chord sequences are those that are generated by the program that I described earlier - and it produces as its output an appropriate bass line. It also generates a rudimentary harmonic accompaniment consisting of the 3rd and 7th of each chord, which are the two notes that most evoke the harmonic quality of a chord given an accompanying bass line. The output of the program is the numerical specification of the pitch of the note, its onset and offset times, and its intensity. These numerals could be used to generate actual notes on a microcomputer, or else the very much superior synthesized bass notes developed by my colleagues Roy Patterson and Rob Milroy on a custom-built computer.

The program works by first selecting whether the next interval is large or small, i.e. greater than a second or not. This choice is made on the basis of a regular grammar of contours that I derived from a corpus of bass lines. The grammar is based on an assumption analogous in some ways to the principles of Parsons' (1975) Directory of Tunes and Musical Themes. This Directory represents any tune merely by its contour: -denotes its first note, R denotes a repeat of the previous note, U denotes an upward interval, and D denotes a downward interval. Thus, for example, the opening of Beethoven's Fifth Symphony is represented as:

* R R D U R RD ...

This system has the striking property that once the first 15 notes of a theme in the classical repertory have been encoded in the notation, the identity of the theme has been uniquely established. The eight symbols above are also common to at least five other themes, including 'Oh joy oh rapture' from Act II of Gilbert's HMS Pinafore. But, the encoding of 15 notes of Beethoven's opening is unique to that masterpiece.

The contour grammar for improvisation is for the production of music, not its recognition, and I assumed that there were no asymmetries between rising intervals and falling intervals. This assumption was also motivated by the Parsons' Directory: a small experiment established that when one inverts the contour of the first few notes of a given entry, the result always corresponds to some other theme in the directory. The inversion of Beethoven's opening is, for example:

* R R U D R R U ...

and this is the contour of several themes, including 'Away, away' from Act II of Sullivan's Pirates of Penzance.


P. N. Johnson-Laird

What seemed relevant for the production of bass lines were contours reflecting the following items:

f: the first note

d: repeat of the previous note s: a small interval

i: a large interval

Figure 7 presents part of a transition diagram that uses these items to generate contours. Such a system, like the one used to generate rhythms, is equivalent to a regular grammar.

Once the program has selected the next interval in the contour, it chooses its precise pitch. If the contour calls for a repeat of the previous note, its pitch is already specified, otherwise the contour merely constrains the choice to a set of notes. The other major constraints are harmonic. The program is equipped with a specification of each of the major types of chord, and a knowledge of the acceptable passing notes that may occur with them. For example, the chord Cmj7 comprises the notes C, E, G and B, and the passing notes that a bass player may use are D, F# and A. Additional harmonic constraints concern the start of a new chord and the beat of the bar, e.g. passing notes are avoided for the start of a new chord,

Figure 7. Part of a transition diagram for generating the contours of bass lines.

9. Jazz Improvisation: A Theory at the Computational Level 317


Bm7 E7

Am7 A~m7

Gm7 C7

19; jJ J J 1 J oJ.J Fir r r r I"Pr oJ.J 1

Fmj7 Fm7 B~7 Em7 A7 E~m7 A~7

19;:) J :) J 1 J J Fr 1 F r proF loJ JJ d1

Om? G7 Cmi7 Eb7 Abmj7 ob7

Figure 8. A typical output of the program for generating bass lines (given the chord sequence of Charlie Parker's variant of the twelve-bar blues).

or on the first beat of the bar. Figure 8 shows a typical output from the program together with the input chord sequence, which is a blues.

The program performs at a level of a moderately competent beginner.

It makes both errors of undergeneration and overgeneration. It undergenerates because it makes no use of chromatic runs consisting of several passing notes, and it makes no use of motifs other than those that occur by chance. These abilities were deliberately excluded from its competence. It overgenerates because it is based on an erroneous theory of passing notes - an inadequacy that was only revealed by an examination of its improvisations. The use of flattened fifths as passing notes failed to take into account that they are a species of chromatic passing note, that is, they cannot be used ad libitum, but only in certain contexts. Care must be taken to consider both the harmonic implications of the previous note, and the note that follows. At this point, it is worth considering an alternative theory of the harmonic constraints on jazz improvisations.

A number of authors, notably Cork (1988), have suggested that improvisers have a tacit knowledge of the particular scale from which notes should be selected at each harmonic juncture in the chord sequence. Thus, in the opening Cmj7 of the Parker blues, the scale is likely to be: C, D, E, F #, G, A, B. This set is, of course, identical to the one adopted by my program. Yet there is a small but important difference between the two approaches. The same chord in different harmonic contexts may have


P. N. Johnson-Laird

different scales associated with it. This principle is, I believe, the right one. For example, the dominant 7th chord occurs frequently in more traditional blues sequences, such as:

I C7

I F7

I C7

I Gm7 C7

I F7 ...

The scale associated with the initial C7 is, depending on the particular style of the musician, likely to include the following notes: C, D, Eb, E, F, Gb, A, Bb. The reader will note the presence of the so-called 'blue' note, Eb, which is such a strong feature of the melodic and harmonic language of jazz. The F7 chord in the second bar, however, is much less likely to contain its corresponding 'blue' note of Ab, which is foreign to the C-major key of the first four bars. The F7 is more likely to have the associated scale: F, Gb, G, A, Bb, B, C, D, Eb. Musicians are therefore probably extemporizing their melodies with some feel for these harmonic nuances. A better theory should combine both the distinction between chordal and passing notes (as in the program) and a sensitivity to a shift in the passing notes according to the harmonic context (as in Cork's theory).

The principal theoretical point at issue is the computational power needed to make jazz improvisations. The rectifications of the various shortcomings in the program would have no major implications for the hypothesis that improvisations place a minimal load on the processing capacity of working memory. The program requires a minimal memory for the results of intermediate computations: it records the item currently generated by its contour grammar, its place in the chord sequence, and the previous note that it has played. The modifications needed to handle chromatic passing notes would require a slightly larger, but still fixed capacity, working memory; and a fixed capacity is consistent with the use of regular grammars to characterize the output. The representation of the scalic properties of different chords in different positions of chord sequences requires a richer long-term memory of harmonic possibilities, and has no effect on working memory.

I am now in a position to characterize what has to be computed in the improvisation of a jazz melody, such as the one illustrated in Figure 1. Jazz musicians have in their long-term memory a knowledge of the principles governing the rhythmic patterns of phrases (and their actual execution in real time), the principles governing acceptable melodic contours, and the harmonic and metrical constraints on choice of notes. They also know the particular harmonic sequence of the piece, and its particular nuances. The generation of an improvised phrase is a process of generating a sequence of notes and rests. What has to be computed in selecting the next note is its onset and offset, which can be characterized by a regular grammar for rhythmic phrases, its step in contour, which can be characterized by a

9. Jazz Improvisation: A Theory at the Computational Level


regular grammar for contours, and its particular pitch, which is determined by a set of harmonic constraints. The program for improvising melodies is thus a direct extension of the bass program, to which is added a regular grammar for rhythms.

From the evidence of the program, on some occasions there is no choice about which note to play given all of these constraints. There is just one note that fits exactly the exigencies of the situation. On other occasions, there may be more than one feasible note, and so the program makes an arbitrary choice from amongst them. One consequence is that even though the program is easy to understand, it is impossible to predict its actual output on any particular occasion. The use of arbitrary choices after all the constraints have been met is entirely consistent with a general theory of creativity that I will describe below. Even when all the constraints in the musician's mind have been taken into account, a rich theory of creativity will be consistent with more than one possible next note. If not, then the theory will be deterministic, and the fecundity of musical virtuosi would be profoundly mysterious.


I have so far described what has to be computed in order to make an improvisation and I have tried not to draw any conclusions about how the process is carried out. One theoretical possibility is, of course, that musicians are equipped with regular grammars for contours and rhythms, which they use to produce improvisations in roughly the same way as the programs work. However, there are many other possibilities, if only because there are infinitely many ways to carry out any computation. Unfortunately, it is extremely difficult to obtain evidence that is directly pertinent to theories of processing. How, for example, could one show that the 'grammatical' account of performance is wrong? On the assumption that it gives an accurate account of the output of the process - which, of course, is its principal aim - no examination of corpora of improvisations can refute it. To pose the problem still more sharply, I will sketch an entirely different mechanism that might underlie performance.

One feature of the psychology of jazz, which I emphasized earlier, is the musicians' lack of introspective access to their own underlying mental procedures. They are not aware of having acquired explicitly structured symbolic rules. This phenomenon might well be explained if, instead of grammatical rules, they rely on distributed representations of the sort currently under much investigation by 'connectionists' (Rumelhart and McClelland, 1986). One major problem confronting this theoretical


P. N. Johnson-Laird

approach is whether a learning procedure such as back-propagation of error has the power to acquire grammars. Existing algorithms have acquired sequential behaviours of only the computationally weakest varieties (Jordan, 1986), and, although some explorations of grammatical learning have been made (Hanson and Kegl, 1987; Allen, 1988), it is not yet known whether a network equivalent to a context-free grammar could be learned by back propagation. However, if musicians need to acquire only regular grammars in order to improvise, then a network learningprocedure, such as back-propagation of error, might suffice for the acquisition of the skill.

One feature that network learning and learning to improvise have in common is that they both take a long time. The acquisition of relatively simple mappings between vectors can take thousands of trials using backpropagation of error. Learning to improvise is, of course, a still more difficult task. Musicians must first familiarize themselves with the genre. They listen to a large amount of music, and they often learn by heart certain solos by virtuosos. One function of this essentially 'passive' knowledge is to provide feedback about their own improvisational efforts. Beginners attempting to improvise are bound to play notes that are harmonically or melodically injudicious, and, if they are to improve, they must be aware of such solecisms: their awareness derives from their previously established passive knowledge of the genre. Like most tyros, their (self-)critical abilities are much in advance of their creative powers. This feature of creativity seems paradoxical (Perkins, 1981): if one can tell the difference between good and bad, why can't one use this knowledge in the act of creation, engendering the good and eschewing the bad? A possible resolution of the paradox is that passive critical knowledge cannot be used in the generative process of creation (Johnson-Laird, 1988b). Hence, a connectionist account of the acquisition of creative processes will need to employ two distinct systems, one modelling passive knowledge, and the other modelling generative knowledge.

One pedagogical consequence of the independence of passive and generative knowledge is that beginners attempting to improvise are truly incompetent. They are rhythmically stilted, particularly if they have been trained in European classical music, and they play many notes that clash with the chord sequence. But, because they can learn to improvise only by improvising, they need to be strongly encouraged to continue and to ignore their repugnance for their early efforts. They will improve if they immerse themselves in the new genre. They can become competent at improvising on a particular chord sequence merely by dint of practice, and despite the fact that they have no explicit knowledge of harmony whatsoever - not even of the constitution of the various sorts of chord. This phenomenon,

9. Jazz Improvisation: A Theory at the Computational Level


which nowadays is not the normal way in which jazz musicians develop, is hardly consistent with the gradual acquisition of explicitly structured grammatical rules. It seems, intuitively, closer to the acquisition of associative connections of varying strengths of the sort that networks acquire. Intuition, alas, can be a most misleading source of knowledge. The mere lack of explicit knowledge does not in itself justify the claim that a skill-is represented without explicit symbolic structure, i.e. in the distributed mode of connectionism.


If the mental processes of creation are assumed to be computational, then, as I have argued elsewhere, there are three broad classes of possible algorithms that they may follow (Johnson-Laird, 1988b). In the first class of algorithms, a new idea is formed by combining, or modifying, existing elements at random. The results of such random combinations are hardly ever viable, and so it is essential to submit the products of this initial generative stage to critical scrutiny. Hence, there is a selective stage in which those novelties that, for whatever reason, are not acceptable are abandoned. During this stage, the creator applies the criteria of the relevant artistic genre, or scientific paradigm. I refer to this class of algorithms as 'nco-Darwinian' because of their obvious resemblance to the theory of evolution that combines the random shuffling of genes with the principles of natural selection. Although several theorists have espoused such procedures as the sole means of creation (Skinner, 1953; Campbell, 1960; Bateson, 1979), their gross inefficiency renders them highly implausible as an account of any sort of mental process.

A second and more plausible procedure is to use the criteria of the genre or paradigm in the very process of generating ideas. Instead of forming wholly arbitrary combinations of existing elements, the combinations are constrained. It may happen that more than one option is consistent with these constraints. At this point, and only at this point, an arbitrary selection is made: it has to be arbitrary because, by assumption, all the available criteria have already been satisfied by the competing alternatives. I refer to this sort of algorithm as 'neo-Lamarckian' because, as in Lamarck's theory of evolution, pre-existing knowledge governs the process of generation.

Almost everyone is a better critic than creator, and the resolution of this paradox, as we saw, is that the knowledge used to evaluate acts of creation is not automatically available to the generative process. This factor yields the third, and most common, class of creative algorithms: some criteria are


P. N. Johnson-Laird

used in the initial generative stage, and other criteria are used in the evaluation of the results. Such 'multi-stage' algorithms may employ many stages in which different criteria are used, and unsatisfactory products are fed back from an evaluative stage to a generative stage for modification. Even when all the criteria, both generative and evaluative, have been exhausted there may still be more than one option available to the creator. In this case, once again, an arbitrary choice must be made from among the alternatives.

The three classes of algorithm exhaust the computational possibilities.

One of them, as I have argued, is unlikely to be much used by human creators. The evolution of species works because nature can afford to carry out millions of experiments in every second of millions of years. This luxury is hardly available to an individual's mental processes. Hence, neo-Lamarckian and multi-stage algorithms are much more plausible psychologically.

Some acts of creation occur in real time, and do not allow the individual to go back and to revise earlier thoughts. Jazz improvisations are a prime example of this mode of creativity, but there are many others, including the extemporization of poems, stories, dances and drawings. Such creations depend on the artist internalizing the tacit principles of an existing genre, along with idiosyncratic variations. These constraints can be exploited by a neo-Lamarckian algorithm that generates the improvisations. The constraints must therefore be adequate to produce acceptable improvisations, and they must be in a form that can be used rapidly and without the need for much computational power. Their acquisition is a most laborious business. The artist is acquiring a skill that depends on tacit procedures in which conscious propositional knowledge has little part to play.

Musical improvisation as a test case of spontaneous art depends on the use of a neo-Lamarckian algorithm and on the productions of a multi-stage algorithm. The improvising musician has in long-term memory a chord sequence, and its original construction calls for more computational power than can be realized in real time: human working memory does not permit more than a rudimentary storage of the results of intermediate computations. Chord sequences are accordingly developed, sometimes collaboratively, using a multi-stage procedure in which initial ideas generated according to certain criteria may be refined or modified at a later stage according to other criteria. The result is a structure that is interesting, and complex: it can support, and stimulate, improvisations that are aesthetically pleasing. The production of these improvisations itself depends directly on a neo-Lamarckian algorithm in which the musician uses a set of constraints in the very process of generating musical phrases.

9. Jazz Improvisation: A Theory at the Computational Level


Other creative acts are likewise within a particular genre, but do not occur in real time. The writing of a poem or the composition of asymphony, for instance, are guided by a tacit productive knowledge of the genre, and the products of 'normal science' depend on a similar guidance by existing constraints (Kuhn, 1970). Initial ideas, however, can be revised by the individual, or even by a collaborative effort among a number of people. These creative processes depend on a multi-stage algorithm.

The highest form of creativity is the development of a new genre or scientific paradigm. This form of creation is more mysterious. There are relatively few such revolutionary transitions in any particular domain. In jazz, for example, the main genres have been ragtime, traditional blues, New Orleans style, Kansas city blues, swing, bop, West coast style, hard bop, modal improvisation, free form, third stream and jazz-rock fusion. This list includes most of the major revolutions in style. There are few, if any, criteria that govern the transitions from one style to another. Moreover, the ultimate success of a putative revolution depends on events of which everyone, including the creators themselves, are ignorant. For artistic revolutions, these events depend on economic and social factors and other contemporary developments in the world of the arts. For scientific revolutions, they depend on the availability of resources to explore the innovation and, of course, its success in making sense of subsequent empirical results - a factor that cannot be foreseen at the time of the innovation. It follows that there cannot be any general principles that underlie all and only the successful revolutions in a domain. Transitions therefore cannot be modelled by a neo-Lamarckian algorithm. They cannot be modelled by a neo-Darwinian algorithm, because of its gross inefficiency. Once again, the process of innovation must depend on multi-stage algorithms.

An obvious aspect of revolutionary developments is their frequent dependence on groups of collaborators. Collaborations raise issues of a social and motivational nature that I have so far ignored (Simonton, 1984): even jazz improvisation is normally a collective matter, and what one musician plays has effects on the others - from the maintenance of the metrical pulse to the actual production of musical ideas. The ways in which such interactions are effected, and the creative consequences of them, are particularly important to the development of new genres. Modern jazz, for example, has its origins in the interactions among certain key players - Charlie Parker, Dizzy Gillespie, Thelonious Monk and others. The task of sorting out their respective contributions is almost certainly impossible. Certain ideas, one strongly suspects, could have arisen only from their mutual interactions both in practice on the bandstand and in discussion,


P. N. Johnson-Laird

demonstration and composition. Yet, as in scientific innovation, certain individuals played an essential part in the development of the new paradigm. The single biggest puzzle about creativity is: What mental processes underlie their innovative thoughts? My only contribution to this puzzle is a negative one. We can be certain that high creativity is not just a matter of 'breaking the rules'. There are many ways to break the rules of any genre: almost all of them are uninteresting and aesthetically unappealing. Geniuses need to know more, and to have this knowledge in a form that can control the generation of new ideas.


Allen, R. B. Generation of Verbal Descriptions and Action Sequences with Connectionist Networks. Mimeo, Bell Communications Research, Morristown, NJ, 1988.

Bateson, G., On Mind and Nature. London: Wildwood House, 1979.

Campbell, D., Blind variation and selective retention in creative thought as in other knowledge processes. Psychological Review, 1960, 67, 380-400.

Cork, C., Harmony by LEGO Bricks: A New Approach to the Use of Harmony in Jazz Improvization, Leicester: Tadley Ewing, 1988.

Dowling, W. J., Scale and contour: two components of a theory of memory for melodies. Psychological Review, 1978, 85, 341-354.

Forte, A., Tonal Harmony in Concept and Practice. 3rd Edn. New York: Holt, Rinehart, and Winston, 1979.

Gazdar, G., Klein, E., Pullum, G. and Sag, I., Generalized Phrase Structure Grammar. Oxford: Basil Blackwell, 1985.

Hanson, S. J. and Kegl, J., Grammar learning. In Proceedings of the Annual Conference of the Cognitive Science Society. HiIIsdale, NJ: Lawrence Erlbaum Associates, 1987.

Helmholtz, H. von, Treatise on Physiological Optics. 3rd Edn. Optical Society of America, 1897.

Johnson-Laird, P. N., The Computer and the Mind: An Introduction to Cognitive Science. London: Fontana. Cambridge, MA: Harvard University Press, 1988a.

Johnson-Laird, P. N., Freedom and constraint in creativity. In Sternberg, R. J. (Ed.) The Nature of Creativity: Contemporary Psychological Perspectives. Cambridge: Cambridge University Press, 1988b.

Jordan, M. I., Serial order: A parallel distributed processing approach. Institute for Cognitive Science, University of California, San Diego, Report No. 8604, 1986.

Kuhn, T. S., The Structure of Scientific Revolutions. 2nd Edn. Chicago: University . of Chicago Press, 1970.

Levitt, D. A., A melody description system for jazz improvization. M.Sc. Thesis, Department of Electrical Engineering and Computer Science, MIT, 1981.

Longuet-Higgins, H. C. and Lee, C. S., The rhythmic interpretation of monophonic music. Music Perception, 1984, 1, 424-441.

Marr, D., Vision. San Francisco: W. H. Freeman, 1982.

Parsons, D., The Dictionary of Tunes and Musical Themes. Cambridge: Spencer Brown, 1975.

9. Jazz Improvisation: A Theory at the Computational Level


Perkins, D. N., The Mind's Best Work. Cambridge, MA: Harvard University Press, 1981.

Roidinger, A., Der Kontrabass tm Jazz. Reihe Jazz II. Vienna: Universal Edition, 1980.

Rumelhart, D. E. and McClelland, J. L., Parallel Distributed Processing:

Explorations in the Microstructure of Cognition, Vol. I: Foundations. Cambridge, MA: Bradford Books, MIT Press, 1986.

Simonton, D. K., Genius, Creativity and Leadership: Historiometric Inquiries.

Cambridge, MA: Harvard University Press, 1984.

Skinner, B. F., Science and Human Behavior. New York: Macmillan, 1953. Steedman, M. J., A generative grammar for jazz chord sequences. Music Perception, 1982, 2, 52-77.

Sudnow, D., Ways of the Hand. London: Routledge & Kegan Paul, 1978. Ulrich, J. W., The analysis and synthesis of jazz by computer. Fifth International Joint Conference on Artificial Intelligence, 865-872, 1977.

Representing Musical Structure


edited by


University College London London, UK


Royal Holloway and Bedford New College Egham, Surrey, UK


University of Cambridge Cambridge, UK.

ACADEMIC PRESS Harcourt Brace Jovanovich, Publishers

London San Diego New York Boston

Sydney Tokyo Toronto

( ~;w

ML-- '3~3cg .RL1'37

i 9'll

ACADEMIC PRESS LTD 24-28 Oval Road, London, NW1 7DX, UK.

United States edition published by ACADEMIC PRESS INC

San Diego, CA 92101.

Copyright © 1991 by ACADEMIC PRESS LIMITED

All rights reserved. No part of this book may be reproduced in any form by photostat, microfilm, or any other means, without written permission from the publishers.

British Library Cataloguing in Publication Data Howell, P.

Representing Musical Structure 1. Music. Analysis

I. Title II. West, R. III. Cross, I. 781

ISBN 0-12-357171-5

Typeset by Mathematical Composition Setters Ltd, Salisbury, Wiltshire

Printed in Great Britain by T J Press Ltd, Padstow, Cornwall