You are on page 1of 15

CREATIVITY, CHAOS AND ARTIFICIAL INTELLIGENCE

ANTHONY DEKKER
National University of Singapore
and
PAUL FARROW
University of Queensland

1. Introduction
What is creativity, and how is it produced? In this paper we provide a limited answer
to these questions, and provide suggestions for the design of a creative AI system.
We define creativity, somewhat similarly to George (1979), as the ability to
solve problems by generating novel ideas which do not follow from the problem by
formal deductive steps.

2. Creativity and chaos


We view ideas as coming from a 'universe' which is an N-dimensional space. In
other words, ideas can be described by a finite group ofN numbers. Kohonen (1989)
describes how neural networks represent information in this way. The International
Code of Diseases (lCD-9) classifies medical conditions in a similar fashion, as a
group of numbers describing etiology, symptoms, etc. In principle, all ideas could be
represented in this way, for N sufficiently large, although producing a sufficiently
general classification scheme would be impossible in practice. Problem-solving
requires searching this space for a particular point satisfying certain conditions.
A discrete formal system is restricted in its ability to do this because its rules
effectively confine it to a set of branching paths which cover only part of the space.
Indeed, this may explain the limitations of classical AI systems noted in Mero
(1990). Mero argues that the performance of AI systems in all areas plateaus at
about the same level, which he calls 'candidate master'. True expertise, which does
not simply consist of rules, seems to require something beyond a formal system.
Indeed, human experts can respond accurately to very complex problems by pattern
recognition without necessarily being able to describe their knowledge in the form
of rules.
A random search through an idea space will eventually reach sufficiently close
to the desired point, but may take a very long time, as a space containing all possible
ideas, even in a restricted domain, will be truly enormous. A mathematically chaotic
function (Gleick, 1987; Devaney, 1989) can supply a source of random numbers.
In order to rapidly reach the desired point we must mix deterministic rules with

217
T. Dartnall (ed.), Artificial Intelligence and Creativity, 217-231.
© 1994 Kluwer Academic Publishers.
218 AN1HONY DEKKER AND PAUL FARROW

chaos, giving a 'strange attractor'. Indeed, on reflection our brains must do this-
pure chaos and rigid rules both exclude creative thought. The relationship between
creativity and strange attractors is supported by the experiments on perception and
brain function reported in Freeman (1991) and May (1989). The theory of evolution
explains the creative origin of species by a similar mixture of randomness and rules.

3. Chaos and infonnation


The mathematical study of chaos describes three possible outcomes for a dynamical
system: periodic attractors, chaos, and strange attractors. A periodic attractor can
be visualised as a fly in a room containing a honey-covered pendulum. The fly's
trajectory in space will converge deterministically to that of the pendulum. As a
result, the information in the fly's brain describing position, intended direction of
flight, etc is lost. The data it is replaced with is known a priori since it refers to
the pendulum, and hence contains no information, in the mathematical sense. A
collection of periodic attractors can act as a memory system, with convergence to
an attractor from a starting position performing the function of memory retrieval.
In contrast, a chaotic fly is one whose long-term behaviour is unpredictable,
even with detailed knowledge of its brain. As a result, information in the fly's brain
is replaced by new information, Le. chaos is a way of generating new (random)
information.
A strange attractor combines the two cases above: convergence to a particular
region of space and the ability to act as a memory system is combined with unpre-
dictabilityand the generation of information. Weather patterns are often strange at-
tractors, combining an element ofregularity with an unpredictability which makes
accurate long-term accurate weather prediction impossible, even with powerful
computers. Nicolis (1991) argues that aspects of the brain's information-processing
behaviour can be modelled as strange attractors. In the remainder of the paper we
will show that mixing determinism with chaos allows us to generate new informa-
tion which is relevant in some way to the problem at hand.

4. A creativity algorithm
We can formalise our creative search strategy in simplified form as follows. Later
we show how to refine the details using neural networks.

repeat
repeat
randomly select a suitable point in the idea space
until that point is useful for deduction;
perform further deductive steps
until a solution is obtained

This closely resembles the 'PO' lateral thinking technique of de Bono (1971). As
an example, consider a monkey in a room containing various items, and a bunch of
CREATIVITY, CHAOS AND ARfIFICIAL INTELLIGENCE 219

output

y = 0.16466

Fig. 1. Chaotic neural network showing weight values.

bananas hanging from the ceiling out of reach. One route to achieving the solution
(Le. the bananas) is to randomly stand on a box, and then deduce that the bananas
could be reached if the box was under them.

5. Chaos and neural networks

Biological neurons manipulate signals of continuously varying frequency in the


range of approximately 0-100 Hz, and can be modelled by a linear (weighting)
function of inputs composed with a sigmoid function. Within part of the input
range the neuron's response will be approximately linear, and Kohonen (1989)
shows how networks of neurons operating in their linear range can perform pattern
recognition and filtering of novel input. Paskand Curran (1982: 143-144) show how
neural novelty filters can recognise creativity in music performance. By combining
neurons acting in their linear range with neurons acting in their non-linear range
we can obtain chaotic behaviour. The Appendix shows the bifurcation diagram
characteristic of chaos which results as the weight J.l is varied in the neural network
shown in Figure 1. Frequencies are represented in arbitrary units from -2 to 3, and
the sigmoid function used is f(x) = (-2x il 3+ 3Xil2 +36x - 6)/25. This function
is approximately linear in the range of frequencies 0 to 1, hence the choice of units.
For J.l = 2.3 the output of the network converges to a single value, for J.l = 2.6
it oscillates between two values, for J.l = 3.12 it cycles through three values, and
for Ii- = 3.24 it is chaotic. These chaotic properties of the network will also hold
for slightly different choices of sigmoid function. Figure 2 shows the so-called
'butterfly effect', typical of chaos, for this network. The vertical axis shows two
series of outputs of the network, for initial input values of 0.5 and 0.49999. As
time progresses to the right, the initially indistinguishable outputs diverge more
and more, until they become completely unrelated. This extreme sensitivity to
microscopic differences is what makes chaotic systems unpredictable. In terms of
weather prediction, it means a butterfly flapping its wings could eventually lead to
220 ANlHONY DEKKER AND PAUL FARROW

Fig. 2. Butterfly effect for chaotic neural network.

the difference between a hurricane and a calm day, hence the name of the effect.

6. Creativity and neural networks

Kohonen (1989), Linsker (1990) and Ritter and Schulten (1987) show how a one-
or two-dimensional neural network can self-organise by learning to topologically
map an N-dimensional space, effectively reducing the space to a collection of
individual concepts. Each neuron in the map responds to a greater or lesser extent
to N-tuples of numbers provided as input. Figure 3 shows such a mapping for a
three-dimensional space. Neurons are shown as the intersections oflines, positioned
at the point in the space they respond to most strongly. Adjacent neurons in the
network are connected to give a •sheet' , which the Kohonen learning process has
curved to fill the entire space. The surface defined by the mapping is a space-filling
curve with a fractal dimension between two and three. As a result, a region in
the space can be described by indicating the neuron closest to that region, i.e. the
neuron which responds most strongly to inputs from that region.
Such a neural map may also have associative linkages between its components,
which are not shown in Figure 3. These linkages are connections between neurons
with particular weight values, which in general are subject to change by learning.
They encode relationships between the concepts or regions of idea space which the
neurons represent.
We can use such a map to perform the creative selection step in our algorithm
as follows:
1. Partially activate those regions of the network relevant to the problem using
the associative linkages between neurons.
2. Use a collection of N chaotic networks to produce an N-tuple of random
numbers.
3. Provide the N-tuple as input to the map, thus causing the activation of the neuron
in the network which responds most strongly. Neurons in the partially activated
CREATIVITY. CHAOS AND ARI1FICIAL INTElLIGENCE 221

Activated
.At.:::::~M~--::;;=:;;<::-;(t- Region

Chosen
Point

Fig. 3. Artist's impression of creativity process.

regions will be more likely to be activated, thus resulting in the selection of


a neuron in the network close to the chosen N-tuple and within an activated
region if possible. The result of this process is shown in Figure 3.
Such a use of neural networks allows us to make a random choice modified by
experience and with relevance to a particular problem. We believe that this is the
underlying mechanism of creativity. The usefulness of the choice can subsequently
be recognised either by pattern recognition, or by attempting to use it as a basis for
deduction.

7. A two-level system
Having outlined a possible neural implementation of creati vity in the brain, we now
propose a two-level system that can apply this to practical AI. Contrary to other two-
level systems, our main level is that of rule-based behaviour (a theorem-proving
system), and our meta-level is a neural network providing creativity. We assume
a hashing function (Aho et al., 1983) mapping terms to N-tuples of numbers, thus
providing a connection between the two levels. We also assume a neural network
which has self-organised by learning many terms, as well as developing associative
222 ANTHONY DEKKER AND PAUL FARROW

Given a > 0, a < b, c > 0, a > 0 & b > 0 & c > 0 =? f( a, b, c)


To show f(a,b,c)
Axioms 1. sex) > 0 2.s(x) > x
Rules 3. x < y =? Y > x 6. x> y =? s(x) > s(y)
4. x> y =? Y < x 7. x < y & y < z =? x < z
5. x > y =? s( x) > y
Fig. 4. An example formal deduction system.

links between related terms. The network will thus cover the space of terms it has
been taught, with the associative links recording relationships between the tenns.
The theorem-prover can operate mainly by fonnal deduction, but when it is not
able to choose a rule, it invokes the creative network. For example, consider the
problem shown in Figure 4.
Working backwards we can deduce that to prove f( a, b, c) it is sufficient to
prove b > 0, and using rule 3, this can be derived from 0 < b. We now can choose
either rule 4 or rule 7 to prove 0 < b, and in the face of uncertainty (since we don't
know that proving 0 < b is of benefit) we ask the network for a choice of rule.
Activating rules involving b, 0, and < it is plausible that it will choose rule 7 (with
o < a & a < b =? 0 < b), thus creating 0 < a and a < bas subgoals. Since a < b
is given, all that remains is to prove 0 < a from a > 0 using rule 4.

8. Experimental results-Experiment 1
Our simplest experimental example involves a universe of two-word sentences such
as 'eat chair', where each word is encoded as a number between 0 and 1. Sentences
are thus points in a 2-dimensional space, as shown in Figure 5. The first word in
each sentence provides the horizontal coordinate, and the second word provides the
vertical coordinate. A I-dimensional neural network of 42 neurons was trained to
map the following subset of 19 sentences:
KT climb table ST see table
KC climb chair SC see chair
KX climb box SX see box
GC get chair SB see banana
GX get box SO see orange
GB get banana EB eat banana
GO get orange EO eat orange
TB table banana XB box banana
TO table orange XO box orange
TX table box
The words 'table' and 'box' are being used here as both nouns and verbs. Figure
6 shows how the trained network spans the given set of sentences. Neurons are
shown as dots positioned at the point in the space they respond to most strongly, and
CREATIVITY, CHAOS AND ARTIFICIAL INTELLIGENCE 223

TX KX GX SX

KC GC SC

KT ST

TO XO EO GO SO

TB XB EB GB SB

Fig. 5. Training subset of sentence universe for Experiment 1.

adjacent neurons in the network are connected, producing a 2-dimensional version


of Figure 3. This network was obtained using the modified Kohonen learning
algorithm presented in Hecht-Nielsen (1990), starting from the initial network
shown in Figure 7.
Essentially the network provides an enumeration of the sentences-but a mean-
ingful one resembling a library classification system in that similar sentences are
close together in the enumeration. This meaningful enumeration is especially useful
when coupling multiple networks together in stages, since the network essentially
simplifies a pair of numbers (presented as input) to a single number, representing
the position in the network of the neuron responding most strongly to the input.
In addition to the Kohonen learning process, associative links were established
between sentences which shared a word. These associative links are not shown in
Figure 6. In a more complex system, associative links would be expected to involve
semantic relationships, rather than the syntactic one used here.
We can express a version of the Monkey and Banana problem in terms of our
subset of sentences as:
ST see table
SX see box
SB see banana
EB eat banana
224 ANTIlONY DEKKER AND PAUL FARROW

Fig. 6. Result of Kohonen learning process for Experiment 1.

TX
III KX

KC
GX

GC
SX

SC

!cr ST

TO XO EO GO SO

TB XB EB GB SB

Fig. 7. Initial network for Kohonen learning process.


CREATIVITY, CHAOS AND ARI1FICIAL INTELLIGENCE 225

Fig. 8. Random inputs to neural network for Experiment 1.

With the desired solution being:

GX get box
TX table box
KT climb table
KX climb box
GB get banana

The associative links between neurons were used to propagate activity to


problem-related neurons, by activating the neurons corresponding to the problem
sentences ST, SX, SB and EB, and propagating the activity along the associative
links. These four neurons were then inhibited to discourage their being chosen.
Random inputs were then provided to the network. The random numbers used were
obtained by selecting every 23rd value produced by the chaotic neural network
shown in Figure 1 (with J-l = 3.2398). As a result of the propagated activities, a
random input half way between the KT and GO neurons would result in activation
of the KT neuron, which is problem-related.

The neurons activated by the 50 random inputs to the network shown in Figure
8 corresponded to the following sentences:
226 ANlHONY DEKKER AND PAUL FARROW

Useful output: TX table box 16 times


GX get box 9
KX climb box 4
GB get banana 3

Useless output: TB table banana 7


TO table orange 3
SO see orange 2
XB box banana 1
EO eat orange 1
SC see chair 1

Random output: GO get orange


GC get chair
KO climb orange
Thus 64% of outputs were useful and problem-related, 30% were problem-
related but useless for this particular problem, and 6% were completely unrelated to
the problem. Note that one randomly-selected neuron corresponded to the untaught
sentence 'climb orange'. Although simple, this example shows how our algorithm
can make random choices which are usually of relevance to the problem at hand.

9. Experimental results-Experiment 2
Our second experiment involves a network of 120 neurons trained with all 50
sentences of the form x < y or x > y, where x and y are one of 0,1, a, b, or
c. These sentences were encoded as triples of numbers. Strong associative links
were provided between x < yand y > x, medium-strength links were provided
between x < x or x > x and 1 < 0 (which was used to represent 'False'), and
weak links were provided between x < y and y < z, and between x > y and
y > z. These associative links were intended to reflect knowledge about proofs
using inequalities. In a more sophisticated system, the associative links would be
created as a result of training with successful proofs. The particular problem at
hand is shown in Figure 9:

Given a>l,b>l,b<c
To show c>O
Axiom ZERO. 0<1
Rules COMl. x<y=:}y>x FLSE. x<x=:}l<O
COM2. x>y=:}y<x TRAN. x<y&y<z=:}x<z
Fig. 9. Formal deduction system for Experiment 2.

Two possible proofs of c > 0 in this system are:


CREATIVITY, CHAOS AND ARI1FICIAL INTELUGENCE 227

1. b>l given 1. b>l given


2. l<b byCOM2 2. 1 < b byCOM2
3. 0<1 axiom 3. b < c given
4. O<b byTRAN 4. 1< c byTRAN
5. b<c given 5. 0<1 axiom
6. O<c byTRAN 6. O<c byTRAN
7. c>O byCOMI 7. c>O byCOM1
Useful non-trivial sub-goals in these proofs are 0 < band 1 < c. Also useful are
b > 0 and c > I, although proofs using these as subgoals will have unnecessary
uses of the rules COM1 and COM2.
The associative links between neurons were used to propagate a wave of activity
to problem-related neurons from the neurons corresponding to a > I, b > I,
b < c, c > 0 and 0 < I, followed by a wave of inhibition of smaller duration.
Inhibition was also propagated from the neuron corresponding to 1 < O. As a result,
activation was only provided to non-trivially problem-related neurons, which tended
to correspond to true or unprovable sentences. The neurons activated by 50 random
inputs to the network corresponded to the following sentences:
Useful output: c> 1 10 times
1< c 2

Unprovable output: a<c 16


b<a 8
b>a 1

True, useless output: O<a 2


a> 0 1

Trivially true output: 0<1 1


False output: b> c 5
O>c 2
b<O
c<b
Thus 24 % of outputs formed useful subgoals, 50% were unprovable sentences,
18% were false, 6% were non-trivially true but not useful for the problem at hand,
and 2 % were trivially true. Essentially this network produces moderately intelligent
guesses which tend notto be trivially true or false (only 20% of outputs fell into these
categories). Naturally the network cannot distinguish between true and unprovable
sentences. However, a practical theorem-proving system can use the responses as
tentative sub-goals, abandoning the proof and trying another guess if progress is
not made. There is also the possibility that attempting to prove a guess such as
a < c will result in incidental progress in the main proof.
228 AN1HONY DEKKER AND PAUL FARROW

10. Experimental results-Experiment 3

In our third experiment, we extended the universe of Experiment 2 to contain


the rules COM1, COM2 and TRAN, in addition to sentences of the form x < y
and x > y, and provided associative links between sentences and rules. Strong
associative links were provided between x < y and the COMl and TRAN rules,
and between x > y and the COM2 rule. These additional associative links were
intended to reflect knowledge about sentences to which a rule could be applied. The
network of 120 neurons was trained with the 50 sentences and 3 rules, and used
as the basis of a simple theorem-prover, with the network providing the choice of
rule to use, as suggested in the proposed two-level system above. Initially the three
givens and the axiom were taken as already proved.
The associative links between neurons were used to propagate a wave of activity
to problem-related neurons from the neurons corresponding to sentences already
proved, and the goal sentence (c > 0). Inhibition was also propagated from the
neuron corresponding to 1 < 0, to discourage false sentences. In order to choose
a rule, neurons not corresponding to rules were then inhibited. If the rule chosen
was COMl, the list of sentences already proved was scanned, and y > x was
added for every sentence x < y already proved. Similarly if the rule chosen was
COM2, y < x was added for every sentence x > y already proved. In either case,
the rule chosen was inhibited on the next step, to prevent it being chosen twice in
succession.
If the rule chosen was TRAN, a random sentence x < y was also chosen, by
propagating activity from the neurons corresponding to sentences already proved, as
well as the goal sentence and the TRAN rule itself. Inhibition was again propagated
from the neuron corresponding to 1 < 0, to discourage false sentences. In order not
to chose a rule, neurons corresponding to rules were then inhibited. If the randomly
chosen sentence was already proved (which the use of propagated activities was
designed to encourage), the list of sentences already proved was then scanned, and
z < y or x < z was added for every sentence z < x or y < z already proved. The
entire proof process was continued until the goal (c > 0) was proved.
The proof of c > 0 produced by the network was as follows:
Initially proved (givens and axiom) a> l,b > l,b < c,O < I
Rule chosen: COMI proving: c > b, I > 0
Rule chosen: COM2 proving: I < a, I < b
Rule chosen: COMl not applicable
Rule chosen: TRAN
with sentence: I < b proving: 0< b, 1 < c
Rule chosen: COMl proving: b> 0, c > I
Rule chosen: COM2 not applicable
Rule chosen: TRAN
with sentence: 0<1 proving: 0< a,O < c
Rule chosen: COMl proving: a>O,c>O
CREATIVITY, CHAOS AND ARTIf1CIAL INTELLIGENCE 229
Performance was clearly quite good, at least for this simple example. For more
complex proofs, this simple theorem-prover could be combined with one which
worked backwards from the goal, and a network, such as that described in Ex-
periment 2, which would guess sensible sub-goals, thus breaking the problem into
manageable parts.

11. Experimental results-Experiment 4


Our final experiment involves a network of 12 neurons trained with words encoded
as 6-tuples of numbers giving the length, first 4 letters, and last letter. The network
was trained with the sentence:
what is creativity and how is creativity produced

Since only 5 letters of a word were encoded, unknown letters were replaced
with 'z' to give:
what is creazzzzzy and how is creazzzzzy prodzzzd

The network learned to map these words, and associative links were provided
between words which followed each other in the training sentence. Each random
input to the network was biased by the associative link with the previous output,
producing the following result:
how is creazzzzzy and how is creazzzzzy and
how is creazzzzzy prodzzzd
what is creazzzzzy prodzzzd
nrrkzi
and how is creazzzzzy prodzzzd
nrrkzi nrrkzi prodzzzd nrrkzi prodzzzd
what is creazzzzzy prodzzzd

Effectively, the network has learned to babble randomly, based on the text
provided to it. Output begins with a random word, which is followed by other
words which plausibly follow it. Since there is no information on words following
'produced' that word is followed by another random word (which in some cases is
the untaught word 'nrrkzi '), and the process is repeated.
Better results would be obtained by encoding words in a three-stage process:
first phonemes encoded as described in Kohonen (1989), then syllables encoded
as groups of phonemes, and finally words encoded as groups of syllables. This
would make similarity between words more meaningful than in this experiment.
Associative links should also reflect semantic links to shared concepts and e.g.
visual input, in addition to grammatical information from sample speech. The
grammatical information itself should provide more context than the immediately
preceding word. With such extensions, the network should creatively produce
plausible speech. This might provide a way of automating some of the functions of
politicians.
230 ANTHONY DEKKER AND PAUL FARROW

12. Conclusion
We have suggested a technique for using experience to guide random choice in
a neural network. We believe that this technique provides a step in the direction
of machine intelligence and creativity. We have demonstrated by experiment how
it can be used to provide problem-related guesses which are useful in theorem-
proving and other areas. The final word on what aspects of human behaviour are
amenable to automation will, however, probably come only after many more years
of experimentation with artificial neural networks.

Acknowledgements
Many thanks are due to Marilyn Ford, Takashi Kato and Terry Dartnall for useful
discussions on neural networks; to Andrew Rock, whose GetThePicture desk ac-
cessory was used to produce Figures 2, 5, 6, 7 and 8 from computer output; and to
Pushkar Piggott, who provided valuable programming assistance with Experiment
4 and helpful comments on the text of the paper.

References
Aho,A. V., Hopcroft,I.E. and Ullman, I. D.: 1983,DataStructuresandAlgorithms,Addison-Wesley.
Aihara, K.and Matsumoto, G.: 1986, Chaotic oscillations and bifurcations in squid giant axons, in
Holden, A. V. (ed.), Chaos, Manchester University Press, pp. 257-269.
de Bono, E.: 1971, Lateral Thinkingfor Management, Penguin.
Devaney, R. L.: 1989, An Introduction to Chaotic Dynamical Systems, Addison-Wesley.
Freeman, W. I.: 1991, The physiology of perception, Scientific American, 264(2): 34-41.
George, F. H.: 1979, Philosophical Foundations of Cybernetics, Abacus Press, Tunbridge Wells,
Kent.
Gleick, I.: 1987, Chaos: Making a New Science, Cardinal/Sphere Books.
Hecht-Nielsen, R.: 1990, Neurocomputing, Addison-Wesley.
Kohonen, T.: 1989, Self-Organization and Associative Memory, Springer-Verlag, Berlin.
Linsker, R.: 1990, Self-organization in a perceprnal system: how network models and information
theory may shed light on neural organization, in Hanson, S. I. and Olson, C. R. (ed.), Connectionist
Modeling and Brain Function, MIT Press, pp. 351-392.
May, R.: 1989, The Chaotic Rhythms of Life, New Scientist, 18 November, pp. 21-25.
Mero, L.: 1990, Ways of Thinking: The Limits ofRational Thought and Artificial Intelligence, World
Scientific.
Nicolis, J. S.: 1991, Chaos and Information Processing: A Heuristic Outline, World Scientific.
Pask, G and Curran, S.: 1982, Micro Man, Cenrnry Publishing, London.
Ritter, H and Schulten, K.: 1987, Extending Kohonen's self-organizing mapping algorithm to learn
ballistic movements, Proceedings of the NATO Advanced Research Workshop on Neural Comput-
ers, Springer-Verlag, Berlin, pp. 393-406.
CREATIVITY, CHAOS AND ARTIFICIAL INTELLIGENCE 231

13. Appendix-Bifurcation diagram for neural network

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.'


1.0

2.240 I
2.2601
2.280 I
2.300 I
2.3201
2.3401
2.3601
2.3801
2.400 I
2.420 I
2.4401
2.4601
2.480 I
2.5001
2.5201
2.5401
2.5601
2.5801
2.6001
2.6201
2.6401
2.6601
2.6801
2.1001
2.7201
2.7401
2.7601
2.7801
2.8001
2.8201
2.8401
2.8601
2.8801
2.9001
2.9201
2.9401
2.9601
:'iil.9801
3.0001
3.0201
3.0401
3.0601
3.0801
3.1001
3.120/
3.1401
3.160 I
3.1801
3.2001
3.2201
3.240 , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

You might also like