00928240/85S3.00 + o.ofl Pergamon Press Ltd. 9 1985 Society for Mathematical Biology
CHAOTIC DYNAMICS OF INFORMATION PROCESSING: THE "MAGIC NUMBER SEVEN PLUSMINUS TWO" REVISITED
JOHN S. NICOLIS Department of Electrical Engineering, University of Patras, Greece ICHIRO TSUDA 'Bioholonics', Nissho Bldg. 5F, Koishikawa 4t424, BunkyoKu Tokyo 112, Japan In a wellknown collection of his essays in cognitive psychology Miller (The Psychology of Communication. Penguin, 1974) describes in detail a number of experiments aiming at a determination of the limits (if any) of the human brain in processing information. He concludes that the 'channel capacity' of human subjects does not exceed a few bits or that the number of categories of (onedimensional) stimuli from which unambiguous judgment can be made are of the order of 'seven plus or minus two'. This 'magic n u m b e r ' holds also, Miller found, for the number of random digits a person can correctly recall on a row and also the number of sentences that can be inserted inside a sentence in a natural language and still be read through without confusion. In this paper we propose a dynamical model of information processing by a selforganizing system which is based on the possible use of strange attractors as cognitive devices. It comes as an amusing surprise to find that such a model can, among other things, reproduce the 'magic number seven plusminus two' and also its variance in a number of cases and provide a theoretical justification for them. This justification is based on the optimum length of a code which maximizes the dynamic storing capacity for the strings of digits constituting the set of external stimuli. This provides a mechanism for the fact that the 'human channel', which is so narrow and so noisy (of the order of just a few bits per second or a few bits per category) possesses the ability of squeezing or 'compressing' practically an unlimited number of bits per s y m b o l   t h e r e b y giving rise to a phenomenal memory.
Central, amongst the aims of cognitive psychology, is the evaluation of human performance during the execution of mental tasks. In order to measure this performance in terms of bits of information it is necessary to regard the subject as an information channel. To this end a welldefined, finite set of alternative stimuli (input) is provided by the experimenter. Such stimuli may be strings of digits, letters, words, geometrical symbols, tones, pictures, etc. To each of these stimuli a definite a priori probability of occurrence is assigned (although in practice as far as the subject is concerned, all stimuli are presented with equal probability). Then, a welldefined, finite set of alternative responses (output) is selected
1. Introduction.
343
344
(motor, vocal, reading, calculating, writing, etc.). Finally, the stimuli are presented to the subject in a randomized order at a steady rate in time and the subject's response to each stimulus is recorded. From a large, statistically significant sample o f recorded data the measure of the transmitted information by the human channel can be evaluated as the i n p u t  o u t p u t (stimulusresponse) crosscorrelation. The 'channel capacity' of the subject is the upper limit of the extent to which the subject can match his responses to the given set of stimuli. Instead of increasing the rate of stimulus presentation the experimentergiving the subject all the time he w a n t s   m a s / choose to increase the number of alternative stimuli among which the subject has to discriminate and determine the critical set Nc b e y o n d which errors begin to occur. The logNc bits define under these circumstances the channel capacity. A tremendous number of experiments using the above methodology has been performed in the past 40 years and a very lucid summing up of some most representative cases has been offered by Miller (1974) in a classic collection of seven essays. In essence these essays refer to three distinct attributes of a human subject seen as an information processor: the determination of the span of immediate attention or shortterm memory; the span of absolute judgement; and the limit of selfembedding in natural language. Let us review quickly Miller's results: 1.1. The span o f immediate attention. This attribute refers to the maxim u m number of digits (the longest string) a normal adult can repeat without error. This number is about 78 digits. This 'memory span' is limited by the number of symbols and n o t by the amount of information that these symbols may represent. 1.2. The span o f absolute ]udgement. This has to do with the maximum number of alternative stin~uli b e y o n d which confusion and errors are reported. The alternative stimuli may differ mutually in one or more parameters (dimensions). No matter h o w many alternative stimuli (differing along one dimension only, e.g. pitch) the subject is asked to evaluate, the best we expect him to do is to assign them to about 67 different classes without error. Or, if we k n o w that there are N alternative stimuli, then the subject's judgement enables us to narrow down (or 'compress') the particular stimulus to one out of ~ N / 7 . The addition o f dimensions to the stimulus (pitch and loudness) increases the channel capacity but at a decreasing rate. (This happens even when the dimensions along which the stimuli differ are not independent.) In other words we increase the total capacity at the expense of accuracy for any particular dimension (parameter). Therefore we can make relatively crude judgements of several things simultaneously. It is tempting to assume that in the course o f evolution the success of
345
the organism has to do with the ability to respond to the widest possible range o f stimuli. In order to survive in a fluctuating environment it is better to have little information about a lot of things than to have a lot of information about few things (e.g. a restricted part of the environment). At any rate it seems that there is a clear and definite limit in the accuracy with which we can identify absolutely the magnitude of a unidimensional stimulus variable: we call this limit, after Miller, the 'span o f absolute judgement'. This is usually somewhere in the neighbourhood o f seven alternative stimuli (or log27 "" 2.5 bits). So there is a span of absolute judgement that can distinguish about seven categories and there is a span of attention that can encompass about seven objects or symbols at a glance. Miller is quick to appreciate however that in spite o f the appearance o f the 'magical number seven plus minus two' in both human attributes, the span of absolute judgement and the span of immediate m e m o r y are quite different kinds of limitations that are imposed on our ability to process information: absolute judgement is limited by the a m o u n t of information while immediate memory is limited by the n u m b e r of items. Using hierarchical coding we can in principle squeeze on a given item more and more bits. So, the span of immediate m e m o r y seems to be almost independent of the number of bits per symbol; with proper coding our ability to process information in terms o f channel capacity may very well increase from 2.5 to 10 or 25 bits (Miller, 1974). In the words of Miller the 'magical n u m b e r seven' may very well be only "a pernicious Pythagorean coincidence". In spite of the above cautionary remarks we i n t e n d   r e f e r r i n g to a given hierarchical level of c o g n i t i o n   t o pursue the above 'coincidence' as a s y m p t o m o f a more deep and fundamental underlying dynamics. The fact is that our pitifully small capacity for processing information has made necessary for us to develop ways of 'recoding' or abstracting the essential features of the perceived stimuli in a hierarchical fashion thereby expressing our experiences in the form of simple 'laws' or with brief words: it is this demonstrable ability to dynamically store and subsequently compress variety or to map information from higher to lower dimensionality state space, which gives us the impetus to propose (next section) a dynamical model which in principle takes care, in one stroke, o f several seemingly unrelated attributes o f human information processing. 1.3. The recurrent (selfreferential) character o f natural (and artificial) languages, and the maximum number o f discriminable selfembeddings. As we know, learning a language does not involve memorization of particular strings of symbols but rather knowledge of rules for generating admissible strings o f symbols (Miller, 1974).
346
These rules are given in terms o f highly nonlinear recursive algorithms. In artificial languages we also possess recursive subroutines whose use aims at increasing resolution via successive iterations. Occasionally the main routine is interrupted to perform a subroutine and then we return to the main routine when the subroutine is finished. It is possible also for one subroutine to refer to another subroutine. Let us see, however, what happens when a subroutine is referring to itself. The 'computer' interrupts the main routine to go into subroutine S. In the middle o f execution S however, before S is completed, the 'computer' is instructed to stop what it is doing and execute subroutine S. No difficulty arises until subroutine S is completed and the 'computer' must decide where to resume its work. Unless special attention is taken in writing the programme, the 'computer' will not be able to remember that the first time it finishes subroutine S it must reenter subroutine S and the second time it finishes subroutine S it must return to its main routine. The situation gets more difficult with subroutine S calling on itself repeatedly; for, each time it does so, a new reentry address has to be remembered. It is also a feature o f natural language that sentences can be inserted inside sentences. Miller gives the following example: The King who said "My Kingdom for a horse" is dead (one embedding). The person who cited "The King who said "My Kingdom for a horse" is dead" as an example is a psychologist (two embeddings). The audience who just heard "The person who cited "The King who said "My Kingdom for a horse" is dead" as an example is a psychologist" is very patient (three embeddings). As the recursive process goes on beyond say the fourth or the fifth selfembedding 'chaos' sets in literally and actually; human subjects are thus very poor also in dealing intelligibly with more than a few recursive interruptions. We may ask now: Is there any c o m m o n feature in the linguistic attributes 1.11.37 In this paper we propose a dynamical model o f an information processing scheme which ensures compressibility o f (environmental) strings of digits and also fixes an optimum resolution (that is an optimum code wordlength) which maximizes storage capacity. We show in specific examples that the optimum (average) length o f such code is o f the order o f 48 bits. Furthermore, the proposed scheme ensures that increasing resolution beyond a certain limit amplifies microscopic noise thereby destroying compressibility.
2. Possible Role o f Dissipative Chaotic Dynamics in Information Processing Generalities. Reliable information processing rests upon the existence
347
of a 'good' code or language: namely a set of recursive rules which generates and stores variety (strings of symbols) at a given hierarchical level and subsequently compresses it, thereby revealing information at a higher level. To accomplish, this task a languagelike good m u s i c   s h o u l d strike at every m o m e n t an optimum ratio of variety (stochasticity) vs the ability to detect and correct errors (memory). Are there any dynamics available today which might model this dual objective in state space? The answer is in principle, yes (see for example Nicolis, 1982a, b, 1984, 1985; and Nicolis e t al., 1983). Consider the class of dynamical systems described by at least three coupled firstorder ordinary nonlinear differential equations whose repertoire includes (for different sets of values of the control parameters) multiple steady states, stable periodic orbits (limit cycles), tori and strange chaotic attractors. We may consider that variety is generated when the volume in the state space of the system expands along certain directions through the dynamical evolution (thereby decreasing resolution) and gets compressed when the volume in state space occupied by the flow contracts (increases resolution) along other directions towards a 'compact' ergodic s e t   t h e attractor. A strange attractor of a threedimensional system for example creates variety along the direction of his positive Lyapounov exponent X+ and constrains variety (thereby revealing information) along the direction of his negative Lyapounov exponent X_. Since [XI > IX+I the attractor is acting literally as a 'compressor', o n the average. (The Lyapounov exponents determine the average amount of information production (X > 0) or dissipation (X < 0) per unit time.) The instability of motion on a chaotic attractor is conveniently illustrated in the onedimensional map on the interval which is constructed as a Poincar6 return map of the attractor. This type of 'analogtodigital conversion' is accomplished by parametrizing the attractor along a onedimensional cut and plotting the position a trajectory crosses the cut vs the position it crosses the next time around the attractor. This may give rise to a Markov chain whose number of states depends on the partition of the interval (for a simple example see Appendix A). Following Shaw (1981), we point out that the change in observable information is generally given by the log2 of the ratio of states N(t) indistinguishable before and after some time interval Nf 2xI = log2 Ni
or
Vf
"~ log2 Vi
348
where Vf and Vi are the final and initial volumes in state space. The rate of information creation or dissipation is given as d/ dt 1 dV V dt 1 dE Z dt"
In nonchaotic systems the sensitivity o f the flow on the initial conditions grows with time at most polynomially say as Z(t)  t" ; then
d/ n
dt
t"
The variety creation rate from such a system converges to zero as time passes, so the system's behaviour is predictable. Concomitantly, the set of generated strings is formed by rational numbers. In chaotic systems the sensitivity o f the flow on the initial conditions grows with time exponentially, say as ~(t) "~ e"t and (1/ dt Such a system is a continuous information source. This information is not implicit in the initial conditions but is generated by the flow itself in state space. A chaotic dynamic creates strings o f digits which constitute aperiodic oscillations or segments of irrational, noncomputable numbers. If such a chaotic system is observed by another cognitive system at a sampling rate less than n, no prediction o f the system is possible. We are interested here in cognitive systems characterized by nonconservative compact flows where the p h e n o m e n o n o f attraction, that is o f dimensionality compression (see next section), is possible. Let us argue in terms of an appropriately constructed Poincar~ map for the flow. For a typical onedimensional map in the interval y = F(x) the probability density P(x) for finding the orbit at x can be estimated via successive iterations, from the recursive relation
2nI
P.(x)
=
i=1
ex(xt)
xi = ( F n1)1 (y),
d ~ xi
(n = 2, 3 , . . . ) ,
x(n + 1) = Fix(n)],
349
where the first p.d.f. Pl(x) can be taken arbitrarily and the resulting P ( y )  corresponding to the initial Pl(x) transformed by the action of the m a p  is the first approximation. Successive iterations of the map will produce closer and closer approximations to the correct asymptotic function of P(x). If the map possesses a stable steady state or a number of stable periodic orbits, P(x) will converge to a sharp ffunction spike or a series of flike functions on the periodic points. In the chaotic regime of the control parameters the map will possess a continuous P(x) with interspersed spikes. The information change ~ r per iteration of the particular initial point x on the interval will be determined as z2d = log2 and will amount to variety creation for slopes > 1 and information (dissipation) for slopes < 1. The average information change over the whole interval 0 ~< x ~< 1 will be given by
d_~x_) dx
bits.
If t(x) stands for the time between successive crosses through the return map, the average information production (or dissipation) rate can be defined as
log2
dx
bits/sec.
Finally if the initial condition from which the process starts is not known exactly (which is always the case),* i.e. it is determined by an a priori probability density distribution Po(x), one gets for the informational value of this initial condition the expression f S=jo
1
bits.
Then the 'memory' of the processor measured as the time elapsing until the flow gets completely disconnected from the initial conditions can be defined
as
* This initial condition is a point x 0 on the unit interval. It can be expressed as a digital n u m b e r . 0 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1 . . . ; with overwhelming probability this n u m b e r is incompressible.
350
S T  
f 1 P0(X)
Jo
log2
IPo(x)l dx
/ P(x) I sec. dx
;o t(x) log2
Beyond this limit, the iterative process becomes so fine that the processor amplifies essentially intrinsic microscopic noise due to the great resolution created by the cascade o f iterations. This noise cannot be 'smeared out' any longer and declares itself on the cognitive level thereby rendering any further measurement or observation unreliable. Computing simulation (Shaw, 1981) shows that in specific cases the number of iterations under which T is achieved may be rather small ( ~ 2030).
NDI 7    % N
where N is the dimensionality o f state space in which the attractor is emb e d d e d as a compact subset. Formally, in order to calculate D I o n e proceeds as follows: one partitions the Ndimensional state space in n(e) hypercubes o f size (degree o f resolution) e from each o f which the flow (or 'the trace o f the scanning path' of the processor) passes at least once. Let Pi(e) be a number proportional to the relative frequency with which the scanner visits the specific square i. Then the 'entropy' o f the digitized pattern can be calculated as
bits,
and as the resolution becomes finer and finer one may be interested in calculating h o w the distribution Pi(e) scales with resolution. One could obtain then the 'information dimension' or the bits required for the determination o f any point on the 'scanning curve' of the processor as
351
Y~ Pi(e) log2Pi(e)
i=1
D~ = lira c~0
'l~
ell
'
i.e. the asymptotic value of the slope of entropy vs resolution. Actually, it turns out that the calculation of Di is more feasible if, instead of starting from the above expression, one uses the relationship between the information dimensionality and the spectrum of the Lyapounov exponents of the flow (or the discrete map). In addition, it is important to calculate not only DI but also the standard deviation ADx as the set of second moments of a distribution whose first moments are the Lyapounov exponents (see Appendix B). Such a quantity would indicate h o w much fuzziness enters in the average degree of compressibility of information ~? = ( N  DI)/N, realized by the attractor in Ndimensional space. Since compressibility of a string of digits is the necessary prerequisite for the subsequent simulation of the time series involved, the importance of strange attractors as information processing models cannot be overlooked. Indeed for an attractor simulating some aspects of a cognitive system one has two basic requirements: big storage capacity and good compressibility. Let us turn first to the simplest attractors (in any Ndimensional state space) namely stable steady states and stable limit cycles, having information dimensionality of zero and one respectively. Consequently they are very poor as information storage units, but since they possess only nonpositive Lyapounov exponents are ideal as information compressing devices. Strange attractors on the other hand, via a harmonious combination of positive and negative Lyapounov exponents ~+, ~_, can in principle comply somewhat with both requirements. They may possess a considerable information d i m e n s i o n   a n d this idiom makes them suitable for information storage. On the other hand, being 'attractors', i.e. possessing also negative Lyapounov exponents ~_ such that L~[ > ;~+ they can serve at the same time as information compressors. The type of the strange attractor we intend to employ as a model here must be rather 'loose' in order to comply with the preceding experimental results. It must, in other words, possess a large information dimensionality in order to allow for maximum dynamical storage of variety. We are going to investigate the optimum resolution e under which this information dimensionality becomes maximum. The ensuing compressibility factor Cx = N  DI will be also investigated. Summarizing, where our primary aim is compressibility the best attractors around are steady states and limit cycles. If, however, our data ask for a model which can
352
ensure first o f all large storage ability without jeopardizing compressibility, a strange attractor is the preferred model. In cognition one presumably needs both types o f attractors working in tandem. But we are going to see in the next section that we can do better. Namely we can have just one attractor which under the action o f external noise switches from the one model to the other, thereby complying with both requirements for m a x i m u m compressibility and m a x i m u m storage capacity.
4. Specific Examples and Relevance o f the Model to Cognitive Psychology. Experimental Results. What is the relevance (if any) of the dynamical theory given in Sections 2 and 3 vis~tvis the experimental results reported in the Introduction, which essentially refer to empirical limitations of human attributes in information processing? To answer this question let us try to see if there exists in various specific cases o f dynamical models o f processors an optimum resolution e (different for each model) or an optimum codeword length for which the cardinal parameter in our theory, the information dimension of the attractor, becomes m a x i m u m without jeopardizing the average 'compressibility factor' NDI N
Let us consider the expression CI = N DI, representing the degree o f average compression realized by the particular attractor and try to calculate a critical resolution e* for which DI becomes m a x i m u m or, ~Ci/~e = O. To take into account the best case let us perform the above calculation for the m a x i m u m value o f
n(e)
~,
DI =
i=1
Pi(e) log2Pt(e)
1
(e finite).
log e The m a x i m u m value o f the information dimension under given finite resolution e is the 'fractal dimension' of the attractor and is realized for Pt(e) = const = 1/n(e). So, DI and In [n(e)] N i n e + Inn(e) G = N +  In e In e In [e~Vn(e)] In e In In(e)] In e
353
We intend now to express n(e) in terms of the Lyapounov exponents of the attractor and the resolution length e. Let the number of points on a (onedimensional) Poincar6 return map of the attractor concerned be M; this means that we determine the orbit with strings of M digits long. The number of cells representing the attractor, with a degree of coarseness e, is just n(e). Now, for the degree of resolution e to be meaningful, it is necessary that each of the possible outcomes of an orbit M digits long could be determined with precision. Since we deal here with systems whose dynamics can be adequately represented in each step by two possible states, say 0 and 1, the number of outcomes is clearly 2M. Hence e should be chosen such that
1
e~.,
2M"
On the other hand, let tr be the sampling time or, alternatively, with sampling time period At = 1,.the number of samplings (iterations) probing our attractor. For an attractor represented by M digits, it is therefore meaningful to choose
n(e)
M
tc ,
Owing to the chaotic character of the dynamics, after tr iterations in the map the system is completely disengaged from the initial condition. This means that tc can be estimated from the relation
ee ~tc ".. 1
where X+ is the positive Lyapounov exponent. So In 1/e X+ and finally (In 1/e) 2
n(e) 
X+ In 2
G=N+
In e
where
354
3' = In The requirement for maximum fractal dimension 3 C i / 3 e = 0 gives for the o p t i m u m resolution e* = exp [e (2~)/2 ] or the o p t i m u m code length In l/e* M*   In 2 from which, Ci(e*) = N  2e (2'y)/2 bits. (2) (1)
These are our final formulas for the minimum average compressibility, i.e. the maximum fractal dimensionality of the attractor concerned and the corresponding o p t i m u m code length M*. It remains to make sure that the extremum of the function
fie) = Di(e) =
3, + 2 ln(ln 1/e) In e
e 4= 0
and
32f(e)
~e 2
for
we get
(02f(e).~ = 2
\ 3   ~   ) ~=~.
which is always positive. S o l ( e ) possesses a minimum at e* and consequently D i ( e ) =   f i e ) becomes maximum. Di(e) becomes zero for e0 = exp(e't/2). Negative values o f DI do not possess o f course any physical meaning. Notice that the value of Di(e*) is much less than the fractal dimensionality DI calculated in the limit e + 0.
355
As a matter o f fact Di(e*) no longer refers to a geometrical object. Rather, it provides us with a convenient measure o f the efficiency o f an informationproducing device. Let us now apply formulas (1) and (2) to a number o f simple cases:
4.1. T h e L o r e n z a t t r a c t o r . 2 = o ( y   x ) , ;9 =   y + r x   x z , 2 = x y  b z ( f o r b = 4, a = 16 a n d r ~ 45.92). We calculate with the computer X+ ~ 1.5, ~ ~ 0.03895, C i ( e * ) ~ 3  0.72157 ~ 2.28, e* "~ 0.062 and
M* ~ 4. That is, the optimum code which maximizes storing capacity by the Lorenz attractor is ~ 4 bits long. Nevertheless the resulting compressibility is impressive too, m u c h larger than what one calculates from the information dimensionality with e + 0 [C(0) = 3  2.06 = 0.94]. 4.2. T h e B e r n o u l l i m a p . x ' = M o d a (2x). In this case of a onedimensional m a p   a s s u m e d here to be u n i f o r m   t h e number of covering segments o f resolution e is
n(e) ~  X+
In ( l / e )
G~
The requirement OC~/Oe = 0 gives for the o p t i m u m resolution e* ~ exp(e x+ln~'+) from which CI (e*) = N  e O+lnx+). The numerical calculations for X+ = 2 in this case give CI(e*) = 0.81606, e* = 0.004354 a n d M * ~ 7 to 8 bits. In summary, the strange attractor models show that after relatively few iterations on a map the trajectories become acausal, i.e. they get completely disconnected from the initial conditions. This allows internally generated or environmental random noise to amplify and declare itself, so to speak, on macroscopic levels. So, qualitatively we justify the observed fact that beyond a limited number of selfembeddings the strings or 'words' in a certain natural or artificial language appear really random.
4.3. T h e n o n  u n i f o r m c h a o t i c a t t r a c t o r s . B e l o u s o v  Z h a b o t i n s k y (BZ) a t t r a c t o r m o d e l . In a series of recent papers (Matsumoto and Tsuda, 1983;
Tsuda and Matsumoto, 1983; and Matsumoto, 1983) a model was developed where a transition from chaotic behaviour to ordered behaviour induced by external noise can be observed in a certain class of onedimensional mappings.
356
This transition is clearly shown in terms of: (1) a decrease o f the entropy of the orbits (as much as 6070% o f the original value at some critical noise level); (2) a decrease o f the L y a p o u n o v n u m b e r   a n d a passage of it from positive to negative values; (3) a sharpening of power spectrum; and (4) an altering o f the position o f the peaks of the invariant measure (the asymptotic probability density function of the map) as if the resulting dynamics after the imposition o f noise is roughly periodic. This unexpected result of changing a chaotic dynamics to a limit cycle (ordered behaviour) b y adding uniform external noise is attributed to a mechanism which deletes certain states in the symbolic dynamics (or the Markov partition) of the system. More explicitly this switching is attributed to the fact that for the category of the maps concerned here, the width o f the symbolic (Markov) dynamical states is very nonuniform, unlike the case of other maps like the logistic map for example. It has been pointed out (see Nicolis e t al., 1983) that in such highly nonuniform attractors the L y a p o u n o v numbers which are, per force, average quantities, no longer describe adequately the dynamics and, in particular, the information processing by the attractor. To go b e y o n d this average picture a n o n  u n i f o r m i t y f a c t o r (hereafter referred to as NUF) has been introduced. This quantity measures the dispersion of the local rate of divergence of nearby trajectories from its average value given b y the Lyapounov numbers (see Appendix B). In the presence o f uniform noise on the map the narrow gaps between states are literally smeared out and this results in a sudden entropy decrease. Above a certain critical noise level, the dynamics consist o f a few 'surviving' states. In some cases it happens that these surviving states form a nearly closed cycle, i.e. transition probabilities between states are particularly large o n l y for a sequence of transitions of the type al ~ a2, 9 9 9 , an ~ a l . Thus under the effect of noise the chaotic dynamics appears as periodic. After these introductory remarks let us explore then the possible usefulness of such a map in our problem. The map under discussion is expressed analytically as follows
f(x)=
[(0.125  x) 1/3 + 0 . 5 0 6 0 7 3 5 7 ] e
x + b ,
f(x)=
f(x)
[ ( x   0 . 1 2 5 ) 1/3 + 0 . 5 0 6 0 7 3 5 7 ] e x + b , ] 19 + b,
and simulates rather well the onedimensional Poincar6 map of the known BelousovZhabotinsky strange attractor (see for example Roux e t al., 1983; Roux, 1983). In this case for b = 0.0232885 one calculates X+ ~ 0.28, e* " 0.465, M* "~ 1 ~ 2, Dimax = 1.3083. So, C i ( e * ) < 0. This negative compression
357
simply means that the fuzziness around N  D1 exceeds the average compressibility. At this point, let us observe that, in the case of the BZ attractor the N U F is extremely large (see Fig. 1). In fact, for the logistic and the Henon strange attractors, the ratio NUF/X+  2, and for the Lorenz attractor NUF/X+  1 (Nicolis et al., 1983), whereas for the BZ strange attractor NUF/X+ ~ 10 to 20. (As one example, NUF/X+ = 3.05/0.28 " 11 for b = 0.0232885279.) This extremely large N U F can produce the nonuniformity of the Markov partition. Therefore, not only the expression Pi(e) = 1/n(e) but also the expression e ~ 1/2 M breaks down. This fact means that in the case of the BZ attractor alternative methods to compress information dynamically must be proposed. Here comes then a deus ex machina. 4.4. The role o f external noise. Every biological processor in general, and the human brain in particular, lives in a pool o f gentle environmental noise (wind, rain and thunder, family discussions, the radio and the pipelines of the neighbour, traffic noise, our own internal disturbances). Suppose we simulate our processor with a BZ like model. Then the problem is whether there exists an o p t i m u m 'dose' of external or internal 'distraction' which far from lowering our ability to compress information may act catalytically and increase it. Figures 13 answer this question in the case of the BZ nonuniform attractor. (The same issue in a somewhat slightly different form has been treated by Nicolis et al., 1975; and Nicolis and Benrubi, 1976.) Figure 1 displays the first and second m o m e n t of the local divergence rate of chaotic trajectories of the BZ map (that is, the L y a p o u n o v exponent and the NUF) as a function of the control parameter b, without external noise. There are windows of the parameter b, for which chaotic behaviour persists and windows within which periodic behaviour takes place. The addition of uniform noise (Fig. 2) has a smoothing effect: the Lyapounov exponent is always negative, while the NUF, far from being small, drops somewhat. In case o f a large N U F one understands that it is not meaningful to try to encode information with an optimum resolution since the variance around such an 'optimum' value will be extremely large (of the order of e* o r M * itself, or more). If on the other hand the N U F were small but a, that is the external noise level, strong, so that e < o* one could consider as an optimum code length the magnitude (see Appendix A): in 1/o*
M* ~ 
In 2
358
3,t6
2, 4.8 t.8
//"'it .// Ii
"P4P
El)I
, ~
E41
.4~DI~
Eet
.60000
E~I
.2 ,54 t.28
I ....
1 : =
.2,,,E_,,
'"
'" "
"
E8t
Figure 1. The NUF and the Lyapounov number X+ vs bifurcation parameter b in the BZ model. The numerical results of Figs. 13 were obtained by averaging over l 0 s iterations with double precision on a DEC VAX1 1 computer.
Finally, for a small N U F and e > e one could proceed roughly as before in the examples given in subsections 4.1 and 4.2. Nevertheless the dramatic drop o f entropy o f the orbit (Matsumoto and Tsuda, 1983) in the BZ model in the presence of external noise tempts us to consider this scheme as a good decoder, that is as a cognitive gadget which reduces the computational complexity required to calculate an irregular orbit. The noisedependence o f the N U F is shown in Fig. 3 for a specific value o f the control parameter. For a very small value o f the noise level ol the NUF passes through a sharp minimum. This level is much smaller as compared to the level a2 which after Matsumoto and Tsuda, chaos switches to order.
359
hb. 3.275i O 3 Z
//
/...,...
Eet
"/
,
E0t
j
,
Eel PARAMETER
,
Eet
~
Eel
E02
BIFURCATION
z I
j2,222!
E
E02
"
"
.:
"
"
"
"'~
:
"
"
Eel
Eet
E4t PARAMETER
Eet
Eet
BIFURCATION Figure 2.
The N U F and t h e k+ in the case o f noise level a = 0.01 in the BZ model.
However, even in this comparatively small NUF case we cannot obtain an optimum code length following the method given in the previous subsections 4.1 and 4.2, since the ratio NUF/X+ is still large. Nevertheless, the importance o f an abrupt decrease of the N U F (~33%) caused by such a small external noise a 1 is something we should pay attention to. A little 'shuttle' o f the nonuniform strange attractor can produce (under el noise level) a uniform attractor and can make itself convenient for information processing, although not exactly as described in 4.1 and 4.2 above. Let us mention one more point in Fig. 3. There exists another critical
360
.j
w i: z~3.3784 ~
lq 3.2337
,.
'
//'/
/.l/
!
/ '
, ~318 . e0636 , ~955
L
I
i:: '
L:!
Z
/
NOISE LEVEL
F i g u r e 3.
T h e N U F vs n o i s e level o in t h e B  Z m o d e l f o r b = 0 . 0 2 3 2 8 8 5 2 7 9 .
noise level (o2 ~ 0.01) beyond which the NUF levels off and saturates. This second critical point roughly corresponds to the critical point for noiseinduced order. Beyond this noise level the chaotic attractor turns itself into a limit cycle and can be used only as a decoder, that is a perfect compressor.
Let us end up with some remarks concerning the basic issue of the number of categories from which a given cognitive system can make unambiguous classification of 'codewords'. In English, for example, the average word has about five letters. In view of the redundancy of the English language p 7 5 % ) we do know that the average information per letter is "~2 bits. So the information content of the average word in the English language is about 10 bits. Now, what is this supposed to mean? It means that if one tries to guess an English word fixed by the experimenter one should be able to do it in 10 guesses involving y e s  n o answers. In turn, this means that guessing has to do with narrowing down logical categories. Now here is the rub: one has to select or sort out from a welldefined set. Suppose that instead of the above example y o u are asked to sort out a particular arrangement o f a deck o f cards (Rapoport, 1956). Since each arrangement is equiprobable its probability being "1/52! it follows that log2(52!) yesno decisions are required to reconstruct a particular order from a shuffled deck. Now log2(52!) ~ log2 1 0 68 "~ 200 bits, so one would think that the specific order or string can be deduced in about 200 yesno guesses. The real issue however, is that in this case we do not possess the linguistic tool to classify the card arrangements so as to be able
361
to make judicious dichotomies for yesno answers. Our knowledge o f card arrangements is confined to a very small subset o f compressible sequences, that is o f arrangements which are reproducible with fewer instructions than their own length. To reconstruct any other arrangement amongst the incompressible ones one has to name the position o f each card or symbol separately. It turns out then that the difficulty o f classifying, springs from the difficulty in compressing or devising algorithms which can reproduce a given string o f symbols with fewer bits. A cognitive gadget ensuring maxim u m compressibility w i t h o u t jeopardizing storage capacity provides then the o p t i m u m classification scheme that is the o p t i m u m size o f the set. The fact that the number o f compressible sequences in a string of 0s and 1s of length N is extremely small was first clearly expressed by Chaitin (1975). Considering the 2N equiprobable sequences o f 0s and ls of length N one can get by flipping a fair coin N times on the row, the number of compressible sequences up to say K bits is
s = 21 + 2 2 + . . . +
2 NK1 =
2NK  2,
2N
,.~ 2  K
is indeed very small. (For K = 10, the percentage of compressible sequences up to 10 bits is just below ~0.1%.) In general, the length of the algorithm one needs for the reproduction o f any random sequence is, f o r N >> 1 N ' "~l n N + S(O, 1)N, where S(0, 1) = P(0) log2P(0)  P( 1) log2P( 1) is the entropy of the sequence, and P(0), P(1) are the probabilities o f appearance o f 0 s and 1s in the sequence and P(0) + P(1) = 1. In biology only certain arrangements maintain life. What are these arrangements? Perhaps evolution has favoured on purely parsimonial grounds cognitive structures which can display, with very simple 'hardware' an impressively broad behavioural repertoire. The genetic code appears to us virtually incompressible, indicating perhaps a large storage capacity. Even if we know 10 6 sequences we do not possess a theory which allows prediction o f the next digit. Can we consider then the random and incompressible series of A ~ T, G ~ C as a 'chaotic fossil' which springs into sequential activity and delivers information in the proper dynamical (enzymatic) context? Could it be that these enzymes provide the
362
t e m p l a t e s , t h e s t r a n g e a t t r a c t o r s  p r o c e s s o r s w h i c h c o g n i z e , t h a t is c o m p r e s s , a seemingly random string thereby revealing genetic information out of it? And finally could it be that the small length of genetic code words (the t r i p l e t s ) c o r r e s p o n d s e x a c t l y t o t h e o p t i m u m resolution l i m i t f o r w h i c h storing capacity becomes maximum maintaining good compressibility? On what other evolutionary grounds should one justify the universality of the genetic code and the length of codons? B u t t h i s is n o t all. W e k n o w t h a t t h e g e n e t i c a p p a r a t u s is n o t o n l y p h e nomenally resistant to environmental noise but literally thrives on environm e n t a l n o i s e w h i c h also a c t i v a t e s it. C o u l d it b e t h a t t h e e x t e r n a l n o i s e is i n s t r u m e n t a l in p u s h i n g c h a o t i c e n z y m a t i c b e h a v i o u r ' b a c k ' t o l i m i t c y c l e behaviour thereby decreasing the amount of computational complexity that the enzymes have to carry out? The example of the BZ chaotic model t r e a t e d in t h i s c o n t e x t in t h e l a s t s e c t i o n is s u g g e s t i v e .
APPENDIX
Onedimensional Maps and Markov Chains. Let us examine the correspondence between onedimensional maps and Markov chains. Let us consider the asymmetric twotoone piecewise bilinear map of Fig. 4. Since matching an environmental Markov string involves the creation of a great number of 'templates' by the (chaotic) processor it is instrumental to calculate in this simple example the elements of the Markov chains which represent the iterative process of the map as an information source. We divide tile unit interval AE in two parts of length a and 1  a, respectively and every time the trace of the iterating trajectory falls on AC we get the symbol 1 while every time the trace falls on CE we get the symbol 0. So we obtain (for a fixed value of the control parameter) strings of 00101101101110101000 which do not constitute, however, just fair coin tosses but rather onesided Bernoulli shifts constrained by the (nonlinear) shape of the map F(x). What are the transitional probability elements Pi/ of the twostate Markov chain and what are the probabilities ul = P(0) and u2 = P(1)? From Fig. 4 it is clear that I(AB) = P n since all points on the interval within AB which belong to a are projected by the map on a portion equal to a. F r o m the geometry we see that P n = a2 Likewise we observe that I(CD) = P22 since all points within this subinterval which belong to 1  a are projected by the map on the portion 1  a. P12 = I(BC) since all the points within it, belonging to a are projected to 1  a and P21 = I(DE) since all the points within it, belonging to 1  a are projected to a. From the geometry we get:
9 . .
P12 =
a  a 2 = a(1  a),
P21 = a(1  a), P22 = (1  a )   a ( 1  a) = (1  a) 2. (The values above are not normalized: one should have P n + P12 = 1, P~I + P22 = 1.) The probabilities ul, u2 are calculated from the relations'u1 = ulPu + u2P21 and ul + u a = 1. We get ul = a and us = 1   a .
363
f/
B
I
(1 X D E
Figure 4.
Tent map
X
x E [ 0 , ~]
f~(x) =
1X
After M iterations we will generate 2M Markov strings of length M bits each and the resolution on the interval will be e ~ 2 M.
APPENDIX
Average Value and Variance o f the Local Divergence Rate o f Chaotic Trajectories.
The spectrum of the Lyapounov exponents constitutes a way to classify dynamical systems in general as well as particular solutions of a dynamical system. We interpret the maximal Lyapounov exponent as the time average of a statistical variable which we call the local divergence rate. Besides the average, further information about the dynamical system is contained in the higher moments of the statistical variable, like the variance. The variance of the local divergence rate is a measure for the nonuniformity of the dynamical behaviour of a strange attractor in respect to the separation of nearby trajectories. Besides the Lyapounov exponents then the variance (nonuniform factor or NUF) will give additional information about a dynamical system and may turn out to be helpful for further classification of 'strangeness' of chaotic systems. In this appendix, following Nicolis et al. (1983), we give the results of a detailed derivation of the expression of the statistical variable whose first moment is the Lyapounov exponent and whose second m o m e n t is the NUF.
364
Let d k stand for the separation of two neighbouring trajectories after k iterations. The local divergence rate is defined as
Yk = 1/7 in dk,
where ~ is the time interval during which the system x ( t ) moves along the trajectory between x[(k  1)T] and x(kz). The Lyapounov exponent is defined as X = lim n +'~/~ k = l or
 In d k
T
X = (Y,,,).
With the interpretation of the Lyapounov exponent as the timeaverage of the local divergence rate we define next the nonuniformity factor (NUF) as
z X Y = ( ( r k  ( r k ) ) 2 ) 1/2 = [ ( r ] : )  ( r k ) 2 ] 1/2
or
(AY)/im
%,/ n>~
[~ ~'~(~Indk) X2. 2]
k=l
The NUF is a measure for the deviation of the local divergence rate from its mean value, the Lyapounov exponent. It characterizes the nonuniformity of a solution of a dynamical system in respect to the sensitivity of initial conditions, or, in other words, how much the local divergence rate changes along the flow. With uniform chaotic behaviour all over the attractor [that is a flat probability density function P(x)] the NUF is zero. Intermittant chaos however, where regular laminar and chaotic phases alternate irregularly with each other, gives a considerable NUF since the local divergence rate varies strongly along the flow. In Nicolis et al. (1983) the NUF of quite a few strange attractors is calculated and the speculation made above has been confirmed.
LITERATURE
Chaltin, G. 1975. "Randomness and Mathematical Proof." Scient. Am. May, 4759. Farmer, J. Doyne. 1982. "Information Dimension and the Probabilistic Structure of Chaos." Z. Naturforsch. 37, 1304. , E. Ott and J. A. Yorke. 1983. "The Dimension of Chaotic Attractors." Physica 7, 153. Matsumoto, K. 1983. "Noiseinduced Order II." J. stat. Phys. (to appear).  and I. Tsuda. 1983. "Noiseinduced Order." J. stat. Phys. 31,87. Miller, G. A. 1974. The Psychology o f Communication. Harmondsworth, U.K.: Penguin. Nicolis, J. S. 1982a. "Sketch for a Dynamical Theory of Language." Kybernetes 11, 123132. 1982b. "Should a Reliable Information Processor be Chaotic." Kybernetes 11, 269274. . 1984. "The Role of Chaos in Reliable Information Processing." J. Franklin Inst.
365
317, 289307. Also in Synergetics, Ed. H. Haken. Proc. Elmau Conference 18 May. Berlin: Springer. .1985. Dynamics of Hierarchical Systems. A n Evolutionary Approach, Ed. H. Haken. Berlin, Springer. and M. Benrubi. 1976. "A Model on the Role of Noise at the Neuronal and Cognitive Levels." J. theor. BioL 59, 77. , G. MayerKress and G. Haubs. 1983. "Nonuniform Chaotic Dynamics with Implications to Information Processing." Z. Naturforsch. 36, 11571169. , E. Protonotarios and H. Llanos. 1975. "Some Views on the Role of Noise in Selforganizing Systems." Biol. Cybernet. 17, 183. Rapoport, A. 1956. "The Promise and Pitfalls of Information Theory." Behavl Sei. 1,303315. Roux, J. C. 1983. "Experimental Studies of Bifurcations Leading to Chaos in the BelousovZhabotinsky Reaction." Physica 7, 57. , R. H. Simoui and H. L. Swinney. 1983. "Observation of a Strange Attractor." Physica 8 , 2 5 7 . Shaw, R. 1981. "Strange Attractors, Chaotic Behavior and Information Flow." Z. Naturforsch. 36, 80112. Tsuda, I. and K. Matsumoto. 1983. "Noiseinduced O r d e r   C o m p l e x i t y Theoretical Digression." InSynergetics SeriesChaos and Statistical Mechanics, Ed. Y. Kuramoto. Berlin: Springer.

RECEIVED
71084