You are on page 1of 293

Ibmedical

Editors

Giuseppe Nardulli
Sebastiano Stramaglia
1 J

Mod^tm^omedicaJ
•t^-f
This page is intentionally left blank
pmedica
igna s
Bari, Italy 19-21 September 2001

Editors
Giuseppe Nardulli
Sebastiano Stramaglia
Center of Innovative Technologies for
Signal Detection and Processing
University of Bari, Italy

V | f e World Scientific
wb New Jersey • London • Sim
Singapore • Hong Kong
Published by
World Scientific Publishing Co. Pte. Ltd.
P O Box 128, Farrer Road, Singapore 912805
USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library.

MODELLING BIOMEDICAL SIGNALS


Copyright © 2002 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.

ISBN 981-02-4843-1

Printed in Singapore by Mainland Press


V

Preface

In the last few years, concepts and methodologies initially developed in


theoretical physics have found high applicability in a number of very different areas.
This book, a result of cross-disciplinary interaction among physicists, biologists and
physicians, covers several topics where methods and approaches rooted in physics
are successfully applied to analyze and to model biomedical data. The volume
contains the papers presented at the International Workshop Modelling Bio-medical
Signals held at the Physics Department of the University of Bari, Italy, on
September 19-21th 2001. The workshop was gathered under the auspices of the
Center of Innovative Technologies for Signal Detection and Processing of the
University of Bari (TIRES Centre); the Organizing Committee of the Workshop
comprised L. Angelini, R. Bellotti, A. Federici, R. Giuliani. G. Gonnella, G.Nardulli
and S. Stramaglia. The workshop opened on September 19th 2001 with two
colloquia given by profs. N. Accornero (University of Rome, la Sapienza), on
Neural Networks and Neurosciences, and E. Marinari (University of Rome, la
Sapienza) on Physics and Biology. Around 70 scientists attended the workshop,
coming from different fields and disciplines. The large spectrum of competences
gathered in the workshop favored an intense and fruitful exchange of scientific
information and ideas. The topics discussed in the workshop include: decision
support systems in medical science; several analyses of physiological rhythms and
synchronization phenomena; biological neural networks; theoretical aspects of
artificial neural networks and their role in neural sciences and in the analysis of EEG
and Magnetic Resonance Imaging; gene expression patterns; the immune system;
protein folding and protein crystallography.
For the organization of the workshop and the publication of the present volume
we acknowledge financial support from the Italian Ministry of University and
Scientific Research (MURST) under the project (PRIN) "Theoretical Physics pf
Fundamental Interactions", from the TIRES Centre, the Physics Department of the
University of Bari and from the Section of Bari of the Istituto Nazionale di Fisica
Nucleare (INFN). We also thank the Secretary of the Workshop, Mrs. Fausta
Cannillo and Mrs. Rosa Bitetti for their help in organizing the event.

Giuseppe Nardulli
Sebastiano Stramaglia
University of Bari
This page is intentionally left blank
VII

CONTENTS

Preface v

ANALYSIS AND MODELS OF BIOMEDICAL DATA BY


THEORETICAL PHYSICS METHODS

The Cluster Variation Method for Approximate Reasoning in


Medical Diagnosis 3
H. J. Kappen*

Analysis of EEG in Epilepsy 17


K. Lehnertz, R. G. Andrzejak, T. Kreuz, F. Mormann, C. Rieke,
P. David and C. E. Elger

Stochastic Approaches to Modeling of Physiological Rhythms 28


Platnen Ch. Ivanov and Chung-Chuan Lo

Chaotic Parameters in Time Series of ECG, Respiratory


Movements and Arterial Pressure 51
E. Conte and A. Federici

Computer Analysis of Acoustic Respiratory Signals 60


A. Vena, G. M. Insolera, R. Giuliani, T. Fiore and G. Perchiazzi

The Immune System: B Cell Binding to Multivalent Antigen 67


Gyan Bhanot

Stochastic Models of Immune System Aging 80


L. Mariani, G. Turchetti and F. Luciani

NEURAL NETWORKS AND NEUROSCIENCES

Artificial Neural Networks in Neuroscience 93


N. Accornero and M. Capozza

Italicized name indicates the author who presented the paper.


VIII

Biological Neural Networks: Modeling and Measurements 107


R. Stoop and S. Lecchini

Selectivity Property of a Class of Energy Based Learning Rules in


Presence of Noisy Signals 123
A. Bazzani, D. Remondini, N. Intrator and G. Castellani

Pathophysiology of Schizophrenia: fMRI and Working Memory 132


G. Blasi and A. Bertolino

ANN for Electrophysiological Analysis of Neurological Disease 144


R. Bellotti, F. de Carlo, M. de Tommaso, O. Difruscolo,
R. Masssafra, V. Sciruicchio and S. Stramaglia

Detection of Multiple Sclerosis Lesions in Mri's with Neural Networks 157


P. Blonda, G. Satalino, A. D'addabbo, G. Pasquariello, A. Baraldi
and R. de Blasi

Monitoring Respiratory Mechanics Using Artificial Neural Networks 165


G. Perchiazzi, G. Hedenstierna, A. Vena, L. Ruggiero, R. Giuliani
and T. Fiore

GENOMICS AND MOLECULAR BIOLOGY

Cluster Analysis of DNA-Chip Data 175


E. Domany

Clustering mtDNA Sequences for Human Evolution Studies 196


C. Marangi, L. Angelini, M. Mannarelli, M. Pellicoro,
S. Stramaglia, M. Attimonelli, M. de Robertis, L. Nitti,
G. Pesole, C. Saccone and M. Tommaseo

Finding Regulatory Sites from Statistical Analysis of Nucleotide


Frequencies in the Upstream Region of Eukaryotic Genes 209
M. Caselle, P. Provero, F. di Cunto and M. Pellegrino

Regulation of Early Growth Response-l Gene Expression and Signaling


Mechanisms in Neuronal Cells: Physiological Stimulation and Stress 221
G. Cibelli

Geometrical Aspects of Protein Folding 234


C. Micheletti
ix

The Physics of Motor Proteins 251


G. Lattanzi and A. Maritan

Phasing Proteins: Experimental Loss of Information and its


Recovery 264
C. Giacovazzo, F. Capitelli, C. Giannini, C. Cuocci and M. lanigro

List of Participants 279

Author Index 281


ANALYSIS AND MODELS OF
BIOMEDICAL DATA BY THEORETICAL
PHYSICS METHODS
This page is intentionally left blank
3

T H E C L U S T E R VARIATION M E T H O D FOR A P P R O X I M A T E
R E A S O N I N G IN M E D I C A L DIAGNOSIS

H.J. KAPPEN
Laboratory of Biophysics, University of Nijmegen,
bertSmbfys. kun. nl

In this paper, we discuss the rule based and probabilistic approaches to computer
aided medical diagnosis. We conclude that the probabilistic approach is superior to
the rule based approach, but due to its intractability, it requires approximations for
large scale applications. Subsequently, we review the Cluster Variation Method and
derive a message passing scheme that is efficient for large directed and undirected
graphical models. When the method converges, it gives close to optimal results.

1 Introduction

Medical diagnosis is the a process, by which a doctor searches for the cause
(disease) that best explains the symptoms of a patient. The search process is
sequential, in the sense that patient symptoms suggest some initial tests to
be performed. Based on the outcome of these tests, a tentative hypothesis is
formulated about the possible cause(s). Based on this hypothesis, subsequent
tests are ordered to confirm or reject this hypothesis. The process may pro-
ceed in several iterations until the patient is finally diagnosed with sufficient
certainty and the cause of the symptoms is established.
A significant part of the diagnostic process is standardized in the form
of protocols. These are sets of rules that prescribe which tests to perform
and in which order, based on the patient symptoms and previous test results.
These rules form a decision tree, whose nodes are intermediate stages in the
diagnostic process and whose branches point to additional testing, depending
on the current test results. The protocols are defined in each country by a
committee of medical experts.
The use of computer programs to aid in the diagnostic process has been
a long term goal of research in artificial intelligence. Arguably, it is the most
typical application of artificial intelligence.
The different systems that have been developed so-far use a variety of
modeling approaches which can be roughly divided into two categories: rule-
based approaches with or without uncertainty and probabilistic methods. The
rule-based systems can be viewed as computer implementations of the pro-
tocols, as described above. They consist of a large data base of rules of the
form: A -) B, meaning that "if condition A is true, then perform action B"
4

or "if condition A is true, then condition B is also true". The rules may be
deterministic, in which case they are always true, or 'fuzzy' in which case they
are true to a (numerically specified) degree. Examples of such programs are
Meditel 1 , Quick Medical Reference (QMR) 2 , DXplain 3 , and Iliad 4 .
In Berner et al. 5 a detailed study was reported that assesses the perfor-
mance of these systems. A panel of medical experts collected 110 patient
cases, and concensus was reached on the correct diagnosis for each of these
patients. For each disease, there typically exists a highly specific test that
will unambiguously identify the disease. Therefore, based on such complete
data, diagnosis is easy. A more challenging task was defined by removing this
defining test from each of the patient cases. The patient cases were presented
to the above 4 systems. Each system generated its own ordered list of most
likely diseases. In only 10-20 % of the cases, the correct diagnosis appeared
on the top of these lists and in approximately 50 % of the cases the correct
diagnosis appeared in the top 20 list. Many diagnoses that appeared in the
top 20 list were considered irrelevant by the experts. It was concluded that
these systems are not suitable for use in clinical practice.
There are two reasons for the poor performance of the rule based systems.
One is that the rules that need to be implemented are very complex in the
sense that the precondition A above is a conjunction of many factors. If each
of these factors can be true or false, there is a combinatoric explosion of con-
ditions that need to be described. It is difficult, if not impossible, to correctly
describe all these conditions. The second reason is that evidence is often not
deterministic (true or false) but rather probabilistic (likely or unlikely). The
above systems provide no principled approach for the combination of such
uncertain sources of information.
A very different approach is to use probability theory. In this case, one
does not model the decision tree directly, but instead models the relations
between diseases and symptoms in one large probability model. As a (too)
simplified example, consider a medical domain with a number of diseases
d = ( d i , . . . ,d„) and a number of symptoms or findings / = (/i, • • • , / r o ) -
One estimates the probability of each of the diseases p(di) as well as the
probability of each of the findings given a disease, p(fj\di). If diseases are
independent, and if findings are conditionally independent given the disease,
the joint probability model is given by:

P(d,f)=P(d)p(f\d)=Upwiipifjidi) (i)
i j

It is now possible to compute the probability of a disease dj, given some


5

findings by using Bayes' rule:

mft) = (2)
-7W
where ft is the list of findings that has been measured up to diagnostic itera-
tion t. Computing this for different di gives the list of most probable diseases
given the current findings ft and provides the tentative diagnosis of the pa-
tient. Furthermore, one can compute which additional test is expected to be
most informative about any one of the diagnoses, say di, by computing

for each test j that has not been measured so-far. The test j that minimizes
Iij is the most informative test, since averaged over its possible outcomes, it
gives the distribution over di with the lowest entropy.
Thus, one sees that whereas the rule based systems model the diagnos-
tic process directly, the probabilistic approach models the relations between
diseases and findings. The diagnostic decisions (which test to measure next)
is then computed from this model. The advantage of this latter approach
is that the model is much more transparent about the medical knowledge,
which facilitates maintenance (changing probability tables, adding diseases or
findings), as well as evaluation by external experts.
One of the main drawbacks of the probabilistic approach is that it is
intractable for large systems. The computation of marginal probabilities re-
quires summation over all other variables. For instance, in Eq. 2

p(/») = !>,/,*>(<*»/)

and the sum over d, f contains exponentially many terms. Therefore, prob-
abilistic models for medical diagnosis have been restricted to very small
domains 6 ' 7 or when covering a large domain, at the expense of the level of
detail at which the disease areas are modeled 8 .
In order to make the probabilistic approach feasible for large applications
one therefore needs to make approximations. One can use Monte Carlo sam-
pling but one finds that accurate results require very many iterations. An
alternative is to use analytical approximations such as for instance mean field
theory 9 ' 10 . This approach works well for probability distributions that resem-
ble spin systems (so-called Boltzmann Machines) but, as we will see, they
perform poorly for directed probability distributions of the form Eq. 1.
6

2 The Cluster Variation Method

A very recent development is the application of the Cluster Variation method


(CVM) to probabilistic inference. CVM is a method that has been developed
in the physics community to approximately compute the properties of the Ising
model 11 . The CVM approximates the probability distribution by a number
of (overlapping) marginal distributions (clusters). The quality of the approx-
imation is determined by the size and number of clusters. When the clusters
consist of only two variables, the method is known as the Bethe approxima-
tion. Recently, the method has been introduced by Yedidia et al. 12 into the
machine learning community, showing that in the Bethe approximation, the
CVM solution coincides with the fixed points of the belief propagation algo-
rithm. Belief propagation is a message passing scheme, which is known to
yield exact inference in tree structured graphical models 13 . However, BP can
can also give impressive results for graphs that are not trees 14 .
Let x = (xi,..., xn) be a set of variables, where each xi can take a finite
number of values. Consider a probability distribution on x of the form
PH{X)
' = wrf"(x) z
= Y.z-H(x)
It is well known, that PH can be obtained as the minimum of the free energy,
which is a functional over probability distributions of the following form:
FH(p) = (H) + {logp), (3)
where the expectation value is taken with respect to the distribution p, i.e.
(H) = ^2xp{x)H{x). When one minimizes FH{P) with respect to p under the
constraint of normalization ^2xp(x) — 1, one obtains pjj a.
Computing marginals of PH such as PH(XI) or pn(xi,Xj) involves sums
over all states, which is intractable for large n. Therefore, one needs tractable
approximations to pn- The cluster variation method replaces the probability
distribution PH{X) by a large number of (possibly overlapping) probability
distributions, each describing the interaction between a small number of vari-
ables. Due to the one-to-one correspondence between a probability distribu-
tion and the minima of a free energy we can define approximate probability
distributions by constructing approximate free energies and computing their
minimum (or minima!). This is achieved by approximating Eq. 3 in terms of
the cluster probabilities. The solution is obtained by minimizing this approx-
imate free energy subject to normalization and consistency constraints.
"Minimizing the free energy can also be viewed as maximizing the entropy with an addi-
tional constraint on (H).
7

Define clusters as subsets of distinct variables: xa = (xilt... ,Xik), with


1 < ij < n. Define a set of clusters P that contain the interactions in H and
write H as a sum of these interactions:

H{x) = YJHl{xa)
aeP

For instance for Boltzmann-Gibbs distributions, H(x) — X};>, WijXiXj +


Y^i^ixi a n d P consists of all pairs and all singletons: P = {a\a = (ij),i >
j or a = (i)}. For directed graphical models with evidence, such as
Eq. 2, P is the set of clusters formed by each node i and its parent set itf.
P = {a\a — (i,iti),i = 1 , . . . , n}. x is the set of non-evidence variables (d in
this case) and Z = p(/t).
We now define a set of clusters B, that will determine our approximation
in the cluster variation method. B should at least contain the interactions in
p(x) in the following way:

Va € P =*• 3a' e B,a C a'.

In addition, we demand that no two clusters in B contain each other:


a,a' £ B => a <£ a',a' (£_ a. Clearly, the minimal choice for B is to chose
clusters from P itself. The maximal choice for B is the cliques obtained when
constructing the junction tree 15 . In this case, the clusters in B form a tree
structure and the CVM method is exact. In general, one, can chose any set
of clusters B that satisfy the above definition. Since the proposed method
scales exponentially in the size of the clusters in B, the smaller the clusters
in B, the faster the approximation. For a simple directed graphical model an
intermediate choice of clusters is illustrated in Fig. 1.
Define a set of clusters M that consist of any intersection of a number of
clusters of B: M = {/3|/3 = nkak,ak € B}, and define U = BUM. Once U
is given, we define numbers ap recursively by the Moebius formula

1= ]T aa, v/?er/
aeU,aD0

In particular, this shows that aa = 1 for a 6 B.


The Moebius formula allows us to rewrite interactions on potentials in P
in terms of interactions on clusters in U:

H(x) = Y^ Hl(xfi) =Y Yl aaHl{x0) = Y a


*Ha,
8

Figure 1. Directed graphical model consisting of 5 variables. Interactions are de-


fined on clusters in P - {(1), (1,2), (2,3), (1,4), (3,4,5)}. The clusters in B are de-
picted by the dashed lines (B - {(1,2,3), (2,3,5), (1,4,5), (3,4,5)}. The set M =
{(1), (2,3),(3),(5), (3, 5)}

where we have defined Ha as the sum of all interactions in /? 6 P that are


contained in cluster a £ U:
H
Ha(xa) = ^2 p(xp)
0€P,0Ca

Since interactions may appear in multiple clusters, the constants aa ensure


that double counting is compensated for. b Thus, we can express (H) in Eq. 3
explicitly in terms of the cluster probabilities pa as

(H) = ^ aa (Ha) = ^2 aa^2Ha{xa)pa(xa) (4)

' i n the case of the Boltzmann distribution

H\ =Hi= BiXi

iiij — W^jXiXj -j- "iXi ~r VjXj

and a(ij) = 1 and a^ — 2 — n.


9

Whereas (H) can be written exactly in terms of pa, this is not the case
for the entropy term in Eq. 3. The approach is to decompose the entropy of
a cluster a in terms of 'connected entropies' in the following way: c
S
Sa = ~ ^ P a ( Z a ) \ogPa{xa) = ^ 0' (5)
Xa 0Ca

Such a decomposition can be made for any cluster. In particular it can be


made for the 'cluster' consisting of all variables, so that we obtain

S = - $ > ( * ) logp(:r) = £ 5 i (6)


x 0

where /3 runs over all subsets of variables d. The cluster variation method
approximates the total entropy by restricting this sum to only clusters in
U and re-expressing Sp in terms of Sa, using the Moebius formula and the
definition Eq. 5.
s
~ ]C sl= Yl H aaSl= 12aaSa (7)
0<EU 0€U aZ>0 a£U
Since Sa is a function of pa (Eq. 5) we have expressed the entropy in terms
of cluster probabilities pa.
The quality of this approximation is illustrated in Fig. 2. Note, that
the both the Bethe and Kikuchi approximation strongly deteriorate around
J = 1, which is where the spin-glass phase starts. For J < 1, the Kikuchi
approximation is superior to the Bethe approximation. Note, however, that
this figure only illustrates the quality of the truncations in Eq. 7 assuming that
the exact marginals are known. It does not say anything about the accuracy
of the approximate marginals using the approximate free energy.
Substituting Eqs. 4 and 7 into the free energy Eq. 3 we obtain the ap-
proximate free energy of the Cluster Variation method. This free energy must
be minimized subject to normalization constraints 5Zx pa(xa) = 1 and con-
sistency constraints
Pa(xp) = p0(xp), /3eM,a£B,l3ca. (8)
Note, that we have excluded constraints between clusters in M. This is suf-
ficient because when /?,/?' £ M, f3 C (3' and ft' C a 6 B: pa{xp') — Pp'{x0')
c
T h i s decomposition is similar to writing a correlation in terms of means and covariance.
For instance when a = (i), S^ = SjL is the usual mean field entropy and S^jf = SjL +
ST.. + S,.., defines two node correction.
d
0) (y)
O n n variables this sum contains 2 n terms.
10

Figure 2. Exact and approximate entropies for the fully connected Boltzmann-Gibbs dis-
tribution on n = 10 variables with random couplings (SK model) as a function of mean
coupling strength. Couplings viij are chosen from a normal Gaussian distribution with mean
zero and standard deviation J'/'s/n. External fields di are chosen from a normal Gaussian
distribution with mean zero and standard deviation 0.1. The exact entropy is computed
from Eq. 6. The Bethe and Kikuchi approximations are computed using the approximate
entropy expression Eq. 7 with exact marginals and by choosing B as the set of all pairs and
all triplets, respectively.

and Pa(xp) = pp{xp) implies pp>{xp) = pp{xp). In the following, a and /? will
be from B and M respectively, unless otherwise stated e .
Adding Lagrange multipliers for the constraints we obtain the Cluster
Variation free energy:

a
Fcvm{{pa(xa)}, {Xa}, {Xapixp)}) = ^2 a ^Pafca) {Ha(xa) + \ogpa(xa))

~ H A° ( X ^ * ^ ) -1
) ~ X ^2^2XMx0)(Pa(xp) -P0(x0))
(9)

3 Iterating Lagrange multipliers

Since the Moebius numbers can have arbitrary sign, Eq. 9 consists of a sum of
convex and concave terms, and therefore is a non-convex optimization prob-
lem. One can separate F c v m in a convex and concave term and derive an
e
In fact, additional constraints can be removed, when clusters in M contain subclusters in
M. See Kappen and Wiegerinck 1 6 .
11

iteration procedure in pa and the Lagrange multipliers that is guaranteed to


converge 17 . The resulting algorithm is a 'double loop' iteration procedure.
Alternatively, by setting QF7™\,U £ U equal to zero, one can express
the cluster probabilities in terms of the Lagrange multipliers:

Pa{xa) = — exp I -Ha(xa) + ^2 ^ap{xp) (10)

P0 (xp) = —
Z
exp -H0{xp)
a
V Xa0{x0) (11)
? \ ^p J
The remaining task is to solve for the Lagrange multipliers such that all
constraints (Eq. 8) are satisfied. There are two ways to do this. One is to
define an auxiliary cost function that is zero when all constraints are satisfied
and positive otherwise and minimize this cost function with respect to the
Lagrange multipliers. This method is discussed in Kappen and Wiegerinck 16 .
Alternatively, one can substitute Eqs. 10-11 into the constraint Eqs. 8
and obtain a system of coupled non-linear equations. In Yedidia et al. 12 a
message passing algorithm was proposed to find a solution to this problem.
Here, we will present an alternative method, that solves directly in terms of
the Lagrange multipliers.
Consider the constraints Eq. 8 for some fixed cluster /? and all clusters
a D (3 and define Bp = {a E B\a D /?}. We wish to solve for all constraints
a D /?, with a € B0 by adjusting Xap,oi € Bp. This is a sub-problem
with |2?/3||:r3| equations and an equal number of unknowns, where \Bp\ is
the number of elements of B0 and \x0\ is the number of values that x0 can
take. The probability distribution p0 (Eq. 11) depends only on these Lagrange
multipliers, up to normalization. pa (Eq. 10) depends also on other Lagrange
multipliers. However, we consider only its dependence on \ap,ct G Bp, and
consider all other Lagrange multipliers as fixed. Thus,

Pa(xa) = exp{Xap(xp))pa(xa),a 6 Bp (12)

with pa independent of \ap, a € Bp.


Substituting, Eqs. 11 and 12 into Eq. 8, we obtain a set of linear equations
for \ap(xp) which we can solve in closed form:

Xap{xp) = /. n ,Hp(x0) -Y]Aaa, logpa>{xp)


12

with

a/3 + \B0\
We update the probabilities with the new values of the Lagrange multipliers
using Eqs. 11 and 12. We repeat the above procedure for all /? G M until
convergence.

4 Numerical results

We show the performance of the Lagrange multiplier iteration method (LMI)


on several 'real world' directed graphical models. For undirected models,
see Kappen and Wiegerinck 16 . First, we consider the well-known chest clinic
problem, introduced by Lauritzen and Spiegelhalter 15 . The graphical model is
given in figure 3a. The model describes the relations between three diagnoses
(Tuberculosis(T), Lung Cancer(L) and Bronchitis(B), middle layer), clinical
observations and symptoms (Positive X-ray(X) and Dyspnoea(D)(=shortness
of breath), lower layer) and prior conditions (recent visit to Asia(A) and
whether the patient smokes(S)). In figure 3b, we plot the exact single node
marginals against the approximate marginals for this problem. For LMI,
the clusters in B are defined according to the conditional probability tables,
i.e. when a node has k parents, a cluster of size k + 1 on this node and its
parents is included in the set B. Convergence was reached in 6 iterations.
Maximal error on the marginals is 0.0033. For comparison, we computed the
mean field and TAP approximations, as previously introduced by Kappen and
Wiegerinck 10 . Although TAP is significantly better than MF, it is far worse
than the CVM method. This is not surprising, since both the MF and TAP
approximation are based on single node approximation, whereas the CVM
method uses potentials up to size 3.
Secondly, we consider a graphical model that was developed in a project
together with the department of internal medicine of the Utrecht Academic
hospital. In this project, called Promedas, we aim to model a large part of in-
ternal medicine 18 . The network that we consider was one of the first modules
that we built and models in detail some specific anemias and consists of 91
variables. The network was developed using our graphical tool BayesBuilder 19
which is shown with part of the network in figure 4. The clusters in B are de-
fined according to the conditional probability tables. Convergence was reached
in 5 iterations. Maximal absolute error on the marginals is 0.0008. The mean
field and TAP methods perform very poorly on this problem.
Finally, we tested the cluster variation method on randomly generated
13

— • o ,••-

x MF
:
O TAP .•• o
+ CVM

r x
x

0.1

t-m^ •
0.1 0.2 0 .3 0.4
Exact marglinals

(a) Chest clinic (b) Approximate inference


graphical model

Figure 3. a) The Chest Clinic model describes the relations between diagnoses, findings and
prior conditions for a small medical domain. An arrow a —> b indicates that the probability
of b depends on the values of a. b) Inference of single node marginals using MF, TAP and
LMI method, comparing the results with exact.

directed graphical models. Each node is randomly connected to k parents.


The entries of the probability tables are randomly generated between zero
and one. Due to the large number of loops in the graph, the exact method
requires exponential time in the so-called tree width, which can be seen from
Table 1 to scale approximately linear with the network size. Therefore exact
computation is only feasible for small graphs (up to size n = 40 in this case).
For the CVM, clusters in B are denned according to the conditional prob-
ability tables. Therefore, maximal cluster size is k + 1. On these more chal-
lenging cases, LMI does not converge. The results shown are obtained with
the auxiliary cost function as that was briefly mentioned in section 3 and
fully described in Kappen and Wiegerinck 16 . Minimization was done using
conjugate gradient descent. The results are shown in Table 1.

5 Conclusion

In this paper, we have described two approaches to computer aided medical


diagnosis. The rule based approach directly models the diagnostic decision
tree. We have shown that this approach fails to pass the test of clinical
14

Figure 4. BayesBuilder graphical software environment, showing part of the Anemia net-
work. The network consists of 91 variables and models some specific Anemias.

n Iter \c\ Potential error Margin error Constraint error


10 16 8 0.068 0.068 5.8e-3
20 30 12 0.068 0.216 6.2e-3
30 44 16 0.079 0.222 4.5e-3
40 48 21 0.073 0.218 4.2e-3
50 51 26 - - 3.2e-3

Table 1. Comparison of CVM method for large directed graphical models. Each
node is connected to k = 5 parents. \C\ is the tree width of the triangulated graph
required for the exact computation. Iter is the number of conjugate gradient descent
iterations of the CVM method. Potential error and margin error are the maximum
absolute error in any of the cluster probabilities and single variable marginals com-
puted with CVM, respectively. Constraint error is the maximum absolute error in
any of the constraints Eq. 8 after termination of CVM.

relevance and we have given several reasons t h a t could account for this failure.
T h e alternative approach uses a probabilistic model to describe the rela-
tions between diagnoses and findings. This approach has the great advantage
t h a t it provides a principled approach for the combination of different sources
15

of uncertainty. T h e price t h a t we have t o pay for this luxury is t h a t proba-


bilistic inference is intractable for large systems.
As a generic approximation m e t h o d , we have introduced the Cluster Vari-
ation m e t h o d and presented a novel iteration scheme, called Lagrange Multi-
plier Iteration. W h e n it converges, it provides very good results and is very
fast. However, it is not guaranteed t o converge in general. In those more
complex cases one must resort to more expensive methods, such as C C C P 1 7
or using an auxiliary cost function 1 6 .

Acknowledgments

T h i s research was supported in p a r t by t h e Dutch Technology Foundation


( S T W ) . I would like to t h a n k Taylan Cemgil for providing his M a t l a b graphical
models toolkit, and Wim Wiegerinck and Sebino Stramaglia (Bari, Italy) for
useful discussions.

References

1. Meditel, Devon, Pa. Meditel: Computer assisted diagnosis, 1991.


2. CAMDAT, Pittsburgh. QMR (Quick Medical Reference), 1992.
3. Massachusetts General Hospital, Boston. DXPLAIN, 1992.
4. Applied Informatics, Salt Lake City. ILIAD, 1992.
5. E.S. Berner, G.D. Webster, A.A. Shugerman, J.R. Jackson, J. Algina, A.L.
Baker, E.V. Ball, C.G. Cobbs, V.W. Dennis, E.P. Frenkel, L.D. Hudson, E.L.
Mancall, C.E. Racley, and O.D. Taunton. Performance of four computer-based
diagnostic systems. N-Engl-J-Med., 330(25):1792-6, 1994.
6. D.E. Heckerman, E.J. Horvitz, and B.N. Nathwani. Towards normative expert
systems: part I, the Pathfinder project. Methods of Information in medicine,
31:90-105, 1992.
7. D.E. Heckerman and B.N. Nathwani. Towards normative expert systems: part
II, probability-based representations for efficient knowledge acquisition and in-
ference. Methods of Information in medicine, 31:106-116, 1992.
8. M.A Shwe, B. Middleton, D.E. Heckerman, M. Henrion, Horvitz E.J., H.P.
Lehman, and G.F. Cooper. Probabilistic Diagnosis Using a Reformulation of
the Internist-1/ QMR Knowledge Base. Methods of Information in Medicine,
30:241-55, 1991.
9. H.J. Kappen and F.B. Rodriguez. Efficient learning in Boltzmann Machines
using linear response theory. Neural Computation, 10:1137-1156, 1998.
10. H.J. Kappen and W.A.J.J. Wiegerinck. Second order approximations for prob-
ability models. In Todd Leen, Tom Dietterich, Rich Caruana, and Virginia de
Sa, editors, Advances in Neural Information Processing Systems 13, pages 238-
244. MIT Press, 2001.
16

11. R. Kikuchi. Physical Review, 81:988, 1951.


12. J.S. Yedidia, W.T. Freeman, and Y. Weiss. Generalized belief propagation. In
T.K. Leen, T.G. Dietterich, and V. Tresp, editors, Advances in Neural Infor-
mation Processing Systems 13 (Proceedings of the 2000 Conference), 2001. In
press.
13. J. Pearl. Probabilistic reasoning in intelligent systems: Networks of Plausible
Inference. Morgan Kaufmann, San Francisco, California, 1988.
14. Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. Loopy belief propagation
for approximate inference: An empirical study. In Proceedings of Uncertainty
in AI, pages 467-475, 1999.
15. S.L. Lauritzen and D.J. Spiegelhalter. Local computations with probabilties
on graphical structures and their application to expert systems. J. Royal Sta-
tistical society B, 50:154-227, 1988.
16. H.J. Kappen and W. Wiegerinck. A novel iteration scheme for the cluster
variation method. In T.G. Dieterich, S. Becker, and Z. Ghahramani, editors,
Advances in Neural Information Processing Systems, volume 14, 2002. In press.
17. A.L. Yuille and A. Rangarajan. The convex-concave principle. In T.G. Di-
eterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information
Processing Systems, volume 14, 2002. In press.
18. W. Wiegerinck, H.J. Kappen, E.W.M.T ter Braak, W.J.P.P ter Burg, M.J.
Nijman, Y.L. O, and J.P. Neijt. Approximate inference for medical diagnosis.
Pattern Recognition Letters, 20:1231-1239, 1999.
19. B. Kappen, W. Wiegerinck, and M. Nijman. Bayesbuilder. In W. Buntine,
B. Fischer, and J. Schumann, editors, Software Support for Bayesian Analysis.
RIACS, NASA Ames Research Center, 2000.
17

ANALYSIS OF EEG I N E P I L E P S Y

K. LEHNERTZ 1 -*, R. G. ANDRZEJAK 1 - 2 , T. KREUZ 1 - 2 , F. MORMANN 1 ' 3 ,


C. RIEKE 1 ' 3 , P. DAVID 3 , C. E. ELGER 1
1
Department of Epileptology, University of Bonn
John von Neumann Institue for Computing, Forschungszentrum Jiilich
Institute for Radiation and Nuclear Physics, University of Bonn
Germany

'E-mail: Klaus.Lehnertz@ukb.uni-bonn.de

We present potential applications of nonlinear time series analysis techniques to


electroencephalographic recordings (EEG) derived from epilepsy patients. Apart
from diagnostically oriented topics including localization of epileptic foci in differ-
ent anatomical locations during the seizure-free interval we discuss possibilities for
seizure anticipation which is one of the most challenging aspects in epileptology.

1 Introduction

The disease epilepsy is characterized by a recurrent and sudden malfunction


of the brain that is termed seizure. Epileptic seizures reflect the clinical signs
of an excessive and hypersynchronous activity of neurons in the cerebral cor-
tex. Depending on the extent of involvement of other brain areas during the
course of the seizure, epilepsies can be divided into two main classes. Gener-
alized seizures involve almost the entire brain while focal (or partial) seizures
originate from a circumscribed region of the brain (epileptic focus) and re-
main restricted to this region. Epileptic seizures may be accompanied by an
impairment or loss of consciousness, psychic, autonomic or sensory symptoms
or motor phenomena.
Knowledge about basic mechanisms leading to seizures is mainly derived
from animal experiments. Although there is a considerable bulk of literature
on the topic, the underlying electrophysiological and neurobiochemical mech-
anisms are not yet fully explored. Moreover, it remains to be proven whether
findings from animal experiments are fully transformable to human epilepsies.
Recordings of the membrane potential of neurons under epileptic conditions
indicate an enormous change, which by far exceeds physiological changes oc-
curring with neuronal excitation. This phenomenon is termed paroxysmal
depolarization shift (PDS 1 ' 2 ' 3 ) and represents a shift of the resting membrane
potential that is accompanied by an increase of intracellular calcium and a
massive burst of action potentials (500 - 800 per second). PDS originating
from a larger cortical region are associated with steep field potentials (known
18

as spikes) recorded in the scalp E E C Focal seizures are assumed to be initi-


ated by abnormally discharging neurons (so-called bursters 4>5>6) that recruit
and entrain neighboring neurons into a "critical mass". This build-up might
be mediated by an increasing synchronization of neuronal activity that is ac-
companied by a loss of inhibition, or by facilitating processes that permit
seizure emergence by lowering a threshold. The fact that seizures appear to
be unpredictable is one of the most disabling aspects of epilepsy. If it was
possible to anticipate seizures, this would dramatically change therapeutic
possibilities 7 .
Approximately 0.6 - 0.8% of the world population suffer from epilepsy. In
about half of these patients, focal seizures originate from functional and/or
morphological lesions of the brain. Antiepileptic drugs insufficiently control
or even fail to manage epilepsy in 30 - 50% of the cases. It can be assumed
that 10 - 15% of these cases would profit from epilepsy surgery. Successful
surgical treatment of focal epilepsies requires exact localization of the epileptic
focus and its delineation from functionally relevant areas. For this purpose,
different presurgical evaluation methodologies are currently in use 8 . Neurolog-
ical and neuropsychological examinations are complemented by neuroimaging
techniques that try to identify potential morphological correlates. Currently,
the gold standard for an exact localization of the epileptic focus, however, is
to record the patient's spontaneous habitual seizure using electroencephalog-
raphy. Depending on the individual occurrence of seizures, this task requires
long-lasting and continuous recording of the E E C In case of ambiguous scalp
EEG findings, invasive recordings of the electrocorticogram (ECoG) or the
stereo-EEG (SEEG) via implanted depth electrodes are indicated. This pro-
cedure, however, comprises a certain risk for the patient and is time-consuming
and expensive. Thus, reliable EEG analysis techniques are required to localize
and to demarcate the epileptic focus even during the seizure-free interval 9 .

2 EEG analysis

In recent years, technical advantages such as digital video-EEG monitoring


systems as well as an increased computational power led to a highly sophis-
ticated clinical epilepsy monitoring allowing to process huge amounts of data
in real-time. In addition, chronically implanted intracranial electrodes allow
continuous recording of brain electrical activity from the surface of the brain
and/or within specific brain structures at a high signal-to-noise ratio and at
a high spatial resolution. Due to its high temporal resolution and its close
relationship to physiological and pathological functions of the brain, electroen-
cephalography is regarded indispensible for clinical practice despite the rapid
19

development of imaging techniques like magnetic resonance tomography or


positron emission tomography.
Usually EEG analysis methods are applied to long-lasting multi-channel
recordings in a moving-window fashion. The time length of a window is chosen
in such a way that it represents a reasonable tradeoff between approximate
stationarity and sufficient number of data points. Depending on the complex-
ity of the analysis technique applied, computation times vary between a few
milliseconds up to some tenths of seconds. Thus, most applications can be
performed in real-time using standard personal computers. However, analy-
ses cannot be applied in a strict mathematical sense because the necessary
theoretical conditions cannot be met in practice - a common problem that
applies to any analysis of short (and noisy) data segments or nonstationary
data.
Linear EEG analysis methods 10 can be divided into two main concept-
based categories. Nonpar am etric methods comprise analysis techniques such
as evaluation of amplitude, interval or period distributions, estimation of auto-
and crosscorrelation functions as well as analyses in the frequency domain like
power spectral estimation and cross-spectral functions. Parametric methods
include, among others, AR (autoregressive) and ARMA (autoregressive mov-
ing average) models 11 , inverse AR-filtering and segmentation analysis. These
main branches are accompanied by pattern recognition methods involving ei-
ther a mixture of techniques mentioned before or, more recently, the wavelet
transform 12,13 ' 14 . Despite the limitations mentioned above, classical EEG
analysis has significantly contributed to and still advances understanding of
physiological and pathophysiological mechanisms of the brain.
Nonlinear time series analysis techniques 15 ' 16 ' 17 have been developed to
analyze and characterize apparently irregular behavior - a distinctive feature
of the EEG. Techniques mainly involve estimates of an effective correlation
dimension, entropy related measures, Lyapunov exponents, measures for de-
terminism, similarity, interdependencies, recurrence quantification as well as
tests for nonlinearity. During the last decade a variety of these analysis tech-
niques have been repeatedly applied to EEG recordings during physiological
and pathological conditions and were shown to offer new information about
complex brain dynamics 18 ' 19 ' 20 ' 21 . Today it is commonly accepted that the
existence of a deterministic or even chaotic structure underlying neuronal
dynamics is difficult if not impossible to prove. Nevertheless, nonlinear ap-
proaches to the analysis of the system brain have generated new clinical mea-
sures as well as new ways of interpreting brain electrical function, particularly
with regard to epileptic brain states. Indeed, recent results provide converging
evidence that nonlinear EEG analysis allows to reliably characterize different
20

states of brain function and dysfunction, provided that limitations of the re-
spective analysis techniques are taken into consideration and, thus, results are
interpreted with care (e.g., only relative measures with respect to recording
time and recording site are assumed reliable).
In the following, we will concentrate on nonlinear EEG analysis tech-
niques and illustrate potential applications of these techniques in the field of
epileptology.

3 Nonlinear EEG analysis in epilepsy

In early publications 22 ' 23 evidence for low-dimensional chaos in EEG record-


ings of epileptic seizures was claimed. However, accumulating knowledge
about influencing factors as well as improvement of analysis techniques ren-
dered these findings questionable 24>25>26'27. There is, however, now converging
evidence that relative estimates of nonlinear measures improve understanding
of the complex spatio-temporal dynamics of the epileptogenic process in differ-
ent brain regions and promise to be of high relevance for diagnostics 28 ' 29 ' 30 ' 31 .

3.1 Outline of analysis techniques


In the course of our work attempting to characterize the epileptogenic process
we investigated the applicability of already established measures and devel-
oped new ones. Results presented below were obtained from extracting these
measures from long-lasting ECoG and SEEG recordings from subgroups of
a collective of about 300 patients with epileptogenic foci located in different
anatomical regions of the brain.
Apart from linear measures such as statistical moments, power spectral
estimates, or auto- and cross-correlation functions several univariate and bi-
variate nonlinear measures are currently in use. Since details of different
analysis techniques have been published elsewhere, we here only provide a
short description of the measures along with the respective references.
Univariate measures: Based on the well known fact that neurons involved
in the epileptic process exhibit high frequency discharges that are scarcely
modulated by physiological brain activity 5 , we hypothesized that this neu-
ronal behavior should be accompanied by an intermittent loss of complexity
or an increase of nonlinear deterministic structure in the corresponding elec-
trographic signal even during the seizure-free interval 33 . To characterize com-
plexity, we use an estimate of an effective correlation dimension 32 £>|ff and the
derived measure neuronal complexity loss L*33'34. These measures are accom-
panied by estimates of the largest Lyapunov exponent Ai 35 ' 36 ' 37 , by entropy
21

measures 38 , by the nonlinear prediction error 39 ' 40 , and by different complexity


measures 41 derived from the theory of symbolic dynamics 42 . Detection and
characterization of nonlinear deterministic structures in the EEG is achieved
by combining tests for determinisms 43 and for nonlinearity 44 resulting in a
measure we have termed fraction of nonlinear determinism £ 45 . 46 . 47 .
Bivariate measures: As already mentioned, pathological neuronal syn-
chronization is considered to play a crucial role in epileptogenesis. There-
fore, univariate measures are supplemented by bivariate measures that aim
to detect and characterize synchronization in time series of brain electrical
activity. The nonlinear interdependence 5 4 8 , 4 9 characterizes statistical rela-
tionships between two time series. In contrast to commonly used measures
like cross-correlation, coherence and mutual information, S is non-symmetric
and provides information about the direction of interdependence. It is closely
related to other attempts to detect generalized synchronization 50 . Following
the approach of understanding phase synchronization 51 in a statistical sense 52
we developed a straight-forward measure for phase synchronization employing
the circular variance 53 of a phase distribution. We have termed this measure
mean phase coherence i? 5 4 ' 5 5 .

3.2 Localizing the epileptic focus


Several lines of evidence originating from studies of human epileptic brain tis-
sue as well as from animal models of chronic seizure disorders indicate that the
epileptic brain is different from the normal, even between seizures - during the
so called interictal state. In order to evaluate the efficacy of different analysis
techniques to characterize the spatio-temporal dynamics of the epileptogenic
process and thus to localize the epileptic focus during the interictal state,
we applied them to long-lasting interictal ECoG/SEEG recordings covering
different states of normal behavior and vigilance as well as different extents
of epileptiform activity. We retrospectively analyzed data of patients with
mesial temporal lobe epilepsy (MTLE) and/or neocortical lesional epilepsy
(NLE) undergoing presurgical evaluation. We included data of patients for
whom surgery led to complete post-operative seizure control as well as of pa-
tients who did not benefit from surgery. Nonlinear EEG analysis techniques
allow to reliably localize epileptic foci in different cerebral regions in more
than 80 % of the cases. This holds true regardless whether or not obvious
epileptiform activity is present in the recordings.
Results obtained from our univariate measures indicate that the dynamics
of the epileptic focus during the seizure-free interval can indeed be character-
ized by an intermittent loss of complexity or an increased nonlinear determin-
22

istic structure in an otherwise stochastic environment. Bivariate measures


indicate that the epileptic focus is characterized by a pathologically increased
level of interdependence or synchronization. Both univariate and bivariate
measures thus share the ability to detect dynamical changes related to the
epileptic process. It can be concluded that our EEG analysis techniques ap-
proach the problem of characterizing the epileptogenic process from different
points of view, and they indicate the potential relevance of nonlinear EEG
analysis to improve understanding of intermittent dysfunctioning of the dy-
namical system brain between seizures. Moreover, our results also stress the
relevance of nonlinear EEG analyses in clinical practice since they provide
potentially useful diagnostic information and thus may contribute to an im-
provement of the presurgical evaluation 9 ' 56 ' 57 ' 29 .

3.3 Anticipating seizures


In EEG analysis the search for the hidden information predictive of an im-
pending seizure has a long history. As early as 1975, researchers considered
analysis techniques such as pattern recognition analytic procedures of spec-
tral data 58 or autoregressive modeling of EEG data 5 9 for predicting epileptic
seizures. Findings indicated that EEG changes characteristic for pre-seizure
states may be detectable, at most, a few seconds before the actual seizure on-
set. None of these techniques have been implemented clinically. Apart from
applying signal analysis techniques the relevance of steep, high amplitude
epileptiform potentials (spikes, the hallmark of the epileptic brain) were in-
vestigated in a number of clinical studies 60 ' 61 ' 62 . While some authors reported
a decrease or even total cessation of spikes before seizures, reexamination did
not confirm this phenomenon in a larger sample.
Although there are numerous studies exploring basic neuronal mecha-
nisms that are likely to be associated with seizures, to date, no definite infor-
mation is available as to the generation of seizures in humans. In this context,
the term "critical mass" might be misleading in the sense that it could merely
imply an increasing number of neurons that are entrained into an abnormal
discharging process. This mass phenomenon would have been easily accessible
for conventional EEG analyses which, however, failed to detect it.
Recent research in seizure anticipation has shown that evident markers
in the EEG representing the transition from asynchronous to synchronous
states of the epileptic brain {pre-seizure state) can be detected on time scales
ranging from several minutes up to hours. These studies indicate that the
seizure-initiating process should be regarded as an unfolding of an increas-
ing number of critical, possibly nonlinear dynamical interferences between
23

neurons within the focal area as well as with neurons surrounding this area.
Indeed, there is converging evidence from different laboratories that nonlinear
analysis is capable to characterize this collective behavior of neurons from the
gross electrical activity and hence allows to define a critical transition state,
at least in a high percentage of cases 63,64,65,67,34,68,69,70,55,71,28

4 Future perspectives

Results obtained so far are promising and emphasize the high value of nonlin-
ear EEG analysis techniques both for clinical practice and basic science. Up
to now, however, findings have been mainly obtained from retrospective stud-
ies in well-elaborated cases and using invasive recording techniques. Thus,
on the one hand, evaluation of more complicated cases as well as prospective
studies on a larger population of patients are necessary.
The possibility of defining a critical transition state can be regarded as the
most prominent contribution of nonlinear EEG analysis to advance knowledge
about seizure generation in humans. This possibility has recently been ex-
panded by studies indicating accessibility of critical pre-seizure changes from
non-invasive EEG recordings 65 ' 71 . Nonetheless, in order to achieve an un-
equivocal definition of a pre-seizure state from either invasive or non-invasive
recordings, a variety of influencing factors have to be evaluated. Most studies
carried out so far have concentrated on EEG recordings just prior to seizures.
Other studies 33 ' 48 ' 66 ' 55,47,28 , however, have shown that there are phases of dy-
namical changes even during the seizure-free interval pointing to abnormalities
that are not followed by a seizure. Moreover, pathologically or physiologically
induced dynamical interactions within the brain are not yet fully understood.
Among others, these include different sleep stages, different cognitive states,
as well as daily activities that clearly vary from patient to patient. In order
to evaluate specificity of possible seizure anticipation techniques, analyses of
long-lasting multi-channel EEG recordings covering different pathological and
physiological states are therefore mandatory 67,34,28 .
Along with these studies, EEG analysis techniques have to be further
improved. New techniques are needed that allow a better characterization
of non-stationarity and high-dimensionality in brain dynamics, techniques
disentangling even subtle dynamical interactions between pathological dis-
turbances and surrounding brain tissue as well as refined artifact detection
and elimination. Since the methods currently available allow a distinguished
characterization of the epileptogenic process, the combined use of these tech-
niques along with appropriate classification schemes 72 ' 73 ' 74 can be regarded a
promising venture.
24

Once given an improved sensitivity and specificity of EEG analysis tech-


niques for both focus localization and seizure anticipation, broader clinical
applications on a larger population of patients, either at home or in a clin-
ical setting, can be envisaged. As a future perspective, one might also take
into consideration implantable seizure anticipation and prevention devices
similar to those already in use with Parkinsonian patients 75 ' 76 . Although
optimization of algorithms underlying the computation of specific nonlinear
measures 77,69 already allows to continuously track the temporal behavior of
nonlinear measures in real time, these applications still require the use of
powerful computer systems, depending on the number of recording channels
necessary to allow unequivocal characterization of the epileptogenic process.
Thus, further optimization and development of a miniaturized analyzing sys-
tem are definitely necessary. Taking into account the technologies currently
available, realization of such systems can be expected within the next years.

Acknowledgments

We gratefully acknowledge discussions with and contributions by Jochen Arn-


hold, Wieland Burr, Guillen Fernandez, Peter Grassberger, Thomas Grun-
wald, Peter Hanggi, Christoph Helmstaedter, Martin Kurthen, Hans-Rudi
Moser, Thomas Schreiber, Bruno Weber, Jochen Wegner, Guido Widman and
Heinz-Gregor Wieser. This work was supported by the Deutsche Forschungs-
gemeinschaft.

References

1. E. S. Goldensohn and D. P. Purpura, Science 139, 840 (1963).


2. H. Matsumoto and C. Ajmone-Marsan, Exp. Neurol. 9, 286 (1964).
3. H. Matsumoto and C. Ajmone-Marsan, Exp. Neurol. 9, 305 (1964).
4. R. D. Traub and R. K. Wong, Science 216, 745 (1982).
5. A. R. Wyler and A. A. Ward, in Epilepsy, a window to brain mechanisms,
eds. J. S. Lockard and A. A. Ward (Raven Press, New York, 1992).
6. E. R. G. Sanabria, H. Su and Y. Yaari, J. Physiol. 532, 205 (2001).
7. C. E. Elger, Curr. Opin. Neurol. 14, 185 (2001).
8. J. Engel Jr. and T. A. Pedley, Epilepsy: a comprehensive text-book
(Philadelphia, Lippincott-Raven, 1997).
9. C. E. Elger, K. Lehnertz and G. Widman in Epilepsy: Problem solving
in clinical practice, eds. D. Schmidt and S. C. Schacter (Martin Dunitz
Publishers, London, 1999).
25

10. F. H. Lopes da Silva in Electroencephalography, eds. E. Niedermeyer and


F. H. Lopes da Silva (Williams & Wilkins, Baltimore, 1993).
11. P J. Franaszczuk and G. K. Bergey, Biol. Cybern. 8 1 , 3 (1999).
12. S. J. SchiSetal, Electroencephalogr. clin. Neurophysiol. 91,442(1994).
13. R. R. Coifman and M. V. Wickerhauser, Electroencephalogr. clin. Neu-
rophysiol. (Suppl.) 45, 57 (1996).
14. A. Effern et al, Physica D 140, 257 (2000).
15. H. G. Schuster, Deterministic chaos: an introduction (VCH Verlag, Basel,
Cambridge, New York, 1989).
16. E. Ott, Chaos in dynamical systems (Cambridge University Press, Cam-
bridge, UK, 1993).
17. H. Kantz and T. Schreiber, Nonlinear time series analysis (Cambridge
University Press, Cambridge, UK, 1997).
18. E. Ba§ar, Chaos in Brain Function (Springer, Berlin, 1990).
19. D. Duke and W. Pritchard, Measuring chaos in the human brain (World
Scientific, Singapore, 1991).
20. B. H. Jansen and M. E. Brandt, Nonlinear dynamical analysis of the EEG
(World Scientific, Singapore, 1993).
21. K. Lehnertz , J. Arnhold, P. Grassberger and C. E. Elger, Chaos in brain?
(World Scientific, Singapore, 2000).
22. A. Babloyantz and A. Destexhe, Proc. Natl. Acad. Sci. USA 83, 3513
(1986).
23. G. W. Frank et al, Physica D 46, 427 (1990).
24. J. Theiler, Phys. Lett. A 196, 335 (1995).
25. J. Theiler and P. E. Rapp, Electroencephalogr. clin. Neurophysiol. 98,
213 (1996).
26. T. Schreiber, in Chaos in brain?, eds. K. Lehnertz , J. Arnhold, P.
Grassberger and C. E. Elger, (World Scientific, Singapore, 2000).
27. F. H. Lopes da Silva et al., in Chaos in brain?, eds. K. Lehnertz , J.
Arnhold, P. Grassberger and C. E. Elger, (World Scientific, Singapore,
2000).
28. B. Litt et al, Neuron 30, 51 (2001).
29. K. Lehnertz et al, J. Clin. Neurophysiol. 18, 209 (2001).
30. M. Le Van Quyen et al, J. Clin. Neurophysiol. 18, 191 (2001).
31. R. Savit et al, J. Clin. Neurophysiol. 18, 246 (2001).
32. P. Grassberger, T. Schreiber and C. Schaffrath, Int. J. Bifurcation Chaos
1, 521 (1991).
33. K. Lehnertz and C. E. Elger, Electroencephalogr. clin. Neurophysiol.
95, 108 (1995).
34. K. Lehnertz and C. E. Elger, Phys. Rev. Lett. 80, 5019 (1998).
26

35. M. T. Rosenstein, J. J. Collins and C. J. de Luca Physica D 65, 117


(1994).
36. H. Kantz, Phys Lett A 185, 77 (1994).
37. J. Wegner, Diploma thesis, University of Bonn (1998).
38. R. Quian Quiroga et al., Phys. Rev. E 62, 8380 (2000).
39. A. S. Weigend and N. A. Gershenfeld, Time Series Prediction: Forecast-
ing the Future and Understanding the Past (Addison-Wesley, Reading,
1993).
40. R. G. Andrzejak et al., Phys. Rev. E 64, 061907 (2001).
41. T. Kreuz, Diploma thesis, University of Bonn (2000).
42. B. L. Hao, Elementary Symbolic Dynamics and Chaos in Dissipative Sys-
tems (World Scientific, Singapore, 1989).
43. D. T. Kaplan and L. Glass, Phys. Rev. Lett. 68, 427 (1992).
44. T. Schreiber and A. Schmitz, Phys. Rev. Lett. 77, 635 (1996).
45. R. G. Andrzejak, Diploma thesis, University of Bonn (1997).
46. R. G. Andrzejak et al, in Chaos in brain?, eds. K. Lehnertz , J. Arnhold,
P. Grassberger and C. E. Elger, (World Scientific, Singapore, 2000).
47. R. G. Andrzejak et al, Epilepsy Res. 44, 129 (2001).
48. J. Arnhold et al., Physica D 134, 419 (1999).
49. J. Arnhold, Publication Series of the John von Neumann Institute for
Computing, Forschungszentrum Jiilich, Vol. 4 (2000).
50. N.F. Rulkov et al., Phys. Rev. E 51, 980 (1995).
51. M. G. Rosenblum et al., Phys. Rev. Lett. 76, 1804 (1996)
52. P. Tass et al., Phys. Rev. Lett. 81, 3291 (1998).
53. K. V. Mardia Probability and mathematical statistics: Statistics of direc-
tional data. (Academy Press, London, 1972).
54. F. Mormann, Diploma thesis, University of Bonn (1998).
55. F. Mormann et al, Physica D 144, 358 (2000).
56. C. E. Elger et al., in Neocortical epilepsies, eds P. D. Williamson, A. M.
Siegel, D. W. Roberts, V. M. Thadani and M. S. Gazzaniga (Lippincott,
Williams & Wilkins: Philadelphia, 2000).
57. C. E. Elger et al., Epilepsia 41 (Suppl. 3), S34 (2000).
58. S. S. Viglione and G. O. Walsh, Electroencephalogr. clin. Neurophysiol.
39, 435 (1975).
59. Z. Rogowski, I. Gath and E. Bental, Biol. Cybern. 42, 9 (1981).
60. J. Gotman et al, Epilepsia 23, 432 (1982).
61. H. H. Lange et al., Electroencephalogr. clin. Neurophysiol. 56, 543
(1983).
62. A. Katz et al., Electroencephalogr. clin. Neurophysiol. 79, 153 (1991).
63. L. D. Iasemidis et al., Brain Topogr. 2, 187 (1990).
27

64. C. E. Elger and K. Lehnertz, in Epileptic Seizures and Syndromes, ed. P.


Wolf (J. Libbey & Co, London, 1994).
65. L. D. lasemidis et al. in Spatiotemporal Models in Biological and Artificial
Systems, eds. F. H. Lopes da Silva, J. C. Principe and L. B. Almeida
(IOS Press, Amsterdam, 1997).
66. M. Le Van Quyen et al, Physica D 127, 250 (1999).
67. C. E. Elger and K. Lehnertz, Eur. J. Neurosci. 10, 786 (1998).
68. J. Martinerie et al, Nat. Med. 4, 1173 (1998).
69. M. Le Van Quyen et al, Neuroreport 10, 2149 (1999).
70. H. R. Moser et al, Physica D 130, 291 (1999).
71. M. Le Van Quyen et al, Lancet 357, 183 (2001).
72. Y. Salant, I. Gath and O. Henriksen, Med. Biol. Eng. Comput. 36, 549
(1998).
73. R. Tetzlaff et al, IEEE Proc. Eur. Conf. Circuit Theory Design , 573
(1999).
74. A. Petrosian et al, Neurocomputing 30, 201 (2000).
75. A. L. Benabid et al, Lancet 337, 403 (1991).
76. P. Tass, Biol. Cybern. 85, 343 (2001).
77. G. Widman et al, Physica D 121, 65 (1998).
28

S T O C H A S T I C A P P R O A C H E S TO MODELING OF
PHYSIOLOGICAL R H Y T H M S

P L A M E N CH. IVANOV
Center for Polymer Studies and Department of Physics,
Boston University, Boston, MA 02215
Cardiovascular Division, Beth Israel Deaconess Medical Center,
Harvard Medical School, Boston, MA 02215, USA
E-mail: plamen@argento.bu.edu

CHUNG-CHUAN LO
Center for Polymer Studies and Department of Physics,
Boston University, Boston, MA 02215, USA
E-mail: cclo@argento.bu.edu

The scientific question we address is how physiological rhythms spontaneously self-


regulate. It is fairly widely believed, nowadays, deterministic mechanisms, includ-
ing perhaps chaos, offer a promising avenue to pursue in answering this question.
Complementary to these deterministic foundations, we propose an approach which
treats physiologiocal rhythms as fundamentally governed by several random pro-
cesses, each of which biases the rhythm in different ways. We call this approach
stochastic feedback, since it leads naturally to feedback mechanisms that are based
on randomness. To illustrate our approach, we treat in some detail the regulation
of heart rhythms and sleep-wake transitions during sleep — two classic "unsolved"
problems in physiology. We present coherent, physiologically based models and
show that a generic process based on the concepts of biased random walk and
stochastic feedback can account for a combination of independent scaling charac-
teristics observed in data.

1 M o d e l i n g scaling features in heartbeat dynamics

1.1 Introduction

The fundamental principle of homeostasis asserts that physiological systems


seek to maintain a constant output after perturbation 1 ' 2 - 3 ' 4 . Recent evidence,
however, indicates that healthy systems even at rest display highly irregu-
lar dynamics 5-6>7>8>9,io_ Here, we address the question of how to reconcile
homeostatic control and complex variability. We propose a general approach
based on the concept of "stochastic feedback" and illustrate this approach
by considering the neuroautonomic regulation of the heart rate. Our results
suggest that in healthy systems the control mechanisms operate to drive the
system away from extreme values while not allowing it to settle down to a
constant (homeostatic) output. The model generates complex dynamics and
29

successfully accounts for key characteristics of the cardiac variability not fully
explained by traditional models: (i) 1 / / power spectrum, (ii) stable scaling
form for the distribution of the variations in the beat-to-beat intervals and
(hi) Fourier phase correlations Ii>i2,i3,i4,i5,i6,i7_ Furthermore, the reported
scaling properties arise over a broad zone of parameter values rather than at
a sharply-defined "critical" point.

1.2 Random walks and feedback mechanisms


The concept of dynamic equilibrium or "homeostasis" x ' 2 ' 3 led to the proposal
that physiological variables, such as the cardiac interbeat interval r(n), where
n is the beat number, maintain an approximately constant value in spite of
continual perturbations. Thus one can write in general
r{n) =T0+n, (1)
where TQ is the "preferred level" for the interbeat interval and 77 is a white
noise with strength a, defined as the standard deviation of rj.
We first re-state this problem in the language of random walks. The time
evolution of an uncorrelated and unbiased random walk is expressed by the
equation r ( n + l ) —r(n) = 77. At every step the walker has equal probability to
move "up" or "down." The deviation from the initial level increases as n 1 / 2 18 ,
so an uncorrelated and unbiased random walk does not preserve homeostasis
(Fig. la). To maintain a constant level, there must be a bias in the random
walk 19 ,
Tin + 1) - Tin) = I(n), (2)
with
T(„\ - j W tt + V) , if Tin) < T0, ,,,
J {6)
W - \ - « , (1+77) , if r ( n ) > r 0 .
The weight w is the strength of the feedback input biasing the walker to return
to its preferred level To- When away from the attraction level To, walker has
an higher probability of moving towards the attraction level. This behavior
represents Cannon's idea of homeostasis (dynamical equilibrium), where a
system maintains constancy even when perturbed by external stimuli. Note
that Eqs. (2 & 3 ) generate dynamics similar to Eq. (1) but through a nonlinear
feedback mechanism. The dynamics generated by these rules correspond to a
system with time-independent feedback.
As expected in this case, for short time scales (high frequencies), the power
spectrum scales as l / / 2 (Brownian noise) with a crossover to white noise at
longer time scales due to the attraction to level TQ (Fig. lb). Note the shift
30

n lnf A

—». ' ». ' >-


n Inf A
Figure 1. (a) Schematic representation of the dynamics of the model, (a) Evolution of a
random walk starting from initial position TO. The deviation of the walk from level TO
increases as n 1 ' 2 , where n is the number of steps. The power spectrum of the random
walk scales as l / / 2 (Brownian noise). The distribution P(A) of the amplitudes A of the
variations in the interbeat intervals follows a Rayleigh distribution. Here the amplitudes are
obtained by: (i) wavelet transform of the random walk which filters out trends and extracts
the variations at a time scale a; (ii) calculation of the amplitudes of the variations via Hilbert
transform, (b) Random walk with a bias toward TO. (C) Random walk with two stochastic
feedback controls. In contrast to (b), the levels of attraction TO and n change values in
time. Each level persists for a time interval X; drawn from a distribution with an average
value Ti oc i(. Each time the level changes, its new value is drawn from a uniform distribution.
Perturbed by changing external stimuli, the system nevertheless remains within the bounds
defined by A T even after many steps. We find that such dynamical mechanism based
on a single characteristic time scale T\oc^ generates a 1 / / power spectrum over several
decades. Moreover, P(A) decays exponentially, which we attribute to nonlinear Fourier
phase interactions in the walk.

of the crossover to longer time scales (lower frequencies) when stronger noise
is present. For weak noise the walker never leaves the close vicinity of the
attraction level, while for stronger noise, larger drifts can occur leading to
31

longer trends and longer time scales. However, in both cases, P(A) follows
the Rayleigh distribution because the wavelet transform filters out the drifts
and trends in the random walk (Fig. lb). For intermediate values of the noise
there is a deviation from the Rayleigh distribution and the appearance of an
exponential tail.
We find that Eqs. (2 & 3) do not reproduce the statistical properties of
the empirical data (Fig. lb). We therefore generalize them to include several
inputs Ik {k = 0,1, • • • , m), with different preferred levels Tk, which compete
in biasing the walker:
m
r(n + l)-T(n) = X)/*(«). (4)
ife=0

where

lk[ {b
>~ \-wk (1+v), iir(n)>rk. >
From a biological or physiological point of view, it is clear t h a t t h e pre-
ferred levels Tk of t h e inputs Ik cannot remain constant in time, for otherwise
the system would not be able t o respond t o varying external stimuli. We
assume t h a t each preferred interval T>. is a random function of time, with
values correlated over a time scale T]* ck . We next coarse grain t h e system
and choose T^(TI) t o be a random step-like function constrained t o have val-
ues within a certain interval and with t h e length of t h e steps drawn from a
distribution with an average value Xlock (Fig. l c ) . This model yields several
interesting features, including 1 / / power spectrum, scaling of the distribution
of variations, and correlations in t h e Fourier phases.

1.3 Neuroautonomic regulation of heartbeat dynamics


To illustrate the approach for the specific example of neuroautonomic control
of cardiac dynamics, we first note that the healthy heart rate is determined
by three major inputs: (i) the sinoatrial (SA) node; (ii) the parasympathetic
(PS); and (hi) the sympathetic (SS) branches of the autonomic nervous sys-
tem.
(i) The SA node or pacemaker is responsible for the initiation of each
heart beat 20 ; in the absence of other external stimuli, it is able to maintain
a constant interbeat interval 2 . Experiments in which PS and SS inputs are
blocked reveal that the interbeat intervals are very regular and average only
0.6s 20 . The input from the SA node, Is A , thus biases the interbeat interval
r toward its intrinsic level TSA (see Fig. lb).
32

beat beat
10 r • , 10 3 .

10"" 10"3 ,10"' 10"1 10° 10s 10"4 10"3 ,10"2 10~1
/[beat] /[beat 1 ]

Figure 2. Stochastic feedback regulation of the cardiac rhythm. We compare the predictions
of the model with the healthy heart rate. Sequences of interbeat intervals r from (a) healthy
individual and (b) from simulation exhibit an apparent visual similarity, (c) Power spectra
of the interbeat intervals r(n) from the data and the model. To first approximation, these
power spectra can be described by the relation S(f) ~ l / / 1 1 . The presence of patches in
both heart and model signals lead to observable crossovers embedded on this 1 / / behavior
at different time scales. We calculated the local exponent B from the power spectrum of
24h records ( « 10 5 beats) for 20 healthy subjects and found that the local value of P
shows a persistent drift, so no true scaling exists. (This is not surprising, due to the non-
stationarity of the signals), (d) Power spectra of the increments in r(n). The model and
the data both scale as power laws with exponents close to one. Since the non-stationarity
is reduced, crossovers are no longer present. We also calculated the local exponent for the
power spectrum of the increments for the same group of 20 healthy subjects as in the top
curve, and found that the exponent Bj fluctuates around an average value close to one, so
true scaling does exist.

(ii) The PS fibers conduct impulses that slow the heart rate. Suppression
of SS stimuli, while under PS regulation, can result in the increase of the
interbeat interval to as much as 1.5s 20>21. The activity of the PS system
changes with external stimuli. We model these features of the PS input, Ips,
by the following conditions: (1) a preferred interval, Tps(n), randomly chosen
from an uniform distribution with an average value larger than TSA , and (2) a
correlation time, Tps, during which Tps does not change, where Tps is drawn
33

from a distribution with an average value T]0Ck.


(iii) The SS fibers conduct impulses that speed up the heart beat. Aboli-
tion of parasympathetic influences when the sympathetic system remains ac-
tive can decrease the interbeat intervals to less than 0.3s 2 0 . There are several
centers of sympathetic activity highly sensitive to environmental influences 2 1 .
We represent each of the N sympathetic inputs by I3SS (j = 1, • • •, N). We
attribute to I3SS the following characteristics: (1) a preferred interbeat in-
terval TgS(n) randomly chosen from a uniform distribution with an average
value smaller than TSA, and (2) a correlation time Tj in which r3ss(n) does
not change; Tj is drawn from a distribution with an average value Xiock which
is the same for all N inputs (and the same as for the PS system), so Ti ock is
the characteristic time scale of both the PS and SS inputs.
The characteristics for the PS and SS inputs correspond to a random walk
with stochastic feedback control (Fig. lc). Thus, for the present example of
cardiac neuroautonomic control, we have N + 2 inputs and Eq. (4) becomes:
N
r{n + 1) - r{n) = ISA(n) + IPS (n,TPS(n)) + ^2lJss [n,TJss(n)j , (6)

where the structure of each input is identical to the one in Eq. (5). Equa-
tion (6) cannot fully reflect the complexity of the human cardiac system. How-
ever, it provides a general framework that can easily be extended to include
other physiological systems (such as breathing, baroreflex control, different
locking times for the inputs of the SS and PS systems 5 ' 2 2 , etc.). We find
that Eq. (6) captures the essential ingredients responsible for a number of
important statistical and scaling properties of the healthy heart rate.
Next we generate a realization of the model with parameters N = 7 and
WSA = wss — W P S / 3 = O.Olsec (Fig. 2b). We choose Tj randomly from
an exponential distribution with average T)OCk = 1000 beats. (We find that a
different form of the distribution for Tj does not change the results.) The noise
r\ is drawn from a symmetrical exponential distribution with zero average and
standard deviation a = 0.5. We define the preferred values of the interbeat
intervals for the different inputs according to the following rules: (1) TSA =
0.6sec, (2) Tps are randomly selected from an uniform distribution in the
interval [0.9,1.5]sec, and (3) the r | s ' s are randomly selected from an uniform
distribution in the interval [0.2,1.0]sec. The actual value of the preferred
interbeat intervals of the different inputs and the ratio between their weights
are physiologically justified and are of no significance for the dynamics —
they just set the range for the fluctuations of r , chosen to correspond to the
empirical data.
34

/[beat"]
Tc TB1 TAl
lnS(f)
D
\C
~ \/ \/ \
\ // ^\ -
A
TB Ax B
2lock
\ A
TC

In/
Figure 3. (top) Effect of the correlation time T[oc^ on the scaling of the power spectrum of
T(TI) for a signal comprising 10 6 beats, (b) Schematic diagram illustrating the origin of the
different scaling regimes in the power spectrum of r ( n ) .

1.4 Experimantal findings and results of simulations


To qualitatively test the model, we first compare the time series generated by
the stochastic feedback model and the healthy heart 23 and find that both sig-
nals display complex variability and patchiness (Fig. 2a,b). To quantitatively
test the model, we compare the statistical properties of heart data with the
predictions of the model:
(a) We first test for long-range power-law correlations in the interbeat
intervals, which exist for healthy heart dynamics 24 . These correlations can
be uncovered by calculating power spectra, and we see (Fig. 2) that the model
simulations correctly reproduce the power-law correlations observed in data
over several decades. In particular, we note that the non-stationarity of both
the data and model signals leads to the existence of several distinct scaling
35

regimes in the power spectrum of r(n) (Figs. 2c and 3). We find that with
increasing Tiock, the power spectrum does not follow a single power law but
actually crosses over from a behavior of the type l / / 2 at very small time
scales (or high frequencies), to a behavior of the type 1//° for intermediate
time scales, followed by a new regime with l / / 2 for larger time scales (Fig. 3).
At very large time scales, another regime appears with flat power spectrum.
In the language of random walkers, T is determined by the competition of dif-
ferent neuroautonomic inputs. For very short time scales, the noise will domi-
nate, leading to a simple random walk behavior and l / / 2 scaling (regime A in
Fig. 3(bottom). For time scales longer than T^, the deterministic attraction
towards the "average preferred level" of all inputs will dominate, leading to a
flat power spectrum (regime B in Fig. 3(bottom), see also Fig. lb). However,
after a time TB (of the order of T Iock /iV), the preferred level of one of the
inputs will have changed, leading to the random drift of the average preferred
level and the consequent drift of the walker towards it. So, at these time
scales, the system can again be described as a simple random walker and we
expect a power spectrum of the type l / / 2 (regime C in Fig. 3(bottom)). Fi-
nally, for time scales larger than Tc, the walker will start to feel the presence
of the bounds on the fluctuations of the preferred levels of the inputs. Thus,
the power spectrum will again become fiat (regime D). Since the crossovers
are not sharp in the data or in the numerical simulations, they can easily be
misinterpreted as a single power law scaling with an exponent /3 « 1. By re-
ducing the strength of the noise, we decrease the size of regime A and extend
regime B into higher frequencies. In the limit a — 0, the power spectrum of
r(n), which would coincide with the power spectrum of the "average preferred
level", would have only regimes B, C and D. The stochastic feedback mecha-
nism thus enables us to explain the formation of regions (patches) in the time
series with different characteristics.
(b) By studying the power spectrum of the increments we are able to
circumvent the effects of the non-stationarity. Our results show that true
scaling behavior is indeed observed for the power spectrum of the increments,
both for the data and for the model (Fig. 2).
(c) We calculate the probability density P(A) of the amplitudes A of the
variations of interbeat intervals through the wavelet transform. It has been
shown that the analysis of sequences of interbeat intervals with the wavelet
transform 25 can reveal important scaling properties 26 for the distributions of
the variations in complex nonstationary signals. In agreement with the results
of Ref. 2 7 , we find that the distribution P(A) of the amplitudes A of inter-
beat interval variations for the model decays exponentially—as is observed
for healthy heart dynamics (Fig. 4). We hypothesize that this decay arises
36

10°
:

0"'
;

a. oe i Data
• Model

AP

Figure 4. Analysis of the amplitudes A of variations in r{n). We apply to the signal gener-
ated by the model the wavelet transform with fixed scale a, then use the Hilbert transform
to calculate the amplitude A. The top left panel shows the normalized histogram P(A) for
the data (6h daytime) and for the model (with the same parameter values as in Fig. 2),
and for wavelet scale a — 8 beats, i.e., ft) 40s. (Derivatives of the Gaussian are used as
a wevelet function). We test the generated signal for nonlinearity and Fourier phase cor-
relations, creating a surrogate signal by randomizing the Fourier phases of the generated
signal but preserving the power spectrum (thus, leaving the results of Fig. 2 unchanged).
The histogram of the amplitudes of variations for the surrogate signal follows the Rayleigh
distribution, as expected theoretically (see inset). Thus the observed distribution which is
universal for healthy cardiac dynamics, and reproduced by the model, reflects the Fourier
phase interactions. The top right panel shows a similar plot for d a t a collected during sleep
and for the model with N < wps/wss- We note that the distribution is broader for the
amplitudes of heartbeat interval variations during sleep compared wake activity indicating
counterintuitively a higher probability for large variations with large values deviating from
the exponential tail 2 8 . Our model reproduces this behavior when the number of sym-
pathetic imputs is reduced in accordance with the physiological observations of decreased
sympathetic tone during sleep 2 0 . The bottom panel tests the stability of the analysis for
the model at different time scales a. The distribution is stable over a wide range of time
scales, identical to the range observed for heart data 2 7 . T h e stability of the distributions
indicates statistical self-similarity in the variations at different time scales.
37

from nonlinear Fourier phase interactions and is related to the underlying


nonlinear dynamics. To test this hypothesis, we perform a parallel analysis
on a surrogate time series obtained by preserving the power spectrum but
randomizing the Fourier phases of a signal generated by the model (Fig. 4);
P{A) now follows the Rayleigh distribution P(A) ~ Ae~A , since there are
no Fourier phase correlations 2 9 .
(d) For the distribution displayed in Fig. 4, we test the stability of the
scaling form at different time scales; we find that P(A) for the model displays
a scaling form stable over a range of time scales identical to the range for
the data (Fig. 4) 2 7 . Such time scale invariance indicates statistical self-
similarity 3 0 .
A notable feature of the present model is that in addition to the power
spectra, it accounts for the form and scaling properties of P(A), which are
independent of the power spectra 31 . No similar tests for nonlinear dynam-
ics have been reported for other models 12>13>14. Further work is needed to
account for the recently reported long-range correlations in the magnitude of
interbeat interval increments 32 , the multifractal spectrum of heartrate fluctu-
ations 33 and the power-law distribution of segments in heart rate recordings
with different local mean values 34 .
The model has a number of parameters, whose values may vary from one
individual to another, so we next study the sensitivity of our results to vari-
ations in these parameters. We find that the model is robust to parameter
changes. The value of Xiock and the strength of the noise a are crucial to
generate dynamics with scaling properties similar to those found for empirical
data. We find that the model reproduces key features of the healthy heart
dynamics for a wide range of time scales (500 < TiOCk < 2000) and noise
strengths (0.4 < a < 0.6). The model is consistent with the existence of an
extended "zone" in parameter space where scaling behavior holds, and our pic-
ture is supported by the variability in the parameters for healthy individuals
for which similar scaling properties are observed.

1.5 Conclusions
Scaling behavior for physical systems is generally obtained for fixed values of
the parameters, corresponding to a critical point or phase transition 3 5 . Such
fixed values seem unlikely in biological systems exhibiting power law scaling.
Moreover, such critical point behavior would imply perfect identity among
individuals; our results are more consistent with the robust nature of healthy
systems which appear to be able to maintain their complex dynamics over
a wide range of parameter values, accounting for the adaptability of healthy
38

systems.
The model we review here, and the data which it fits, support a revised
view of homeostasis that takes into account the fact that healthy systems
under basal conditions, while being continuously driven away from extreme
values, do not settle down to a constant output. Rather, a more realistic
picture may involve nonlinear stochastic feedback mechanisms driving the
system.

2 Modeling dynamics of sleep-wake transitions

2.1 Introduction

In this Section we investigate the dynamics of the awakening during the night
for healthy subjects and find that the wake and the sleep periods exhibit
completely different behavior: the durations of wake periods are characterized
by a scale-free power-law distribution, while the durations of sleep periods
have an exponential distribution with a characteristic time scale. We find
that the characteristic time scale of sleep periods changes throughout the
night. In contrast, there is no measurable variation in the power-law behavior
for the durations of wake periods. We develop a stochastic model, based
on biased random walk approach, which agrees with the data and suggests
that the difference in the dynamics of sleep and wake states arises from the
constraints on the number of microstates in the sleep-wake system.
In clinical sleep centers, the "total sleep time" and the "total wake time"
during the night are used to evaluate sleep efficacy and to diagnose sleep dis-
orders. However, the total wake time during a longer period of nocturnal sleep
is actually comprised of many short wake intervals (Fig. 5). This fact suggests
that the "total wake time" during sleep is not sufficient to characterize the
complex sleep-wake transitions and that it is important to ask how periods
of the wake state distribute during the course of the night. Although recent
studies have focused on sleep control at the neuronal level 36>37,38,39^ v e r y little
is known about the dynamical mechanisms responsible for the time structure
or even the statistics of the abrupt sleep-wake transitions during the night.
Furthermore, different scaling behavior between sleep and wake activity and
between different sleep stages has been observed 40>41. Hence, investigating
the statistical properties of the wake and sleep states throughout the night
may provide not only a more informative measure but also insight into the
mechanisms of the sleep-wake transition.
39

120 180 240


Time (min)
Wake
REM
L
cc 2
g3
z 4 -^W
Wake

Sleep

50 70 90 110 130 150


Time (min)

Figure 5. The textbook picture 4 3 of sleep-stage transitions describes a quasi-cyclical pro-


cess, with a period of « 90 min, where the wake stage is followed by light sleep and then
by deep sleep, with transition back to light sleep, and then to rapid-eye-movement (REM)
sleep—or perhaps to wake stage. Sleep-wake transitions during nocturnal sleep, (a) Rep-
resentative example of sleep-stage transitions from a healthy subject. Data were recorded
in a sleep laboratory according to the Rechtschaffen and Kales criteria 5 2 : two channels of
electroencephalography (EEG), two channels of electrooculography (EOG) and one channel
of submental electromyography (EMG) were recorded. Signals were digitized at 100 Hz and
12 bit resolution, and visually scored by sleep experts in segments of 30 seconds for sleep
stages: wakefulness, rapid-eye-movement (REM) sleep and non-REM sleep stages 1, 2, 3
and 4. (b) Magnification of the shaded region in (a), (c) In order to study sleep-wake
transitions, we reduce five stages into a single sleep state by grouping rapid-eye-movement
(REM) sleep and sleep stages 1 to 4 into a single sleep state.

2.2 Empirical analysis

We analyze 39 full-night sleep records collected from 20 healthy subjects (11


females and 9 males, ages 23-57, with average sleep duration 7.0 hours).
We first study the distribution of durations of the sleep and of the wake
states during the night (Fig. 5). We calculate the cumulative distribution of
40

durations, defined as

p(r)dr, (7)

where p(r) is the probability density function of durations between r and


r + dr. We analyze P(t) of the wake state, and we find that the data follow
a power-law distribution,
P(t) ~ t~a . (8)
We calculate the exponent a for each of the 20 subjects, and find an average
exponent a = 1.3 with a standard deviation a = 0.4.
It is important to verify that the data from individual records correspond
to the same probability distribution. To this end, we apply the Kolmogorov-
Smirnov test to the data from individual records. We find that we cannot
reject the null hypothesis that p(t) of the wake state of each subject is drawn
from the same distribution, suggesting that one can pool all data together to
improve statistics without changing the distribution (Fig. 6a). Pooling the
data from all 39 records, we find that P(t) of the wake state is consistent with
a power-law distribution with an exponent a — 1.3 ± 0.1 (Fig. 7a).
In order to verify that the distribution of durations of wake state is better
described by a power law rather than an exponential or a stretched expo-
nential functional forms, we fit these curves to the distributions from pooled
data. Using Levenberg-Marquardt method, we find that both exponential
and stretched exponential form lead to worse fit. The x 2 error of power-law
fit, exponential fit and stretched exponential fit are 3 x 10~ 5 , 1.6 x 10~ 3 and
3.5 x 10~ 3 , respectively. We also check the results by plotting (i) log P(t) ver-
sus t and (ii) log(| logP(t)\) versus logt a and find in both cases that the data
are clearly more curved than when we plot log P(t) versus logt, indicating
that power law provides the best description of the data b.
We perform a similar analysis for the sleep state and find, in contrast to
the result for the wake state, that the data in large time region (t > 5 min)
exhibit exponential behavior
P(t) ~ e-*/ r . (9)

"For the stretched exponential y = aexp(—bxc), where a, b and c are constants, the
log(| log2/|) versus logx plot is not a straight line unless a = 1. Since we don't know what
the corresponding value of a is in our data, we can not rescale y so that a — I. The solution
is to shift x for a certain value to make y = 1 when x — 0, in which case a = 1. In our data,
P(t) = 1 when t = 0.5, so we shift t by —0.5 before plotting log(| logP(t)|) versus logt.
b
According Eq. 7, if P(t) is a power-law function, so is p(t). We also separately check
the functional form of p(t) for the data with same procedure and find that the power law
provides the best description of the data.
41

Figure 6. Cumulative probability distribution P(t) of sleep and wake durations of indi-
vidual and pooled data. Double-logarithmic plot of P(t) of wake durations (a) and semi-
logarithmic plot of P(t) of sleep durations (b) for pooled data and for data from one typical
subject. P(t) for three typical subjects is shown in the insets. Note that due to limited
number of sleep-wake periods for each subject, it is difficult to determine the functional
form for individual subjects. We perform K-S test and compare the probability density
p(t) for all individual data sets and pooled data for both wake and sleep periods. For both
sleep and wake, less than 10% of the individual d a t a fall below the 0.05 significant level of
disproof of the null hypothesis, that p(t) for each individual subject is very likely drawn
from the same distribution. The K-S statistics significantly improves if we use recordings
only from the second night. Therefore, pooling all data improves the statistics by preserving
the form of p(t).

We calculate the time constants r for the 20 subjects, and find an average
r = 20 min with a = 5. Using the Kolmogorov-Smirnov test, we find that we
cannot reject the null hypothesis that p(t) of the sleep state of each subject of
our 39 data sets is drawn from the same distribution (Fig. 6b). We further
find that P(t) of the sleep state for the pooled data is consistent with an
exponential distribution with a characteristic time r = 22 ± 1 min (Fig. 7b).
In order to verify that P(t) of sleep state is better described by an expo-
nential functional form rather than by a stretched exponential functional form,
we fit these curves to the P(t) from pooled data. Using Levenberg-Marquardt
method, we find that the stretched exponential form lead to worse fit. The
X2 error of exponential fit and stretched exponential fit are 8 x 10~5 and
2.7 x 10~ 2 , respectively. We also check the results by plotting log(| logP(t)\)
versus logt (1) and find that the data are clearly more curved than when we
plot logP(i) versus logt, indicating that an exponential form provides the
best description of the data.
Sleep is not a "homogeneous process" throughout the course of the night
42

Figure 7. Cumulative distribution of durations P(t) of sleep and wake states from data, (a)
Double-logarithmic plot of P(t) from the pooled data. For the wake state, the distribution
closely follows a straight line with a slope a — 1.3 ± 0.1, indicating power-law behavior of
the form, cf. Eq. (8). (b) Semi-logarithmic plot of P(t). For the sleep state, the distribu-
tion follows a straight line with a slope 1 / T where T = 22 ± 1, indicating an exponential
behavior of the form, cf. Eq. (9). It has been reported that the individual sleep stages have
exponential distributions of durations 53 > 54 > 55 . Hence we expect an exponential distribution
of durations for the sleep state.

42 43
' , so we ask if there is any change of a and r during the night. We study
sleep and wake durations for the first two hours, middle two hours, and the
last two hours of nocturnal sleep using the pooled data from all 39 records
(Fig. 8). Our results suggest that a does not change for these three portions
of the night, while r decreases from 27 ± 1 min in the first two hours to 22 ± 1
min in the middle two hours, and then to 18 ± 1 min in the last two hours.
The decrease in r implies that the number of wake periods increases as the
night proceeds, and we indeed find that the average number of wake periods
for the last two hours is 1.4 times larger than for the first two hours.

2.3 Model

We next investigate mechanisms that may be able to generate the different


behavior observed for sleep and wake. Although several quantitative models,
such as the two-process model 44 and the thermoregulatory model 45 , have
been developed to describe human sleep regulation, detailed modeling of fre-
quent short awakening during nocturnal sleep has not been addressed 46 . To
model the sleep-wake transitions, we make three assumptions (Fig. 9) 4 r :
43

10
>, \g, o First 2 hr >,

Cumulativ e probabilit
Cumulativ e probabili

• Middle 2 hr
£\o , A Last 2 hr
o

o
a=1.3
-
o

O A».

o
a. Wak e ° .X^
10" 3 , ,
1 10 100 50
time (min) time (min)

Figure 8. P(t) of sleep and wake states in the first two hours, middle two hours and last
two hours of sleep, (a) P(t) of wake states; the power-law exponent a does not change in a
measurable way. (b) P(t) of sleep states; the characteristic time r decreases in the course
of the night.

Assumption 1 defines the key variable x(t) for sleep-wake dynamics.


Although we consider a two-state system, the brain as a neural system is
unlikely to have only two discrete states. Hence, we assume that both wake
and sleep "macro" states comprise large number of "microstates" which we
map onto a continuous variable x(t) defined in such a way that positive values
correspond to the wake state while negative values correspond to the sleep
state. We further assume that there is a finite region — A < x < 0 for the
sleep state.
Assumption 2 concerns the dynamics of the variable x(t). Recent stud-
ies 37 ' 39 suggest that a small population of sleep-active neurons in a localized
region of the brain distributes inhibitory inputs to wake-promoting neuronal
populations, which in turn interact through a feedback on the sleep-active
neurons. Because of these complex interactions, the global state of the sys-
tem may present a "noisy" behavior. Accordingly, we assume that x(t) evolves
by a random-walk type of dynamics due to the competition between the sleep-
active and wake-promoting neurons.
Assumption 3 concerns a bias towards sleep. We assume that if x(t)
moves into the wake state, then there will be a "restoring force" pulling it
towards the sleep state. This assumption corresponds to the common expe-
rience that in wake periods during nocturnal sleep, one usually has a strong
tendency to quickly fall asleep again. Moreover, the longer one stays awake,
the more difficult it may be to fall back asleep, so we assume that the restoring
force becomes weaker as one moves away from the transition point x = 0. We
44

model these observations by assuming that the random walker moves in a log-
arithmic potential V(x) = b\uxJ yielding a force f(x) = -dV(x)/dx = —b/x,
where the bias b quantifies the strength of the force.
Assumptions 1-3 can be written compactly as:
c /,x ,. -.N /.x f e W> if-A<z(f)<0 (sleep), .,„.
6x(t)=x(t + l)-x{t) = {}l>+<t)t x^^J1- l P
( w a k 4; (10)
where e(t) is an uncorrelated Gaussian-distributed random variable with zero
mean and unit standard deviation. In our model, the bias b and the threshold
A may change during the course of the night due to physiological variations
such as circadian cycle 44 - 46 .
In our model, the distribution of durations of the wake state is identical to
the distribution of return times of a random walk in a logarithmic potential.
For large times, this distribution is of a power law form 48.49>50>51, Hence, for
large times, the cumulative distribution of return times is also a power law,
Eq. (8), and the exponent is predicted to be

a=\+b. (11)

From Eq. (11) it follows that the cumulative distribution of return times for a
random walk without bias (b = 0) decreases as a power law with an exponent
a = 1/2. Note that introducing a restoring force of the form f(x) = —b/x1
with 7 ^ 1 , yields stretched exponential distributions 51 , so 7 = 1 is the only
case yielding a power-law distribution.
Similarly, the distribution of durations of the sleep state is identical to
the distribution of return times of a random walk in a space with a reflecting
boundary. Hence P(t) has an exponential distribution, Eq. (9), in the large
time region, with the characteristic time r predicted to be
r ~ A2 . (12)
Equations (11) and (12) indicate that the values of a and r in the data can
be reproduced in our model by "tuning" the threshold A and the bias b
(Fig. 10). The decrease of the characteristic duration of the sleep state as the
night proceeds is consistent with the possibility that A decreases (Fig. 9. Our
calculations suggest that A decreases from 7.9 ± 0.2 in the first hours of sleep,
to 6.6 ± 0.2 in the middle hours, and then to 5.5 ± 0.2 for the final hours of
sleep. Accordingly, the number of wake periods of the model increases by a
factor of 1.3 from the first two hours to the last two hours, consistent with
the data. However, the apparent consistency of the power-law exponent for
the wake state suggests that the bias b may remain approximately constant
during the night. Our best estimate is b = 0.8 ± 0.1.
45

C.
Data
Wake

Sleep 1 II 11 1II 1 If
Wake Moclei

Sleep
I
11
. I . I .
1
100 200 300 400
Time (min)

Figure 9. Schematic representation of the dynamics of the model. T h e model can be viewed
as a random walk in a potential well illustrated in (a), where he bottom flat region between
—A < x < 0 corresponds to the area without field, and the region x > 0 corresponds to
the area with logrithmic potential, (b) The state x(t) of the sleep-wake system evolves as
a random walk with the convention that x > 0 corresponds to wake state and —A < x < 0
corresponds to the sleep state, where A gradually changes with time to account for the
decrease of the characteristic duration of the sleep state with progression of the night. In
the wake state there is a "restoring force," f(x) = —b/x, "pulling" the system towards the
sleep state. The lower panel in (b) illustrates sleep-wake transitions from the model, (c)
Comparison of typical data and of a typical output of the model. The visual similarity
between the two records is confirmed by quantitative analysis (Fig. 10).

To further test the validity of our assumptions, we examine the correla-


tion between the durations of consecutive states. Consider the sequence of
sleep and wake durations { 5i W\ Si Wi....Sn Wn }, where Sn indicates the
duration of n-th sleep period and Wn indicates the duration of n-th wake
period (Fig. 9b). Our model predicts that there are no autocorrelations in
the series Sn and Wn, as well as no cross-correlations between series Sn and
Wn, the reason being that the uncorrelated random walk carries no informa-
tion about previous steps. The experimental data confirms these predictions,
46

Duration (min) Duration (min)

Figure 10. Comparison of P(t) for data and model (two runs with same parameters), (a)
P(t) of the wake state, (b) P(t) of the sleep state. Note that the choice of A depends on the
choice of the time unit of the step in the model. We choose the time unit to be 30 seconds,
which corresponds to the time resolution of the data. To avoid big jumps in x(t) due to the
singularity of the force when x(t) approaches x = 0, we introduce a small constant A in the
definition of the restoring force f(x) — —b/(x + A). We find that the value of A does not
change a or T.

within statistical uncertainties.

2.4 Conclusions
Our findings of a power-law distribution for wake periods and an exponen-
tial distribution for sleep periods are intriguing because the same sleep-control
mechanisms give rise to two completely different types of dynamics—one with-
out a characteristic scale and the other with. Our model suggests that the
difference in the dynamics of the sleep and wake states (e.g. power law versus
exponential) arises from the distinct number of microstates that can be ex-
plored by the sleep-wake system for these two states. During the sleep state,
the system is confined in the region —A < x < 0. The parameter A imposes
a scale which causes an exponential distribution of durations. In contrast,
for the wake state the system can explore the entire half-plane x > 0. The
lack of constraints leads to a scale-free power-law distribution of durations. In
addition, the l/x restoring force in the wake state does not change the func-
tional form of the distribution, but its magnitude determines the power-law
exponent of the distribution (see Eq. (11)).
Although in our model the sleep-wake system can explore the entire half-
plane x > 0 during wake periods, the "real" biological system is unlikely to
generate very large value (i.e., extreme long wake durations). There must be
47

a constraint or boundary in the wake state at a certain value of a:. If such a


constraint or boundary exists, we will find a cut-off with exponential tail in
the distribution of durations of the wake state. More data are needed to test
this hypothesis.
Our additional finding of a stable power-law behavior for wake periods for
all portions of the night implies that the mechanism generating the restoring
force in the wake state is not affected in a measurable way by the mechanism
controlling the changes in the durations of the sleep state. We hypothesize
that even though the power-law behavior does not change in the course of the
night for healthy individuals, it may change under pharmacological influences
or under different conditions, such as stress or depression. Thus, our results
may also be useful for testing these effects on the statistical properties of the
wake state and the sleep state.

3 Summary

We show that a stochastic approach based on general phenomenological con-


siderations can successfully account for a variety of scaling and statistical
features in complex physiological processes where interaction between many
elements is typical. We propose a "common framework" to describe diverse
physiological mechanims such as heart rate control and sleep-wake regulation.
In particular, in the context of cardiac dynamics we find that the generic pro-
cess of a random walk biased by attracting fields, which are often functions of
time, can generate the long-range power-law correlations, and the form and
stability of the probability distribution observed in heartbeat data. A process
based on the same concept, in the context of sleep-wake dynamics, generates
complex behavior which accounts both for the scale-free power-law distribu-
tion of the wake periods, and for the scale-dependent exponential distribution
of the sleep periods. Further studies are needed to establish the extent to
which such approaches can be used to elucidate mechanisms of physiologic
control.

Acknowledgments

We are greatful to many individuals, including L.A.N. Amaral, A.L. Gold-


berger, S. Havlin, T. Penzel, J.-H. Peter, H.E. Stanley for major contributions
to the results reviewed here, which represent a collaborative research effort.
We also thank A. Arneodo, Y. Ashkenazy, A. Bunde, I. Grosse, H. Herzel,
J.W. Kantelhardt, J. Kurths, C.-K. Peng, M.G. Rosenblum, and B.J. West
for valuable discussions. This work was supported by NIH/National Center
48

for Research Resources (P41 RR13622), NSF, NASA, and The G. Harold and
Leila Y. Mathers Charitable Foundation.

References

1. C. Bernard, Les Phenomenes de la Vie (Paris, 1878)


2. B. van der Pol and J. van der Mark, Phil. Mag 6, 763 (1928).
3. W. B. Cannon, Physiol. Rev. 9, 399 (1929).
4. B. W. Hyndman, Kybernetik 15, 227 (1974).
5. S. Akselrod et al, Science 213, 220 (1981).
6. M. Kobayashi and T. Musha, IEEE Trans, of BME 29, 456 (1982).
7. M. F. Shlesinger, Ann. NY Acad. Sci. 504, 214 (1987); M. F. Shlesinger
and B. J. West, in Random Fluctuations and Pattern Growth: Experi-
ments and Models (Kluwer Academic Publishers, Boston, 1988).
8. M. Malik and A. J. Camm, Eds. Heart Rate Variability (Futura, Armonk
NY, 1995).
9. J. Kurths et al, Chaos 5, 88 (1995).
10. G. Sugihara et al., Proc. Natl. Acad. Sci. USA 93, 2608 (1996).
11. R. deBoer et al, Am. J. Physiol. 253, H680 (1987).
12. M. Mackey and L. Glass, Science 197, 287 (1977); L. Glass and
M. Mackey, From Clocks to Chaos: The Rhythms of Life (Princeton Univ.
Press, Princeton, 1981); L. Glass et al, Math. Biosci. 90, 111 (1988);
L. Glass and C. P. Malta, J. Theor. Biol. 145, 217 (1990); L. Glass,
P. Hunter, A. McCulloch, Eds. Theory of Heart (Springer Verlag, New
York, 1991).
13. M. G. Rosenblum and J. Kurths, Physica A 215, 439 (1995).
14. H. Seidel and H. Herzel, in Modelling the Dynamics of Biological Systems
E. Mosekilde and O. G. Mouritsen, Eds. (Springer-Verlag, Berlin, 1995).
15. J. P. Zbilut et al, Biological Cybernetics 75, 277 (1996).
16. J. K. Kanters et al, J. Cardiovasc. Electrophysiology 5, 591 (1994).
17. G. LePape et al, J. Theor. Biol. 184, 123 (1997).
18. E. W. Montroll and M. F. Shlesinger, in Nonequilibrium Phenomena II:
From Stochastics to Hydrodynamics, L. J. Lebowitz and E. W. Montroll
Eds. (North-Holland, Amsterdam, 1984), pp. 1-121.
19. N. Wax, Ed. Selected Papers on Noise and Stochastic Processes (Dover
Publications Inc., New-York, 1954); G. H. Weiss, Aspects and Applica-
tions of the Random Walk (Elsevier Science B.V., North-Holland, New-
York, 1994)
20. R. M. Berne and M. N. Levy, Cardiovascular Physiology 6th ed. (C.V.
Mosby Company, St. Louis, 1996).
49

21. M. N. Levy, Circ. Res. 29, 437 (1971).


22. G. Jokkel et al, J. Auton. Nerv. Syst. 5 1 , 85 (1995).
23. MIT-BIH Polysomnographic Database CD-ROM, second edition (MIT-
BIH Database Distribution, Cambridge, 1992)
24. C. -K. Peng et al, Phys. Rev. Lett. 70, 1343 (1993); J. M. Hausdorff
and C. -K. Peng, Phys. Rev. E 54, 2154 (1996).
25. A. Grossmann and J. Morlet, Mathematics and Physics: Lectures on
Recent Results (World Scientific, Singapore, 1985); I. Daubechies, Comm.
Pure and Appl. Math. 4 1 , 909 (1988).
26. J. F. Muzy et al., Int. J. Bifurc. Chaos 4, 245 (1994); A. Arneodo et al.,
Physica D 96, 291 (1996).
27. P. Ch. Ivanov et al., Nature 383, 323 (1996).
28. P. Ch. Ivanov et al, Physica A 249, 587 (1998).
29. R. L. Stratonovich, Topics in the Theory of Random Noise (Gordon and
Breach, New York, 1981).
30. J. B. Bassingthwaighte, L. S. Liebovitch, B. J. West, Fractal Physiology
(Oxford Univ. Press, New York, 1994).
31. P. Ch. Ivanov et al., Europhys. Lett. 43, 363 (1998).
32. Y. Ashkenazy et al, Phys. Rev. Lett 86, 1900 (2001).
33. P. Ch. Ivanov et al, Nature 399, 461 (1999).
34. P. Bernaola-Galvan et al., Phys. Rev. Lett 87, 168105(4) (2001).
35. H. E. Stanley, Introduction to Phase Transitions and Critical Phenomena
(Oxford University Press, London, 1971).
36. M. Chicurel, Nature 407, 554 (2000).
37. D. Mcginty and R. Szymusiak, Nature Med. 6, 510 (2000).
38. J. H. Benington, Sleep 23, 959 (2000).
39. T. Gallopin et al, Nature 404, 992 (2000).
40. P. Ch. Ivanov et al, Europhys. Lett. 48, 594 (1999).
41. A. Bunde et al, Phys. Rev. Lett. 85, 3736 (2000).
42. J. Born et al, Nature 397, 29 (1999).
43. M. A. Carskadon and W. C. Dement, Principles and Practice of Sleep
Medicine (WB Saunders Co, Philadelphia) 2000, pp. 15-25.
44. A. A. Borbely and P. Achermann, J. Bio. Rhythm. 14, 557 (1999).
45. M. Nakao et al, J. Biol. Rhythm. 14, 547 (1999).
46. D. -J. Dijk and R. E. Kronauer, J. Bio. Rhythm. 14, 569 (1999).
47. C.-C. Lo et al, pre-print: cond-mat/0112280; Europhys. Lett. , (2002)
in press.
48. S. Zapperi et al, Phys. Rev. B 58, 6353 (1998).
49. S. Havlin et al, J. Phys. A 18, 1043 (1985).
50. D. Ben-Avraham and S. Havlin, Diffusion and Reactions in Fractals and
50

Disordered Systems (Cambridge Univ. Press, Cambridge) 2000.


51. J. A. Bray, Phys. Rev. E 62, 103 (2000).
52. A. Rechtschaffen and A. Kales, A Manual of Standardized Terminol-
ogy, Techniques, and Scoring System for Sleep Stages of Human Subjects
(Calif: BIS/BRI, Univ. of California, Los Angeles) 1968.
53. R. Williams et al, Electroen. Clin. Neuro. 17, 376 (1964).
54. V. Brezinova, Electroen. Clin. Neuro. 39, 273 (1975).
55. B. Kemp and H. A. C. Kamphuisen, J. Biol. Rhythm. 9, 405 (1986).
51

CHAOTIC PARAMETERS IN TIME SERIES OF ECG, RESPIRATORY


MOVEMENTS AND ARTERIAL PRESSURE

E. CONTE, A. F E D E R I C I 0
Department of Pharmacology and Human Physiology-University ofBari, P.zza G. Cesare,
70100 Bari-Italy. '^Center ofInnovative Technologies for Signal Detection and Processing,
Bari, Italy.
E-mail: fisio2@fisiol. uniba. it

Correlation Dimension, Lyapunov exponents, Kolmogorov entropy were calculated by analysis of ECG,
respiratory movements and arterial pressure of normal subjects in spontaneous and forced conditions of
respiration. We considered the cardiovascular system arranged by a model of five oscillators having
variable coupling strengths, and we found that such system exhibits chaotic activity as well as its
components. In particular, we obtained that respiration resolves itself in a non linear input into heart
dynamics, thus explaining that it is a source of chaotic non linearity in heart rate variability.

1. Introduction

A recent, relevant paradigm is that, due to the complexity of biological matter,


chaos theory should represent a reasonable formulation of living systems. Chaotic
behaviour should be dominant and non chaotic states should correspond more to
pathological than normal states. Fundamental results and theoretical reasons sustain
the relevant role of chaos theory in explaining the mechanisms of living matter.
This is so because many physiological systems may be represented by the action of
coupled biological oscillators. It has been evidenced 4 that, under suitable
conditions, such stimulated and coupled oscillators generate chaotic activity. We
retain that, in different physiological conditions, a stronger or lower coupling
among such oscillators takes place, determining a modification in the control
parameters of the system, with enhancement or reduction of the chaotic behaviour
of an oscillator respect to the other mutually coupled. Such dynamical modification
will be resolved and observed by a corresponding modification of the values of the
chaotic parameters (i.e. Lyapunov exponents) usually employed in the analysis of
experimental time series.
Recent studies' of the cardiovascular system emphasize the oscillatory nature of the
processes happening within this system. Circulatory system is represented by the
heart, and the systemic and pulmonary vessels. To regulate vessels resistance,
myogenic activity operates to contract the vessels in response to a variation of intra-
52

vascular pressure. This generates a rhythmic activity related to periodicity in signals


of blood pressure5 and of blood flow7'9. Neural system also realizes the activity of
the autonomic nervous system and it is over-imposed to the rhythmic activity of
pacemaker cells. Rhythmic regulation of vessels resistance is also realized by the
activity of metabolic substances in the blood. In conclusion, the dynamics of blood
flow in its passage through the cardiovascular system, is governed by five
oscillators, the heart, the lungs, myogenic-, neural-, metabolic-activities. We may
consider this system to be a spatially distributed physical system constituted by five
oscillators. Each oscillator exhibits autonomous oscillations but positive and
negative feedback loops take place so that the continuous regulation of blood
circulation is realized through the coherent activity of such mutually coupled
oscillators.
This is the model that we employ in the present study. We have all the elements to
expect such system to be a non linear and complex system. So, we arrive to the
central scope of the present work. We intend to ascertain the following points:
using the methods of non linear analysis, we intend to establish if
cardiovascular system exhibits chaotic activity as well as its components;
we aim to ascertain also if the model of five oscillators is or not supported by
our analysis, and, in particular, if we may arrive to the final conclusion that
respiration resolves itself as a non linear input from respiration into heart
dynamics of the cardiovascular oscillator.
It is well known the importance to give a definitive and rigorous answer to this last
problem. Let us specify in more detail the nature of our objective. Analyzing data
regarding ECG time series, several authors2'6 obtained results indicating that the
normal sinus rhythm in ECG must be ascribed to actual low-dimensional chaos. By
the same kind of analysis, it was also obtained evidence for inherent non linear
dynamics and chaotic determinism in time series of consecutive R-R intervals. The
physiological origins for such chaotic non linearity are unknown. The purpose of
our study was to establish whether a non linear input from spontaneous respiration
to heart exists and if it may be considered one of the sources for the chaotic non
linearity in heart rate variability.

2. Methods
We measured signals of ECG, respiratory movements, arterial pressure in six
normal nonsmoking subjects in normal (NR) and forced (FR) conditions of
respiration, respectively. The condition FR was obtained by asking the subjects to
perform inspiratory acts with a 5 s periodicity, at a given signal. The signal for
expiration was given 2 s after every inspiration. The measured ECG signals were
53

sampled at 500 Hz for 300 s. Signals vs time tracings for respiration, ECG, Doppler,
and R-R intervals are given in Fig.l for the subject #13-07. Peak to peak values
were considered for time series. Programs for noise reduction were utilized in order
to use noise reduced time series data only. In order to follow variability in time of
the collected data, the obtained time series were re-sampled in five intervals (sub-
series), each interval containing 30.000 points. All the data were currently analyzed
by the methods of non linear prediction and of surrogate data.
Correlation dimension, Lyapunov spectrum, Kolmogorov entropy were estimated
after determination of time delay T by auto-correlation and mutual information.
Embedding dimension in phase space was established by the method of False
Nearest Neighbors (FNN) (for chaotic analysis, see, as example, ref. 3, 8).

3. The Results
The main results of chaotic analysis are reported in Table 1. For cardiac oscillations
time delays resulted ranging from 14 to 60 msec both in the two cases of subjects in
NR and FR. Embedding dimension in phase space resulted to be d = 4, thus
establishing that we need four degrees of freedom to correctly describe heart
dynamics. Correlation dimension, D2, established with saturation in a D 2 -d plot,
resulted to be a very stable value during the selected intervals of experimentation. It
assumed values ranging from 3,609 ± 0,257 to 3,714 + 0,246 in the case of five
intervals for normal subjects in NR, and D2-values ranging from 3,735 ± 0,228 to
3,761 ± 0,232 in the case of subjects in FR. On the basis of such results, we
concluded that normal cardiac oscillations as well as cardiac oscillations of subjects
under FR follow deterministic dynamics of chaotic nature. Soon after we estimated
Lyapunov exponents: ^-i and X2 resulted to be positive; X3 and X4 assumed negative
value; the sum of all the calculated exponents resulted to be negative as required for
dissipative systems. We concluded that cardiac oscillations of normal subjects under
NR and FR, represent an hyper-chaotic dynamics. X] and X2 positive exponents in
Table 1 represent the rates of divergence of the attractor in the directions of
maximum expansion. These are the direction in which the cardiac oscillating system
realizes chaoticity. ^,3 and ^.4 negative values in Table 1 represent the rates of
convergence of the attractor in the contracting directions. The emerging picture is
that cardiac oscillations, as measured by ECG in normal subjects, are representative
of a large ability of heart to continuously cope with rapid changes corresponding to
the high values of its chaoticity. Looking in Table 1 at the calculated values of X{
and X2 along the five different time intervals that we have analysed, we deduce that
such values remained substantially stable in different intervals. Thus, we may
conclude that, due to the constant action of the oscillators defined in our model, in
54

the NR and FR conditions, heart chaotic dynamics remains substantially stable in


time. The same tendency was confirmed examining the results obtained for
Kolmogorov entropy, K, (see Table 1) and characterizing the overall chaoticity of
the system. Thus, we may arrive to establish the first conclusion of the present
paper: heart dynamics exhibits chaoticity, and this remains substantially stable in
time for normal subjects in NR and FR. However, we have to respond here also to
the question to ascertain if respiration resolves itself in a non linear input given
from the respiratory system into the heart dynamics of cardiac oscillator. To this
regard we must remember that, as previously explained in the introduction, the
estimation of Lyapunov exponents, and, in particular, of positive Lyapunov
exponents must be considered, in chaotic analysis of physiological systems, a sign
to confirm the presence of a physiological mechanism of control acting through a
modification of the control parameters of the considered system via a stronger or
lower coupling between the oscillators assumed to act in the system. According to
this thesis, an existing non linear input from respiration into the heart dynamics of
the cardiovascular system should be actually realised through a modification of the
control parameters of the system via a modification of the coupling strength
between two considered oscillators, and it would be evidenced through evident
modifications of positive and negative Lyapunov exponents in the two cases of NR
and FR. In fact, for A.! values we obtained, in five normal subjects, an increase, in
FR with respect to NR, varying from 6% to about 36%. For X2, the same increase
was more relevant, varying from about 12% to about 61%. Also the corresponding
negative values, A,3 and X4, were increased in FR with respect to NR. Only in one
subject a decrease was obtained in the values of Lyapunov exponents after passing
from NR to FR. Also in this case, appreciable percent differences were observed.
In conclusion, the substantial differences in Lyapunov exponent values in NR with
respect to FR, is a result of the present work. Increasing values of positive
Lyapunov exponent reveal an increasing degree of chaoticity. Decreasing values
reveal, instead, decreasing chaoticity. Increased and (in only one case) decreased
values of positive Lyapunov exponents that we found in FR with respect to NR,
indicate that in the first condition we had an increasing (in only one case
decreasing) chaoticity, and this establishes that respiration acts as a non linear input
of the respiration on the cardiovascular oscillator. According to our model, such non
linear input from respiration to cardiovascular oscillator, resolves itself in a greater
(or lower) coupling strength between such two considered oscillators. Obviously, in
order to confirm this conclusion, we need to emphasize that also respiration is
characterised by chaotic dynamics. We so performed chaos analysis of respiration
movements time series data obtained from the previously considered normal
55

subjects, and the analysis was executed following the same previous methodological
criteria. Time delays resulted to vary from 4 to 76 msec, embedding dimension in
phase space resulted to be d = 3. As previously said, such dimension reflects the
number of degrees of freedom necessary for description of respiratory system. We
deduced from d = 3 that it is necessary to consider the action of three possible
oscillators determining the behaviour of such system. Mean value of correlation
dimension, D2, resulted to be 2,740 ± 0,390 in the case of NR in the first interval of
investigation. A rather stable mean value, ranging from D 2 = 2,579 + 0,340 to D2 =
2,665 ± 0,346, was also obtained in the case of the four remaining intervals. We
concluded that respiratory system of the examined normal subjects exhibits chaotic
determinism. As expected, during FR, we obtained a reduction of the mean values
of correlation dimension respect to NR. The mean value resulted to be D2 = 2,414 ±
0,417 in the first interval and varied between D2 = 2,339 ± 0,314 and D2 = 2,389 ±
0,383 in the remaining four intervals whit a decreasing percent about 10-12 %
respect to NR. Then, we had reduction of chaotic dynamics of respiration during FR
respect to NR physiological condition. A clear discrimination of such two
conditions was also obtained by calculation of the dominant Lyapunov exponent.
We obtained a mean value of such parameter, XD, XD = 0,028 ± 0,023 in the case of
NR and A.D = 0,009 ± 0,004 in the case of FR in the first interval of experimentation
whit a percent decreasing in the case of FR about 68%. Evident discrimination was
also obtained in the other four intervals, in the second interval it resulted A,D = 0,029
± 0,020 for NR and XD = 0,012 ± 0,004 for FR (decreasing percent about 59%), in
the third interval it resulted ^ D = 0,030 ± 0,022 for NR against XD = 0,008 ± 0,003
for FR (decreasing percent about 73%), in the fourth interval we had ^,D = 0,026 ±
0,022 for NR against XD = 0,009 ± 0,004 for FR (decreasing percent about 65%),
and in the fifth interval XD = 0,022 + 0,020 for NR and A.D = 0,011 ± 0,008 for FR
(decreasing percent about 50%).
In conclusion, we had a great stability of the dominant Lyapunov exponents
calculated along the intervals of experimentation and both in the two cases of NR
and FR, while instead we determined a decreasing percent of values in the case of
FR respect to NR. These results indicated that respiratory systems exhibit chaotic
dynamics. Without any doubt, chaoticity resulted strongly reduced during FR
respect to NR. This results clearly supports our thesis, based on the model of five
oscillators: during forced respiration, we have a reduction of chaoticity of
respiratory system respect to spontaneous respiration, to such reduction it
corresponds instead an increasing of chaoticity of cardiac oscillations in
consequence of a greater non linear input from the respiratory system to heart
dynamics. In other terms, a stronger coupling between the two oscillators is realized
56

and it is resolved in an enhancement of cardiac chaoticity correlated to a


simultaneous reduction of chaoticity of the respiratory oscillatory system.
The final aim was to test for the possible chaotic dynamics of blood pressure. We
analyzed arterial pressure time series data following the same previous
methodology. Time delay resulted to be about 2 msec. Embedding dimension in
phase space resulted to be d = 5. We retain that this is the best our result to confirm
the correctness of the model of the cardiovascular system based on five oscillators.
Blood pressure signal reflects the action of the five oscillators that we considered,
and, in fact, our calculated embedding dimension resulted to be just d = 5.
Calculation of correlation dimension, D2, again gave rather stable values along the
five intervals of experimentation and resulted to vary between 3,661 and 3,924 in
the case of NR, and between 3,433 and 3,910 in the case of FR. Thus, we may
conclude that blood pressure signal indicates to have the behaviour of a very chaotic
deterministic system, as confirmed also from Kolmogorov entropy values. The
calculation of Lyapunov exponents are given in Table 1, and they confirm that
blood pressure is a deterministic hyper-chaotic dissipative system. The values of the
exponents Xu X2, X4, a n d A.5 resulted to be very stable along the five intervals of
experimentation with an evident their similarity also in the two conditions of
experimentation. Considering the model of five oscillators, we may say that we
have constant non linear inputs acting from the oscillators that determine such
constant level of chaoticity. X3 showed instead a very great variability along the five
considered intervals as well as in comparison of NR respect to FR. The variability
of A.3 happened with three characteristic times that resulted to be respectively about
3-4 seconds, about 10 seconds and finally about 20-30 seconds corresponding to
0,3-0,4 Hz, to 0,1-0,2 Hz and to 0,04-0,06 Hz. We concluded that the first
frequency should be due to the action of the respiratory oscillator, and the two other
remaining frequencies should correspond to the action of the myogenic and neural
(baroreceptors) oscillators.
57

Table 1 CHAOS ANALYSIS OF R-R OSCILLATIONS.


LYAPUNOV SPECTRUM Kolmogorov entropy
h h X2 ^2 ^3 ^3 />4 K K K
Five
intervals NR FR NR FR NR FR NR FR NR FR
of data
m.v. 0,271 0,343 0,089 0,119 -0,135 -0,133 -0,548 -0,639 0,360 0,462
s.d. 0,063 0,075 0,043 0,036 0,022 0,020 0,098 0,118 0,105 0,106
m.v. 0,278 0.349 0,092 0,138 -0,124 -0,119 -0,537 -0,606 0,370 0,488
s.d. 0,086 0,127 0,059 0,071 0,026 0,020 0,099 0,121 0,144 0,196
m.v. 0,262 0,346 0,091 0,110 -0,121 -0,140 -0,534 -0,578 0,353 0,456
s.d. 0,092 0,148 0,060 0,054 0,029 0,032 0,090 0,085 0,152 0,198
m.v. 0,276 0,307 0,089 0,094 -0,114 -0,113 -0,522 -0,565 0,365 0,401
s.d. 0,096 0,018 0,064 0,029 0,021 0,011 0,118 0,078 0,160 0,046
m. v. 0,279 0,289 0,090 0,095 -0,121 -0,125 -0,526 -0,572 0,369 0,384

s.d. 0,093 0,074 0,062 0,065 0,023 0,025 0,130 0,135 0,154 0,139

Table 2 CHAOS ANALYSIS OF BLOOD PRESSURE SIGNAL


LYAPUNOV SPECTRUM
h x2 ^3 A,4 ^5 Kolmogorov
Entropy
Five
intervals NR NR NR NR NR NR
of data
m.v. 0,557 0,250 0,032 -0,213 -0,705 0,838
s.d. 0,036 0,019 0,023 0,043 0,033 0,033
m.v. 0,561 0,251 0,012 -0,215 -0,687 0,824
s.d. 0,025 0,004 0,006 0,030 0,025 0,015
m.v. 0,577 0,252 0,040 -0,232 -0,695 0,868
s.d. 0,027 0,011 0,013 0,015 0,008 0,051
m. v. 0,553 0,259 0,018 -0,220 -0,704 0,829
s.d. 0,016 0,006 0,009 0,009 0,006 0,018
m. v. 0,570 0,246 0,012 -0,249 -0.706 0,827
s.d. 0,011 0,002 0,006 0,005 0,018 0,018
58

Fig.l

la-ii-OI DV WON'I KESP

Li 1 !

i « V-

•v.' \ X X\f V\J \ \ \ \ M V \ r \ N v \ V \ \ W M V- >


H*-»

Acknowledgements

Authors wish to thank Ms Anna Maria Papagni for her technical assistance.

References

1. Akselrod S, Gordon D, Ubel FA, Shannon DS, Borger AC, Choen RJ. Science
1981;213:220-225
2. Babloyantz A, Destexhe A. Is the normal heart a periodic oscillator. Biological
Cybernetics 1988; 58: 203-211
3. Badii R, Politi A. Dimensions and Entropies in Chaotic Systems. Springer
Berlin, 1986
59

4. Guevara MR, Glass L, Shrief A. Phase-locking, period-doubling bifurcations,


and irregular dynamics in periodically stimulated cardiac cells. Science 1981;
214: 1350-1353
5. Kitney RI, Fulton T, Mc Donald AH, Likens DA. J. Biomed. Eng. 1985; 7:
217-225
6. Kitney RS, Rompelman O. The study of heart rate variability. Oxford
University Press, 1980
7. Madwed JB, Albrecht P, Mark RG, Cohen RJ. Low-frequency oscillations in
arterial pressure and heart rate: a simple computer model. Am. J. Phys. 1989;
256: H1573-H1579
8. Schreiber T. Interdisciplinary application of non linear time series methods.
Physics Report 1999; 308: 1-64
9. Streen MD. Nature 1975; 254: 56-58
60

COMPUTER ANALYSIS OF ACOUSTIC RESPIRATORY SIGNALS

A. VENA, G.M. INSOLERA, R. GIULIANI**', T. FIORE


Department of Emergency and Transplantation, Bari University,
Policlinico Hospital, Piazza Giulio Cesare, 11, 70124 Bari- Italy. (*)Center of Innovative
Technologies for Signal Detection and Processing, Bari, Italy.
e-mail: antonvenaCalvahoo.com

G. PERCHIAZZI
Department of Clinical Physiology,
Uppsala University Hospital, S-75185 Uppsala, Sweden

Evaluation of breath sounds is a basic step of patient physical examination. The auscultation
of the respiratory system gives direct information about the structure and function of lung
tissue that is not to be achieved with any other simple and non-invasive method. Recently, the
application of computer technology and new mathematical techniques has supplied alternative
methodologies to respiratory sounds analysis. We present a new computerized approach to
analyzing respiratory sounds.

1 Introduction

Acoustic respiratory signals have been the subject of considerable research over
the last years, however their origin is still not completely known. It is now generally
accepted that, during respiration, turbolent motion of a compressive fluid in larger
airways with rugous walls (trachea ad bronchi) generates acoustic energy [5]. This
energy is transmitted through the airways and lung parenchima to the chest wall that
represent a non-stationary system. [1,4]. Pulmonary diseases induce anatomical and
functional alterations in the respiratory system; changes in the quality of lung
sounds (loudness, length and frequency) are often directly correlated to pathological
changes in the lung.
The traditional method of auscultation is based on a stethoscope and human
auditory system; however, due to the poor response of the human auditory system to
lung sounds (low frequency and low sign-to-noise ratio) and the subjective
character of the technique, it is common to find different clinical description of the
respiratory sounds.
Lung-sounds nomenclature has long been unclear: until last decades were used
the names derived from originals given by Laennec [10] and translated into english
by Forbes [2]. In 1985, the International Lung Sounds Association (I.L.S.A.) has
composed an international standard classification of lung sounds that include fine
and coarse crackles, wheezes and rhonchi: each of these terms can be described
acoustically [13].
61

The application of computer technology and recent advancements in signal


processing have provided new insights into acoustic mechanisms and supplied new
measurements of clinical importance from respiratory sounds.
Aim of this study is to develop a system for acquisition and elaboration of
respiratory acoustic signals: this would provide an effective, non-invasive and
objective support for diagnosis and monitoring of respiratory disorders.

2 Respiratory Sounds

Lung sounds in general are classified into three major categories: "normal"
(vesicular, bronchial and bronchovesicular breath sounds), "abnormal" and
"adventitious" lung sounds.
Vesicular breath sounds consist of a quiet and soft inspiratory phase followed
by a short, almost silent expiratory phase. They are low pitched and normally heard
over most lung fields of a healthy subject. These sounds are not generated by gas
flow moving through the alveoli (vesicles) but are the result of attenuation of breath
sound produced in the larger bronchi. Bronchial breath sounds are normally heard
over the trachea and reflect turbulent airflow in the main-stem bronchi. They are
loud, high-pitched and the expiratory phase is generally longer than the inspiratory
phase with a tipical pause between the phases. Bronchial sounds heard over the
thorax suggest lung consolidation and pulmonary disease. Bronchovesicular breath
sounds are normally heard on both sides of the sternum in the first and second
intercostal spaces. They should be quieter than the bronchial breath sounds and
increased intensity of these sounds is often associated with increased ventilation.
Abnormal lung sounds include the decrease/absence of normal lung sounds or
their presence in areas where they are normally not heard (bronchial breath sounds
in peripheral areas where only vesicular sounds should be heard). This is
characteristic of parenchima consolidation (pneumonia) that transmit sound from
the lung bronchi much more efficiently than through the air-filled alveoli of the
normal lung.
The term "adventitious" (adventitious lung sounds) refers to extra or additional
sounds that are heard over normal lung sounds and their presence always indicates a
pulmonary disease. These sounds are classified into discontinuous (crackles) or
continuous (wheezes) adventitious sounds. Crackles are discontinuous, intermittent
and nonmusical noises that may be classified as "fine" (high pitched, low amplitude,
and short in duration) and "coarse" (low pitched, higher in amplitude, and long in
duration). Crackles are generated by fluid in the small airways or by sudden
opening of closed airways. Their presence is often associated with inflammation or
infection of the small bronchi, bronchioles, and alveoli, with pulmonary fibrosis,
with heart failure and many other cardiorespiratory disorders. Wheezes are
continuous (since their duration is much longer than that of crackles), lower-pitched
and musical breath sounds, which are superimposed on the normal lung sounds.
They originate by air moving through small airways narrowed by constriction or
62

swelling of airway or partial airway obstruction. They are often heard (during
expiration, or during both inspiration and expiration) in patients with asthma or
other obstructive diseases. Other respiratory sounds are: rhonchi (continuous sounds
that indicate partial obstruction by thick mucous in the bronchial lumen, oedema,
spasm or a local lesion of the bronchial wall); stridor (high-pitched harsh sound
heard during inspiration and caused by obstruction of the upper airway); snoring
(acoustical signals produced by a constriction in the upper airway, usually during
sleep) and pleural rubs (low-pitched sounds that occur when inflamed pleural
surfaces rub together during respiration).

3 Review of literature

Many studies focused on the acoustic properties of normal lung sounds in


healthy subjects [6,7] and their changes with airflow [8].
In 1996 Pasterkamp et al., by using the Fast Fourier Transform (FFT), analysed
and described the lung sound spectra in normal infants, children, and adults [14]. At
the end of the 1980's, normal and pathological lung sounds were displaied and
studied in the time and frequency domain [15].
Various works investigated the characteristics of crackles due to asthma,
chronic obstructive pulmonary diseases (COPD), heart failure, pulmonary fibrosis,
and pneumonia [16,12,9].
In 1992 Pasterkamp and Sanchez indicated the significance of tracheal sounds
analysis in upper airway obstructions [17]. Malmberg et al. analysed changes in
frequency spectra of breath sounds during histamine challange test in adult
asthmatics subjects [11].
In the last years, the application of Wavelet Transform has demonstrated the
possibility of elaborating properly non-stationary signals (such as crackles); by
comparing the ability of Fourier and Wavelet based techniques to resolve both
discrete and continuous sounds, many studies concluded that the wavelet-based
methods had the potential to effectively process and display both continuous and
discrete lung sounds [3].

4 Signal acquisition and processing methods

Lung sounds transmitted through the respiratory system can be acquired by an


equipment able to convert the acoustic energy to electrical signal. Then, the next
elaboration phase, by using of specific mathematical transformations, returns a
sequence of data that allows to study the features of each signal.
In this study, respiratory sounds were picked up over the chest wall of normal
and abnormal subjects by an electronic stethoscope (Electromag Stethoscope ES-
120, Japan).
63

The sensor was placed over the bronchial regions of the anterior chest (second
intercostal space on the mid clavicular line), the vesicular regions of the posterior
chest (apex and base of lung fields, bilaterally) and the trachea at the lower part of
the neck, 1-2 cm to the right of the medline.
Sounds were amplified, low-pass filtered and recorded in digital format (Sony
Minidisc MZ-37, Japan) using a rate of sampling of 44.1 Khz. and 16 bit
quantization. The signal was transferred to a computer (Intel Pentium 500 Mhz.
Intel Corp., Santa Clara, CA, USA) and then analyzed by a specific Fourier
Transform based spectral analysis software (CoolEdit pro 1.0, Syntrillium Software
Corp., Phoenix, USA).
Because of the clinical necessity to correlate the acoustic phenomenon to the
phases of the human respiration, a method of analysis dedicated to the
time/frequency plane was applied: the STFT (Short Time Fourier Transform). It
provided "spectrograms" related to different respiratory acoustic patterns that,
according to the intensity and frequency changes in the time domain, were analyzed
offline. The spectrogram shows, in a three-dimensional coordinate system, the
acoustic energy of a signal versus time and frequency.
We studied normal breath sounds (vesicular and tracheal) from healthy subjects
without pulmonary diseases and adventitious lung sounds (crackles and wheezes)
from patients with pneumonia and COPD (chronic obstructive pulmonary disease),
spontaneously breathing. The signals were examinated for artifacts (generally
emanating from defective contact between sensor and chest wall or background
noise) and contaminated segments were excluded from further analysis.

5 Results

Normal breath sounds (vesicular and tracheal) showed typical spectra with a
frequency content extending up to 700 Hz {vesicular sounds) and 1600 Hz {tracheal
sound). Generally, at a frequency below 75-100 Hz there are artefacts from the heart
and the muscle sounds. Inspiratory amplitude was higher than expiratory amplitude
for vesicular sounds and lower than expiratory amplitude for tracheal sounds (fig. 1,
fig. 2).

Fig. 1 Vesicular sound Fig. 2 Tracheal sound


64

Discontinuous adventitious sounds (crackles) appeared as non-stationary


explosive end-inspiratory noise with a frequency content extending beyond 1000
Hz; their duration was less than 200 msec (fig. 3). Continuous adventitious sounds
(wheezes) appeared as expiratory spectral densities, harmonically related, at 300
Hz, 600 Hz and 1200 Hz; their duration was longer than 200 msec. (fig. 4).

Fig. 3 Crackles Fig. 4 Wheezes

6 Conclusions

In this study significant changes in averaged frequency spectra of breath sounds


were demonstrated passing from healthy to sick lungs. Moreover, this processing
method was able to classify abnormal patterns into different pathology-related
subgroups.
Implementation of this technology on a breath-to-breath basis, will provide an
useful tool for continuous bed-side monitoring by a computerized auscultation
device which can record, process and display the respiratory sound signals with
sophisticated visualization techniques.
Future perspectives for respiratory sound reasearch include the building of
miniaturized systems for non-invasive and real-time monitoring; the application of
multimicrophone analysis to evaluate regional distribution of ventilation;
respiratory sounds databases; remote diagnosis systems; automatic recognition
systems of acoustic respiratory patterns by artificial neural networks.
65

7 References

1. Cohen A. Signal Processing Methods for Upper Airways and Pulmonary


Dysfunction Diagnosis. IEEE Engineering In Medicine And Biology
Magazine, (1990).
2. Forbes J. A Treatise of the Diseases of the Chest, 1st ed. Underwood.
London, (1821).
3. Forren J.F., Gibian G. Analysis of Lung Sounds Using Wavelet
Decomposition, (1999).
4. Fredberg J.J. Acoustic determination of respiratory system properties. Ann.
Biomed. Eng. 9 (1981) pp. 463-473.
5. Gavriely N. Breath Sounds Methodology. Boca Raton, FL: CRC Press, Inc.,
(1995).
6. Gavriely N., Nissan M., Rubin A.E., Cugell D.W. Spectral characteristics of
chest wall breath sounds in normal subjects. Thorax 50 (1995) pp. 1292-
1300.
7. Gavriely N., Herzberg M. Parametric representation of normal breath sounds.
J. Appl. Physiol. 73(5) (1992) pp. 1776-1784.
8. Gavriely N., Cugell D.W. Airflow effects on amplitude and spectral content
of normal breath sounds. J. Appl. Physiol. 80(1) (1996) pp. 5-13.
9. Kaisla T., Sovijarvi A., Piirla P., Rajala H.M., Haltsonen S., Rosqvist T.
Validated Methods for Automatic Detection of Lung Sound Crackles.
Medical and Biological Engineering and Computing 29 (1991) pp. 517-521.
10. Laennec R.T.H. De I'auscultation mediate ou traite du diagnostic de maladies
des poumons et du coeur, fonde principalement sur ce nouveau moyen
d'exploration. Brosson et Chaude. Paris, (1819).
11. Malmberg L.P., Sovijarvi A.R.A., Paajanen E., Piirila P., Haahtela T., Katila
T. Changes in Frequency Spectra of Breath Sounds During Histamine
Challange Test in Adult Asthmatics and Healthy Control Subjects. Chest
105 (1994) pp. 122-132.
12. Munakata M., Ukita H., Doi I., Ohtsuka Y., Masaki Y., Homma Y.,
Kawakami Y. Spectral and Waveform Characteristics of Fine and Coarse
Crackles. Thorax 46 (1991) pp. 651-657.
13. Pasterkamp H., Kraman S.S., Wodicka G.R. Respiratory Sounds. Advances
Beyond the Stethoscope. Am J Respir Crit Care Med 156 (1997) pp. 974-
987.
14. Pasterkamp H., Powell R.E., Sanchez I. Lung Sound Spectra at Standardized
Air Flow in Normal Infants, Children, and Adults. Am J Respir Crit Care
Med 154 (1996) pp. 424-430.
15. Pasterkamp H., Carson C , Daien D., Oh Y. Digital Respirasonography. New
Images of Lung Sounds. Chests (1989) pp. 1505-1512.
66

16. Piirila P., Sovijarvi A., Kaisla T., Rajala H.M., Katila T. Crackles in Patients
with Fibrosing Alveolitis, Bronchiectasis, COPD, and Heart Failure. Chest
99(5) (1991) pp. 1076-1083.
17. Pasterkamp H., Sanchez I. Tracheal Sounds in Upper Airway Obstruction.
Chest 102 (1992) pp. 963-965.
67

T H E I M M U N E SYSTEM: B CELL B I N D I N G TO
MULTIVALENT A N T I G E N

Gyan Bhanot

IBM Research, Yorktown Hts., NY 10598, USA


E-mail: gya.n@watson.ibm.corn

This is a description of work done in collaboration with Yoram Louzoun and Mar-
tin Weigert at Princeton University. Experiments in the late 80's by Dintzis etal
revealed puzzling aspects of the activation of B-Cells as a function of the valence
(number of binding sites) and concentration of presented antigen. Through com-
puter modeling, we are able to explain these puzzles if we make an additional
(novel) hypothesis about the rate of endocytosis of B-Cell receptors. The first
puzzling result we can explain is why there is no activation for low valence (less
than 10-20). The second is why the activation is limited to a small narrow range of
antigen concentration. We performed a computer experiment to model the B-Cell
surface with embedded receptors diffusing in the surface lipid layer. We presented
these surface receptors with antigen with varying concentration and valence. Using
experimentally reasonable values for the binding and unbinding probabilities for
the binding sites on the antigens, we simulated the dynamics of the binding pro-
cess. Using the single hypothesis that the rate of endocytosis of bound receptors
is significantly higher than that of unbound receptors, and that this rate varies
inversely as the square of the mass of the bound, connected receptor complex, we
are able to reproduce all the qualitative features of the Dintzis experiment and
resolve both the puzzles mentioned above. We were also able to generate some
testable predictions on how chimeric B-Cells might be non-immunogenic.

1 Introduction

This paper is a description of work done in collaboration with Yoram Louzoun


and Martin Weigert at Princeton University*. I begin with a brief introduction
to the human immune system and the role of B and T Cells in it 2 . Next, I
describe the B-Cells receptor/antibody and how errors in the coding for the
light chains on these receptors can result in chimerical B-Cells with different
light chains on the same receptor or different types of receptors on the same cell.
After this, I describe the Dintzis experiments 3 ' 4 , 5 and the efforts to explain
these experimental results using the concept of an Immunon 6 - 7 . There is also
analytic work by Perelson 8 using rate equations to model the binding and
activation process. This is followed by a description of our computer modeling
experiment, its results and conclusions 1 .
68

2 Brief Description of H u m a n Immune System

The human immune system 2 , on encountering pathogen, has two distinct but
related responses. There is an immediate response, called the Innate Response
and there is also a slower, dynamic response, called the Adaptive Response.
The Innate Response, created over aeons by the slow evolutionary process, is
the first line of defense against bacterial infections, chemicals and parasites. It
comes into effect immediately and acts mostly by phagocytosis (engulfment).
The Adaptive Response is evolving even within an individual, is slower in its
action (with a latency of 4-7 days) but is much more versatile. This Adative
Response is created by a complex process involving cells called lymphocytes.
A single microliter of fluid in the body contains about 2500 lymphocytes.
All cellular components of the Immune System arise in the bone marrow
from hematopoietic stem-cells, which differentiate to produce the other more
specialized cells of the immune system. Lymphocytes derive from a lymphoid
progenitor cell and differentiate into two cell types called the B-Cell and the T-
Cell. These are distinguished by their site of differentiation, the B-Cells in the
bone marrow and the T-Cell in the thymus. B and T Cells both have receptors
on their surface that can bind to antigen (pieces of chemical, peptides, etc.) An
important difference between B and T Cell receptors is that B-Cell receptors
are bivalent (have two binding areas) while T-Cell receptors are monovalent
(with a single binding area). In the bone marrow, B-Cells are presented with
self antigen, eg. pieces of the body's own molecules. Those B-Cells that react
to such self antigen are killed. Those that do not are released into the blood and
lymphatic systems.T-Cells on the other hand are presented with self antigen
in the thymus and are likewise killed if they react to it.
Cells of the body present on their surface pieces of protein from inside the
cell in special structures called the MHC (Major Histocompatibility Complex)
molecules. MHC molecules are distinct between individuals and each individ-
ual carries several different alleles of MHC molecules. T-Cells are selected in
the thymus to bind to some MHC of self but not to any self peptides that are
presented on these MHC molecules. Thus, only T-Cells that might bind to
foreign peptides presented on self MHC molecules are released from the thy-
mus. There are two types of T-Cells, distinguished by their surface proteins.
They are called CD8 T-Cells (also called killer T-Cells) and CD4 T-Cells (also
called helper T-Cells).
When a virus infects a cell, it uses the cell's DNA/RNA machinery to repli-
cate itself. However, while this is going on, the cell will present on its surface
pieces of viral protein on MHC molecules. CD8 T-Cells in the surrounding
medium are programmed to bind strongly to such MHC molecules presenting
69

non-self peptides. After they bind to the MHC molecule, they send a signal to
the cell to commit suicide (apoptose) and then unbind from the infected cell.
Also, once activated in this way, the CD8 T-Cell will replicate aggressively and
seek out other infected cells to send them the suicide signal. The CD4 T-Cells
on the other hand, recognize viral peptides on B-cells and macrophages (spe-
cialized cells which phagocytose or engulf pathogens, digest them and present
their peptide pieces on MHC molecules). The role of the CD4 T-Cell, when
it binds in this way, is to signal the B-Cell and macrophages to activate and
proliferate.
B-Cell that are non-reactive to self antigens in the bone marrow are re-
leased into the blood and secondary lymphoid tissue. They have a life time
of about three days unless they successfully enter lymphoid follicles, germinal
centers or the spleen and get activated by binding to antigen presented to them
there. Those that have the correct antibody receptors to bind strongly to viral
peptide (antigen), will become activated and will start to divide, thereby pro-
ducing multiple copies of themselves with their specific high affinity receptors.
This process is called 'clonal selection' as the clone which is fittest (binds most
strongly to presented antigen) is selected to multiply. The B-Cells that bind
to antigen will also endocytose their own receptors with bound antigen and
present it on their surface on MHC-II molecules for an activation signal from
CD4 T-Cells. Once a clone is selected, the B-Cells also mutate and proliferate
to produce variations of receptors to achieve an even better binding specificity
to the presented antigen. B-Cells whose mutation results in improved bind-
ing will receive a stronger activation signal from the CD4 T-Cells and will
out-compete the rest. This process is called 'affinity maturation'. Once the
optimum binding specificity B-cells are produced, they are released from the
germinal centers. Some of these differentiate into plasma cells which release
large numbers of antibodies (receptors) with high binding affinity for the anti-
gen. These antibodies mark the virus for elimination by macrophages. Some
B-Cells go into a latent phase (become memory B-Cells) from which they may
be activated if the infection recurs.
It is clear from the above discussion that there are two competing pressures
in play when antigen binds to B-Cells. One pressure is to maximize the number
of surface bound receptors, until a critical threshold is reached when the B-
Cell is activated and will proliferate. The other pressure is to endocytosis the
receptor-antigen complex followed by presentation of the antigen peptide on
MHC-II molecules, binding to CD4 T-Cells and an activation signal from that
binding. To function optimally, the immune system must carefully balance
these two processes of binding and endocytosis.
70

Unbound receptors on the surface of B-Cells are endocytosed at the rate


of about one receptor every half hour. However, the binding and activation of
B-Cells happens in a time scale of a few seconds to a minute (for references
to many of the details of the numerical values used in this paper, refer to the
references in 1 ) . If endocytosis is to compete with activation, as it must for
the process described above to work, then bound receptors must be endocy-
tosed much more frequently than once every half hour. Since there is no data
available on the exact rate of endocytosis for bound receptors, we made the
assumption in our simulation that the probability of endocytosis of a single
B-Cell receptor bound to antigen is of the same order of magnitude as the
probability of binding of antigen to the receptor. There is a strong probability
that multiple receptors are linked by bound antigen before they are endocy-
tosed. We make the reasonable assumption that the probability of endocytosis
of the receptor-antigen cluster is inversely proportional to the square of the
mass of the cluster.
Let us now discuss, in a very simplified way, the structure of the B-Cell
receptor/antibody. The B-Cell receptor is a Y shaped molecule consisting of
three equal sized segments, connected by disulfide bonds. The antigen binding
sites are at the tip of the arms of the Y. These binding sites are made up
of two strands (heavy and light) each composed of two regions, one which is
constant and another which is highly variable, called the constant and variable
regions respectively. The process that forms the antibody first creates a single
combination of the heavy and light chains (H,L) sections and then combines
two such (H,L) sections by disulfide bonds to create the Y shaped antibody.
In diploid species, such as humans, whose DNA strands come from different
individuals, there are four ways to make the (H,L) combinations using genes
from either of the parent DNA strands. Thus if the parent types make HI,
LI, and H2, L2 respectively, in principle, it would be possible to make four
combinations: (H1,L1), (H2,L2), (H1,L2) and (H2,L1). The classical dogma
in immunology is allelic exclusion, which asserts that, in a given B-Cell, when
two strands of (H,L) fuse to form a receptor, only the same (H,L) combination
is always selected. This will ensure that for a given B-Cell, all the receptors
are identical. However, sometimes this process does not work and B-Cells are
found with both types of light chains in receptors on the same cell 9 .
It turns out that there are two distinct types of light chains called K and
A.Normally in humans the ratio of B-Cells with n or A chains is 2:1 with
each cell presenting either a KK or a AA light chain combination.However, as
mentioned above, sometimes allelic exclusion does not work perfectly and B-
Cells present KX receptors or the same cell presents receptors of mixed type,a
combination of some which are KK, some which are AA and some which are KA.
71

A given antigen will bind either to the A or the re chain, or to neither, but
not to both. Thus areAB-Cell receptor is effectively monovalent. Furthermore,
a B-Cell with mixedrereand AA receptors would effectively have fewer receptors
available for a given antigen.
It is possible to experimentally enhance the probability of such genetic
errors and study the immunogenicity of the resulting B-Cells. This has been
done in mice. The surprising result from such experiments is that chimerical
B-Cells are non-immunogenic 9 . We shall attempt to explain how this may
come about as a result of our assumption about endocytosis.

3 The Dintzis Experimental Results and the Immunon Theory

Dintzis etal 3 ' 4 , 5 did an in-vivo (mouse) experiment using five different flu-
oresceinated polymers as antigen (Ag). The results of the experiment were
startling. It was found that to be immunogenic, the Ag mass had to be in a
range of 105 —106 Daltons (1 Dalton = 1 Atomic Mass Unit) and have a valence
(number of effective binding sites) greater than 10-20. Antigen with mass or
valence outside this range elicited no immune response for any concentration.
Within this range of mass and valence, the response was limited to a finite
range of antigen concentration.
A model based on the concept of an Immunon was proposed to explain
the results 6 ' 7 . The hypothesis was that the B-Cell response is quantized, ie.
to trigger an immune response, it is necessary that a minimum number of
receptors be connected in a cluster cross linked by binding to antigen. This
linked cluster of receptors was called an Immunon and the model came to be
called the 'Immunon Model'. However, a problem immediately presents itself:
Why are low valence antigens non immunogenic? Why can one not form large
clusters of receptors using small valence antigen? The Immunon model had no
answer for this question.
Subsequently, Perelson et al 8 developed mathematical models (rate equa-
tions) to study the antigen-receptor binding process. Assuming that B-Cell
response is quantized, they were able to show that at low concentration, be-
cause of antigen depletion (too many receptors, too little antigen), an Immunon
would not form. However, the rate equations made the flaws in the Immunon
model apparant. They were not able to explain why large valence antigen were
necessary for an immune response nor why even such antigen was tolerogenic
(non-immunogenic) at high concentration.
72

4 Modeling the B-Cell Receptor Binding to Antigen:


Our Computer Experiment

The activation of a B-cell is the result of local surface processes leading to


a cascade of events that result in release of antibody and/or presentation of
antigen. The local surface processes are binding, endocytosis and receptor
diffusion. Each of these is governed by its own time and length scales, some of
which are experimentally known.
To model B-cell surface dynamics properly, the size of the modeled surface
must be significantly larger than the largest dynamic length scale we wish
to model and the time steps used must be smaller than the smallest dynamic
time scale. Further, the size of the smallest length scale on the modeled surface
must be smaller than the smallest length scale in the dynamics. The size of a
B-cell receptor is 3 nm and this is the smallest surface feature we will model.
The size of a typical antigen in our simulation is 5-40 nm. The diffusion rate
of receptors is of the order of D = 10x1 - 5~ 10 cm 2 /s and the time scale for
activation of a cell is of the order of a few seconds to a few tens of seconds
(r ~ 100s). Hence the linear size of the surface necessary in our modeling is
L > \TDT ~ 1 — 1\im. This is the maximum distance that a receptor will
diffuse in a time of about 100s.
We choose a single lattice spacing to represent a receptor. The linear size of
our surface was chosen to be 1000 lattice units which approximately represents
a physical length of 3 — 4/xm. The affinity of receptor-hapten binding is lO 5 ]^^ 1
for a monovalent receptor. The affinity of a bivalent receptor depends on the
valence of the antigen and on the distribution of haptens on the antigen. The
weight of a single hapten is a few hundred Daltons. Hence, the ratio of the
on-rate to the off-rate of a single receptor hapten pair is ~ 100 - 1000. We
choose an on-rate of 0.2 and an off rate of 0.001 in dimensionless units. Our
unit of time was set to 0.05 milli second. This was done by choosing D in
dimensionless units to be 0.1 which means that the effective diffusion rate
is (0.1x(3nm)7(0.05ms) ~ 2.0xl0- 1 0 cm 2 /s. The affinity of chimeric B-Cell
receptors was set lower because they bind to DNA with a lower affinity. For
them, we used an on-rate of 0.1 and an off-rate of 0.01.
The cell surface was chosen to have periodic boundary conditions as this
simplifies the geometry of the modeling considerably. The size of our cell
surface is equivalent to 20% of a real non-activated B-cell.A B-cell typically
has 50000 receptors on its surface.Hence, we modeled with 10000 receptors
initially placed on random sites of the lattice. In each time step, every receptor
was updated by moving it to a neighboring site (if the site was empty) with a
probability D = 0.1. Receptors that are bound to antigen were not allowed to
73

move. At every time step, receptors which have free binding sites can bind to
other haptens on the antigen or to any other antigen already present on the
surface. They can also unbind from hapten to which they are bound. Once an
antigen unbinds from all receptors, it is released within 5 time steps on average.
Once every 20 time steps, the receptors were presented with new antigen at
a constant rate which was a measure of the total antigen concentration. We
varied this concentration rate in our modeling.
The normal rate of endocytosis of unbound receptors is once every 1/2
hour. If this is the rate of endocytosis for bound receptors also, it will be
too small to play a role in antigen presentation. Thus we must assume that
a bound receptor has an higher probability of being endocytosed compared to
an unbound receptor. A receptor can bind to two haptens and every antigen
can bind to multiple receptors. This cross linking leads to the creation of large
complexes. We assume that the probability to enodcytose a receptor-antigen
complex is inversly proportional to the square of its mass. The mass of the B
cell receptor is much higher than the mass of the antigens, so, when computing
the mass of the complex we can ignore the mass of the antigen. We thus set the
endocytosis rate only as a function of the number of bound receptors. The rate
of endocytosis for the entire complex was chosen to be inversly proportional to
the square of the number of receptors in the complex. More specifically, we set
the probability to endocytose an aggregate of receptors to be 0.0005 divided
by the square of the number of receptors in the aggregate. For chimeric B-cells
we reduced the numerator in this probability by a factor of 100.

5 Results

The results of our computer study are shown in figure (1), where the solid
line shows the number of bound receptors after 10 seconds of simulation as a
function of antigen valence. The data are average values over several simula-
tions with different initial positions for receptors and random number seeds.
The dashed line shows the number of endocytosed receptors. One observes a
clear threshold below which the number of bound surface receptors stays close
to zero followed by a region where the number of bound receptors increases
and flattens out. This establishes that we can explain the threshold in antigen
valence in the Dintzis experiment.
The reason for the threshold is easy to understand qualitatively. Once
an antigen binds to a receptor, the probability that its other haptens bind to
the other arm of the same receptor or to one of the other receptors present in
the vicinity is an exponentially increasing function of the number of haptens.
Also, once an antigen is multiply bound in a complex, the probability of all
74

the haptens unbinding is an exponentially decreasing function of the number


of bound haptens. Given that receptors once bound may be endocytosed, low
valence antigen bound once will most likely be endocytosed or will unbind be-
fore it can bind more than once (ie. before it has a chance to form an aggregate
and lower its probability to endocytose.) As the valence increases, the unbind-
ing probability decreases and the multiple binding probability increases until
it overcomes the endocytosis rate. Finally, for high valence, one will reach a
steady state between the number of receptors being bound and the number
endocytosed in a given unit of time.
In figures (2) and (3), we show the number of bound receptors (solid line)
and endocytosed receptors (dashed line) as a function of the antigen concen-
tration for two different values of valence. Figure (2) has data for high valence
(20) and figure (3) for low valence (5). It is clear that for high valence, there
is a threshold in concentration below which there are no bound receptors (no
immune response) followed by a range of concentration where the number of
bound receptors increases followed by a region where it decreases again. The
threshold at low concentration is easy to understand. It is caused by antigen
depletion (all antigen that binds is quickly endocytosed). The depletion at high
concentration comes about because of too much endocytosis, which depletes
the pool of available receptors. For low valence (figure (3)), there is no range
of concentrations where any surface receptors are present. The reason is that
the valence is too low to form aggregates and lower the rate of endocytosis and
is also too low to prevent unbinding events from happening fast enough. Thus
all bound receptors get quickly endocytosed. The high rate of endocytosis for
high concentration probably leads to tolerance, as the cell will not survive such
a high number of holes on its surface.
Figures (1), (2) and (3) are the major results of our modeling. They clearly
show that the single, simple assumption of an increased rate of endocytosis for
bound receptors and reasonable assumptions about the way this rate depends
upon the mass of the aggregated receptors is able to explain both the low
concentration threshold for immune response as well as the high concentration
threshold for tolerogenic behavior in the Dintzis experiment. It can also explain
the dependence of activation on valence, with a valence dependent threshold
(or alternately, a mass dependent threshold) for activation.
Now consider the case of chimeric B-Cells. It turns out that these cells
bind to low affinity DNA but are not activated 9 . DNA has a high valence and
is in high concentration when chimeric B-Cells are exposed to it. To model
the interaction of these B-Cells, we therefore used a valence of 20 and lowered
the binding rate. We considered two cases:
75

Case 1: The B-cell has KK and AA receptors in equal proportion. This


effectively halves the number of receptors since antigen will bind either to the
KK receptor or the AA receptor but not to both. Figure (4) shows the results
of our modeling for this case. Note that the total number of bound receptors
is very low. This is due to the low affinity. However, the endocytosis rate
is high, since receptors once bound will be endocytosed before they can bind
again and lower their probability of endocytosis. Thus in this case we would
expect tolerogenic behavior because of the low number of bound receptors.
Case 2: The K and A are on the same receptor. This means that the
receptor is effectively monovalent since antigen that binds to one of the light
chains will not, in general, bind to the other. In a normal bivalent receptor,
the existance of two binding sites creates an entropy effect where it becomes
likely that if one of the sites binds, the other binds as well. The single binding
site on the K\ receptors means that antigen binds and unbinds, much like the
case of the T-Cell. Thus, although the number of bound receptors at any
given time reaches a stady state, the endocytosis rate is low, since receptors do
not stay bound long enough to be endocytosed. Figure (5) shows the results
of the modeling which are in agreement with this qualitative picture. The
non-immunogenicity of K\ cells would come about because of the low rate of
endocytosis and consequent lack of T-Cell help.
Our modeling thus shows that for chimeric receptors, non-immunogenicity
would arise from subtle dynamical effects which alter the rates of binding and
endocytosis so that either activation or T-Cell help would be compromised.
These predictions could be tested experimentally.
76

Number of bound and endocitosed receptors as a function of valence

•«# * u
£ 300

5 IOO -
n
E

Antigen valence

Figure 1: Dependence of the number of bound receptors and number of endocytosed receptors
on Antigen Valence for medium levels of concentration after 10 seconds.

High valence antigen


9000
'
8000
o
Q.
R 7000 i
£ t
"c i
noq

6000 i
75
*$
d and

5000 i
t
&
o
- • « 4000
*
8 Valence=20 t
t
?0) t
o 3000
$
* / ^ N
1 2000
: / \
I 1000

...^>^
Antigen concentration

Figure 2: Dependence of the number of bound receptors and number of endocytosed receptors
on antigen concentration for high valence antigens after 10 seconds of simulation.
77

Low valence antigen

Antigen concentration

Figure 3: Dependence of the number of bound receptors and number of endocytosed receptors
on antigen concentration for low valence antigens after 10 seconds of simulation.

Low affinity K * V cells, with bivalent receptors


350
1 i i i i i i i i

5000 nc, 5000 U •"


ors

Q.
0)
S 250
•o
(U
o «*
r,#*
s.
iendoc

-
f 150
s~
uno

n
*~
rot

t
S ,''
f
IM %*
t

y"
n 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Time [msj

Figure 4: The number of bound and endocytosed receptors for a cell with 50% KK and 50%
AA receptors. These cells would be non-immunogenic because of low levels of activation from
the low binding.
78

Number of bound and endocytosed monovalent receptors


1401 1 1 1 1 1 1 1 i

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Time [ms]

Figure 5: The number of bound and endocytosed receptors for a cell with only KX receptors.
These cells would be non-immunogenic because of low levls of endocytosis and consequent
lack of T-Cell help.

References
1. Y. Louzoun, M. Weigert and G. Bhanot, "A New Paradigm for B
Cell Activation and Tolerance", Princeton University, Molecular Biology
Preprint, June 2001.
2. C. A. Janeway, P. Travers, M. Walport and J. D. Capra, "Immunobiology
- The Immune System in Health and Disease", Elsevier Science London
and Garland Publishing New York, 1999.
3. R. Z. Dintzis, M. Okajima, M. H. Middleton, G. Greene, H. M. Dintzis,
"The Immunogenicity of Soluble Haptenated Polymers is determined by
Molecular Mass and Hapten Valence", J. Immunol. 143:4, Aug. 15, 1989.
4. J. W. Reim, D. E. Symer, D. C. Watson, R. Z. Dintzis, H. M. Dintzis,
"Low Molecular Weight Antigen Arrays Delete High Affinity Memory B
cells Without Affecting Specific T-cell Help", Mol. Immunol., 33:17-18,
Dec. 1996.
5. R. Z. Dintzis, M. H. Middleton and H. M. Dintzis, "Studies on the
Immunogenicity and Tolerogenicity of T-independent Antigens", J. Im-
munol., 131, 1983.
6. B. Vogelstein, R. Z. Dintzis, H. M. Dintzis, "Specific Cellular Stimulation
in the Primary Immune Response: a Quantized Model", PNAS 79:2, Jan.
79

1982.
7. H. M. Dintzis, R. Z. Dintzis and B. Vogelstein, "Molecular Determinants
of Immunogenicity, the Immunon Model of Immune Response", PNAS
73, 1976.
8. B. Sulzer, A. S. Perelson, "Equilibrium Binding of Multivalent Ligands
to Cells: Effects of Cell and Receptor Density", Math. Biosci. 135:2,
July, 1996; ibid. "Immunons Revisited: Binding of Multivalent Antigens
to B Cells", Mol. Immunol. 34:1, Jan. 1997.
9. Y. Li, H. Li and M. Weigert, "Autoreactive B Cells in the Marginal Zone
that Express Dual Receptors", Princeton University Molecular Biology
Preprint, June 2001.
80

S T O C H A S T I C MODELS OF I M M U N E S Y S T E M A G I N G

L. MARIANI, G. TURCHETTI
Department of Physics, Via Irnerio 46, 40126 Bologna, Italy
Centro Interdipartimentale L. Galvani, Universita di Bologna, Bologna, Italy
E-mail: turchetti@bo.infn.it frida@economia.unibo.it

F. LUCIANI
Max Planck Institute for Compex Systems, Noetnitzer 38, Dresden, Germany
E-mail: luciani@mpipks-dresden.mpg.de

The Immune System (IS) is devoted to recognition and neutralization of antigens,


and is subject to a continuous remodeling with age (Immunosenescence). The
model we propose, refers to a specific component of the IS, the cytoxic T lympho-
cytes and takes into account the conversion from virgin (ANE) to memory and
effector (AE) phenotypes, the virgin cells injection by thymus and the shrinkage
of the overall compartment. The average antigenic load as well as the average ge-
netic properties fix the the parameters of the model. The stochastic variations of
the antigenic load induce random fluctuations in both compartments, in agreement
with the experimental data. The results on the concentrations are compatible with
a previous simplified model and the survival curves are in good agreement with
independent demographic data. The rate of mortality, unlikely the Gomperz law,
is zero initially and asymptotically, with an intermediate maximum, and allows to
explain the occurrence of very long living persons (centenarians).

1 Biological Complexity
The Immune System (IS) preserves the integrity of the organism, continuously
challenged by internal and external agents (antigens). The large variety of anti-
gens ranging from mutated cells and parasites to viruses, bacteria and fungi
requires a rapid and efficient antagonistic response of the organism. At the
top of the philogenetic tree the evolution has developed a specific (clonotipic)
immunity which cooperates with the ancestral innate immunity to control the
antigenic insults ^h The innate system has an arsenal of dendritic cells and
macrophagi with a limited number of receptors capable of recognizing and neu-
tralizing classes of antigens. With the appearance of vertebrates the increase
of complexity stimulated the development of a system based on two new types
of cells, B and T lymphocytes, with three distinct tasks: to recognize the anti-
gens, to destroy them and to keep track of their structure through a learning
process. This kind of immunological memory is the key for a more efficient
response to any subsequent antigenic insult caused by an antigen that the or-
ganism has already experienced (this is the basis of vaccination). The specific
response is based on a variety of memory cells which are activated, by specific
81

molecules of the antigen, presented by the APC (antigen presenting cells) to


their receptors. There are two main T cells compartments: the virgin cells,
which are produced (with B lymphocytes) in the bone marrow but maturate
all their surface receptors in the thymus, and the memory cells, which are
activated by the antigenic experience and preserve the information. The mem-
ory cells specific to a given antigen form a clone which expands with time,
subject to the constraint that total number of cells remains almost constant
with a small decrease with age (shrinkage of IS). The virgin cells instead, after
reaching a maximum in the early stage of life, decrease continuously since the
IS is not able to compensate their continuous depletion due to various biolog-
ical mechanisms (conversion into memory cells, progressive inhibition of the
thymic production, peripheric clonal competition) I2-3!.
The systems with self-organizing hardware, cognitive and memory properties
and self replicating capabilities are by definition complex. The immune and
nervous systems exhibit these features at the highest degree of organization and
can be taken as prototypes of complex systems. Indeed the specific (clonotipic)
immune system has a hardware hierarchically organized , capable of receiving,
processing and storing signals (from its own cytochines and immunoglobulines
network and from the environment), and to create a memory, self replicating
via the DNA encoding, which allows a long term evolutionary memory. For
this reason the mathematical modeling has been particularly intensive on the
IS. Since this system exhibits a large number of space time scales, modeling
is focused either on specific microscopic phenomena with short time scales or
on large scale aspects with long time scales ranging from a few weeks (acute
antigenic response) to the entire lifespan.

2 T lymphocytes

We will focus our attention on the dynamics of the T cells populations on a


long time scale disregarding the detailed microscopic behavior which is cer-
tainly very relevant on short time scales. The virgin T lymphocytes developed
by the thymus have a large number of receptors TCR, built recombining genie
sequences. This large set of variants (up to 1016 in humans), known as T
cell repertoire, allows the recognition by steric contact of the antigen frag-
ments presented by the APC (Antigen Presenting Cells), which degrade the
proteins coming from the englobed antigens via proteolytic activities and show
the peptides resulted from the cleavage to the surface molecules MHC I 4 ' 5 !.
Other stimuli, such as the cytokines t 1 !, determine the differentiation into
effector and memory cells and their proliferation (clone expansion). The mem-
ory cells, unlikely the effector ones, are long lived and show a sophisticated
82

| T helper ] /*™\ [ T Cytotoxic]

ANE AE ANE AE
(Virgin) (memory+effector) CVirgin) (memory+effector)

Figure 1: Markers of Virgin and Memory plus Effector T lymphocytes. Schematic organiza-
tion of main T cell pool: CD4+ (Helper) and CD8+ (cytotoxic) lymphocytes.

cognitive property allowing a more efficient reaction against a new insult of


an experienced antigen. The T lymphocytes are split into two groups: the
cytotoxic and helper T cells. The former attack and destroy cells infected by
intra-cellular antigens, such as virus and kind of bacteria, the latter contribute
to extracellular antigenic response, not described here. They are labeled by
CD8 + (cytotoxic) and CD4 + (helper) according to the surface markers used
to identify them. Each group is further split into virgin, effector and memory
cells, whose role has been outlined, and are identified by some others surface
markers, see figure 1. We are interested in the dynamics of two populations,
the Antigen Not Experience (ANE) virgin T cells, and Antigen Experienced
(AE) effector and memory T cells, which are identified by the CD95~ and
CD95 + surface markers respectively.

3 Modeling immunosenescence
In this note we propose a mathematical model to describe the time variation
of the ANE and AE T cells compartments due to the antigenic load and to
the remodeling of the system itself. The antigenic load has sparse peaks of
high intensity (acute insults) and a permanent low intensity profile with rapid
random variations (chronic antigenic stress). In a previous work I 6 ' a simple
model for the time evolution AE and ANE T cells concentrations was proposed
on the basis of Franceschi's theory of immunosenescence, which sees the entire
IS undergoing a very deep reshaping during the life span. The exchange be-
tween the compartments were considered due to antigen stimulated conversion
and to reconversion due to secondary stimulation. The average antigenic load
contributed to define these conversion rates jointly with a genetic average. The
deterministic part of the model described the decrease of the ANE CD8 + T
83

cells concentration in agreement with experimental data, while the stochastic


forcing, describing the chronic stress I 7 ' 8 !, allowed to obtain individual his-
tories. The spread about the mean trajectory was also compatible with the
data on T cells concentration and allowed to obtain survival curves in good
agreement with independent demographic data, starting from the hypothesis
that the depletion of the ANE T cells compartments is a mortality marker.
The present model is intended to introduce some improvements by taking into
account the remodeling of the immune system with age and is formulated on
the ANE and AE populations rather than on the concentrations. Moreover
the antigenic load is introduced on both the ANE and AE variation rates with
an adjustable mixing angle. The complete model, that will be described in
detail in the next section, has several parameters, but the relevant point is
that if we neglect the remodeling, and compute the concentration, the results
are very similar to the original ones; moreover the data on the AE T cells are
fairly well reproduced t 3 '. The introduction of the remodeling effects shows
that a further improvement still occurs especially for the early stage were the
simplified model was not adequate. The last part is dedicated to the survival
curves obtained from the model. A very important difference is found with re-
spect to the classical Gomperz t 9 l survival law: the rate of mortality vanishes
initially and asymptotically whereas it increases exponentially in the Gomperz
law. This results, which explains the presence of very long lived individuals
(centenarians), supports the biological hypothesis (depletion of ANE T cells
compartment) of Franceschi's theory.

4 Mathematical model and results


The mathematical model is defined by

dV _
-aV -/3M + fi e~xt + ecos2(6>) f(i)
dt ~

dM _ aV + /3M . 2//,N/-/^
1+ M+ +
— 2
(*)^) (1)
dt 7

where V denotes the number of ANE (virgin) CD8 + cells and M the number
of AE (effector +memory) CD8+ cells, M+ = M if M > 0 and M+ = 0
if M < 0. The parameter a gives the conversion rate of virgin cells due to
primary antigenic insult whereas {3 is the reactivation rate of memory cells due
to secondary antigenic stimulus, which has an inhibitory effect on the virgin
cells. In the primary production of AE cells we have taken the conversion
and reconversion terms proportional to (1 + ^M+)~l, in order to take into
84

§
500

o
0
o

Memory
c 8
___
„o o •
° ° ,^-V--" o o
(ft) ^-^"O 0

Jfcg^~t- --I -«
-100 -200
0 Time (years) 120 0 Time (years) 120

Figure 2: Comparison of the model with experimental data for the virgin (ANE) and memory
plus effector (AE) CD8+ T cells for the following parameters a = 0.025, /? = 0.01, e = 15,
0 = 35°, 7 = 0.004, A = 0.05, ^ = 15 and V(0) = 50. The curves are (V(t)) + fcoy (t) and
(M{i)) + kaM{t) with k = - 2 , 0 , 2 .

account the shrinkage of the T cells compartment. The term fie~xt describes
the production by thymus which is assumed to decay exponentially. Finally
e£(£), where
«(*)> = o (at)at')) = s(t-t') (2)
is the contribution of stochastic fluctuations to the conversion rates. The
mixing angle 0 gives the weight of this term on the ANE and AE compartments.
The results are compared with experimetal data in figure 2.

4-1 Simplified model


It was previously studied in the deterministic version and is obtained from (1)
by setting 7 = y, = e = 0. Since M + V is constant the same equations were
satisfied by the concentrations v = V/(V + M) and m = M/(V + M) and the
stochastic equation for v was t 10 ' 6 !
dv
= -{a-fi)v-P + eZ{t) (3)
~dl
where e ~ e/(V r (0)+M(0)), and was solved with initial condition v(0) = 1. The
deterministic model obtained from (1) setting e = 0 can be solved analytically
and we consider two simple cases: /? — 7 = 0 describing the effect of thymus
and /3 = /j, — 0 describing the effect of shrinkage.
85

9=0 e=o 9=45 9=45


1000 500
o

^iv ?- ° . . - • - • "

Virgin
X
'" 8 \ o s
F f
,..'' &
,—
•—r

' ^N * ^* >.•" °" " o^9— . . .


1
/u.
^
•f° S t ° ^-stS^s.
r-.'J*-. M
/ ^ " o
o° *
- f
..ta-..


*S- *
B

^~- •?nn -mn


0 120 120 12 12
Time (years) ° Time (years) ° Time (years) ° ° Time (years) °

Figure 3: Comparison with CD95 data ' 2 1 of virgin (ANE) and memory plus effector (AE)
populations for the model without shrinkage and thymus for two different mixing angles.
The parameters are a - 0.02, /3 = 0.005, £ = 10, V(0) = 400 and 6 ~ 0 (left figures)
and 9 = 45° (right figures). The curves are (V{t)) + kav(t) and (M(t)) + kaM(t) with
k = -2,0,2.

4-2 Analytic solutions without noise


The deterministic solution of the model without thymus and shrinkage 7 =
fi = 0, for initial conditions V(0) and M(0) = 0, reads

e(0-a)t _ J
(V(t)) = V0 (M(t)) = V0a (4)
/3-a 0-a
and the T cell population is conserved M(t) + V(t) = V(0). The deterministic
solution with no shrinkage 7 = 0 can be analytically obtained t 11 !. Since
0 < P < a choosing for simplicity ,5 = 0 one has

e-at _ e-Xt
(V(t)) = V(0)e-at+v (M(t)) = V(0)-(V(t)) + ?-{l-e-»)
A—a
(5)
The solution with shrinkage and no thymic term n = 0, choosing for simplicity
/? = 0, reads
-at
(V(t))=V(0)e (M(t)) = 7 - 1 ( [1 + 2 7 V(0)(1 - e-at) Y1/2 - 1 ) (6)
The graph of (V(t)} (5) exhibits a peak at t = (log A — loga)/(A — a) if
1/(0) = 0. The peak disappears when the thymus contribution vanishes. Con-
versely (M(t)) is monotonic increasing but the thymus enhances its value. The
shrinkage reduces considerably the increase of (M(t)) whereas it does not af-
fect (V(t)). The stochastic term generates a family of immunological histories.
Their spread is measured by the variance. In figure 3 we show the effect of the
mixing angle for the same set of parameters chosen for the simplified model.
86

Thymus Thymus Shinkage Shinkage


§

500

Virgin
Virgin

fa X ^-^
inn -100 ••., " " - —

120 120 120 12


Time (years) Time (years) ° Time (years) Time (years) °

Figure 4: Comparison with CD95 data I 2 J of virgin (ANE) and memory plus effector (AE)
populations for the model with a = 0.025, /9 = 0.01, e = 15, 6 = 35°. On the left side we
consider the contribution of thymus with A = 0.05, /x = 15, 7 = 0 and V(0) = 50 . On the
right side the contribution of shrinkage is shown for 7 = 0.004, /i = 0 and V(0) = 400. The
curves are (V(t)) + kav(t) and (M(t)) •+- kaM{t) with k - - 2 , 0 , 2 .

When 6 grows, the rms spread av of V decreses, whereas the rms spread aM
of M increases. The separate effect of thymus and shrinkage is shown in figure
4, for the same parameters as figure 2.

5 Survival curves
The simplified model with noise given by equation (3), corresponds to the
Ornstein-Uhlembeck process. The probability density, satisfies the Fokker-
Planck equation and has an expilict solution
1 (v-(v)(t)[
(v)(t) + (1 - t;oo)e- t / T p(v,t) = exp
^2Trai(t) ~'r [ 2a2 (t)
(7)
where T = (a - 0)~\ Voo = -0T and a2(t) = \ e2 T (1 - e~2tlT). In figure
5 we compare the results of the model with demographic data. Assuming that
the depletion v = 0 of the virgin T cells compartment marks the end of life it
is possible to compute from (7) the survival probability up to age t
r+00
S(t) 2 du x(t) = (8)
Jx(t) a(t)
Neglecting the thymus and shrinkage effects the concentrations obtained from
equation (1) are close to values of the simplified model if d = 0. Indeed when
7 = /x = 0 we have M(t) + V{t) = V(0) + M(0) + ew(t) where w(t) denotes the
Wiener process. Setting v = V/{V + M) and V = V0 + eVx and M = M0 + <LV\
where V\, M\ are the fluctuating parts with zero mean, we have

(v)=v + e2 (V0 (w2) - (Vi w)) ((v - (v))2) = e2 ((V, - v0 wf) (9)
87

o
500
J\ o° "°o\
Sm
• \ \ °
*

Mortality Rate
rvival Pr obab
0
% \ /"" "X

0.1
/ \

4
« V l
V

V i
-100 y
) Time (years) 0 50 100 150 0 50 100 150
150 Time (years) Time (years)

Figure 5: Virgin cells population (V(t)) ± 2av (i) with vx = -0.5, T = 67 and e =
0.016 (left). Corresponding survival curve (center) and mortality rate (right). These values
correspond to a - 0.022, 0 = 0.0075, e = 6.5 7 = fi = 0 and V(0) = 400

up to order e2, where e = e/{V(0) + M(0)) and v0 = V0/(Vo + M 0 ). When


/3 = 0 asymptotic variance is the same as for the simplified model. The sur-
vival probability S(t) obtained from the simplified model has been compared
with demographic data. We notice that S(t) is a three parameters function
modeling death process as a threshold for a biomarker, which evolves follow-
ing a deterministic in randomly varying environment I 12 - 13 !. In our case the
biomarker is the virgin CD8 + T cells concentration v and the threshold is
t>* — 0. The lower integration end point can be written as

1/2
exp^y^
x(t) = C
1
Vl - e- 2 «/ T H(f (v* - Uoo) (10)

where tt is the death age {v(tr)} = u* namely e~'*/ T = (u* — Voo)/(l — ^oo)- In
figure 5 we fit the demographic data of human males using (8) and the values
of the parameters are close to the ones obtained from the fit of CD8 + T cells
concentrations.

6 Comparison with Gomperz law

Making the simplifying assumption t* = T we obtain a survival probability


depending on two parameters C, T
88

just as the Gomperz law, which is defined by

^ = -RSG f = £ -> 50(t)=exp(-C0(e^0-l)) (12)

Our mortality rate is not monotonically increasing, as for Gomperz law, but
decreases after reaching a maximum, see figure 5. It is better suited to describe
the survival of human populations in agreement with demographic data. This
property, due to the randomly varying antigenic load on the organism, explains
the occurrence of very long lived persons (centenarians). We notice that x(t) oc
- i - 1 / 2 as t ->• 0 and x(oo) = C so that
-C2/2
e
Jim S(t) = 1 lim S(t) = —7=r (13)

We notice that 5(+oo) > 0 means nonzero probability of indefinite survival.


However 5(+oo) ~ 1 0 - 3 for C = 3 and it is below 1 0 - 6 for C = 5 our law
imposes to fix a reasonable lower bound on C. We further notice that

«*H f t=T
C
T
'
V2TT(1 - e- 2 )
(14)

The meaning of the parameters is obvious: T is the age at which the survival
probability is exactly 50% and the slope of the curve there is proportional to
C. We can say that C measures the flatness of the graph of S(t). For the
mortality rate R = —S/S we have the following asymptotic behavior

7 Conclusions
We consider the long time behavior of the CD8 + virgin T cells and CD8 + anti-
gen experienced T cells compartments and the remodeling of the IS system.
The stochastic variations of the antigenic load determine a spread in the time
evolution of the cells number, in agreement with experiments. The results are
compatible with a previous simplified model for the virgin T cells concentra-
tions and provides survival curves compatible with demographic data. The
effect of thymus and remodeling improves the description of the early stage for
the virgin T cells and late stage of antigen experienced T cells.

8 Acknowledgments
We would like to thank Prof. Franceschi for useful discussions on the immune
system and ageing.
89

9 References
1. A Lanzavecchia, F.Sallustio Dynamics of T lymphocytes Responses: In-
termediates,Effector and Memory Science 290, 92 (2000)
2. F. Fagnoni, R. Vescovini, G. Passeri, G. Bologna, M. Pedrazzoni, G. La-
vagetto, A. Casti, C. Franceschi, M. Passeri & P. Sansoni, Shortage of
circulating naive CD8 T cells provides new insights on immunodeficiency
in aging. Blood 95, 2860 (2000)
3. F. Luciani, S. Valensin, R. Vescovini, P. Sansoni, F. Fagnoni, C.
Franceschi, M. Bonafe, G. Turchetti, Immunosenescence: The Stress
Theory AStochastical Model for CD8+ T cell Dynamics in Human Im-
munosenescence: implications for survival and longevity, J. Theor. Biol.
213,(2001)
4. A. Lanzavecchia, F. Sallusto, Antigen decoding by T lymphocytes: from
synapse to fate determination, Nature Immunology 2, 487 (2001)
5. G.Pawelec et al, T Cells and Aging, Frontiers in Bioscience 3, 59 (1998)
6. F. Luciani, G. Turchetti, C. Franceschi, S. Valensin, A Mathematical
Model for the Immunosenescence, Biology Forum 94, 305 (2001).
7. C. Franceschi, S. Valensin, M. Bonafe, G. Paolisso, A. I. Yashin, D.
Monti, G. De Benedictis, The network and the remodeling theories of
aging: historical background and new perspectives., Exp. Gerontol. 35,
879 (2000)
8. C. Franceschi, M. Bonafe, S. Valensin. Human immunosenescence: the
prevailing of innate immunity, the failing ofclonotipic immunity, and the
Riling of immunological space. Vaccine. 18,1717(2000)
9. B. Gompertz On the nature of the function expressive of the law of hu-
man mortality, and on the new mode of determining the values of life
contingencies. Philos. Trans. R. Soc. London 115, 513 (1825)
10. F. Luciani Modelli hsico-matematici per la memoria immunologica
e l'immunosenescenza , Master thesis, Univ. Bologna (2000)
11. L. Mariani Modelli stocastici deU'immunologia: Risposta adattativa,
Memoria e Longevita' Master thesis, Univ. Bologna (2001)
12. L.A. Gavrilov, N. S. Gavrilova The Biology of Life Span: a quantitative
approach ( Harwood Academic Publisher,London, 1991)
13. L. Piantanelli, G. Rossolini, A. Basso, A. Piantanelli, M. Malavolta,
A. Zaia, Use of mathematical models of survivorship in the study of
biomarker of aging: the role of Heterogeneity, Mechanism of Ageing and
Development 122, 1461 (2001)
This page is intentionally left blank
NEURAL NETWORKS AND
NEUROSCIENCES
This page is intentionally left blank
93

ARTIFICIAL NEURAL NETWORKS IN NEUROSCIENCE

N. ACCORNERO, M. CAPOZZA
Dipartimento di Scienze Neurologiche, Universita di Roma LA SAPIENZA

We present a review of the architectures and training algorithms of Artificial Neural Networks and their
role in Neurosciences.

1. Introduction: Artificial Neural Networks

The way an organism possessing a nervous system behaves depends on how the
network of neurons making up that system functions collectively. Singly, these
neurons spatially and temporally summate the electrochemical signals produced by
other cells. Together they generate highly complex and efficient behaviors for the
organism as a whole. These operational abilities are defined as "emergent" because
they result from interactions between computationally simple elements. In other
words, the whole is more complex than the sum of its parts.
Our understanding of these characteristics in biological systems comes largely from
studies conducted with artificial neural networks early in the 1980s [1]. Yet the
biological basis of synaptic modulation and plasticity were perceived by intuition 40
years earlier by Hebb, and the scheme for a simple artificial neuronal network, the
perceptron, was originally proposed by Rosenblatt [2] and discussed by Minsky [3]
in the 1960s.
An artificial neural network, an operative model simulated electronically (hardware)
or mathematically (software) on a digital processor, consists of simple processing
elements (artificial neurons, nodes, units) that perform algorithms (stepwise linear
and sigmoid functions) on the sum or product of a series of numeric values coming
from the various input channels (connections, synapses). The processing elements
distribute the results of the output connections multiplying them by the single
connection "weights" received from the other interconnected processors. The final
complex computational result therefore depends on how the processing units
function, on the connection weights, and on how the units are interconnected (the
network architecture).

To perform a given task (training or learning), an artificial net is equipped with an


automated algorithm that progressively changes at least one of these individual
94

computational elements (plasticity) almost exactly as happens in a biological


neuronal network.

CONNECTIONS
PLASTICITY

F(l) Z^-—^F(')*P2
F(I)*P3
(l)*P4

SIMULATION
HARDWARE SOFTWARE

Figure 1 : Comparison between a biological neuron and an artifical neuron, and a


hardware-software simulation.

The functions of the processing elements (units) and the network architecture are
often pre-determined: the automated algorithms alter only the connection weights
during training. Other training methods entail altering the architecture, or less
frequently, the function of each processing unit.
The architecture of a neural network may keep to a pre-determined scheme (for
example with the processing elements, artificial neurons, grouped into layers, with a
single input layer, several internal layers, and a single output layer). Otherwise it
starts from completely chance connections that are adjusted during the training
process.
95

A R T I F I C I A L NEURAL NETWORKS
TOPOLOGY

NETWORK CLUSTER LAYERED

Figure 2 Variable architecture of artificial neural networks.

Network training may simply involve increasing the differences between the various
network responses to the various input stimuli (unsupervised learning) so that the
network automatically identifies "categories" of input [4, 5, 6]. Another training
method guides the network towards a specific task (making a diagnosis or
classifying a set of patterns). Networks designed for pattern classification are trained
by trial and error. Training by trial and error can be done in two ways. In the first, an
external supervisor measures the output error then changes the connection weights
between the units in a way that minimizes the error of the network (supervised
learning) [7, 8]. The second training method involves selective mechanisms similar
to those underlying the natural selection of biological species — a process that makes
random changes in a population of similar individuals, then eliminates those
individuals having the highest error and reproduces and interbreeds those with the
lowest error. Reiterating the training examples leads to genetic learning of the
species [9].
96

0>
J I "
SUPERVISED

BEITA RlJt.ES
M
P '. ERROR 8ACIC-PMOPASATION

UNSUPERVISED

TOWETlTlWe UEARftHMG

GENETIC AiGORITHRSS
l -HI I l in I i Hi UtaHFUHlBlH
CROSSOVER

Figure 3. Training modalities

The choice of training method depends on the aim proposed. If the network is
intended to detect recurrent signal patterns in a "noisy" environment, then excellent
results can be obtained with a system trained through unsupervised learning. If one
aims to train a diagnostic net on known knowledge, or to train a manipulator robot
on precise trajectories, then one should choose a multilayered network trained
through unsupervised learning. If the net is designed for use as a model, that is to
simulate biologic nervous system functions, then the ideal solution is probably
genetic learning. Adding the genetic method to either of the other two methods will
improve the overall results.
In summary, biological and artificial neural networks are pattern transformers. An
input stimulation-pattern produces an output pattern-response, especially suited to a
given aim. To give an example from biology: a pattern of sensory stimuli, such as
heat localized on the extremity of a limb, results in a sequence of limb movements
that serve to remove the limb from the source of heat. A typical example of an
artificial network is a system that transforms a pattern of pathologic symptoms into a
medical diagnosis.
97

Input and output variables can be encoded as a vectorial series in which the value of
a single vector represents the strength of a given variable. The power of vectorial
coding becomes clear if we imagine how some biological sensory systems code the
reality of nature. The four basic receptors located on the tongue (bitter-sweet-salt-
acid) allow an amazing array of taste sensations. If each receptor had only ten
discrimination levels - and they certainly have more - we could distinguish as many
as 10.000 different flavors. On this basis, each flavor corresponds to a point in a
four-dimensional space identified by the 4 coordinates of the basic tastes. Similar
vectorial coding could make up the input of an artifical neural network designed to
identify certain categories of fruit. One or more internal (hidden) layers would
transform this coding first into numerous arbitrary hidden (internal) codes, and
ultimately into output codes that classify or recognize the information presented.

APPLE

BITTER BANANA

SWEET

ACID

SALTY CHERRY

«
• GRAPE
INPUT OUTPUT
VECTORIAL CODING POSITIONAL CODING

Figure 4. A forward-layered neural network

This network designed to recognize or diagnose can be generalized to a wide range


of practical applications, from geological surveying to medicine and economic
evaluation. This type of network architecture, termed "forward", because its
connections all converge towards the output of the system, is able to classify any
"atemporal" or "static" event. Yet reality is changeable, input data can change
98

rapidly, and the way in which these data follow one another can provide information
that is essential for recognizing the phenomenon sought or for predicting how the
system will behave in the future. Enabling a network to detect structure in a time
series, in other words to encode "time", means also inserting "recurrent"
connections (carrying output back to input). These connections relay back to the
input unit the values computed by units in the next layer thus providing information
on the pattern of preceding events. Changing of the connection weights during
training is therefore also a function of the chain of events.
The nervous system is rich in recurrent connections. Indeed, it is precisely these
connections that are responsible for perceiving "time". If a "forward" network
allows the input pattern to be placed in a single point of the multidimensional vector
space, a recurrent network will evaluate this point's trajectory in time.

DYNAMIC RECOGNITION
T I M E CODING

PREDICTION

Figure 5. Recurrent layered neural network

Advances in neural networks, the unceasing progress in computer science, and the
intense research in this field have brought about more profound changes in
technology than are obvious at first glance. Paradoxically, because these systems
have been developed thanks to digital electronics and by applying strict
99

mathematical rules, the way artificial neural networks function runs counter to
classic computational, cognitive-based theories. A distinguishing feature of neural
networks is that knowledge is distributed throughout the network itself rather than
being physically localized or explicitly written into the program. A network has no
central processing unit (CPU), no memory modules or pointing device. Instead of
being coded in symbol (algorithmic-mathematical) form, a neural network's
computational ability resides in its structure (architecture, connection weights and
operational units). To take an example from the field of mechanics, a network
resembles a series of gears that calculates the differential between the rotation of the
two input axles and relays the result to the output axle. The differential is executed
by the structure of the system with no symbolic coding.

The basic features of an artificial neural network can be summarized as follows:


1) Complex performance emerging from simple local functions.
2) Training by examples
3) Distributed memory, and fault tolerance.

4) Performance plasticity.

Connectionist systems still have limited spread in the various fields of human
applications, partly because their true potential remains to be discovered, and partly
because the first connectionist systems were proposed as alternatives to traditional
artificial-intelligence techniques (expert systems) for tasks involving diagnosis or
classification. This prospect met with mistrust or bewilderment, due paradoxically to
their inherent adaptiveness and plasticity, insofar as misuse of these qualities can
lead to catastrophic results. Disbelievers also objected that the internal logic of
artificial neural networks remains largely unknown: networks do not indicate how
they arrive at a conclusion. These weaknesses tend to disconcert researchers with a
determinist background (engineers, mathematicians and physicists) and drive away
researchers in medical and biological fields, whose knowledge of mathematics and
computer science is rarely sufficient to deal with the problems connectionist systems
pose.

2. Application of artificial neural networks to neuroscience

Neural networks now have innumerable uses in neuroscience. Rather than listing the
many studies here, we consider it more useful to give readers some reference points
for general guidance. For this purpose, though many categories overlap or remain
borderline, we have distinguished three: applications, research, and modelling.
Applications essentially use neural networks for reproducing, in an automated, faster
or more economic manner, or in all three ways, tasks typically undertaken by human
experts. The category of applications includes diagnostic neural networks (trained to
100

analyze a series of symptoms and generate a differential diagnosis, and even a


refined one, and networks designed for pattern recognition (currently applied in
clinical medicine for diagnostic electrocardiography, electromyography,
electroencephalography, evoked potentials, and neuroophfhalmology) [10]
Other networks are designed to segment images (for example, to identify anatomical
objects present in radiologic images, and highlight them in appropriate colors or
reconstruct them in three dimensions). In general, these networks have received
supervised training. Of especial interest to neurologists are diagnostic networks
trained to make a typically neurologic diagnosis such as the site of lesion. One of
these nets visualizes the areas of probable lesion on a three-dimensional model of
the brainstem, thus supplying a topographic, rather than a semantic diagnosis [11].

autoMAtio visual Finlri ana lijsor output


OPTONET
FLOWCHART I*

lncoMpi«t» eonsroous i-isht

Figure 6: Automated diagnosis of the visual field.


101

EMG -NET
A NEURAL NETWORK FOR THE DIAGNOSIS OF
SURFACE MUSCLES ACTIVITY

"wi m
i P ' ' lfcO(.H."<»l

PREAMPLIFIERS TIMr SFPIFS SPECTROGRAMS FORWARD N.N.

Figure 7: Automated electromyographic diagnosis.

BRAINSTEM-NET

DATA REPORT FORM NEURAL 3D MODEL


NERWORK

CLINICAL

NEUR0PHYSI0L0G1CAL

5268
VOXELS
_ l
Figure 8: 3d-images computed by a forward-layered neural network
102

Neural networks of the diagnostic type have many practical uses (for example,
automated recognition of EEG epileptic abnormalities quickens the tedious
examination of a 24-48 hour dynamic EEG recording). They have the advantage of
being able to process even noisy data, and can smoothly degrade the network's
response to excessive signal deterioration, whereas traditional expert systems
function well up to a given signal-to-noise ratio but then stop responding.
Physicians often seem reluctant to make use of neural networks. The reasons why
they do so are many and merit an analysis in depth that is outside the scope of this
chapter. To put it in brief, we suspect that physicians fear that a diagnosis provided
automatically by a machine will diminish their authority.
An extremely interesting field is the use of a neural network as a research tool in
areas where more traditional research tools seem to have exhausted their potential.
These cases essentially call for unsupervised networks, able to discover correlations,
regularity, and hidden patterns that other methods fail to disclose. In this context,
neural networks function as powerful multivariate and non-linear statistical tools.
Research conducted by Roberts and Tarassenko [12] shows, for example, that the
traditional division of sleep into five stages according to visual inspection of the
EEG is an arbitrary classification. Using a neural network they identified seven EEG
attractors during sleep (and one during wakefulness). Each sleep stage arises from a
characteristic trajectory between some of these attractors: and all these dynamic
events originate from the competitive interactions between three processes.
Last, the research field of greatest interest to the neuroscientist is also the area where
the cormectionist method of analysis finds widest agreement, namely the use of
neural networks in nervous system modeling. In this field, neural networks are
strictly speaking used not as tools but as models- though simplified ones - of
biologic nervous system functioning.
Because these models are simulated on computers, they can study the properties of
systems in a dynamic, quantitative manner, starting from the single local phenomena
emerging from collective interactions: memory, imagination, language, and maybe
in the future, from consciousness.
In effect, artificial neural network simulation offers the first - and at present the only
chance — of bringing the study of higher nervous system functions back from
psychology and philosophy into the realms of the natural quantitative sciences. No
longer will research be limited to observing and formulating hypotheses on nervous
system function: it will also be able to test function experimentally, though for the
time being in a limited way.
Sensory systems behave similarly to biological systems (artificial retina). Artificial
neuronal systems for motor control display an ability for unsupervised learning on
the control of a physical artificial arm or one simulated on a computer [13, 14].
103

LEARNING BY DOING

Figure 9: Model of motor control implemented with an unsupervised neural


network

They will also control a double-inverted pendulum model of standing posture that
appears able to learn the upright posture spontaneously and to compensate for
perturbations in balance due to environmental conditions.
Studies of this type have helped us to understand some of the essential mechanisms
underlying the development of central nervous system sensorimotor control. This
knowledge has been put to various practical uses including neurorehabilitation and
attempts to construct sensory and motorized prostheses.
A connectionist (neural network) approach allows one to investigate not only the
functioning, but also the birth and evolution of simple nervous systems [15, 16].
The unavoidable, spontaneous affinity between connectionism and the
computational branch of evolutionary biology that studies the formation and
transformation of elementary organisms simulated by genetic algorithms has lead to
extraordinary results in the simulation of "artificial organogenesis" (the eye) and of
"artificial life".

The term artificial life refers to a computer simulation and study of dynamic
ecosystems. These systems are described as artificial in the sense that they are
104

originally designed by humans and are immaterial, but they evolve and reproduce in
an autonomous and often unpredictable manner. The lack of a material constitution
limits the ability of stimulation to describing the physicochemical properties of
organic and inorganic materials in mathematical terms.
To date, most studies focus on macroscopic behaviors including reproduction,
movement strategies, and energy exchange with the simulated environment (the
search for food) or the appearance of cooperative behaviors (including storms and
schools offish).
The search for criteria that will distinguish between living and non-living organisms
is as old as man. It also seems ever more arduous as investigational techniques
become increasingly refined. Currently the borderline between the two, if the
question is legitimate, lies between crystalline mineral structures and self-replicating
biological structures (DNA-RNA). At behavioral level, a useful definition is that
proposed by Monod, indicating three essential features of the living world:
teleonomy (the presence of a structural plan), autonomous morphogenesis and
reproductive invariance. But again, these three characteristics are wholly
interdependent. Most probably, the only law that distinguishes organic from
inorganic matter is their tendency to saturate the environment with copies of
themselves, thus modifying the environment itself, whenever possible to their own
advantage.
Because many species found in nature compete with one another and cooperate
towards the same aim, local situations of dynamic equilibrium are reached, termed
ecosystems. The scale of observation is obviously an important variable since the
living world seems to be organized into ecosystems within other ecosystems, rather
like Chinese boxes. From this viewpoint, expecting to simulate fully a biologic
ecosystem, however small, within the isolation and immaterial setting of a
mathematical process, may seem absurd and misleading. Yet if the aim is not to
simulate the ecosystem exactly but to understand only some of the rules governing
the biological world then the method is right and can be enlightening.
The opportunities for investigation in this field stem from at least three determinant
coincidences:
1) the development and spread of sufficiently powerful digital computers;
2) progress in studies on connectionism; and
3) an improved understanding of genetic biological mechanisms that allowed basic
laws to be simulated with mathematical formulas termed "genetic algorithms".
105

ARTIFICIAL LIFE
COMPUTER SIMULATION OF EVOLUTION AND BEHAVIOUR
OF SIMPLE ORGANISMS IN AN ENVIRONMENT
GENETIC ALGORITHMS AND THE RULES OF "NATURAL SELECTION "ARE EMBEDDED IN THE SIMULATED
SYSTEM . STARTING CONDITIONS ARE CASUAL . SPECIES EVOLVE BECAUSE
BEST-FITTING ORGANISMS .THAT CAN UTILIZE THE RESOURCES OF THE ENVIRONMENT
TO REPRODUCE EFFICIENTLY, ARE SELECTED

PARENTS

11D01D10010 010011110101001011 GENOME

01101 11101010100.10 :POS ER

011011110101010000 i,'ur~Ti'~ •]

NATURAL SELE( i l 1
or>~'-,rFiuc-.
3« %
Fig 10: Model of artificial life simulated using genetic algorithms on populations
(hundreds to thousands) of unsupervised neural networks that compete in an artifical
environment.

3. Conclusions

After a fatiguing course, with trials and tribulations lasting more than 50 years,
connectionism has at last achieved the recognition it deserves. It is now beginning to
show its enormous potential for managing and understanding complex phenomena
that the deterministic means hitherto available could not approach. The coming
years should therefore witness a major breakthrough in scientific investigation.
We recommend readers who wish to approach this technology, as well as reading
explanatory texts (1,8), to begin experimenting using one of the inexpensive
commercial software programs suitable for a personal computer of modest
performance. These programs enable even inexpert users to set up and train a neural
network suitable for multiple applications.
Innumerable Internet sites are dedicated to neural networks: in response to keywords
including "neural networks" any research engine will supply hundreds, or often
106

thousands of links. Many of these sites offer neural nets as "freeware" (software that
can be downloaded free and used without restrictions) and "shareware" (software
that can be downloaded free-of-charge for personal evaluation and ultimately
eliminated or bought).

References

[I] D. Parisi: Intervista sulle reti neurali, Bologna, II Mulino, 1989.


[2] F. Rosenblatt: Principles of neurodynamics. Perceptrons and the theory of brain
mechanisms, Washington, D.C., Spartan Books, 1962.
[3] M. Minsky, S. Papert: Perceptrons, Cambridge, Mass., MIT Press, 1969.
[4] G. A. Carpenter, S. Grossberg: Neural dynamics of category learning and
recognition: attention, memory consolidation, and amnesia, in J. Davis, R.
Newburgh, and E. Wegman (Eds.): Brain Structures, Learning, and Memory, AAAS
Symposium Series, 1986.
[5] S. Grossberg (Ed): Neural networks and natural intelligence, Cambridge, Mass.,
MIT Press, 1988.
[6] T. Kohonen: Self-organization and associative memory, 3 r ed., Berlin/'
Heidelberg/New York, Springer-Verlag, 1989.
[7] J.J. Hopfield, D.W. Tank: Neural computation of decision in optimization
problems, Biol. Cybern. 52, 141-152, 1985.
[8] D.E. Rumelhart , J.L. McClelland: Parallel distributed processing, Cambridge,
Mass., MIT Press, 1986.
[9] D.E. Goldberg: Genetic algorithms in search, optimization, and machine
learning, Reading, Mass., Addison-Wesley, 1989.
[10] N. Accornero, M. Capozza: OPTONET: neural network for visual field
diagnosis. Med. & Biol. Eng. & Comput, 1995, 33, 223-226.
[II] G. Cruccu, M. Capozza, S. Bastianello, M. Mostarda, A. Romaniello, N.
Accornero: 3D Brainstem Mapping of "anatomical" and "functional" lesions, 9th
European Congress of Clinical Neurophysiology, Bologna, Monduzzi Editore, 1998.
[12] S. Roberts, L. Tarassenko: New method of automated sleep quantification,
Med. & Biol. Eng. & Comput, 1992, 30, 509-517.
[13] B.W. Mel: Connectionist Robot Motion Planning, San Diego, CA, Academic
Press Inc., 1990.
[14] M. Capozza. N. Accornero: Rete neurale non supervisionata per il controllo di
un arto simulato, VI Congr. Nazionale Tecnologie Informatiche e Telematiche
Applicate alle Neuroscienze (ANINs), Milano, 1997.
[15] G.M. Edelman: Neural darwinism, New York, Basic Books, 1989.
[16] N. Accornero, M. Capozza: Vita artificiale: connessionismo e meccanismi
evolutivi genetici. Riv. Neurobiologia, 1994, 40 (5/6), 445-449.
107

BIOLOGICAL NEURAL NETWORKS: MODELING AND


MEASUREMENTS

RUEDI STOOP AND STEFANO LECCHINI


Institut fur Neuroinformatik, ETHZ/UNIZH, Winterthurerstr. 190,
CH-8057 Zurich
ruedi@ini.phys.ethz.ch

When interaction among regularly firing neurons is simulated (using measured cor-
tical response profiles as experimental input), besides complex network dominated
behavior, embedded periodicity is observed. This is the starting point for our the-
oretical analysis of the potential of neocortical neuronal networks for synchronized
firing. We start from the model that complex behavior, as observed in natural
neural firing, is generated from such periodic behavior, lumped together in time.
We address the question of whether, during periods of quasistatic activity, different
local centers of such behavior could synchronize, as required, e.g., by binding the-
ory. It is shown that for achieving this, methods of self-organization are insufficient
- additional structure is needed. As a candidate for this task, thalamic input into
layer IV is proposed, which, due to the layer's recurrent architecture, may trigger
macroscopically synchronized bursting among intrinsically non-bursting neurons,
leading in this way to a robust synchronization paradigm. This collective behavior
in layer IV is hyperchaotic; its characteristic statistical descriptors agree well with
the characterizations obtained from in vivo time series measurements of cortical
response to visual stimuli. When we evaluate a novel, biological relevant measure
of complexity, we find indications that the natural system has a tendency of tuning
itself to the regions of highest complexity.

1 Introduction

One emergent question in biology and in the computational sciences, is how


the brain processes information. The answer to this question is important for
at least two reasons. Firstly, if a spike event is taken to give the basic unit
of the clock time, the brain is incredibly efficient, at cycle times that are of
the order of milliseconds and thus way below those of artificial information
processing devices. Secondly, cortical data are full of noise. However, our
perception of noise has recently changed, from that of a kind of hindrance to
something potentially useful, as noise has been shown to be able to synchronize
ensembles, to lead to phase transitions, and to have many more unexpected
effects. Moreover, insight has been gained that noise is being used to a large
extent by biological systems. Striking examples are Brownian motors [1] or
stochastic resonance [2]. From the technological point of view, insight into
nature and potential of noise is important, as it is the primary obstacle to
making electronic compounds ever smaller, which would imply not only the
optimization of the occupied space, but also minimizing signal processing time
108

and energy spent.


As a consequence, the expectation that there may be useful, still undiscov-
ered, computational principles within the cortex, is not entirely of speculative
nature. These principles, when combined with the speed of modern comput-
ers, could lead to a jump in the computational power of artificial computation,
from the hard- and software points of view. We have recently shown [3] that
to distinguish noisy firing from firing in terms of patterns, fractal dimension
log-log plots are useful, where pattern firing is documented by parallel step-
like behavior of the embedded correlation curves, whereas noisy firing yields
convex, non-converging curves. In Fig. 1, we show two cases, obtained from
a set of in vivo measurements of cat neocortical neurons (VI). When looking
for principles able to convert noise into patterns, the principle of locking offers
itself as a solution. Upon small-scale uncorrelated noisy input, a neuron is put
on a limit-cycle solution. Such neurons, when coupled more strongly, will en-
gage in locked firing. One possibility then would be to explain in vivo complex
firing behavior as the switching between different locked states. This mecha-
nism would not only provide an ideal A/D-converter device, it moreover would
lead to an encoding of stimuli optimal under different information-theoretic
aspects [4].
In the discussion of the computational properties of the human brain, an
important issue of current interest is the feature-binding problem, which relates
to the cortical task of associating one single object with its different features
[5-6]. As a solution to this problem, synchronization among neuron firing has
been proposed - in opposition to the concept of so-called grandmother cells.
The purpose of this contribution is to investigate, under which conditions a
neocortical network is able to evolve towards self-organized complexity, docu-
mented in the emergence of spatio-temporal structures, e.g., synchronization.

2 Results

Of primary importance therefore would be, to have solid, generally accepted,


experimental proofs of synchronization. This, however, at the moment seems
not to be available. We will find that also our, more theoretical, approach,
requires details of neuronal connectivity and efficacy that are far from being
resolved, not only from the physiological, but also from the mathematical point
of view (e.g., the question from what degree of connectivity on a network can
be modeled as all-to-all connected). Our modeling assumption will be that
for layers I-III and V-VI, a hierarchy of interactions is relevant, from weak
to strong, from topologically far to near. This implies that a hierarchy of
couplings should be considered and suggests, that for the interaction, mod-
109

a) b)

-20
-20 log e 0 -20 log e 0

-3.5

-7
-4 loge -0.5 -4 |og e -1

Figure 1. In vivo measurements (cat, visual cortex VI). log-log correlation plots and inter-
spike probability distributions of a) a noisy, and b) a pattern-firing neuron.

eling as point processes should be allowed. It suggests furthermore, that for


layers with low connectivity, a coupled map lattice model (CML), where the
interactions should essentially be next-neighbor order, is reasonable, although
for finding coherent, synchronized behavior, all-to-all or mean-field coupling
would be more promising. Layer IV finally is, due to its distinct connectivity,
modeled as an all-to-all coupled network. As the result of our investigations,
it emerges that the origin of synchronized behavior (if existent in nature) is
unlikely in layers I-III and V-VI. We find:

1) The qualitative behavior in the CML-model of layers I-II and V-VI, is in-
dependent from the local lattice map;
2) In the absence of noise, the CML-model is unable to synchronize in a self-
organized manner. Synchronization, however, could emerge in an input-driven
way;
3) Collective, roughly synchronized, behavior can be generated in a model of
110

layer IV, which also is able to generate responses that are very close to in vivo
measured neuronal firing;
4) The highest complexity of the CML-model is found where the global be-
havior of the network is close to the border between order and disorder. It is
possible that the objective of this property is to facilitate the inheritance of
rhythms of firing to other layers than IV.

3 Methods

Model of computational layers

For the computational layers I-III/V-VI, we use a CML approach, where, in a


first step, we approximate the layer by weakly coupled ensembles of, in itself,
more strongly coupled regularly firing pyramidal neurons. More precisely, for
the CML-site maps, we use the profiles of in-vitro measured pyramidal neu-
rons, when they are perturbed at otherwise quasistatic dynamical conditions,
by excitatory and inhibitory pulses. In quasistatic conditions, the individual
neurons are driven to regular firing by uncorrelated small-scale noise, but, due
to stronger coupling with nearest neighbors, engage in locked states [7], which
may be expressing computational results [4]. It has been shown that recurrent
connections on these computational circuits can act as controllers of the peri-
odicity of the locking. In other words, they modify the computational results
returned by the circuit [8]. For this case, we find that self-organized synchro-
nization, as needed to support the binding by synchronization hypothesis, is
virtually impossible. The natural next step is then to add second order pertur-
bations among the sites. However, even for this refined case, we end up with
absence of self-organized synchronization.
The CML-site maps are derived from binary interaction, although the
extension to n-ary interaction, or even to interaction among synchronized en-
sembles, is straightforward. The description of binary interaction is by means
of maps of the circle
/ : <f>i+i = <pi + tt - T(<j)i)/T0 (modulo 1). (1)
Here, ft is the ratio of the self-oscillation frequency over the perturbation fre-
quency, and T(<f>i)/T0 measures the lengthening / shortening of the unper-
turbed interspike interval due to the perturbation, as a function of the phase
at which the perturbation arrives. Note that both quantities can be measured
in experiments; this is how we base our derivation upon experimental data.
The formula describes what happens to a noise-driven neuron (where the noise
follows a central-limit theorem and therefore drives the neuron into regular fir-
ing), when it is perturbed by a similar, strongly coupled "neighboring" neuron
111

(note, however, that in the mathematical literature, this type of interaction


is often referred to as weak interaction [9]). From experiments of increased
perturbation strengths, we found that its effect can be parameterized as

g(<t>,K) := T{<f>uK)ITQ = (T(<t>hK0)/T0 - \)K + 1, (2)


where KQ is a normalization, chosen such that at K = 1, 75 percent of the
maximal experimentally applicable perturbation strength is obtained. The
perturbation response experiments are performed for excitatory, as well as
for inhibitory, perturbations. The first experimental finding is that chaotic
response may be attained from pair interaction, but only if the interaction is of
inhibitory nature [10]. This is essentially a consequence of the greater efficacy
of inhibitory synapses, a fact that is well-known in physiology. Note also that
our biology-motivated normalization differs from the usual mathematical one,
which attributes the value K = 1 to the critical value of the map, i.e., when
the map / ceases to be invertible. The second finding is that, as is predicted by
the theory of interacting limit cycles, locking into periodic states is abundant,
and that the measure of quasiperiodic firing relations between the neurons
quickly vanishes as a function of the perturbation strength K. A last finding
is that when going from the static to the quasistatic case, lockings into a
hierarchy of periodicities is observed, exactly of the type that is predicted by
the associated Farey-tree. In fact, our results can be interpreted as the first
experimental proof of the limit cycle nature of regularly firing cortical neurons.
Consequently, as a first approximation, the activity within the centers of
stronger interacting neurons is described by locking on Arnold tongues (see,
e.g. [7]). Beyond the bi- or n-ary strong interaction, there also is a weaker
exchange of activity, which can be modeled as diffusive interaction, among the
more strongly coupled centers. In this way, we arrive at a coupled map lattice
model, that we base on measured binary interaction profiles at physiological
conditions (including all kinds of variability, e.g., interaction types, coupling
strengths)

<i>i,j(tn+i) •= (1 - k2kij)fKn((f>ij{tn)) + —kij^2(j)k,i(tn), (3)


nn

where <f> is the phase of the phase-return map, at the indexed site, and nn
again denotes the cardinality of the set of all nearest-neighbors of site i, j .
&2 describes the overall coupling among the site maps. This global coupling
strength is locally modified by realizations kij, taken from some distribution,
which may or may not have a first moment (in the first case, k2 can be nor-
malized to be the global average over the local coupling strengths). In Eq.
3, the first term reflects the degree of self-determination of the phase at site
112

{i,j}, the second term reflects the influence by nearest-neighboring (i.e., the
ones producing the strongest interactions) centers.

Absence of self-organized synchronization

Synchronized behavior, as we understand it, should be observable as the emer-


gence of non-local structures within the firing behavior of the neurons in the
network. In the case of initially independent behavior, we may expect that
due to the coupling, a simpler macroscopic behavior will be attained, which
could be taken as the expression of the corresponding perceptional state. Ex-
tended simulations, however, yield the result that, for biologically reasonable
parameters, the response of the network is essentially unsynchronized, despite
the coupling. Extrapolations from simpler models, for which exact results are
available [11], provide us with the explanation why. Generically, from weakly
coupled regular systems, regular behavior can be expected. If only two sys-
tems are coupled, generally a shorter period than the maximum of the involved
periodicities emerges. If, however, more partners are involved, a competition
sets in, and high periodicities most often are the result. Typically, synchro-
nized chaotic behavior results from the coupling of chaotic and regular systems,
where the chaotic contribution is strong enough. Otherwise, the response will
be regular. When chaotic systems are coupled, however, synchronized chaotic
behavior as well as macroscopically synchronized regular behavior, may be the
result (e.g., [11]). For the last option, we need to focus on the evolution of
cyclic eigenstates, as they show how this collective behavior might emerge.
We performed simulations using 2-d networks, with diffusive coupling be-
tween 20 x 20 to 100 x 100 local maps of excitatory / inhibitory interaction.
In agreement with the above expectations, we found no signs of macroscopic,
self-organized synchronization, using physiologically motivated variability on
the parameters (inhibitory / excitatory site maps with individual "excitability"
K, locally varying diffusive coupling strength, etc.). To better understand the
results of our simulations, we compared with an idealized model, that should
be a better candidate for collective synchronization. This model is a diffusively
coupled model where at all sites we have identical tent maps. It may be argued
against this comparison that, whereas the maps derived from the experiments
are non-hyperbolic, this model is hyperbolic, which is a non-generic situation.
Through simulations, however, it is found, that the corresponding model with
non-hyperbolic (e.g., parabola) site maps, shares the main properties of the
tent-map model, i.e., the phenomenology is much stronger determined by the
coupling than by the chosen site maps. The advantage of the model of coupled
tent maps is, that it can be solved analytically. In our case the largest network
113

Lyapunov exponent [12] is of relevance, which can be calculated by using the


thermodynamic formalism approach as follows. First, it must be realized that
the coupled map lattice can be mapped onto a matrix representation of the
form:
( (1 - k2)a 0 > 0 '-a 0 \
(1 - k2)a 0 %a 0
M = (4)

V 0 IfaQlfa 0 (l-k2)a\J
where a is the (absolute) slope of the local tent maps and k2 is the diffusive
coupling strength. The thermodynamic formalism formally proceeds by raising
the (matrix) entries to the (inverse) temperature /3, and then focusing, as
the dominating effect, on the largest eigenvalue as a function of the inverse
temperature. For large network sizes, the latter converges towards
fi(p,k) = (\(l-k2)a\)P + (ak2)0. (5)
This expression explicitly shows that the network behavior is determined
by two sources: by the coupling (fc2) and by the site instability (a). Using this
expression of the largest eigenvalue, we obtain the free energy of our model as
F(/3) = log((| a(l — k2) Q'3 + (ak2)@). From the free energy, the largest network
Lyapunov exponent is derived as a function of the diffusive coupling strength
k2 and the slope of the local maps a, according to the formula

X= -^F(/3,k2)\0=1, (6)

which yields the final result


- ' 1 - k2 log(| a(l - k2) |) + ak2 log(ak2)
K{a,k2) — (7)
a | 1 — k2 I +ak2
Fig. 2a shows a contour plot of A n (a, k2), for identically coupled identical tent
maps, over a range of {a, A;2}-values. In Fig. 2b, a cut through this contour plot
is shown, at parameters that correspond to those used in the numerical simu-
lations of the biologically motivated, variable coupled map lattice, displayed in
Fig. 2c. The qualitative equivalence of the two approaches is easily seen. Nu-
merical simulations of coupled parabola show furthermore that the displayed
characteristic behavior is preserved even in the presence of non-hyperbolicities.
As a function of the slope a of the local tent map (which corresponds to the
local excitability K) and of the coupling strength k2, contour lines indicate the
instability of the network patterns. As can be seen, due to the coupling, even
for locally chaotic maps (a > 1), stable network patterns may evolve (often
114

in the form of statistical cycling, see [11])- Upon further increasing the local
instability, chaotic network behavior of turbulent characteristics emerges.

Figure 2. a) Network Lyapunov exponent A n , describing the stability of patterns of a network


of coupled tent maps, as a function of the absolute site map slope a and coupling k^.
Contour lines of distance 0.25 are drawn, where the heavy line, indicating the location of
zero Lyapunov exponents \ n — 0), separates stable networks (to the left) from unstable
networks (to the right), b) Cut through the contour plot of a), at a = 1.25. c) Maximal
site-Lyapunov exponent A m a x of a network of locked inhibitory site maps, as a function of
the coupling £2- For the network, the local excitability is K = 0.5 for all sites, and Q is from
the interval [0.8,0.85]. The behavior of this network closely follows the behavior predicted
by the tent-map model. For some simulations, the peak at &2 = 1 is not so obvious. These
cases are better modeled by a slightly modified behavior [4].

The stable patterns do not correspond to emergent macroscopic, compa-


rable to synchronized, behavior. Therefore, in order to estimate the potential
for synchronization, we need to concentrate on the parameter region where
macroscopic patterns evolve, that is, on the statistical cycling regime. How-
ever, the parameter space that corresponds to this behavior, is very small,
even for the tent map model. When we compare the model situation with our
simulations from biologically motivated variable networks, we again observe
that the overall picture provided by the tent map model of identical maps still
applies. To show this in a qualitative manner, we compare the contour plot
115

of the tent map model with the numerically calculated Lyapunov exponent of
the biological network, which shows the identical qualitative behavior. Based
on our insight into the tent-map model behavior, we conclude that in the bio-
logically motivated network, a notable degree of global synchronization would
require a large subset of all binary connections to be in the chaotic interaction
regime. This possibility, however, only exists for the inhibitory connections
(excitatory connections are unable to reach this state [10]). Moreover, the
part of the phase space on which the chaotic maps dwell, is rather small (al-
though of nonzero measure, see [11]). It is then reasonable to expect that for
the network including biological variability, statistical cycling is of vanishing
measure, and therefore cannot provide a means of synchronizing neuron fir-
ing on a macroscopic scale. To phrase it more formally: this implies that by
methods of self-organization, the network cannot achieve states of macroscopic
synchronization. In addition, we also investigated whether Hebbian [13] learn-
ing rules acting on the weak connections between centers of stronger coupling
could be a remedy for this lack of coherent behavior. Even with this additional
mechanism, the model does not show macroscopic synchronization.
The observation that the tent and the biological response site maps yield
qualitative identical properties, has some additional bearing. In simple systems
of nearest-neighbor coupled map lattices, it is found that the order parameter
corresponding to the average phase would display a phase transition at high
enough coupling strength, as the system is essentially equivalent to an Ising
model at a finite temperature. This is qualitatively similar to our model, where
a first order phase transition is observed at the coupling &2 = 1, for all values
of the local instability.

Synchronization via thalamic input

Layer IV's task is believed to be more centered around amplification and coor-
dination, than on computation. Accurately modeling layer IV is more difficult,
since, because of the smaller size of its typical neurons, it is difficult to measure
in-vitro response profiles. Our model of the "amplifier" layer IV is based on
biophysically detailed, variable model neurons that are connected in an all-to-
all fashion. This ansatz is partially motivated by the facts that layer IV is more
densely connected than other layers and that natural, measured responses can
easily be reproduced when using this setting. If synchronization - understood
as an emergent, not a feed-forward property - is needed for computational and
cognitive tasks, the question remains how this property is generated. In our
simulations of biophysically detailed models of layer IV cortical architecture
[3,14], we discovered a strong tendency of this layer to respond to stimula-
116

tion with coarse-grained synchronization. This synchronization is based on


intrinsically non-bursting neurons that develop the bursting property, as a
consequence of the recurrent network architecture and the feed-forward thala-
mic input. Detailed numerical simulations yield the result that, in the major
part of the accessible parameter space, collective bursting emerges. That is,
all individual neurons are collectivized, in the sense that, in spite of their in-
dividual characteristics, they all give rise to dynamics with very similar, on a
coarse grained scale synchronized, characteristics (see Fig. 3). In fact, using
methods of noise cleaning (noise, in this sense, are small variations due to the
individual neuron characteristics), we find that the collective behavior can be
represented in a four-dimensional model, having a strong positive (Ai ~ 0.5),
a small positive, a zero and a very strong negative Lyapunov exponent. More
explicitly, it is found that the basic behavior of the involved neuron types are
identical and hyperchaotic [15]. The validity of the latter characterization has
been checked by comparing the Lyapunov dimensions {(IKY ~ 3.5) with the
correlation dimensions (d ~ 3.5). Moreover, different statistical tests have been
performed to assess that noise-cleaning did not modify the statistical behavior
of the system in an inappropriate way. As a function of the feed-forward input
current, we observed an astonishing ability of the layer IV network to generate
well-defined firing patterns.

i i i i i i i i i

kkk
-80 -40 0
time lag [ms]
40 80

Figure 3. Coarse-grained synchronized activity of layer IV dynamics, where 80 excitatory and


20 inhibitory individual neurons are coupled in an all-to all fashion. The cross-correlogram
between two excitatory neurons indicates strong synchronization.
117

When we compared the responses from the layer IV model with data from
in vivo anesthetized cat (17 time series from 4 neurons of unspecified type from
unspecified layers), we found corresponding behavior. Not only the Lyapunov
exponents (generally hyperchaotic, Xmax ~ 0.8, the second exponent little
above zero), correspond well to the simulated ones from the model {\max ~ 0.5,
second exponent also little above zero). Also, the measured dimensions were
in the range predicted by the model of layer IV; specific characteristic patterns
found in vivo could be reproduced by our simulation model with ease. Of
particular interest are step-wise structures found in the log-log-plots used for
the evaluation of the dimensions [16] (see Fig. 1). However, as the majority of
the measured in vivo neurons could not be attributed to layer IV, the natural
hypothesis is that the other layers inherit these characteristics from the latter.

4 Complexity of network response and in vivo data

To investigate in more detail the relation between layer IV with the remaining
layers, we calculated for the coupled map lattice model a recently proposed,
biologically relevant complexity measure, Cs(l,0) [17]. To evaluate this quan-
tity, we first calculated from the largest eigenvalue the free energy of the net-
work, and from this quantity, C s ( l , 0 ) . We find that the highest complexities
are situated in the area beyond the line separating negative and positive Lya-
punov exponents (see Fig. 4, and compare with Fig. 2). Moreover, the area of
highest complexity roughly coincides with the area where the model Lyapunov
exponents agree with the in vivo measured exponents. This suggests that the
natural system has the tendency of being tuned to a state of high complexity.
These coincidences of modeling and experimental aspects lead us to be-
lieve that the ability of the network, to fire in well-separated characteristic
time scales or in whole patterns, is not accidental, but serves to evoke corre-
sponding responses by means of resonant cortical circuits. However, as has
been mentioned above, not every neuron shares this property. In our recent
studies of in vivo anesthetized cat data, we found, in evoked or spontaneous
firing experiments, essentially three different classes of behavior. The neurons
of the first class show no patterns in their firing at all. The neurons of the
second class are able to pick up stimulation patterns and convert them into
well-defined firing patterns. Neurons of the third class respond with smeared
patterns that seem not to be compatible with the stimulation paradigms (for
the first two classes see Fig. 1). With regard to the interspike distributions,
for the first class, a long-tail behavior of the interspike distribution is charac-
teristic. For the second class, a clean separation of the distribution into two
regimes, dominated by individual interspike intervals and by compound pat-
118

1.5

k2

0.5

0
0 1 2 3 4
a

Figure 4. Region in the parameter space (site map slope a and coupling £2), where the
biologically relevant complexity measure C$(l,0) is maximal. The contour lines increase in
steps of 0.1, starting from the line at the rigth upper corner with C s ( l , 0 ) = 0.1. From the
comparison of Lyapunov exponents, it is suggestive that the measured biological neurons are
tuned to working points of maximal complexity.

terns, respectively, is found. The characteristics of the last class indicate a


mixture of the properties of the two other classes. In all cases, the behavior at
long interspike interval times is governed, in the log-log plot, by a linear part,
i.e., is long-tail.

5 Relevance of cortical chaos

Chaotic firing emerges from the proposed model, as well as from the in vivo
data that we compare with, with nearly identical Lyapunov exponents and
fractal dimensions. The agreement between Kaplan-Yorke and correlation di-
mensions [12] corroborates the consistency of the results obtained. The ques-
tion then arises of what functional, possibly computational, relevance this phe-
nomenon could be associated with? Cortical chaos essentially reflects the abil-
ity of the system to express its internal states (e.g., a result of computation) by
choosing among different interspike intervals (ISI) or, more generally, among
distinct patterns of firing. This mechanism can be viewed in a broader con-
text. Chaotic dynamics is generated through the interplay of distinct unstable
periodic orbits, where the system follows a particular orbit until, due to the
instability of the orbit, the orbit is lost and the system follows another orbit,
and so on. It is then natural to exploit this wealth of structures hidden within
chaos, especially for technical applications. The task that needs to be solved to
119

do so, is the so-called targeting, and chaos control, problem: The chaotic dy-
namics first needs to be directed onto a desired orbit, on which it then needs to
be stabilized, until another choice of orbit is submitted. From an information-
theoretic point of view, information content can be associated with the different
periodic orbits. This view is related to beliefs that information is essentially
contained in the patterns of neuronal firing. If well-resolved interspike intervals
can be extracted from the spike trains of a neuron, the interspike lengths can
directly be mapped onto symbols. A suitable transition matrix then specifies
the allowed, and the forbidden, successions of interspike intervals. I.e., this
transition matrix provides an approximation to the grammar of the natural
system. In the case of collective bursting, it may be more useful to associate
information content with firing patterns consisting of characteristic intermit-
tent successions of spikes. In a broader context, the two approaches can be
interpreted as realizations of a statistical mechanics description by means of
different types of ensembles [18-19].
In the case of artificial systems or technical applications, strategies on how
to use chaos to transmit messages, and more generally, information, are well
developed. One basic principle used is that small perturbations applied to a
chaotic trajectory are sufficient to make the system follow a desired symbol
sequence containing the message [20]. This control strategy is based upon the
property of chaotic systems known as "sensitive dependence on initial condi-
tions" . Another approach, which is currently the focus of applications in areas
of telecommunication, is the addition of hard limiters to the system's evolution
[21-22]. This very robust control mechanism, can, due to its simplicity, even
be applied to systems running at Giga-Hertz frequencies. It has been shown
[23] that optimal hard limiter control leads to convergence onto periodic or-
bits in less than exponential time. In spite of these insights into the nature
of chaos control, which kind of control measures should be associated with
cortical chaos, however, is unclear. In the collective bursting case of layer IV,
one possible biophysical mechanism would be a small excitatory post-synaptic
current. When the membrane of an excitatory neuron is perturbed at the
end of a collective burst with an excitatory pulse, the cell may fire additional
spikes. Alternatively, at this stage inhibitory input may prevent the appear-
ance of spikes and terminate bursts abruptly. In a similar way, also the firing
of inhibitory neurons can be controlled. Another possibility, is the use of local
recurrent loops to establish delay-feedback control [24]. In fact, such control
loops could be one explanation for the abundantly occurring recurrent con-
nections among neurons. The relevant parameters in this approach are the
time delay of the re-fed signal, and the synaptic efficacy, where especially the
latter seems biologically realistic. In addition to the encoding of information,
120

one also needs read-out mechanisms, able to decode the signal at the receiver's
side. Thinking in terms of the encoding strategies outlined above, this would
amount to the implementation of spike-pattern detection mechanisms. Besides
simple straightforward implementations based on decay times, more sophisti-
cated approaches, such as the recently discovered activity-dependent synapses
[25-27], seem natural candidates for this task. Also the interactions of synapses,
with varying degrees of short-term depression and facilitation, could provide
the selectivity for certain spike patterns. Small populations of neurons, which
(due to variable axonal, and synaptic potential propagation delays) achieve
supra-threshold summation only for particular input spike sequences, is yet
another possible mechanism.

6 Conclusion

We find that the origin of synchronized firing in the cortex - if its existence can
be proven experimentally - would most likely be in layer IV, and be heavily
based on recurrent connections and simultaneous thalamic feed-forward in-
put. We expect that firing in patterns, in this layer, is able to trigger specific
resonant circuits in other layers, where then the actual computation is done
(which we propose to be based on the symbol set of an infinity of locked states
[4]). It is a fact that the majority of ISI measurements from in vivo cat vi-
sual cortex neurons and simple statistical models of neuron interaction show
emergent long-tail behavior, but this also might be the result of the interaction
between different areas of the brain or a consequence of the input structure,
or of a mixture of all of them. Long-tail interspike interval distributions are
in full contrast to the current assumption of a Poissonian behavior that orig-
inates from the assumption of random spike coding. We propose that in the
measurements that are compatible with the Poissonian spike train assump-
tion, layer IV explicitly shuts down the long-range interactions via inhibitory
connections or by pumping energy into new temporal scales that no longer
sustain the ongoing activity. Long-tailed ISI distributions could, however, also
be of relevance from the point of view of the tendency of the system of tuning
itself to a state of maximal complexity Cs(l,0). At such a state, due to the
structure of the measure, long range and slowly decaying correlations can be
expected to dominate the dynamical behavior.
Recently, it has been reported that noise can synchronize even coupled map
lattices with variable lattice maps [28]. It will be worthwhile investigating
whether this property also holds for our site maps (which we are willing to
believe), and whether this finally can be attributed to a kind of mean-field
activity generated by the noise. To what extent the assumptions made are the
121

relevant ones will have to be explored in continued work in the future, from
the biological, as well as from the mathematical side.

References

1. Chaos and Noise in Biology and Medicine, eds. M. Barbi and S. Chillemi
(World Scientific, Singapore, 1998).
2. J.K. Douglass, L. Wilkens, E. Pantazelou, and F. Moss, Nature 365, 337-
340 (1993).
3. R. Stoop, D. Blank, A. Kern, J.-J. v.d. Vyver, M. Christen, S. Lecchini,
and C. Wagner, Cog. Brain Res., in press (2001).
4. R. Stoop, L.A. Bunimovich, and W.-H. Steeb, Biol. Cybern. 83, 481-489
(2000).
5. C. Von der Malsburg in Models of Neural Networks II, eds. E. Domany,
J. van Hemmen, and K. Schulten, 95-119 (Springer, Berlin, 1994).
6. W. Singer in Large-Scale Neuronal Theories of the Brain, eds. C. Koch
and J. Davis, 201-237 (Bradford Books, Cambridge MA, 1994).
7. R. Stoop, K. Schindler, and L.A. Bunimovich, Acta Biotheoretica 48,
149-171 (2000).
8. R. Stoop in Nonlinear Dynamics of Electronic Systems, eds. G. Setti, R.
Rovatti, G. Mazzini, 278-282 (World Scientific, Singapore, 2000).
9. F.C. Hoppensteadt and E.M. Izhiekevich, Weakly Connected Neural Net-
works (Springer, New York, 1997).
10. R. Stoop, K. Schindler, and L.A. Bunimovich, Nonlinearity 13, 1515-1529
(2000).
11. J. Losson and M. Mackey, Phys. Rev. E 50, 843-856 (1994).
12. R. Stoop and R F . Meier, J. Opt. Soc. Am. B 5, 1037-1045 (1988); J.
Peinke, J. Parisi, O.E. Roessler, and R. Stoop, Encounter with Chaos
(Springer, Berlin, 1992).
13. D. Hebb, The Organization of Behavior (Wiley and Sons, New York,
1949).
14. D. Blank, PHD thesis (Swiss Federal Institute of Technology ETHZ, 2001).
15. O.E. Roessler, Phys. Lett. A 7 1 , 155-159 (1979).
16. A. Celletti and A. Villa, Biol. Cybern. 74, 387-393 (1996).
17. R. Stoop and N. Stoop, submitted (2001).
18. R. Stoop, J. Parisi, and H. Brauchli, Z. Naturforsch. a 46, 642-646 (1991).
19. C. Beck and F. Schloegel, Thermodynamics of Chaotic Systems: An In-
troduction (Cambridge University Press, Cambridge, 1993).
122

20. S. Hayes, C. Grebogi, E. Ott, and A. Mark, Phys. Rev. Lett. 73, 1781-
1784 (1994).
21. N. Corron, S. Pethel, and B. Hopper, Phys. Rev. Lett. 84, 3835-3838
(2000).
22. C. Wagner and R. Stoop, Phys. Rev. E 63, 017201 (2000).
23. C. Wagner and R. Stoop, J. Stat. Phys. 106, 97-107 (2002).
24. K. Pyragas, Phys. Lett. A 170, 421-428 (1992).
25. L. Abbott, J. Varela, K. Sen, and S.B. Nelson, Science 275, 220-224
(1997).
26. M.V. Tsodyks and H. Markram, Proc. Natl. Acad. Sci. USA 94, 719-723
(1997).
27. A. M. Thomson, J. Physiol. 502, 131-147 (1997).
28. C. Zhou, J. Kurths, and B. Hu, Phys. Rev. Lett. 87, 098101 (2001).
123

SELECTIVITY P R O P E R T Y OF A CLASS OF E N E R G Y B A S E D
L E A R N I N G RULES IN P R E S E N C E OF N O I S Y SIGNALS

A.BAZZANI, D.REMONDINI
Dep. of Physics and Centro Interdipartimentale Galvani Univ. of Bologna,
v.Irnerio 46 40126 Bologna, ITALY
and INFN sezione di Bologna E-mail: bazzani@bo.infn.it

N.INTRATOR
Inst, for Brain and Neural Systems, Brown Univ., Providence 02912 RI, USA

G.CASTELLANI
DIMORFIPA and Centro Interdipartimentale Galvani, Univ. of Bologna, v. Tolara
di Sopra 50, 4OO64 Ozzano dell'Emilia, ITALY

We consider the selectivity property of a class of energy based learning rules with
respect to the presence of clusters in the input distribution. These rules are a
generalization of the BCM learning rule and use the distribution momenta of order
> 2. The analytical results show that selective solutions are possible for noisy input
data up to a certain signal to noise ratio and that the introduction of a bias in
the input signal could improve the selectivity. We illustrate this effect with some
numerical simulations in a simple case.

1 Introduction to the B C M neuron model

The BCM neuron 1 has been introduces to analyze the plasticity property of a
biological neuron. In particular the model takes into account the LTP (Long
Term Potentiation) 7 and LTD (Long Term Depotentiation) 4 phenomena ob-
served in the visual neuron response under modification of the experience. In
its simplest formulation the BCM model assumes that the neuron response c
to an external stimulus d 6 R " is linear: c = m • d where m are the synaptic
weights. The changing of the weights m is described by the equation

m = ${c,0)d (1)

where 8 is an internal threshold of the neuron and the typical shape of the
function $ is given by figure 1. At each time the external signal d can be
considered the realization of a random variable of given distribution. If the
threshold 9 is fixed the equilibrium position c = 9a is unstable and only
the LTD behavior is described by eq. (1). To overcome this difficulty one
introduces the hyphotesis that the threshold 6 depends on the external envi-
124

Figure 1. Typical shape of the function <I>(c,#) that defines the BCM neuron.

ronment0, according to

= < c2 > ' - * (2)


where < > denotes the average with respect to the input distribution. Using
the definition (2), we get a stochastic differential equation whose integration
should be explained. Let At the integration step we have

m{t + At) - m(t) + $(m(£) • d, 6)d At; (3)


if the realizations of d are independent in the limit At —> 0, one can prove
that m(t) satisfies the average equation 6

m=< $((m(t) -d,0)d> (4)


which substitutes the initial equation (1). A simple class of possible functions
$ is given by

${c,6)=c(c .p-2 _ 0)
(5)
and the average equation (4) reads rh = d£/dm where we have introduced
the energy function

< c p > _ <c2 >«


£ = (6)
V 2q

" T h e external environment is assumed to be stationary.


125

The case p = 3 and q — 2 is commonly referred as the BCM case. Due to


the presence of high order momenta (p > 3) the energy function (6) pro-
vides a complex non-supervised analysis of the data, performing exploratory
projection pursuit that seeks for multi-modality in data distributions 5 .
The existence of stable equilibria is related to the existence of local minima
for the energy (6). A simple calculation shows that the condition p < 2q is
necessary for the existence of stable non-trivial fixed points in the equation
(4), that performs a LTP behavior. We are interested in stable solutions that
select different clusters eventually present in the data distribution. A ideal
selective solution should give a high output c when the input d belongs to a
single fixed cluster and c = 0 if d belongs to the other clusters.
A general approach to study the stability property of the equilibrium
points gives the following Proposition 2 :
the stable equilibrium points m* of eq. (4) are related to the local maxima
i/* of the homogeneous function

Tvp
f(y) = -j- y G Rn (7)

constrained on the unit sphere yCy = 1 where T is the symmetric p-tensor


associated to the p-momenta of the input distribution

Tilt...,in =<dil....din > ii,...,in = l,...,n (8)

and C is the metric defined by the second moment matrix CV, = < didj >;
the correspondence is explicitly given byTO*= pf(y*)y*-

2 Selective equilibria for the B C M neuron

According to a previous result 3 the BCM neuron has a selectivity property


with respect to n linear independent vectors Vj e R". Let the input signal
rfbea discrete random variable that takes values among the vectors Vj with
equal probability l/n and let Oj = y • Vj, a standard calculation shows that
the function (7) can be written in the form
126

constrained on the unit sphere V - o2- = n. Let us suppose oi > 0b, then by
differentiating eq. (9) we get the system

o ^ + o r = 0
r * * = - * j=2,..,n (10)
J
aoj aoj o\
Then the critical points are computed from the equations
2 2
0>r -°r )=° j=2,.-,n (ii)
It is easy to check the existence of a local maximum Oj = 0, j = 2, ..,n and
O! = y/n so that the BCM neuron is able to select only the first vector i>i
among all the possible input vector. According to the relation Oj = y • Vj
the vectors y defined by the equilibrium solutions are directed as the dual
base of the input vector Vj. We call this property the selectivity property of
the BCM neuron 3 . We observe that in the value of selective solutions o are
independent from the lengths and the orientation of the input vectors v; this
is not true for the corresponding outputs c = m • v of the neuron whose values
depend on the choice of the signals v. However the numerical simulations
show that the basin of attraction of the stable selective solutions has a strong
dependence from the norms of the vectors v; this effect makes very difficult
to distinguish signals of different magnitude and we assume that a procedure
could be applied in order to normalized the input signals.
The situation becomes more complicate when we consider the effect of an
additive noise on the selectivity property of a BCM neuron. Let us define the
random variable

where £ is a random vector that selects with equal probability one of the vector
Vj and 77 is a gaussian random vector where < rjj > = 0 and < rjiTjj >= Sijcr2 jn
i,j = 1,.., n c . For seek of simplicity we consider the case p = 3, q = 2, but the
calculations can be generalized in a straightforward way. It is convenient to
introduce the matrix V whose lines are the vectors v and the positive define
symmetric matrix S = VVT where VT is the transposed matrix. The cubic
function (7) can be written in the form

/(°) = ^ E o , 3 + ^ ( ° - ^ l o ) I > (13)


k
Due to the parity property of the function (7) we can restrict our analysis to the subspace
oj > 0 j = 1,.., n
c
This choice allows to keep constant the signal to noise ratio varying the dimensionality n.
127

and it is constrained on the ellipsoid


n
YJo)+a2{o-S~1o)=n (14)
3= 1

2
After some algebraic manipulations we get the equation

oi+az(S L
o)i = —-2 \ ii s

i=i i=i (15)

,2 +2^(5^0), ^ o , + n - X : <
j=i j=i

According to our ansatz the r.h.s. of eq. (15) is of order 0(a2)/oi so that we
can estimate
2 (0{l)
oi ~ cH I — ^ + aO(l)oi j (16)

where we have defined a = max; = 2,.., n IS/i*! to take into account the leading
term of (S~1o)i and 0(a2) to denote a term of order a2. Eq.(16) shows that
we can have a selective solution only if a <C \fn and a ~ 1/n. If we substitute
eq. (16) into the equation (15) and considering the leading terms, we get a
biquadratic equation for o\

4
o?+a (n-l)[^+aO(lK] + 0(n) = 0 (17)

where we have estimated a2(o- S~1o) ~ 0(n). If a ~ 1/n andCT4< 0 ( n ) , the


equation (17) has a solution 0\ = 0(^/n) and eq. (16) provides o; = 0(a2 /y/n)
according to our initial ansatz. Moreover there is a critical value <rc = 0{n1li)
for the existence of the selective solution: if a > crc a bifurcation occurs in
the solution space and the selective solution becomes complex 2 .
The basin of attraction of each stable solution cannot be analytically
estimated by the previous methods, but one has some indications from the
numerical results. A big difference among the measures of the various basin
could destroy the selectivity property since some selective solutions has a very
low probability to attract the initial condition of the synaptic weights. As we
have observed there is a strong dependence of the basin measure from the
norm of the input vectors and to avoid this effect we assume that \\d\\ ~
1 applying a normalization procedure when it is necessary. Therefore we
consider essentially the selectivity property of a BCM neuron with respect to
128

the different directions of the unperturbed signal Vj. The role of the noise
amplitude a is to reduce the "attraction force" of the stable solutions and
eventually to change the stability property through a bifurcation phenomenon
when it exceeds a critical value.

3 Effect of a bias in B C M neuron

The quantity a (see eq. (17)) plays a crucial role in the selectivity property
of the BCM neuron. It has a easy geometrical interpretation in the space of
the input signals d (cfr. eq. (12)). According to the definition of the matrix
S, a = max/ =2 ,..,n {S^l is directly related to the projection of the vector vi
on the hyper-plane defined by the vectors v2,....,fTl. Indeed a is equal to
0 when v\ is orthogonal to the other vectors vi I = 2,..,n . The equations
(15) and (17) indicate that for a fixed level a of the noise the selectivity
for the vector v\ of the BCM neuron is maximum when vi is orthogonal
to the other unperturbed vectors v2,----,vn. This remark suggests that the
selectivity property of a BCM neuron whose input distribution has the form
(12), the selectivity property for a given unperturbed vector v\ is optimized
if one introduces a strategy to satisfy the condition v\ • vi = 0 for I = 2, ..,n.
A common procedure to optimize the performance of a neural network is to
introduce a bias b in the input signals. This is equivalent to translate the
input distribution (cfr. eq. (12))

d = J2 vAi -b + r} = J2(VJ ~ Wi + V ( 18 )
3 3

where b £ R n and we have used the property V • £j = 1. Then we choose b


in order to satisfy the conditions
(vi-b)-(vt-b) =0 l = 2,..n (19)
A simple solution to (19) is given by a linear combination of the vectors
v2, ...,t>n: b = Y^k=2 PkVk where the coefficients (3k satisfy to the equation

( v i -J^PkVk) -vi = 0 l = 2,..n (20)

The solution exists since V2,.-.,vn are linearly independent; it is straightfor-


ward to verify that vi — b is also orthogonal to the vector b. If one introduces
the bias b, the matrix S is diagonalized in two blocks (the first block is the
element Su) and a = 0; therefore we expect an increase of the selectivity for
the input signal v\. We observe that different bias are necessary to increase
129

the selectivity with respect to the different input signals Vj and an exhaustive
analysis of the input space would require a neural network of n neurons with
inhibitory connections.
The introduction of a bias b changes the norms of the input vectors so
that it is necessary to apply a normalization procedure that could decrease the
signal to noise ratio; this may destroy the advantages of a bias. At the moment
it is not available an efficient procedure which automatically computes the
bias; this problem is at present under consideration.

4 Numerical simulations

In order to show the selectivity property of a BCM neuron and the effect
of a bias we have considered a the plane case; the input distribution has
been defined by eq. (12) where the vectors v\,V2 lie on the unit circle at
different angles a. We have normalized each input vector d on the unit circle
so that the effect of noise enters only in the phase; the noise level a has been
varied in the interval [0,1]. We study two cases for the energy function (6):
p = 3 and g = 2 that corresponds to the BCM neuron and p = 4 and g = 3
that simulates kurtosis-like learning rule for the neuron. The initial synaptic
weights are choosen in a neighborhood of the origin near the vertical axis. To
quantify the separability we introduce the quantity

A = \m • v2\ - \m • vi\ (21)


V2a\\m\
that measures the distance between the projections of the signals v\ and V2
along the direction of the stable solution m*. When A is > 1 we can detect
the presence of two clusters in the input distribution with high probability.
In the fig. 2 we report A as a function of noise level a in the case of a
separation angle a — 90° and a = 10°; we have used a statistics of 106 input
vectors. We observe that in the case of the BCM neuron and a = 90 the
selectivity decreases in a sudden way at a certain value of the noise level.
This effect is due to the presence of a critical noise level ac (see eq. (17)) at
which the selective solutions bifurcate in the complex plane. In the case of the
kurtosis-like neuron the presence of a critical value ac is not detected in the
figure 2; this is a peculiar property of the kurtosis energy and of the relation
between the second and the fourth momenta of a gaussian distribution. This
is illustrates in the fig. 3 where we have plotted the neuron outputs o = y -v\
(black curve) and o = y • v? (red curve) in the case a = 90°: the presence
of a bifurcation for the BCM neuron (left plot) is clear. However in the case
a = 10° the selectivity of the kurtosis-like neuron is lost very soon (fig. 2
130

0 0.2 0.4 0.6 0

Figure 2. Separability property for a BCM (black circles) and kurtosis-like (red squares)
neuron; the left plot refers to a separation angle a = 90° between the input signals whereas
the right plot refers to a separation angle a — 10°; we have used a statistics of 10 6 input
vectors and the threshold A = 1 is also plotted.

0
0.5

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

Figure 3. Normalized neuron outputs o = y • VJ j = 1,2 for the selective solution in the
BCM case (left plot) and in the kurtosis-like case (right plot); the separation angle between
the input vectors is a = 90°.

right); this is the effect of the appearance of a stable non-selective solution


that attracts the neuron. We have checked the effect of a bias b that selects
the second input V2 in the case of a separation angle a = 10°. The results are
plotted in figure 4. The fig. 4 shows that the introduction of a bias increases
the selectivity both for the BCM and kurtosis-like neuron; both the neurons
loose their selectivity at a noise level a ~ .6, but the BCM neuron performs
a better separation of the input clusters.
131

100

10

A
0.1

0.01

0.001

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1


a a

Figure 4. Comparison of the separability property without (circles) and with (squares) a
bias for a separation angle a = 10° between the input vectors: the left plot refers to the
BCM neuron whereas the right plot to the kurtosis-like neuron.

5 Conclusions

The analytical study of the selectivity property of neurons whose learning


rules depend on the input distribution momenta of order > 3, suggests that
a better performance could be achieved by using a bias in the input data. A
numerical simulation on a simple example shows that this prevision is correct
also for noisy input data. However further studies are necessary to understand
the effect of a low statistics in the input data since the bias could decrease
the signal to noise ratio. Moreover an algorithmic procedure to compute the
bias is not yet available.

References

1. E. L. Bienenstock and L. N Cooper and P. W. Munro,Journal of Neuro-


science 2, 32 (1982).
2. A. Bazzani, D.Remondini,N. Intrator and G. Castellani,submitted to
Neural Computation , (2001).
3. C.Castellani, N.Intrator, H.Shouval and L.N.Cooper, Net-
work:Comput.Neural Syst. 10, 111 (1999).
4. Dudek, S. M. and Bear, M. F. Proc. Natl. Acad. Sci. 89, 4363 (1992).
5. Friedman, J. H. J. Am. Stat. Ass. 82, 249 (1997).
6. N. Intrator and L. N Cooper,Neural Networks 5, 3 (1992).
7. A. Kirkwood, H.-K. Lee and M. F. Bear, Nature 375, 328 (1995).
132

Pathophysiology of Schizophrenia: fMRI and Working Memory

GIUSEPPE BLASI AND ALESSANDRO BERTOLINO

Universita degli Studi di Bari, Dipartimento di Scienze Neurologiche e Psichiatriche

P.zza Giulio Cesare, 11 -70124 - Bari, Italy


E-mail: bertolia@psichiat.uniba.it

Functional Magnetic Resonance (fMRI) is an imaging technique with high spatial and temporal
resolution that allows investigation of in vivo information about the functionality of discrete neuronal
groups during their activity utilizing the magnetic properties of oxy- and deoxy-hemoglobin. fMRI
permits the study of normal and pathological brain during performance of various neuropsychological
functions. Several research groups have investigated prefrontal cognitive abilities (including working
memory) in schizophrenia using functional imaging. Even if with some contradictions, large part of these
studies have reported relative decrease of prefrontal cortex activity during working memory, defined as
hypofrontality. However, hypofrontality is still one of the most debated aspects of the patophysiology of
schizophrenia because the results can be influenced by pharmacotherapy, performance and chronicity.
The first fMRI studies in patients with schizophrenia seemed to confirm hypofrontality. However, more
recent studies during a range of working memory loads showed that patients are hypofrontal at some
segments of this range, while they are hyperfrontal at others. These studies seem to suggest that the
alterations of prefrontal functionality are not only due to reduction of neuronal activity, but they probably
are the result of complex interactions among various neuronal systems.

Functional Magnetic Resonance Imaging (fMRI)

Like its functional brain imaging forebears single photon emission tomography
(SPECT) and positron emission tomography (PET), fMRI seeks to satisfy a long-
term desire in psychiatry and psychology to define the neurophysiological (or
functional) underpinnings of the so-called 'functional' illnesses.
For much of the last century, attempts to define the 'lesions' causing these
illnesses, such as schizophrenia, major depression and bipolar disorder, have been
elusive, leading to their heuristic differentiation from 'organic' illnesses, like stroke
and epilepsy, with more readily identifiable pathogenesis.
FMRI offers several advantages in comparison to functional nuclear medicine
techniques, including low invasiveness, no radioactivity, widespread availability
133

and virtually unlimited study repetitions [49]. These characteristics, plus the relative
ease of creating individual brain maps, offer the unique potential to address a
number of long-standing issues in psychiatry and psychology, including the
distinction between state and trait characteristics, confounding effects of medication
and reliability [80]. Finally, the implementation of 'realtime' fMRI will allow
investigators to tailor examinations individually while a subject is still in the
scanner, promising true interactive studies or 'physiological interviews' [26].

The physical basis of fMRI is the blood oxygenation level dependent (BOLD)
effect, that is due to oxygenation-dependent magnetic susceptibility of hemoglobin .
Deoxyhemoglobin is paramagnetic, causing slightly attenuated signals intensity in
MRI image voxel containing deoxygenated blood. During neuronal firing, localized
increases in blood flow oxygenation and consequently reduced deoxyhemoglobin
cause the MRI signal to increase. It is therefore assumed that these localized
increases in BOLD contrast reflect increases in neuronal activity.
The BOLD mechanism has been further clarified by more recent experiments.
By using imaging spectroscopy, which allows selective measurement of both
deoxyhemoglobin and oxyhemoglobin, Malonek and Grinvald [52] demonstrated
that hemoglobin-oxygenation changes in response to neuronal activation are
biphasic: an early (<3 s), localized increase in deoxyhemoglobin (often referred to
as the 'initial dip') is followed by a delayed decrease in deoxyhemoglobin and a
concomitant increase in oxyhemoglobin. Malonek et al. showed that the initial
increase in deoxyhemoglobin is caused by an increase in cerebral metabolic rate of
oxygenation without matching cerebral blood flow response. The later increase in
cerebral blood flow causes the subsequent decrease in deoxyhemoglobin and the
concomitant increase in oxyhemoglobin [51].

Working memory

Working memory is a construct that describes the ability to transiently store and
manipulate information on line to be used for cognition or for behavioral guidance
[2,40]. A key aspect of working memory is its capacity limitation, usually reflected
in cognitive testing as decreasing performance in response to increasing working
memory load [31,45,56,65]. Numerous functional neuroimaging studies have used
the spatial location and temporal characteristics of the 'activation' response during
working memory to localize this cognitive phenomenon to regionally distinct
components within a larger distributed network [8,16,17,18,19,44,54]. For example,
activation in dorsolateral prefrontal cortex (DLPFC) appears to be related to the
active maintenance of information over a delay [17,19] and/or the manipulation of
this information [68]. In contrast, activation in areas like the anterior cingulate is
more the result of increase effort or task complexity [3,12,58].
Parametric working memory tasks, most notably the popular 'n-back' task [32],
134

are ideally suited to examine issues of dynamic range since working memory load
can be increased during the same experiment. The 'no back' control task simply
requires the identification of the number currently seem. The working memory
conditions require the encoding of currently seen numbers and the concurrent recall
of numbers previously seen and retained over a delay: as memory load increased,
the task required the recollection of respectively one stimulus ('one back'), two
stimuli ('two back') or three stimuli ('three back') previously seen while encoding
additional incoming stimuli.
Callicott et al. have identified characteristics of working memory capacity
using this parametric 'n-back' working memory task involving increasing cognitive
load and ultimately decreasing task performance during fMRI in healthy subjects.
Loci within dorsolateral prefrontal cortex (DLPFC) evinced exclusively an
'inverted-U' shaped neurophysiological response from lowest to highest load,
consistent with a capacity-constrained response. Regions outside of DLPFC, in
contrast, were more heterogeneous in response and often showed early plateau or
continuously increasing responses, which did not reflect capacity constrains.
However, sporadic loci, including the premotor cortex, thalamus and superior
parietal lobule, also demonstrated putative capacity-constrained responses, perhaps
arising as an upstream effect of DLPFC limitations or as a part of a broader
network-wide capacity limitation. These results demonstrate that regionally specific
nodes within the working memory network are capacity-constrained in the
physiological domain [10].

Schizophrenia, fMRI and Working Memory

Based on multiple clinical, neuropathological and functional neuroimaging


studies, it is clear that schizophrenia is a brain disorder arising from subtle neuronal
deficits (for lack of more specific terminology) [73]. These deficits likely arise in a
few key regions such as dorsolateral prefrontal cortex and hippocampal formation,
that result in widespread, multifaceted and devastating clinical consequences [77].
These neuronal deficits are clearly heritable, although in a complex fashion from
multiple genes interacting in an epistatic fashion with each other and the
environment [47,62]. It is reasonable to assume that these neuronal deficits, clearly
resulting in quantifiable behavioral abnormalities in schizophrenic patients, will
produce predictable, quantifiable aberrations in neurophysiology that can be
'mapped' using fMRI.
Attempts to map the physiological signature of putative prefrontal cortex (PFC)
neuronal pathology in schizophrenia have been numerous, but the results have been
inconsistent and controversial. Of the various functional neuroimaging findings in
schizophrenia, reduced function of PFC, so called 'hypofrontality', has been both
the most prominent and controversial [1,28,43,78]. According to its proponents,
hypofrontality is a marker of PFC dysfunction in schizophrenia that most reliably
135

arises during demanding cognitive tasks that tax PFC function [13,79]. A corollary
of this explanation s that cortical activation is relatively 'normal' during cognitive
tasks that are less taxing on PFC [4,74]. On the other hand, critics have raised a
number of objections regarding the relationship between hypofrontality and
schizophrenia, invoking issues of experimental design and related inconsistencies.
For example, an alternative interpretation of hypofrontality is that it arises as an
epiphenomenon of patient behavior, specifically task performance, that is tipically
abnormal in patients with schizophrenia [28,42,61]. Thus, while many studies of
PFC function in schizophrenia have reported reduced PFC activation when patients
perform poorly [11,13,25,27,69,76,77], others have observed normal [21,28,55],
reduced [20,81] and even increased PFC activation [53,69] when patients'
performance is near normal. Regardless of these uncertainties, most authors agree
that the physiological responses of the schizophrenic brain are abnormal when
cognitive challenges are beyond these patients' behavioural capacity [75].
The interpretation of hypofrontality in the context of capacity limitations is
further complicated by recent studies in healthy subjects. For example, Goldberg et
al. found that healthy subjects performing a dual task paradigm became relatively
hypofrontal when pushed beyond their capacity to maintain accuracy [34]. In the
above cited study, Callicott et al. found evidence of an inverted-U shaped PFC
response to parametrically increasing working memory difficulty in healthy subjects
who became relatively hypofrontal as they were pushed beyond their working
memory capacity [10]. In addition, diminished PFC activity coincident with
diminished behavioural capacity has been found in single-unit recording studies in
non-human primates during working memory tasks [29,30] and in
electrophysiological studies in humans attempting complex motor tasks [33]. Thus,
under certain circumstances, hypofrontality can be a normal physiological response
to excessive load. Collectively, these data make it difficult to resolve whether
hypofrontality as a 'finding' in schizophrenic patients is a direct (i.e. disease
dependent) manifestation of PFC pathology or whether hypofrontality simply reflect
diminished behavioural capacity as might occur for any subject pushed beyond
capacity (i.e. disease independent).
To complicate matters further, there is also evidence that the 'healthy'
relationship between reduced working memory capacity and PFC neuronal function
could be over-activation of PFC (i.e. relative hyperfrontality). Rypma and
D'Esposito recently demonstrated that healthy controls who have longer reaction
times during a working memory task respond by increasing activation in dorsal but
not in ventral PFC [63]. They interpreted these results as a reflection of reduced
efficiency of working memory information manipulation within dorsal PFC.
Further, they interpreted the failure of reaction time to correlate with fMRI
activation in ventral PFC as a reflection of the putative link between ventral PFC
and working memory maintenance functions [22,57,66,67,72]. Thus, it is
conceivable that under certain circumstances schizophrenic patients might evidence
over-activation especially in dorsal PFC given their poor performance.
136

Working memory deficits are well documented in many studies of patients with
schizophrenia [14,23,24,35,36,37,46,59,70,80]. While working memory is thought
to be capacity limited in all subjects [45,56], schizophrenic patients appear to have
additional capacity limitations presumed to arise from dorsal PFC dysfunction
[38,39,41].
Callicott et al. [9], using fMRI, mapped the response to varying working
memory difficulty in patients with schizophrenia and healthy comparison subject
using the above cited parametric version of the 'n-back' task. Consistent with earlier
neuropsychological studies, he found that patients with schizophrenia have limited
working memory capacity compared to healthy subjects. Although patients
activated the same distributed working memory network, the response of patients
with schizophrenia to increasing working memory difficulty was abnormal in dorsal
PFC. The salient characteristic of PFC dysfunction in schizophrenia in this
paradigm was not that the PFC was relatively 'up' or 'down' in terms of activation
when compared to healthy subjects; rather, the salient characteristic was an
inefficient dynamic modulation of dorsal PFC neuronal activity. While several
regions within a larger cortical network also showed abnormal dynamic responses
to varying working memory difficulty, the fMRI response in dorsal PFC (areas 9-
10, 46) met additional criteria for a disease-dependent signature of PFC neuronal
pathology. In fact, at higher memory difficulties (lback and 2back) wherein patients
showed diminished working memory capacity, dorsal PFC was consistently hyper-
responsive. Furthermore, there was a functional distinction between the response of
ventral and dorsal PFC, even though both were abnormal to some extent. In contrast
to dorsal PFC, ventral PFC (BA47) was hypo-responsive to varying memory
difficulty [9].
While hypofrontality as a finding generates continued debate, there is less
debate that PFC neuronal pathology exists in schizophrenia and that this pathology
may be more prominent in dorsal PFC (areas 9, 46). Similarities between some of
the clinical symptoms of schizophrenia - particularly between the negative or
deficit symptoms in schizophrenics and those of patients with frontal lobe lesions -
have long implicated PFC in schizophrenia [48,60]. Even though the heterogeneity
of clinical symptomatology implicates multiple brain regions, evidence that
schizophrenia fundamentally involves dorsal PFC neuronal pathology continues to
accumulates from many directions [50,64]. For example, proton magnetic resonance
spectroscopy studies have repeatedly found reduced concentrations of an
intraneuronal chemical N-acetylaspartate (NAA) in PFC [7,5,6,15,71]. Furthermore,
those studies that have examinated sub-regions within PFC have found NAA
reductions in dorsal but not ventral PFC [5,6,7]. In addition, Callicott et al
demonstrated dorsal but not ventral PFC NAA reductions specifically predicted the
extent of negative symptoms in schizophrenic patients [9]. These and other data
provide a strong basis for the assumption that specific neurocognitive abnormalities
in schizophrenia (particularly working memory) result from physiological
dysfunctions of PFC neurons.
137

A note of caution is in order when attempting to attribute to one primary node


of dysfunction (here, dorsal PFC) behavioral or physiological abnormalities arising
during the use of a task that evokes a wide cortical network. It remains uncertain as
to whether these problems arise from inherent neuronal abnormality primarily in
dorsal PFC or as a result of abnormal feedforward or feedback input to PFC from
neuronal pathology in other brain areas. Because working memory relies on an
integrated network, it is likely that there are significant interactions between PFC
and other nodes within this network, including parietal cortex, anterior cingulate
and the hippocampal area. One could argue that these non-PFC regions may play an
important modulatory roles in working memory either directly or indirectly via their
reciprocal connectivity with PFC. Thus, it is conceivable that limited working
memory capacity in schizophrenic patients arises as a result of neural pathology in
these non-PFC regions to produce abnormal hyperesponsiveness to varying working
memory difficulty.

Figure 1: fMRI of patients with schizophrenia during the 1-back task.

In conclusion, patients with schizophrenia seem to have a combination of


reduced cortical physiological efficiency and behavioral capacity. Dorsal PFC
138

neuronal responses - putatively linked to more executive working memory


functions like information manipulations - may be relatively more impaired in
schizophrenia than ventral PFC regions associated with maintenance of working
memory content. A non-behavioral, biological measure of PFC neuronal pathology
also reveales that these patients have specific reductions in dorsal PFC NAA
measures that specifically predict functional abnormalities in dorsal PFC. Thus, we
infer that dorsal PFC neuronal pathology is a plausible cause of cortex-wide
abnormal physiological responses in working memory.

References

1. Andreasen N.C., Rezai K., Alliger R., Swayze V.W., Flaum M., Kirchner P.,
Cohen G., O'Leary D., Hypofrontality in neuroleptic-naive patients and in
patients with chronic schizophrenia: assessment with Xenon 133 single-photon
emission computed tomography and the tower of London, Arch. Gen.
Psychia.49 (1992) pp. 943-958.
2. Baddeley A., Working memory (Clarendon Press, Oxford, 1986).
3. Barch D.M., Braver T.S., Nystrom L.E., Forman S.D., Noll D.C. and Cohen
J.D., Dissociating working memory from task difficulty in human prefrontal
cortex, Neuropsychologia 35 (1997) pp. 1373-1380.
4. Berman K.F., Illowsky B.P., Weinberger D.R., Physiological dysfunction of
dorsolateral prefrontal cortex in schizophrenia. IV. Further evidence for
regional and behavioral specificity, Arch. Gen. Psychiat. 45 (1988) pp. 616-
622.
5. Bertolino A., Callicott J.H., Elman I., Mattay V.S., Tedeschi G., Frank J.A.,
Breier A. and Weinberger D.R., Regionally specific neuronal pathology in
untreated patients with schhizophrenia: a proton magnetic resonance
spectroscopic imaging study, Biol. Psychiat., 43 (1998) pp. 641-648.
6. Bertolino A., Callicott J.H., Nawroz S., Mattay V.S., Duyn J.H., Tedeschi G.,
Frank J.A. and Weinberger D.R., Reproducibility of proton magnetic resonance
spectroscopic imaging in patients with schizophrenia,
Neuropsychopharmacology, 18 (1998) pp. 1-9.
7. Bertolino A., Nawroz S., Mattay V.S., Barnett A.S., Duyn J.F., Moonen C.T.,
Frank J.A., Tedeschi G. and Weinberger D.R., Regionally specific pattern of
neurochemical pathology in schizophrenia as assesed by multislice proton
magnetic resonance spectroscopic imaging, Am. J. Psychiat, 153 (1996) pp.
1554-1563.
8. Braver T.S., Cohen J.D., Nystrom L.E., Jonides J., Smith E.C. and Noll D.C, A
parametric study of prefrontal cortex involvment in human working memory,
Neuroimage 5 (1997) pp. 49-62.
9. Callicott J.H., Bertolino A., Mattay V.S., Langheim F.J.P., Duyn J., Copcpola
R., Goldberg T.E. and Weinberger D.R., Psysiological dysfunction of the
139

dorsolateral prefrontal cortex in schizophrenia revisited, Cerebral Cortex, 10


(2000) pp. 1078-1092.
10. Callicott J.H., Mattay V.S., Bertolino A., Finn K., Coppola R., Frank J.A.
Goldberg T.E. and Weinberger D.R., Physiological characteristics of capacity
constrains in working memory as revealed by functional MRI, Cerebral Cortex
9 (1999) pp. 20-26.
11. Callicott J.H., Ramsey N.F., Tallent K., Bertolino A., Knable M.B., Coppola
R., Goldberg T., van Gelderen P., Mattay V.S., Frank J.A., Moonen C.T. and
Weinberger D.R., Functional magnetic resonance imaging brain mapping in
psychiatry: methodological issues illustrated in a study of working memory in
schizophrenia, Neuropsychopharmacology 18 (1998) pp. 186-196.
12. Carter C.S., Braver T.S., Barch D.M., Botvinick M.M., Noll D. and Cohen J.D.,
Anterior cingulate cortex, error detection, and the online monitoring of
performance, Science 280 (1998) pp. 747-749.
13. Carter C.S., Perlstein W., Ganguli R., Brar J., Mintun M., Cohen J.D.,
Functional hypofrontality and working memory dysfunction in schizophrenia,
Am. J. Psychiat. 155 (1998) pp. 1285-1287.
14. Carter C.S., Robertson L., Nordhal T., Chaderjian M., Kraft L. and O'Shora-
Celeya L., Spatial working memory deficits and their relationship to negative
symptoms in unmedicated schizophrenia patients, Biol. Psychiat., 40 (1996) pp.
1285-1287.
15. Cecil K.M , Lenkinski R.E., Gur R.E., Gur R.C., Proton magnetic resonance
spectroscopy in the frontal and temporal lobes of neuroleptic naive patients
with schizophrenia, Neuropsychopharmacology, 20 (1999) pp. 131-140.
16. Cohen J.D., Forman S.D., Braver T.S., Casey B.J., Servan-Schreiber D. and
Noll D.C., Activation of the prefrontal cortex in a nonspatial working memory
task with functional MRI, Human Brain Map 1 (1994) pp. 293-304.
17. Cohen J.D., Perlstein W.M., Braver T.S., Nystrom L.E., Noll D.C., Jonides J.
and Smith EE., Temporal dynamics of brain activation during a working
memory task, Nature 386 (1997) pp. 604-608.
18. Courtney S.M., Petit L., Maisog J.H., Ungerleider L.G. and Haxby J.V., An
area specialized for spatial working memory in human frontal cortex, Science
279(1998) 1347-1351.
19. Courtney S.M., Ungerleider L.G., Keil K. and Haxby J.V., Transient and
sustained activity: a distributed neural system for human working memory,
Nature 386 (1997) 608-611.
20. Curtis V.A., Bullmore E.T., Brammer M.J., Wright I.C., Williams S.C.R.,
Morris R.G., Sharma T., Murray R.M. and McGuire P.K., Attenuated frontal
activation during a verbal fluency task in patients with schizophrenia, Am. J.
Psychiat, 155 (1998) pp. 1056-1063.
21. Curtis V.A., Bullmore E.T., Morris R.G., Brammer M.J., Williams S.C.R.,
Simmons A., Sharma T., Murray R.M. and McGuire P.K., Attenuated frontal
140

activation in schizophrenia may be task independent, Sch. Res. 37 (1999) pp.


35-44.
22. D'Esposito M., Aguirre G.K., Zarahn E., Ballard D., Shin R.K. and Lease J.,
Functional MRI studies of spatial and nonspatial working memory, Brain Res.
Cogn. Brain Res., 7 (1998) pp. 1-13.
23. Fleming K., Goldberg T.E., Binks S., Randolph C , Gold J.M. and Weinberger
D.R., Visuospatial working memory in patients with schizophrenia, Biol.
Psychiat, 41 (1997) pp. 43-49.
24. Fleming K., Goldberg T.E., Gold J.M. and Weinberger D.R., Verbal working
memory dysfunction in schizophrenia: use of a Brown-Peterson paradigm,
Psychiat. Res., 56 (1995) pp. 155-161.
25. Fletcher P.C, McKenna P.J., Frith CD., Grasby P.M., Friston K.J. and Dolan
R.J., Brain activation in schizophrenia during a graded memory task studied
with functional neuroimaging, Arch. Gen. Psychiat. 55 (1998) pp. 1001-1008.
26. Frank J.A., Ostuni J.L., Yang Y., Shiferaw Y., Patel A., Qin J., Mattay, V.S.,
Lewis B.K., Levin R.L. and Duyn, J.H., Technical solution for an interactive
functional MR imaging examination: application to a physiologic interview and
the study of cerebral physiology. Radiology 210 (1999) pp. 260-268.
27. Franzen G. and Ingvar D., Absence of activation in frontal structures during
psychological testing of chronic schizophrenics, J. Neurol. Neurosurg.
Psychiat., 38 (1975) pp. 1027-1032.
28. Frith CD., Friston K.J., Herold S., Silbersweig D., Fletcher P., Cahill C , Dolan
R.J., Frackowiak R.S., Liddle P.F., Regional brain activity in chronic
schizophrenic patients during the performance of a verbal fluency task, Br. J.
Psychiat. 167 (1995) pp. 343-349.
29. Funahashi S., Bruce C.J. and Goldman-Rakic P.S., Mnemonic coding of visual
space in monkey's dorsolater prefrontal cortex, J. Neurophysiol., 61 (1989) pp.
331-349.
30. Funahashi S., Bruce C.J. and Goldman-Rakic P.S., Neuronal activity related to
saccadic eye movements in the monkey's dorsolateral prefrontal cortex, J.
Neurophysiol., 65 (1991) pp. 1464-1483.
31. Fuster J.M., The prefrontal cortex (Raven Press, New York, 1980).
32. Gevins A.S., Bressler S., Cutillo B., Illes J., Miller J., Stern J. and Jex H.,
Effect of prolonged mental work on functional brain topography,
Electroenceph. Clin. Neuropsysiol. 76 (1990) pp. 339-350.
33. Gevins A.S., Morgan N.H., Bressler S.I., Cutillo B.A., White R.M., Illes J.,
Greer D.S., Doyle J.C and Zeitlin G.M., Human neuroelectric patterns predict
performance accuracy, Science, 235 (1987) pp. 580-585.
34. Goldberg T.E., Berman K.F., Fleming K., Ostrem J., VanHorn J.D., Esposito
G., Mattay V.S., Gold J.M. and Weinberger D.R., Uncoupling cognitive
workload and prefrontal cortical psysiology: a PET rCBF study, Neuroimage, 7
(1998) pp. 296-303.
141

35. Goldberg T.E., Patterson K.J., Taqqu Y. and Wilder K., Capacity limitations in
short-term memory in schizophrenia: tests of competing hypoteses, Psychol.
Med., 28 (1998) pp. 665-673.
36. Goldberg T.E., Weinberger D.R., Berman K.F., Pliskin N.H. and Podd M.H.,
Further evidence for dementia of the prefrontal type in schizophrenia? A
controlled study of teaching the Wisconsin Card Sorting Test, Arch. Gen.
Psychiat., 44 (1987) pp. 1008-1014.
37. Goldberg T.E. and Weinberger D.R., Thought disorder, working memory and
attention: interrelationships and the effects of neuroleptic medications, Int.
Clin. Psychopharmacol., 10(Suppl 3) (1995) pp. 99-104.
38. Goldberg T.E. and Weinberger D.R., Probing prefrontal function in
schizophrenia with neuropsychological paradigms, Schizophr. Bull, 14 (1988)
pp. 179-183.
39. Goldman-Rakic P.S., Prefrontal cortical dysfunction in schizophrenia: the
relevance of working memory. In Psychopathology and the brain, ed. by
Carroll B.J. and Barnett J.E. (Raven Press, New York, 1991).
40. Goldman-Rakic P.S., Regional and cellular fractionation of working memory,
PNAS 93 (1996) pp. 13473-13480.
41. Goldman-Rakic P.S., Working memory dysfunction in schizphrenia, J.
Neuropsychiat. Clin. Neurosci., 6 (1994) pp. 348-357.
42. Gur R.C., Gur R.E., Hypofrontality in schizophrenia: RIP, Lancet 345 (1995)
pp. 1383-1384.
43. Ingvar D. and Franzen G., Distribution of cerebral activity in chronic
schizophrenia, Lancet 2 (1974) pp. 1484-1486.
44. Jonides J., Smith E.E., Koeppe R.A., Awh E., Minoshima S. and Mintun M.A.,
Spatial working memory in humans as revealed by PET, Nature 363 (1993) pp.
583-584.
45. Just M.A. and Carpenter P.A., A capacity theory of comprehension: individual
differences in working memory, Psychol. Rev. 99 (1992) pp. 122-149.
46. Keefe R.S., Roitman S.E., Harvey P.D., Blum C.S., DuPre R.L., Prieto D.M.,
Davidson M. and Davis K.L., A pen-and-paper human analogue of a monkey
prefrontal cortex activation task: spatial working memory in patients with
schizophrenia, Schizophr. Res, 17 (1995) pp. 25-33.
47. Kidd K.K., Can we find genes for schizophrenia?, Am J Med Genet 74 (1997)
pp. 104-111.
48. Kraepelin E., Dementia precox and paraphrenia (E.&S. Livingstone,
Edinburgh, 1919).
49. Levin J.M., Ross M.H. and Renshaw P.F., Clinical applications of functional
MRI in neuropsychiatry, J Neuropsychiatry Clin Neurosci 7 (1995) pp. 511-
522.
50. Lewis, D.A., Development of the prefrontal cortex during adolescence: insights
into vulnerable neural circuits in schizophrenia, Nueropsychopharmacology, 16
(1997) pp. 385-398.
142

51. Malonek D., Dirnagl U., Lindauer U., Yamada K., Kanno I. and Grinvald A.,
Vascular imprints of neuronal activity: relationships between the dynamics of
cortical blood flow, oxygenation, and volume changes following sensory
stimulation, PNAS 94 (1997) pp. 14826-14831.
52. Malonek D. and Grinvald A., Interactions between electrical activity and
cortical microcirculation revealed by imaging spectroscopy: implications for
functional brain mapping, Science 272 (1996) pp. 551-554.
53. Manoach D.S., Press D.Z., Thangaraj V., Searl M.M., Goff D.C., Halpern E.,
Saper C.B. and Warach S., Schizophrenic subjects activate dorsolateral
prefrontal cortex during a working memory task as measured by MRI, Biol.
Psychiat, 45 (1999) pp. 1128-1137.
54. McCarthy G., Blamire A.M. Puce A., Nobre A.C., Bloch G., Hyder F.,
Goldman-Rakic P.S. and Shulman R.G., Functional magnetic resonance
imaging of human prefrontal cortex activation during a spatial working
memory task, PNAS 91 (1994) pp. 8690-8694.
55. Mellers J.D.C., Adachi N., Takei N., Cluckie A., Toone B.K. and Lishman
W.A., Pet study of verbal fluency in schizophrenia and epilepsy, Br. J.
Psychiat. 173 (1998) pp. 69-74.
56. Miller G.A., The magical number seven, plus or minus two: some limits on our
capacity for processing information, Psychol. Rev. 63 (1956) pp. 81-97.
57. Owen A.M., Evans A.C. and Petrides M., Evidence for a two-stage model of
spatial working memory processing within the lateral frontal cortex: a positron
emission tomography study, Cereb. Cortex, 6 (1996) pp. 31-38.
58. Pardo J.V., Pardo P.J., Janer K.W. and Raichle M.E., The anterior cingulate
cortex mediates processing selection in the Stroop attentional conflict
paradigm, PNAS 87 (1990) pp. 256-259.
59. Park S. and Holzman P.S., Schizophrenics show spatial working memory
deficits, Arch. Gen. Psychiat, 49 (1992) pp. 975-982.
60. Piercy M., The effects of cerebral lesions on intellectual function: a rewiev of
current research trends, Br. J. Psychiat, 110 (1964) pp. 310-352.
61. Price M., Friston K.J., Scanning patients with task they can perform,
Hum.Brain Map., 8 (1999) pp. 102-108.
62. Risch N. and Merikangas K., The future of genetic studies of complex human
diseases, Science 273 (1996) pp. 1516-1517.
63. Rypma B. and D'Esposito M., The role of prefrontal brain regions in
components of working memory: effects of memory load and individual
differences, PNAS, 96 (1999) pp. 6558-6563.
64. Selemon L.D. and Goldman-Rakic P.S., The reduced neuropil hypotesis: a
circuit based model of schizophrenia, Biol. Psychiat, 45 (1999) pp. 17-25.
65. Shallice T., From neuropsychology to mental structure (Cambridge University
Press, Cambridge, 1988).
66. Smith E.E. and Jonides J., Neuroimaging analyses of human working memory,
PNAS, 95 (1998) pp. 12061-12068.
143

67. Smith E.E. and Jonides J., Storage and executive processes in the frontal lobes,
Science, 283 (1999) pp. 1657-1661.
68. Smith E.E., Jonides J., Marshuetz C. and Koeppe R.A., Components of verbal
working memory: evidence from neuroimaging, PNAS 95 (1998) pp. 876-882.
69. Stevens A.A., Goldman Rakic P.S., Gore J.C., Fulbright R.K. and Wexler B.E.,
Cortical dysfunction in schizophrenia during auditory word and tone working
memory demonstrated by functional magnetic resonance imaging, Arch. Gen.
Psychiat. 55 (1998) pp. 1097-1103.
70. Stone M., Gabrieli J.D., Stebbins G.T. and Sullivan E.V., Working strategic
memory deficits in schizophrenia, Neuropsychology, 12 (1998) pp. 278-288.
71. Thomas M.A., Ke Y., Levitt J., Caplan R., Curran J., Asarnow R. and
McCracken J., Preliminary study of frontal lobe 1H MR spectroscopy in
childhood onset schizophrenia, J. Magn. Reson. Imag., 8 (1998) pp. 841-846.
72. Wagner A.D., Working memory contributions to human learning and
remembering, Neuron, 22 (1999) pp. 19-22.
73. Weinberger D.R., Implications of normal brain development for the
pathogenesis of schizophrenia, Arch. Gen. Psychiatry 44 (1987) pp. 660-669.
74. Weinberger D.R. and Berman K.F., Prefrontal function in schizophrenia:
confounds and controversies, Phil. Trans. R. Soc. Med. 351 (1996) pp. 1495-
1503.
75. Weinberger D.R. and Berman K.F., Speculation on the meaning of cerebral
metabolic hypofrontality in schizophrenia, Schizophr. Bull., 14 (1988) pp. 157-
168.
76. Weinberger D.R., Berman K.F. and Illowsky B.P., Physiological dysfunction of
dorsolateral prefrontal cortex in schizophrenia III. A new cohort and evidence
for a monoaminergic mechanism, Arch. Gen. Psychiat. 45 (1988) 609-615.
77. Weinberger D.R., Berman K.F., Suddath R. and Torrey E.F. Evidence of
dysfunction of a prefrontal-limbic network in schizophrenia: a magnetic
resonance imaging and regional cerebral blood flow study of discordant
monozygotic twins, Am J Psychiatry 149 (1992) pp. 890-897.
78. Weinberger D.R., Berman K.F. and Zee R.F., Physiologic dysfunction of
dorsolateral prefrontal cortex in schizophrenia. I. Regional cerebral flood
evidence, Arch. Gen. Psychiat. 43 (1986) pp. 114-124.
79. Weinberger D.R., Mattay V., Callicott J., Kotrla K., Santha A.,van Gelderen P.,
Duyn J., Moonen C. and Frank J., fMRI applications in schizophrenia research,
Neuroimage 4 (1996) pp. 118-126.
80. Wexler B.E., Stevens A.A., Bowers A.A., Sernyak M.J. and Goldman-Rakic
P.S., Word and tone working memory deficits in schizophrenia, Arch. Gen.,
Psychiat, 55 (1998) pp. 1093-1106.
81. Yurgelun-Todd D.A., Waternaux CM., Cohen B.M., Gruber S.A., English
CD. and Renshaw P.F., Functional magnetic resonance imaging of
schizophrenic patients and comparison subjects during word production, Am. J.
Psychiat, 153 (1996) pp. 200-205.
144

A N N FOR ELECTROPHYSIOLOGICAL ANALYSIS OF


NEUROLOGICAL D I S E A S E

R. B E L L O T T I 1 ' 2 ' 4 ^ . DE CARLO 2 ' 4 ,M. DE TOMMASO 1 ' 3 ,


O. DIFRUSCOL0 3 ,R. MASSAFRA 2 ,V. SCIRUICCHI0 3 ,S. STRAMAGLIA 1 ' 2 ' 4
1
Center of Innovative Technologies for Signal Detection and Processing, Bari
2
Department of Physics, University of Bari
3
Department of Neurological and Psichiatric Sciences, University of Bari
4
J.N.F.N., Bari
E-mail: roberto.bellotti@ba.infn.it , m.detommaso@neurol.uniba.it

The aim of this study was to develop a discriminant analysis based both on clas-
sical linear methods, as Fisher's Linear Discriminant (FLD) and Likelihood Ratio
Method (LRM), and non-linear Artificial Neural Network (ANN) classifier in or-
der to distinguish between patients affected by Huntington's disease (HD) and
normal subjects. R.O.C. curve analysis revealed ANN to be the best classifier.
Moreover the network classified gene-carrier relatives as normal thus suggesting
the EEG to be a marker of the evolution of the HD.

1 Introduction

A study of electroencephalogram (EEG) of patients affected by the Hunting-


ton's disease 1 (HD), also known as corea, and their gene-carrier relatives was
done in order to establish the best classifier to discriminate between healthy
and not-healthy elements.
For this aim three classification systems were considered and their per-
formances were compared: Fisher's Linear Discriminant 2 (FLD), Likelihood
Ratio Method 3 (LRM) and Artificial Neural Network 4 (ANN). R.O.C. curves
analysis 5 showed ANN to have the best performance.
Moreover, gene-carrier relatives' data were submitted to the network in
order to investigate the correlation between brain activity and the HD thus
revealing the EEG to be related to the phenotipic manifestation of the disease
rather than to the genetic anomaly.
The paper is organized as follows: in next section (2) the statistics and the
process of data extraction are presented. The three classifiers are described
in sections (3), (4) and (5) and their performances are compared in section
(6) where R.O.C. curves analysis is introduced. In section (7) we focus our
attention on the analysis of gene-carrier relatives' data and the conclusions are
drawn in section (8).
145

2 Data set
The data set here considered refers to 8 patients affected by the HD, 7 gene-
carrier first-degree relatives and 7 controls.
The EEG signal was sampled at 512 Hz in 2-seconds epochs on 19 elec-
trodes positioned on FP1, FP2, F7, F 3 , FZ, FA, F8, T3, C3, CZ, C4, T4,
T5, P 3 , P Z , P4, T6, 0 1 , 0 2 derivations, according to the 10 - 20 system.
Artifacts-free random samples were selected to form a data-set constituted
of 160 epochs from patients' recordings, 160 epochs from controls and 71 from
gene-carrier relatives. These were Fast-Fourier transformed and the power
of the brain rhythms a (8 - 12.5)Hz, /3 (13 - 30)Hz, •d (4 - 7.5)Hz and 5
(0.5 — 3.5)Hz was considered.
Due to the limited availability of the data, the cross-validation technique6
was applied, which consists in considering all the possible 8 different partitions
of the data into a training set of 140 elements and a test set of 20 elements:
each partition is submitted to the classification systems and the corresponding
outputs are summed for the case of signals-controls classification while averaged
for the case of gene-carrier relatives' analysis.

3 Fisher's Linear Discriminant


Fisher's linear discriminant 4 consists in maximizing the classes' separation
through the projection of the data from the original 19-dimensional input space
onto a 1-dimensional space defined by the versor

w=
JMI S ^ m s ~ " ^ ^
where

ms m
= Jf- z l *i > c = TF z l X i (2)
s c
jzcs iecc
are the mean vectors of the two classes (signals and controls) and

S
™- zZ (XJ-mc)(xi-mc)T+ ^2 (xj - m s ) ( x j - m s ) T (3)

is the within-class covariance matrix.


The new data variables satisfying the request of maximal separation are
z — x-w, defined in the range [-1,4-1] (due to the normalization of the original
x data) which are linearly re-scaled in the interval [0,1] in order to be directly
compared with the output variables of the other two classification systems.
146

4 Likelihood Ratio Method


Likelihood ratio method is largely used in many fields dealing with classifica-
tion tasks 7 . Let us define the i-class conditional probability density function,
P(xk\i), as the probability density that the A;-th feature is Xk given that it
belongs to the i-th class (i = 1,2). If we assume that the EEG recordings at
the 19 differents locations on the scalp of the patients are independent, the
probability density that a patient generates the data vector x = {x\,... ,X\g)
given that he belongs to the i-th class is
19
Pi(x) = l[P(xk\i). (4)
fc=i

Likelihood ratio for the i-th class is than defined by

Due to the high dimensionality of the feature space, estimation of the probabil-
ity density (4) by the histogram methodwould not work (see, e.g., the discussion
on the curse of dimensionality in Bishop 4 , pp. 51). It is then estimated by a
non-parametric approach as the kernel-based method8 in which the functional
form is not specified a priori but relies on the data itself. In particular it is
given by the sum all over the training set of normal multivariate distributions
each one centered on a training data point:

1 *-^ 1 i i * - * , - ii 2

where d = 19 is the dimension of the data space and h is a free parameter


which plays the role of smoothing parameter of the whole distribution.
It is worth remarking that the performance of the LRM classifier may
change drammatically as h is varied (see (6)).

5 Artificial Neural Network


Let us consider a two layered feed-forward perceptron 9 whose general structure
is represented in figure (1). The input layer has 19 neurons according to the
dimension of the features' space; the hidden layer has a number of neurons
varying from 2 to 10 and the output layer has only one neuron which, in the
training phase, is set to 1 when signals are submitted to the network and to 0
otherwise.
147

Figure 1: Two layered feed-forward perceptron.

The output Vi of each neuron is a sigmoid transfer function of its input


u
i = J2jwijVj where the sum is computed over the neuron of the previous
layer:
v
< = *("«> = I T ^ T • (?)

The weights are updated according to the gradient descent learning rule9:

Aw?fw =-r,-^- +aAwff (8)

where E is the error function


E
= \ E K " - °M]2' <9)
which is a measure of the distance between the network outputs C and the
target patterns C = 1,0 respectively for signal and control data.
At each iteration the error function reduces until its minimum is attained.
148

The second term in (8), the so called momentum term 10 , represents a


sort of inertia which is added in order to let the weights change in the av-
erage downhill direction, avoiding sudden oscillations of the Wy around the
minimum: this term allows the network to reach the solution more quickly.
The network parameters we used are : learning rate n = 0.01, momentum
parameter a = 0.1 -r 0.3 and gain factor 0 = 1.

6 R . O . C . curves' analysis
Figures (2), (3) and (4) show tipical output histograms in the a frequency re-
gion from the three classification systems. In particular, for LRM analysis, the
output histogram of the likelihood Ls, relative to the signal class, is considered
both for controls (figure (3), up) and for signals (figure (3), down).
Even by visual comparison it is clear that the FLD output gives the worst
discrimination between the classes due to the strong overlap of the distributions
while both LRM and ANN histograms are more separated and peaked, which
means a better classification.
The subsequent step is to put an appropriate threshold on the output
histograms so that once a new data point (which is not known to be a signal
or a control) is submitted to the classifier a decision can be taken on its class
depending on the side of the corresponding output with respect to the threshold
itself.
In order to have a quantitative measure of the performance of the algo-
rithms we use R.O.C. curve analysis which is a good technique to estimate
the quality of a classification in the particular case of binary hypothesis to be
tested. Given a threshold value on the output histogram, the sensitivity e and
the specificity s are defined as

H-SS Tlcc /ir>\


e = , s= (10)
Ylss T Tlsc Tlcc i Tics

where nss and the numbers of the correctly classified signal and control
data, respectively, and ncs and nsc the number of misclassifications.
Sweeping the threshold parameter through the [0,1] interval, the graphical
rappresentation of the sensitivity e versus the specificity s gives the R.O.C.
curve.
149

n r-

•'• l« • Ml
0.2 0.4

Figure 2: FLD a histogram: controls (up), signals (down).

120 1 ~T 1 1
'
100 - "
80 "
60 "
40 "
20 -
lb .
-0.2 0.2

»• • I. I.I.
0.2 0.4 0.6 0.8

Figure 3: Ls a histogram: controls (up), signals (down).


150

140 1 1 1
'
120 .
100 -
80 -
60 • -
40 - -
20 -
0 I.. , ! r ,
-0 2 ) 0.2 0.4 0.6 0.8 1

60 I

50 -
40 -
30 -
20 -
10 - -
0 . 1 i.... IJIJIIIIII
-0 2 ( 0.2 0.4 0.6 0.8 1 1

Figure 4: ANN a histogram: controls (up), signals (down).

In the case of a perfect classification the two terms nsc and ncs tend to
zero and therefore both the sensitivity and the specificity tend to 1 as the area
under R.O.C. curve: this area is, therefore, an index of the goodness of the
performance of the classification system and will be used to compare our three
classifiers.
R.O.C. curves are shown for FLD, LRM (for different values of the pa-
rameter h) and ANN in the different frequency regions a (figure (5)), /3 (figure
(6)), d (figure (7)) and 5 (figure (8)). In figure (9) R.O.C. curves relative to
ANN are drawn to compare the performances of the network in the different
regions. By computing the areas a for a frequencies, we find FLD to be the
worst algorithm (a = 0.7954), ANN the best one (a = 0.9877) while LRM has
an intermediate performance increasing as h decreases (a = 0.8163 for h — 0.5,
a = 0.9314 for h = 0.1 and a = 0.945 for h = 0.05): this LRM behavior is ver-
ified also for j3, 8 and $ frequencies. In the other three regions FLD overcomes
LRM for h = 0.5 while the order of performance of ANN and LRM (h = 0.1,
h — 0.05) is the same as in a. Concerning ANN, its performance increases
from 5 (a = 0.9396) to •d [a = 0.9661) to /3 (a = 0.9864) to a (a = 0.9877).
Therefore we are led to the conclusion that ANN is, for each frequency,
the best of the three classifiers with a minimum performance for 5 rhythms.
151

a R.O.C. curves
~K 1~.

Figure 5: a R.O.C. curves for FLD (stars), LRM with h = 0.5 (squares), h — 0.1 (triangles),
h = 0.05 (rhombi) and ANN (circles).

p R.O.C. curves

**• * * * '•••
****
'**

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


specificity

Figure 6: /? R.O.C. curves for FLD (stars), LRM with /i = 0.5 (squares), ft = 0.1 (triangles),
h = 0.05 (rhombi) and ANN (circles).
152

* R.O.C. curves

Figure 7: •& R.O.C. curves for FLD (stars), LRM with h = 0.5 (squares), h = 0.1 (triangles),
h = 0.05 (rhombi) and ANN (circles).

8 R.O.C curves

*a^j*-
* •• •
0.9 * Ht m •
•k •"•**•%•.
*
0.8 • A

07
* A
• •• I

- * *

X 1
0.6


0.5

*
•* 1•
0.4 *
* 1
0.3
** 1
0.2

0.1

0
-
I
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
specificity

Figure 8: 8 R.O.C. curves for FLD (stars), LRM with h = 0.5 (squares), h = 0.1 (triangles),
h = 0.05 (rhombi) and ANN (circles).
153

ANN R.O.C. curves

** *

0.1 0.2 0.3 0.4 0.5 0.6


specificity

Figure 9: ANN R.O.C. curves for <5 (stars), i9 (triangles), /? (rhombi) and a (circles) region.

7 Gene-carrier relatives' analysis


Gene-carrier first-degree relatives have not yet shown the disease. From their
analysis we expect that if they were classified as not-healthy, the EEG would
be close related to the genetic anomaly, meaning that the EEG analysis would
be predictive of the appareance of the HD. Otherwise, if they were classified
as healthy, the EEG would be related to the manifestation itself of the disease
and so the EEG signal would be a marker of the evolution of the HD.
Gene-carrier relatives' data were submitted to the network and the corre-
sponding outputs are shown in figures (10), (11), (12) and (13). As one can
see, their distributions are peaked around the zero in a, j3 and d regions as
the control ones (see figure (4)). In 5 region, instead, the situation seem to be
more confused due to the spread of the distribution all over the interval [0,1].
However the performance of the network in classifing signal and control data
was shown to be the worst in this region with respect to the others (see figure
(9)) thus implying the classification to be bad for the relatives too.
Therefore we are led to the following medical conclusion: gene-carrier rel-
atives data are classified as healthy by ANN thus meaning the EEG to be a
marker of the evolution of the disease.
154

30 -

25- -

20 -

15 -

10 -

: I J J i i u i i ii in
0.2 0.4 0.6 0.8

Figure 10: ANN gene-carrier relatives' a histogram.

0.2 0.4

Figure 11: ANN gene-carrier relatives' /3 histogram.


155

0.2
II
0.4 0.6
II I
Figure 12: ANN gene-carrier relatives' i) histogram.

Figure 13: ANN gene-carrier relatives' 5 histogram.


156

8 Conclusions
A comparison between the statistical methods of FLD and LRM and ANN
approach to evaluate the classification performance of the EEG data taken
from patients affected by the HD was presented.
R.O.C. curve analysis clearly showed the supremacy of non-linear ANN
approach over the classical linear methods {FLD and LRM).
Moreover ANN classified gene-carrier relatives as controls, thus leading
to the conclusion that the EEG is a marker of the phenotipic manifestation of
the HD.

Acknowledgments
We thank Carmela Marangi (I.R.M.A.-C.N.R.) and Fabio Bovenga (Physics
Department, University of Bari) for helpful discussions.

References
1. For general aspects see, e.g., S.E. Folstein, R.J. Leigh, I.M. Parhad, M.F.
Folstein Neurology 36, 1986, 1279 - 1283.
2. R.A. Fisher Annals of Eugenics 7, 1936, 179 - 188. Reprinted in Con-
tributions to Mathematical Statistics, John Wiley, New York (1950).
3. See, e.g., R. Bellotti, M. Castellano, C. De Marzo, N. Giglietto, G.
Pasquariello, P. Spinelli Computer Physics Communications 78, 1993,
17 — 22 and references therein.
4. C M . Bishop Neural Networks for Pattern Recognition Oxford University
Press, Oxford (1995).
5. J.A. Swets "Measuring the accuracy of diagnostic systems" Science, 240,
1285-1293, (1988).
6. M.Stone Journal of the Royal Statist. Society B 36 (1), 1974, 111-147.
M. Stone Math. Operat. Statist. Ser. Statistics 9 (1), 1978, 127 - 139.
G. Wahba, S. Wold Comm. in Statistics, Series A 4 (1), 1975, 1 — 17.
7. See, e.g., for application dealing with electron-hadron discrimination
K.K. Tang Astrophysics Journal 278, 1984, 881
A. Bungener et al. Nuclear Instruments Methods 214, 1983, 261.
8. M. Rosenblatt Annals of Mathematical Statistics 27, 1956, 832 - 837.
E. Parzen Annals of Mathematical Statistics 33, 1962, 1065 — 1076.
9. J. Hertz, A. Krogh, R.G. Palmer Introduction to the theory of neural
computation Addison-Wesley, 1991.
10. D.E. Rumelhart, J.L. McClelland Parallel Distribuited Processing - Vol.1
MIT Press, Cambridge, MA (1986), pp. 318.
157

DETECTION OF MULTIPLE SCLEROSIS LESIONS IN MRI'S WITH


NEURAL NETWORKS

P. BLONDA, G. SATALINO, A. D'ADDABBO, G.PASQUARIELLO

I.E.S.I.- C.N.R., Via Amendola 166/5 - 70126 Bari, Italy, blonda@iesi.ba.cnr.it

A. BARALDI
I.S.A.O.-C.N.R., Via Gobetti 101; 3100-Bologna -Italy, baraldi@imga.bo.cnr.it.

R. DE BLASI
Cattedra e Servizio di Neuroradiologia , University ofBari, P.za G. Cesare -70126 Bari, Italy

The objective of this paper is to state the effectiveness of a two-stage learning classification
system in the automatic detection of small lesions from Magnetic Resonance Images (MRIs)
of a patient affected by multiple sclerosis. The first classification stage consists of an
unsupervised neural network module for data clustering. The second classification stage
consists of a supervised learning module employing a plurality vote mechanism to relate each
unsupervised cluster to the supervised output class having the largest number of
representatives inside the cluster. In this paper two different neural network algorithms, i.e.
the Enhanced Linde-Buzo-Gray (ELBG) algorithm and the well-known Self-Organizing Map
(SOM), have been employed as the clustering module in the first stage of the system,
respectively. The results obtained with the two different clustering algorithms have been
qualitatively and quantitatively compared in a set of classification experiments. In these
experiments, ELBG is equivalent to SOM in terms of classification accuracy and superior to
SOM with respect to the visual quality of the output map and robustness to changes in the
order and composition of the data presentation sequence. The results confirm the usefulness
of the neural classification system in the automatic the detection of small lesions.

1 Introduction

The typical approach to automated recognition of tissue types includes multi-


spectral analysis of Magnetic Resonance Images (MRIs) consisting of tissue-
dependent parameters such as the Proton Density (PD), T2 (the spin-spin relaxation
time) and Tl (the spin-lattice relaxation time). In recent years a novel three-
dimensional Tl-weighted gradient echo sequence, based on the turbo-flash
technique and called Magnetization-Prepared RApid Gradient Echo (MP-RAGE),
that can be generated from contiguous and very thin ( 1 . 3 - 3 mm) sections, allows
visual detection of small lesions typically affected by partial volume effects and
intersection gaps in Tl weighted Spin-Echo (SE) sequences [5], [6].
In this work, two per-pixel nearest multiple-prototype classifiers, based on a
hybrid two-stage learning framework [4], are compared, both qualitatively and
quantitatively, in the detection of small lesions from a data set consisting of PD-SE,
T2-SE, Tl-SE and MP-RAGE images of a volunteer affected by multiple sclerosis.
158

In this data set, supervised (labelled) image areas are manually selected by an expert
neuroradiologist to provide the learning algorithms with training and testing data
samples. In this classification framework, the first classification stage consists of a
pixel-based data clustering algorithm. In the second classification stage, a
supervised learning module employing a plurality vote mechanism relates each
unsupervised cluster to the supervised output class having the largest number of
representatives inside the cluster. Classification accuracy is assessed on a test set.
The Enhanced Linde-Buzo-Gray (ELBG) clustering algorithm and the Self-
Organizing Map (SOM) are employed, respectively, as the first classification stage
providing unsupervised learning. ELBG is a novel quantization algorithm capable
of providing a near-optimal solution to the Mean Square Error (MSE) minimisation
problem [10]. Owing to their complementary functional features, the Fully self-
Organizing Simplified Adaptive Resonance Theory (FOSART) clustering network
may be adopted to initialize ELBG [1], [2]. On the one hand, FOSART is on-line
learning, constructive (i.e., the number of processing elements is not fixed by the
user on a a priori basis before processing the data, rather it is set by the algorithm
depending on the complexity of the clustering task according to an optimization
framework) and cannot shift codewords through Voronoi regions. On the other
hand, ELBG is non-constructive, batch learning and capable of moving codewords
through Voronoi regions to reduce MSE.
For comparison with ELBG, SOM is selected from the literature as a well-
known and successful clustering network. SOM is on-line learning, soft-to-hard
(fuzzy-to-crisp) competitive, non-constructive and capable of employing topological
relationships between output nodes belonging to a 2-D output array [7]. The rest of
this paper is organized as follows. A brief overview of SOM, FOSART and ELBG
is provided in section 2. The data set, the classification method and the results are
illustrated in section 3. Conclusions follow in section 4.

2 Clustering networks

2.1 SOM
SOM and FOSART are both (fuzzy-to-crisp) competitive clustering networks, but,
unlike FOSART, SOM employs inter-node distances in a fixed output lattice rather
than inter-pattern distances in input space to compute learning rates. Noticeably,
unlike FOSART, SOM deals with topological relationships (e.g., adjacency) among
output nodes without explicitly dealing with inter-node (lateral) connections [2].
Despite its many successes in practical applications, SOM has some limitations [7]:
termination is not based on optimising any model of the process or its data [1], [2];
the size of the output lattice, the learning rate and the size of the resonance
neighbourhood must be varied empirically from one data set to another to achieve
159

useful results [3]; prototype parameter estimates may be severely affected by noise
points and outliers.

2.2 FOSART
FOSART is a soft-to-hard (fuzzy-to-crisp) competitive, minimum-distance-to-
means clustering algorithm capable of: i) generating processing units and lateral
(intra-layer) connections on an example-driven basis, and ii) removing processing
units and lateral connections on a mini-batch basis (i.e., based on statistics collected
over subsets of the input sequence to average information over the noise on the
data).Potential advantages of FOSART are listed in the following [2]: a)owing to its
soft-to-hard competitive learning strategy, FOSART is expected to be less prone to
being trapped in local minima and less likely to generate dead units than hard
competitive alternatives [3]; b)owing to its neuron removal strategy, it is robust
against noise; c) feed-back interaction between attentional and orienting subsystems
allows FOSART to self-adjust its network size depending on the complexity of the
clustering task; d) the expressive power of networks that incorporate competition
among lateral connections in a constructive framework, like FOSART and the
Growing Neural Gas (GNG) [2], is superior to that of traditional constructive or
non-constructive clustering systems (e.g., SOM) which employ no lateral
connection explicitly [2]. As a consequence, FOSART, features an application
domain extended to: vector quantization; entropy maximization; and structure
detection in input data to be mapped in a topologically correct way onto submaps of
an output lattice pursuing dimensionality reduction [1].

2.3 ELBG
ELBG is non-constructive, batch learning and capable of moving codewords
through Voronoi regions to reduce MSE. In ELBG, templates eligible for being
shifted and split are those whose "local" contribution to the MSE value is,
respectively, below and above the mean distorsion. Templates eligible for being
shifted are selected sequentially and those eligible for being split are selected
stochastically (in a way similar to the roulette wheel selection in genetic
algorithms). Each selected pair of templates is adjusted locally based on the
traditional LBG (c-means) batch clustering algorithm [8]. In [10] ELBG is
initialized either randomly or with the splitting-by-two technique proposed in [8]. In
this work ELBG is initialised with the FOSART network.
160

3 Experimental results

3.1 Data set


The multispectral data set consists of PD-SE, Tl-SE, T2-SE and Tl MP-RAGE
sequences acquired by a Siemens Impact 1.0 Tesla MR a patient affected by
multiple sclerosis and white matter lesions in the brain. The MP-RAGE slices from
foramen magnum to upper convexity are acquired at 3 mm of thickness to stress the
signal deriving from lesions actually present in the nervous system. A single slice
was randomly selected from the subset of the MRI volume that showed the lesions.
Figures l.a and Lb show, respectively, the Tl-SE and Tl MP-RAGE images. The
image size is 256 by 256 pixels. The input values, originally ranging between 0 and
4096, are scaled between 0 and 255 according to a linear transformation. Labelled
image areas, manually selected by an expert neuroradiologist in the raw images,
belong to classes: White Matter (WM), Grey Matter (GM), Cerebral Spinal Fluid
(CSF) Pathologic lesions (PT), Background (BC), and other (see Fig. lc).
Approximately 66% of the 7544 supervised (labelled) pixels are extracted randomly
for training the classification system (Tablel). The remaining pixels are used for
testing. Three different extractions are carried out to obtain three training and test
data set pairs characterised by different orders of the selected patterns to be fed to
the system. Thus, in each classification experiment, the classification accuracy is
averaged over three training/testing procedures.

Figure 1 Input images: (a) Tl MP - RAGE; (b) Tl-SE; and (c) labeled image areas.

Table 1

Class Labels WM GM CSF PT BC Other Total


Training Points 1102 853 816 310 558 1314 4953
Test Points 573 441 435 169 290 683 2591

3.2 Classification training and testing


Two classification experiments are carried out to: i) compare the capabilities of
ELBG and SOM of detecting small lesions due to multiple sclerosis in MR images,
161

and ii) assess the utility of Tl MP-RAGE vs Tl-SE images. Let us identify a
labelled (supervised) pixel as an input-output vector pair (Xj,Yj), where
Xi=(/|,i,...,/|,D) £ ^ D is an input data vector, D is the input space dimensionality,
_/i?kS R , i*=l,..., M, k=l,...,D, is the feature component, M represents the number of
input patterns, while Yj=(yj,i,...,>>;. J , i=l,..., M, is the output labelling vector and L
is the total number of classes. Classification results are averaged over three runs
where a different selection of training and testing data sets is adopted. During
learning, the unsupervised first stage of the TSH classifier employs a training set
where data labels are ignored, while the supervised second classification stage is
trained with labelled data, i.e., with (input, output) data pairs. Once the first
classification stage reaches convergence, the second classification stage is trained to
relate each cluster, detected by the first stage, to the supervised output class Yj
having the largest number of representatives inside the cluster.
In the first set of classification experiments, the unsupervised first stage of the TSH
classifier is implemented as an ELBG module initialised by a FOSART clustering
network. In the first classification stage the number of input units is equal to the
number D of input spectral bands considered, whereas the number of output nodes
depends on the FOSART input parameter p. Increasing values of input vigilance
threshold p are employed until the testing Overall Classification Accuracy (OA) of
the TSH classification system remains constant or start decreasing. Table 2 shows
the training and testing results of this first TSH classifier when the Tl MP-RAGE,
T2-SE and PD-SE image bands are used as input. The number of output nodes
detected by FOSART and the different p values employed by FOSART are
reported in the first and second column of Table 2, respectively. The number of
epochs required to train the TSH system is shown in the third column of Table 2 as
the sum of FOSART and ELBG training epochs. The OA percentage values of
FOSART and ELBG are reported in columns 4 and 5 for training data and column 6
and 7 for testing data, respectively. Figure 2(a) shows the classified image obtained
by the first TSH system that employs 174 output clusters and the Tl MP-RAGE,
T2-SE, PD-SE bands as input. A lower qualitative and quantitative performance is
obtained using Tl-SE in place of Tl MP-RAGE when the traditional Tl-SE image
replaces the Tl MP-RAGE band.
Table 2 ELBG Average Results. Input data: T1_MPR; PD_SE; T2JSE

Training Data OA(%) Test Data OA(%)


Out Vigilance Total
FOSART ELBG FOSART ELBG OASt.
Neurons Threshold iterations
preprocessing preprocessing Dev
17 0.002 3+9 74.2 78.9 75.3 80.1 0.2
42 0.005 3+10 80.1 84.4 81.2 85.7 0.6
174 0.015 3+9 84.9 87.9 86.5 88.2 0.2
248 0.02 3+10 87.1 88.0 87.7 88.9 0.6
162

In the second set of experiments, the unsupervised first stage of the second TSH
classifier is implemented as a SOM where the number of input nodes is set equal to
the input space dimensionality D=3, while the number of output nodes is set equal
to the number of nodes detected in the first experiment by FOSART, to make any
classification comparison between the two experiments consistent. Table 3 shows
the training and testing results obtained by this second TSH classification system
when the Tl MP-RAGE, T2-SE and PD-SE image bands are used as input. These
results are almost equivalent to those shown in Table 2. The SOM learning rate a is
set to 0.02 in all simulations and the number of training epochs is set equal to the
total number of epochs required by the first TSH system to train. This number of
epochs is considered sufficient for SOM (in fact, the OA values of SOM do not
change significantly when the number of training epochs is increased). Figure 2 (b)
shows the image obtained by the second TSH classifier with the Tl MP-RAGE, T2-
SE, PD-SE input bands and 174 output clusters. In terms of performance stability
with respect to changes in the order and composition of the presentation sequence,
SOM features an OA standard deviation of 0.7 % during testing.

Table 3 SOM AVERAGE RESULTS. Input date: T1_MPR; PD_SE; T2_SE,a=0.02

Out Iterations Training Data Test Data


Neurons OA(%) OA(%) OA St.Dev
17 13 80.7 81.3 0.7
42 12 83.9 85.1 0.7
174 13 87.3 88.1 0.7
248 13 87.4 88.4 0.5
163

4 Results and conclusions

In multi-spectral MRI classification tasks, a TSH classification system performs


better when a Tl MP-RAGE image, featuring high anatomical definition, replaces
the traditional Tl-SE band. Exploitation of the whole set of MR image bands does
not significantly improve the TSH classification performance.
In our experiments, summarized in Tables 2 and 3 where, respectively, ELBG
and SOM are employed as the clustering stage of the TSH classification system,
similar performances are obtained in terms of OA. However, the ELBG module
performs better than SOM in terms of MSE minimization, especially when the
number of training epochs is small as shown in Figures 4. The absolute difference in
MSE between ELBG and SOM decreases with the number of output clusters. In
terms of performance stability, ELBG is more robust than SOM to changes in the
order and composition of the presentation sequence.
700,00 700,00

600,00 -SOM MSE 600,00


— SOM MSE
500,00 - Elbg MSE 500,00 • Elbg MSE

ujtoO.OO uito0,00
CO
^00,00
%)0,00
200,00 -
200,00
100,00
100,00
0,00
0,00
0 5 10 15 5 10 15
iteration number iteration number

Figure 4 MSE of SOM and ELBG with 42 and 174 clusters.

Besides quantitative evaluations, Figures 2(a) and 2(b), generated by two TSH
classification systems employing 174 clusters are qualitatively compared by an
expert neuroradiologist who considers Figure 2(a), generated by ELBG, more
significant than Figure 2 (b), produced by SOM. In this example, SOM detects more
false positives than ELBG, i.e., SOM tends to overestimate the lesion class to which
many interface areas located between white and grey matter are assigned. Both
ELBG and SOM are incapable of detecting a right frontal lesion, which is visible in
SE sequences but has a normal grey matter appearance in MP-RAGE.
Our experiments in multi-spectral MR image labelling seem to indicate that: a) the
ELBG and SOM clustering networks employed in the TSH classification scheme
are equivalent in terms of classification accuracy, b) ELBG is better than SOM in
164

minimizing MSE at small epoch numbers; c) ELBG is less sensitive to noise and/or
false positive and this allows a more correct identification of multiple sclerosis
lesions; c) ELBG is more stable than SOM with respect to small changes in the
order and composition of the presentation sequence.
Future works will assess the utility of interslice and intersubject MR data in the
detection of multiple sclerosis lesions by means of two-stage supervised learning
classifiers where both classification stages employ labelled data pairs to train. In
this type of classifiers, the density of clusters (basis functions) is made independent
of input vector density but dependent on the complexity of the (input, output)
mapping at hand, to avoid generation of mixed clusters of input vectors that are
closely spaced in input space but belong to different classes.

References

1. A. Baraldi and P. Blonda, A survey on fuzzy neural networks for pattern


recognition: Part I, IEEE Trans. Systems, Man and Cybernetics - Part B:
Cybernetics 29 (1999) 778-785.
2. A. Baraldi and P. Blonda, A survey on fuzzy neural networks for pattern
recognition: Part II, IEEE Trans. Systems, Man and Cybernetics - Part B:
Cybernetics 29 (1999) 786-801.
3. J. C. Bezdek and N. R. Pal, Two soft relative of learning vector quantization,
Neural Networks 8 (1995) 729-743.
4. P. Blonda, V. la Forgia, G. Pasquariello, and G. Satalino, Feature extraction
and pattern classification of remote sensing data by a modular neural system,
Optical Engineering 35 (1996) 536-542.
5. M. Brant-Zawadzki, G. D. Gillan, and W. R. Nitz, MP RAGE: a three-
dimensional, Tl-weighted, gradient-echo sequence. Initial experience in the
brain, Radiology 182 (1992) 769-775.
6. B. Johnston, M. S. Atkins, B. Mackiewich, and M. Anderson, Segmentation of
multiple sclerosis lesions in intensity corrected multispectral MRI, IEEE
Trans.on Medical Imaging 15 (1996) 154-169.
7. T. Kohonen, Self-Organizing Maps (Springer Verlag, Berlin, 1995).
8. Y. Linde, A. Buzo, and R. M. Gray, An algorithm for vector quantizer design,
IEEE Trans, on Communications 28 (1980) 84-94.
9. T. Martinetz and K. Schulten, topology representing networks, Neural
Networks 7 (1994) 507-522.
10. M. Russo and G. Patane, The Enhanced-LBG algorithm, Neural Networks 14
(2001), 1219-1237.
165

MONITORING RESPIRATORY MECHANICS USING ARTIFICIAL


NEURAL NETWORKS

G. PERCHIAZZI, G. HEDENSTIERNA
Department of Clinical Physiology,
Uppsala University Hospital, S-75185 Uppsala, Sweden
e-mail: zperchiazzi(d).vahoo.com

A. VENA, L. RUGGIERO, R. GIULIANI AND T. FIORE


Department of Emergency and Transplantation, Bari University,
Policlinico Hospital, Piazza Giulio Cesare, 11, 70124 Bari-Italy

Application of mechanical ventilation requires reliable tools for extracting respiratory


mechanics. Artificial neural networks (ANN) have been used by the authors to perform this
specific task. In the reported experiments, ANN have showed a good performance on both
simulated and real tracings of mechanical ventilation recordings. These results suggest that
ANN may play a role in the development of future bed-side monitoring tools.

1 Introduction

In animal species, the major task of the respiratory system is to exchange gases
between blood and atmosphere. In order to perform this particular task, the system
works like bellows: when the inspiratory muscles contract, the intra-thoracic
volume increases and a negative pressure in the airways is generated. The difference
in pressure between atmosphere and airways determines a gas flow towards the
internal, gas-exchanging part of the lung (alveoli).
Different pathologic conditions can affect this system. Situations that impair the
capacity of exchanging gas ("lung failure") or the efficiency of the gas flow
dynamics ("pump failure"), may require mechanical ventilation. It consists in
making the patient exchange gases by using an endotracheal tube connected to an
external cyclic pump ("mechanical ventilator").
The respiratory system is composed of a conduction system (devoted mainly to
convey gas to the respiratory part of the lung) and a respiratory part (where the gas
exchange effectively takes place). In relation to gas dynamics during artificial
ventilation, the mechanical properties of the respiratory system that have medical
importance are resistance (RRS) and compliance (CRs). The change of RRs and CRS
from their normal value is an indicator of potential pathology.
Different techniques have been proposed to monitor RRS and CRS during
ongoing mechanical ventilation. Among them, the most used is the Interrupted Flow
Technique (IFT). See figure 1. When the flow is constant, the interruption of its
delivery will cause a fall in pressure that is related to the resistive properties of the
166

respiratory system. Maintaining a constant gas volume in the lungs (preventing the
patient from expiring) for some seconds, the recorded pressure in the airways after a
transient, is related mainly to the elastic components of the lung. Although the
described technique remains the gold standard for measuring respiratory mechanics
in ventilated patients, new approaches are necessary. The weak point of IFT is the
necessity of performing a maneuver on a ventilated patient, interrupting the
sequence of ventilation and requiring an operator that pushes an inspiratory-hold
button (end- Inspiratory Hold Maneuver, e-IHM).

25 -i
Airways
Pressure
'PLAT

[cmH2Q] 1
PEEP DROP

Inspiratory
Flpw 0 [s] 1

0.5 i Airways J**~~-^-~~M


Flow
[1/sec] -Tidal Volume

Tidal Volume
c =
*"RS PPLAT-PEEP

"PEAK ~ PDROP
R
RS_
Inspiratory Flow

Figure 1: Interrupted Flow Technique

It is felt the necessity of a tool capable of a constant monitoring of respiratory


mechanics, during ongoing mechanical ventilation, with features of robustness and
noise immunity. We tested an Artificial Neural Networks (ANN) based technology
for extracting respiratory system mechanics during ongoing mechanical ventilation.
167

2 Review of Literature

ANN-based technologies have been extensively used in different medical


application. A review of literature was published in three issues of "The Lancet"
[3,5,6], where it is possible to read that the most of these application regards
computer-aided decision tools. Coming to signal analysis, it is possible to note that
a large variety of tracings has been studied using ANNs: electrocardiograms [7],
electromyograms [1], electroencephalograms [9] , arterial pulse waveforms [2] and
evoked potentials in anesthesia [8].
The application of ANNs at respiratory signals analysis, shows that the most of
the efforts have been concentrated on pattern identification problems. Leon and
Lorini [10] investigated the capability of ANNs to identify spontaneous and
pressure support ventilation modes from respiratory signals. Wilks and English [21]
used ANNs to classify the efficiency of respiratory patterns, in order to predict
changes of the 0 2 saturation. Snowden et al. [20] fed an ANN with blood gas
parameters and the ventilator settings that determined them, in order to obtain new
ventilator settings. Bright et al. [4] have described the use of an ANN to identify
upper airway obstruction from flow-volume loops. Leon et al. [11] developed an
ANN-based system to detect esophageal intubation using airways flow and pressure
signals. Rasanen and Leon [19] , giving the respiratory tracings of healthy and oleic
acid injured lungs of dogs, trained an ANN to assess the presence of lung damage
and its extent.

3 Experiences of the authors

Aim of the experiments reported here was to test whether ANNs can assess
respiratory system compliance and resistance starting from airway pressure and
flow (PAW, F A W ) . TO train an ANN it is necessary to provide examples of the
tracings to be faced during its use.
In a preliminary phase, we used a model of the respiratory system developed on a
computer via software, inspired to the studies of Otis [13,17] (See figure 2). The
model provided curves that were obtained under different mechanical conditions.
The ANN had to learn to associate the curves with the RRS and CRS that determined
them. We implemented simulations of mechanical ventilation, varying for
mechanical parameters and ventilatory support. These first experiments showed the
applicability of the method [15,16] .Then we decided to evaluate the performances
of the method in noise-affected conditions.
In a joint project of the Department of Clinical Physiology of Uppsala
University (Sweden) and the Department of Emergency and Transplantation of Bari
University (Italy), we studied an animal model of acute lung injury (ALI). When
moving to biological models, the first problem to face is to provide the large amount
of examples to be used for ANN training.
168

Figure 2: The Otis model of the lung

Our idea was to use a well known effect of a substance, oleic acid (OA)
when injected in a central vein of an animal. It modifies, by acting on the lung
structures, the mechanical properties of the lung, creating a time related damage
starting at its administration (see Neumann et al., [12] ). Ten pigs were ventilated in
Volume Controlled - Constant Flow Mechanical Ventilation (VC-CFMV) and ALI
was induced by multiple OA injections. We recorded PAW and FAW at different time
intervals, in order to have different snapshots of respiratory mechanics while the
damage was establishing.
The ANN had to extract RRS and CRS from the recorded curves (that presented
an e-IHM). During the training phase, the curves plus the expected RRS and CRs
were given at the same time to the ANN. The expected RRS and CRS were obtained
by applying on each curve the IFT performed manually by an expert. Then the
trained ANN was tested: only the tracings were given and the yielded results were
compared to the expected ones. The ANN was successfully trained. At this point we
fed the ANN with tracings coming from a new group of four pigs. The aim was to
observe its performance in a prospective way. Performance on the assessment of
CRS remained very high, adjustment of ANN implementation was suggested for the
assessment of RRS. The results were published in The Journal of Applied Physiology
[14]. The described experiments demonstrated the applicability of the method by
comparing the gold standard (the IFT) and ANN-based technologies on curves
having an e-IHM.
A further step was to train an ANN to extract CRS from breaths not having an
e-IHM. Twentyfour pigs, ventilated in VC-CFMV, were studied. They underwent
ALI induction by multiple OA injections. At different time intervals, recordings of
169

more than ten breaths were obtained during steady state (see figure 3). At the end of
each series, an e-IHM was performed. This last breath was used to calculate CRS
according to IFT. The breath preceding the one having the e-IHM (and not having
any flow interruption) had to be given to the ANN (Dynamic Breath, DB). We gave
to the ANN the Pressure/Volume loop of each DB and the CRS calculated on the
successive breath (this last having an e-IHM). The ANN had to associate the DB to
the static CRS obtained by IFT on the successive breath. The results showed that
ANNs were able to extract static CRS without needing to stop inspiratory flow [18] .

Artificial
Airways Neural Interrupter
Pressure Network Technique

time
Figure 3: Experimental design for training ANNs without using e-IHM

Conclusions

These experiments show that it is possible to extract lung mechanics variables


from respiratory tracings applying ANN-based technologies. Future work has to be
focused on taking advantage of robustness and noise immunity of ANN. In the field
of mechanical ventilation a new clinical necessity is arising: the aim to control a
ventilator in "closed-loop". It concerns the use of information coming on-line from
the connected patient (as mechanics variables and blood gases partial pressure), to
titrate ventilation strategy, breath by breath. The capability of interfacing complex
variables and the claimed robustness of their performances, suggest that ANNs will
play a role in these future applications: both in information extraction and in signal
integration.
References

1. Abel, E.W., P.C. Zacharia, A. Forster, and T.L. Farrow. Neural network
analysis of the EMG interference pattern. Med Eng Phys 18 (1996) pp. 12-
17.
2. Allen, J. and A. Murray. Comparison of three arterial pulse waveform
classification techniques. J Med Eng Technol20 (1996) pp. 109-114.
3. Baxt, W.G. Application of artificial neural networks to clinical medicine.
Lancet 346 (1995) pp. 1135-1138.
4. Bright, P., M.R. Miller, J.A. Franklyn, and M.C. Sheppard. The use of a
neural network to detect upper airway obstruction caused by goiter. Am J
Respir Crit Care Med 157 (1998) pp. 1885-1891.
5. Cross, S.S., R.F. Harrison, and R.L. Kennedy. Introduction to neural
networks. Lancet 346 (1995) pp. 1075-1079.
6. Dybowski, R. and V. Gant. Artificial neural networks in pathology and
medical laboratories. Lancet 346 (1995) pp. 1203-1207.
7. Heden, B., H. Olin, R. Rittner, and L. Edenbrandt. Acute myocardial
infarction detected in the 12-lead ECG by artificial neural networks.
Circulation 96 (1997) pp. 1798-1802.
8. Huang J., H., Y. Lu, A. Nayak, and J. Roy R. Depth of anesthesia estimation
and control. Ieee Transactions On Biomedical Engineering 46 (1999) pp. 76-
81.
9. Jando, G., R.M. Siegel, Z. Horvath, and G. Buzsaki. Pattern recognition of
the electroencephalogram by artificial neural networks. Electroencephalogr
Clin Neurophysiol 86 (1993) pp. 100-109.
10. Leon, M.A. and F.L. Lorini. Ventilation mode recognition using artificial
neural networks. Comp Biomed Res 30 (1997) pp. 373-378.
11. Leon, M.A., J. Rasanen, and D. Mangar. Neural network-based detection of
esophageal intubation. Anesth Analg 78 (1994) pp. 548-553.
12. Neumann, P., J.E. Berglund, E.F. Mondejar, A. Magnusson, and G.
Hedenstierna. Dynamics of lung collapse and recruitment during prolonged
breathing in porcine lung injury. J Appl Physiol 85 (1998) pp. 1533-1543.
13. Otis, A.B., C.B. Mckerrow, R.A. Bartlett, J. Mead, M.B. Mcilroy, N.J.
Selverstone, and E.P. Radford. Mechanical factors in distribution of
pulmonary ventilation. J.Appl.Physiol. (1956) pp. 427-443.
14. Perchiazzi, G., M. Hogman, C. Rylander, R. Giuliani, T. Fiore, and G.
Hedenstierna. Assessment of respiratory system mechanics by artificial
neural networks: an exploratory study. J Appl Physiol 90 (2001) pp. 1817-
1824.
15. Perchiazzi, G., L. Indehcato, N. D'Onghia, C. Coniglio, A.M. Fanelli, and R.
Giuliani. Assessing respiratory mechanics of inhomogeneous lungs using
171

artificial neural network: network design. Proceedings of APICE Congress


(1998) pp. 209-212.
16. Perchiazzi, G., L. Indelicate, N. D'Onghia, E. De Feo, A.M. Fanelli, and R.
Giuliani. Assessing respiratory mechanics of inhomogeneous lungs using
artificial neural network: preliminary results. Proceedings of APICE
Congress (1998) pp. 213-216.
17. Perchiazzi, G., S. Martino, G. Contino, M.E. Rosafio, F. Puntillo, V.M.
Ranieri, and R. Giuliani. Alveolar overdistension during constant flow
ventilation: study of a model. Acta Anaesthesiol Scand 40 (1996) pp. A210.
18. Perchiazzi, G., L. Ruggiero, M. Hogman, R. Giuliani, T. Fiore, and G.
Hedenstierna. Neural networks extract respiratory system compliance
without needing to stop respiratory flow. Intensive Care Med 26 (2000) pp.
S294.
19. Rasanen, J. and M. Leon. Detection of lung injury with conventional and
neural network-based analysis of continuous data. J Clin Monit 14 (1998)
pp. 433-439.
20. Snowden, S., K.G. Brownlee, S.W. Smye, and P.R.F. Dear. An advisory
system for artificial ventilation of the newborn utilizing a neural network.
Med Inform 18 (1993) pp. 367-376.
21. Wilks, P.A.D. and M.J. English. A system for rapid identification of
respiratory abnormalities using a neural network. Med Eng Phys 17 (1995)
pp. 551-555.
This page is intentionally left blank
GENOMICS AND MOLECULAR
BIOLOGY
This page is intentionally left blank
175

CLUSTER ANALYSIS OF D N A - C H I P DATA

EYTAN DOMANY
Department of Physics of Complex Systems, Weizmann Institute of Science,
Rehovot 76100, Israel
E-mail: eytan.domany@weizmann.ac.il

DNA chips are novel experimental tools that have revolutionized research in molec-
ular biology and generated considerable excitement. A single chip allows simul-
taneous measurement of the level at which thousands of genes are expressed. A
typical experiment uses a few tens of such chips, each focusing on one sample -
such as material extracted from a particular tumor. Hence the results of such an
experiment contain several hundred thousand numbers, that come in the form of
a table, of several thousand rows (one for each gene) and 50 - 100 columns (one
for each sample). We developed a clustering methodology to mine such data. I
provide here a very basic introduction to the subject, with no prior knowledge of
any biology assumed. I will explain what genes are, what is gene expression and
how it is measured by DNA chips. I will also explain what is meant by "cluster-
ing" and how we analyze the massive amounts of data from such experiments. I
will present results obtained from analysis of data obtained from brain tumors and
breast cancer.

1 Introduction

This talk and paper has three parts, aimed at explaining the meaning of its
title. The first part is a crash course in biology, starting from genes and
transcription and ending with an explanation of what DNA chips are. The
second part is an equally concise introduction to cluster analysis, leading to
a recently introduced method, Coupled Two-Way Clustering (CTWC), that
was designed for the analysis and mining of data obtained by DNA chips. The
third section puts the two introductory parts together and demonstrates how
CTWC is used to obtain insights from the anaysis of gene expression data in
several clinically relevant contexts, such as colon cancer and leukemia.

2 A Crash Course in Biology

2.1 Gene Expression


I present here a severely oversimplified description of many very complex
processes. My aim is to introduce only those concepts that are absolutely
essential for understanding the data that will be presented and analyzed. The
interested reader is referred to two excellent textbooks x ' 2 .
176

NUCLEUS ^ ^-CELL

\
MBOSOMR
D.N'A

4
Figure 1. Caricature of a eucaryotic cell: its nucleus contains DNA, whereas the ribosomes
are in the cytoplasm.

Fig 1 depicts a schematic drawing of a eucaryotic cell, enclosed by its mem-


brane. Embedded in the cell's cytoplasm is its nucleus, surrounded and pro-
tected by its own membrane. The nucleus contains DNA, an one dimensional
molecule, made of two complementary strands, coiled around each other as a
double helix. Each strand consists of a backbone to which a linear sequence
of bases is attached. There are four kinds of bases, denoted by C,G,A,T. The
two strands contain complementary base sequences and are held together by
hydrogen bonds that connect two matching pairs of bases; G-C and A-T.
A gene is a segment of DNA, which contains the formula for the chemical
composition of one particular protein. Proteins are the working molecules of
life; nearly every biological function is carried out by a protein. Topologi-
cal^, a protein is also a chain, each link of which is one of 20 amino acids,
connected head to tail by covalent peptide bonds. A gene is nothing but an
alphabetic cookbook recipe, listing the order in which the amino acids are to
be strung when the corresponding protein is synthesized. Genetic information
is encoded in the DNA molecule, in the linear sequence in which the bases on
the two strands are ordered; a triplet of three consecutive bases codes for one
particular amino acid.
The genome is the collection of all the chemical formulae for the proteins
that an organism needs and produces. The genome of a simple organism such
as yeast contains about 6400 genes; the human genome has between 40,000
and 60,000. An overwhelming majority (98%) of human DNA contains non-
coding regions (introns), i.e. strands that do not code for any particular
protein.
Here is an amazing fact; every cell of a multicellular organism contains its
entire genome! That is, every cell has the entire set of recipes the organism
may ever need; the nucleus of each of the reader's cells contains every piece
of information needed to make a copy (clone) of him/her! Clearly, cells of a
complex organisms, taken from different organs, have entirely different func-
177

PROTEIN SYNTHESIS:

Figure 2. Transcription involves synthesis of mRNA, a copy of the gene encoded on the
DNA (left). The mRNA molecules leave the nucleus and serve as the template for protein
synthesis by the ribosomes (right).

tions and the proteins that perform these functions are very different; cells
in our retina need photosensitive molecules, whereas our livers do not make
much use of these. A gene is expressed in a cell when the protein it codes
for is actually synthesized.
There will be differences between the expression profiles of different cells,
and even in a single cell there are variations of expression that are dictated
by external and internal signals that reflect the state of the organism and the
cell itself.
Synthesis of proteins takes place at the ribosomes. These are enormous
machines (made also of proteins) that read the chemical formulae written
on the DNA and synthetise the proten according to the instructions. The
ribosomes are in the cytoplasm, whereas the DNA is in the protected envi-
ronment of the nucleus. This poses an immediate logistic problem - how does
the information get transferred from the nucleus to the ribosome?

2.2 Transcription
The obvious solution of information transfer would be to rip out the piece of
DNA that contains the gene that is to be expressed, and transport it to the
cytoplasm. The engineering analogue of this strategy is the following. Imagine
an architect, who has a single copy of a design for a building, stored on the
hard disk of his PC. Now he has to transfer the blueprint to the construction
site, in a different city. He probably will not opt for tearing out his hard
disk and mailing it to the site, risking it being irreversibly lost or corrupted.
Rather, he will prepare several diskettes, that contain copies of his design,
and mail these in separate envelopes.
This is precisely the strategy adopted by cells.
When a gene receives a command to be expressed, the corresponding
178

double helix of DNA opens, and a precise copy of the information, as written
on one of the strands, is prepared (see Fig 2). This "diskette" is a linear
molecule called messenger R N A , m R N A and the process of its production,
subsequent reading by the ribosome and synthesis of the corresponding protein
a
is called transcription. In fact, when many molecules of a certain protein are
needed, the cell produces many corresponding mRNAs, which are transferred
through the nucleus' membrane to the cytoplasm, and are "read" by several
ribosomes. Thus the single master copy of the instructions, contained in the
DNA, generates many copies of the protein (see Fig 2). This transcription
strategy is prudent and safe, preserving the precious master copy; at the same
time it also serves as a remarkable amplifier of the genetic information.
A cell may need a large number of some proteins and a small number of
others. That is, every gene may be expressed at a different level. The man-
ner in which the instructions to start and stop transcription are given for a
certain gene is governed by regulatory networks, which constitute one of the
most intricate and fascinating subjects of current research. Transcription is
regulated by special proteins, called transcription factors, which bind to spe-
cific locations on the DNA, upstream from the coding region. Their presence
at the right site initiates or suppresses transcription.
This leads us to the basic paradigm of gene expression analysis:

The " biological state" of a cell (or tisue) and the ongoing biological
processes are reflected by its expression profile: the expression levels
of all the genes of the genome. These, in turn, are reflected in the
concentrations of the corresponding mRNA molecules.

This paradigm is by no means trivial or perfectly true. One may argue


that the state of a cell at a given moment is defined by its chemical com-
position, i.e. the concentration of all the constituent proteins. There is no
assurance that these concentrations are directly proportional to the concen-
trations of the related mRNA molecules. The rates of degradation of the
different mRNA, the efficiency of their transcription to proteins, the rate of
degradation of the proteins - all these may vary. Nevertheless, this is our
working assumption; specifically, we assume that for human cells the expres-
sion levels of all 40,000 genes completely specify the state of the particular
tissue from which the cells were taken. The question we turn to answer is

"Actually the mRNA is "read" by one end of another molecule, transfer RNA; the amino
acid that corresponds to the triplet of bases that has just been read is attached to the other
end of the tRNA. This process, and the formation of the peptide bond between subsequent
amino acids, takes place on the ribosome, which moves along the mRNA as it is read.
179

- how does one measure, for a given cell or tissue, the expression levels of
thousands of genes?

2.3 DNA chips

A D N A chip is the instrument that measures simultaneously the concen-


tration of thousands of different mRNA molecules. It is also referred to as a
DNA microarray or macroarray, depending on the number of genes measured
(see 3 for a recent review of the technology). DNA macroarrays, produced
by Affymetrix4, can measure simultaneously the expression levels of up to
12,000 genes; the less expensive spotted arrays 5 do the same for several thou-
sand. Schematically, this is done by dividing a chip (a glass plate of about
1 cm across) into "pixels" - each dedicated to one gene g. Billions of pieces
of single strand DNA taken from g are attached to the dedicated pixel. The
mRNA molecules are extracted from cells taken from the tissue of interest
(such as tumor tissue obtained by surgery) and their concentration is largely
enhanced. Fluorescent markers are attached to these mRNA molecules. The
solution of marked and enhanced mRNA molecules is placed on the chip and
the mRNA molecules, originally extracted from the tissue, are diffusing over
the dense forest of single strand DNA that were placed on the chip. When
such an mRNA encounters a part of the gene of which it is a perfect copy, it
attaches to it - hybridizes - with a high affinity (considerably higher than with
a bit of DNA of which it is not a perfect copy). When the mRNA solution
is washed off, only those molecules that found their perfect match remain
stuck to the chip. Now the chip is illuminated with a laser, and these stuck
probes fluoresce; by measuring the light intensity emanating from each pixel,
one obtains a measure of the number of probes that stuck, which, in turn, is
proportional to the concentration of these mRNA in the investigated tissue.
In this manner one obtains, from a chip on which Ng genes were placed, Ng
numbers that represent the expression levels of these genes in that tissue. A
typical experiment provides the expression profiles of several tens of samples
(say Ns « 100), over several thousand (Ng) genes. These results are summa-
rized in an Ng x Ns expression table; each row corresponds to one particular
gene and each column to a sample. Entry Ags of such an expression table
stands for the expression level of gene g in sample s. For example, the ex-
periment on colon cancer, first reported by Alon et al 6 , contains Ng = 2000
genes whose expression levels passed some threshold, over Ns = 62 samples,
40 of which were taken from tumor and 22 from normal colon tissue.
Such an expression table contains up to several hundred thousand num-
bers; the main issue addresed in this paper concerns the manner in which
180

such vast amounts of data are "mined", to extract from it biologically rele-
vant meaning. Several obvious aims of the data analysis are the following:

1. Identify genes whose expression levels reflect biological processes of in-


terest (such as development of cancer).

2. Groups the tumors ito classes that can be differentiated on the basis of
their expression profiles, possibly in a way that can be interpreted in
terms of clinical classification. If one can partition tumors, on the basis
of their expression levels, into relevant classes (such as e.g. positive vs
negative responders to a particular treatment), the classification obtained
from expression analysis can be used as a diagnostic and thereupeutic
tool 6 .
3. Finally, the analysis can provide clues and guesses for the function of
genes (proteins) of yet unknown role c .

This concludes the brief and very oversimplified review of the biology back-
ground that is essential to understand the aims of this research. In what
follows I present a method designed for mining such expression data.

3 Cluster Analysis

3.1 Supervised versus unsupervised analysis


Say we have two groups of samples, that have been labeled on the basis of
some external (i.e. not contained in the expression table) information, such
as clinical identification of tumor and normal samples; and our aim is to
identify genes whose expression levels are significantly different for these two
groups. Supervised analysis is the most suitable method for this kind of task.
The simplest way is to treat the genes one at a time; for gene g we have
Ns expression levels Ags, and we propose as a null hypothesis that the these
numbers were picked at random, from the same distribution, for all samples
s. There are well established methods to test the validity of such a hypothesis
and to calculate for each gene a statistic whose value indicates whether the
null hypothesis should be accepted or rejected, as well as the probability
Pg for error (i.e. for rejecting the null hypothesis on the basis of the data,

"For example one hopes to use the expression profile of a tumor to select the most effective
therapy.
c
T h e statement "the human genome has been solved" means that the sequences of 40,000
genes are known, from which the chemical formulae of 40,000 proteins can be obtained.
Their biological function, however, remains largely unknown.
181

even though it is correct). An alternative supervised analysis uses a subset


of the tissues of known clinical label to train a neural network to separate
them into the known classes on the basis of their expression profiles. The
generalization ability of the network is then estimated by classifying a test set
of samples (whose correct labels are also known), that was not used in the
training process.
The main disadvantage of supervised methods is their being limited to
hypothesis testing. If one has some prior knowledge which can lead to a hy-
pothesis, supervised methods will help to accept or reject it. They will never
reveal the unexpected and never lead to new hypotheses, or to new partitions
of the data. For example, if the tumors break into two unanticipated classes
on the basis of their expression profiles, a supervised method will not be able
to discover this. Another shortcoming is the (often very common) possibility
of misclassification of some samples. A supervised method will not discover,
in general, samples that were mistakenly labeled and used in, say, the training
set.
The alternative is to use unsupervised methods of analysis. These aim at
exploratory analysis of the data, introducing as little external knowledge or
bias as possible, and "let the data speak". That is, we explore the structure
of the data on the basis of correlations and similarities that are present in it.
In the context of gene expression, such analysis has two obvious goals:
1. Find groups of genes that have correlated expression profiles. The mem-
bers of such a group may take part in the same biological process.
2. Divide the tissues into groups with similar gene expression profiles. Tis-
sues that belong to one group are expected to be in the same biological
(e.g. clinical) state.
The method presented here to accomplish these aims is called clustering.

3.2 Clustering - statement of the problem.


The aims of cluster analysis 7 ' 8 can be stated as follows: given N data points,
Xj, i = 1,...,N embedded in D-dimensional space (i.e. each point is repre-
sented by D components or coordinates), identify the underlying structure of
the data. That is, peartition the N points into M clusters, such that points
that belong to the same cluster are "more similar" to each other than two
points that belong to different clusters. In other words, one aims to determine
whether the N points form a single " cloud", or two, or more; in respectable
unsupervised methods the number of clusters, M, is also determined by the
algorithm.
182

r(RESOLUTION)

ik
H i l l l J | [ l l l l l l l l lll.l I I I I

Figure 3. Left: Each zebra or giraffe is represented as a point on the neck length - coloration
shape plane. The points form two clouds marked by the black ellipses. At higher resolution
(controlled by the parameter T), we notice that the cloud of the giraffes is in fact composed
of two slightly separated sub clouds. The corresponding dendrogram is presented on the
right hand side.

The clustering problem, as stated above, is clearly ill posed. No definition


was given for what is "more similar"; furthermore, as we will see, the manner
in which data points are assigned to clusters depends on the resolution at
which the data are viewed. The last concern is addressed by generating a
dendrogram or tree of clusters, whose number and composition varies with
the resolution that is used. To clarify these points I present a simple example
for a process of "learning without a teacher", of which clustering constitutes
a particular case.
Imagine the following experiment; find a child who has never seen either
a giraffe or a zebra, and expose him to a large number of pictures of these
animals without saying a word of instruction. On each animal shown the child
performs a series of D measurements, two of which are most certainly L, the
length of the neck, and E, the excentricity of the coloration (i.e. the ratio of
the small dimension and the large). Each animal is represented, in the child's
brain, as a point i n a D dimensional space. Fig. 3 depicts the projection of
these points on the two dimensional (L,E) subspace.
Even though initially the child will see "animals" - i.e. assign all points
to a single cloud - with time he will realize (as his resolution improves) that
183

in fact the data break into two clear clouds; one with small values of L and
E, corresponding to the zebras, and the second - the giraffes - with large L
and E PS 1. The child, not having been instructed, will not know the names
of the two kinds of animals he was exposed to, but I have no doubt that he
will realize that the pictures were taken of two different kinds of creatures. He
has performed a clustering operation on the visual data he has been presented
with.
Let us pause and consider the data and the statements that were made.
Are there indeed two clouds in Fig 3? As we already said, when the data are
seen with low resolution, they appear to belong to a single cloud of animals.
Improved resolution leads to two clouds - and closer inspection reveals that
in fact the cloud of giraffes breaks into two sub-clouds, of points that have
similar colorations but different neck lengths! Apparently there were mature
fully developed giraffes with long necks, and a group of young giraffes with
shorter necks. Finally, when resolution is improved to the level of discerning
individual differences between animals, each one forms his own cluster. Thus
the proper way of representing the structure of the data is in the form of a
dendrogram, also shown in Fig 3. The vertical axis corresponds to a parameter
T that represents the resolution at which the data are viewed. The horizontal
axis is nominal - it presents a linear ordering of the individual data points
(as identified by the final partition, in which each cluster consists of one
individual point). The ordering is determined by the entire dendrogram - it
can be thought of as a highly nonlinear mapping of the data from D to one
dimension. In any clustering algorithm that we use, we should look for the
two features mentioned here, of (a) yielding a dendrogram that starts with
a single cluster of N points and ends with N single-point clusters, and (b)
providing a one-dimensional ordering of the data.

3.3 Clustering Algorithms


There are numerous clustering algorithms. Even though each aims at achiev-
ing a truly unsupervised and objective method, every one has built in, im-
plicitly or explicitly, the bias of its inventor as to how a " cluster should look"
- e.g. a tight, spherical cloud, or a continuous region of high relative density
and arbitrary shape, etc.
Average linkage 7 , an agglomerative hierarchical algorithm that joins pairs
of clusters on the basis of their proximity, is the most widely used for gene
expression analysis 9 . K-means 7 , s and Self Organized Maps 10 are algorithms
that identify centroids or representatives for a preset number of groups; data
points are assigned to clusters on the basis of their distances from the cen-
184

troids. There are several physics related clustering algorithms, e.g. Deter-
ministic Annealing n and Coupled Maps 12 . Deterministic Annealing uses
the same cost function as K-means, but rather than minimizing it for a fixed
value of clusters K, it performs a statistical mechanics type analysis, using a
maximum entropy principle as its starting point. The resulting free energy is
a complex function of the number of centroids and their locations, which are
calculated by a minimization process. This minimization is done by lowering
the temperature variable slowly and following minima that move and every
now and then split (corresponding to a second order phase transition). Since
it has been proved that in the generic case the free energy function exhibits
first order transitions, the deterministic annealing peocedure is likely to follow
one of its local minima.
We use another physics-motivated algorithm, which maps the clustering
problem onto the statistical physics of granular ferromagnets 13 .

3.4 Superparamagnetic Clustering (SPC)


14
The algorithm assigns a Potts spin Si to each data point i. We use q = 20
components; the results depend very weakly on q. The distance matrix
DyHXi-X.I (1)
is constructed. For each spin we identify a set of neighbors; a pair of neigh-
borings interact by a ferromagnetic coupling J^ = f{D{j) with a decreasing
function / . We used a Gaussian decay, but since the interaction between non-
neighbors is set to J = 0, the precise form of the function has little influence
on the results.
The energy of a spin configuration {5} is given by

^[{5}] = - S ^ - [ l - * ( 5 i > 5 j ) ] (2)

The summation runs over pairs of neighbors. We perform a Monte Carlo


simulation of this disordered Potts ferromagnet at a series of temperatures.
At each temperature T we measure the spin-spin correlation for every pair of
neighbors,
Gij=<[S(Si,Sj)-l/q]/[l-l/q]> (3)
where the brackets < • > represent an equilibrium average of the ferromagnet
(2), measured at T. If i and j belong to the same ordered "grain", we will
have Gij « 1, whereas if the two spins are uncorrelated, dj « 0. Hence we
threshold the values of G^; if G^ > 0.5 the data points i and j are connected
185

by an edge. The clusters obtained at temperature T are the connected compo-


nents of the resulting graph. In fact, the simple thresholding is supplemented
by a "directed growth" process, described elsewhere.
At T = 0 the system is in its ground state, all Si have the same value, and
this procedure generates a single cluster of all N points. At T = oo we have iV
independent spins, all pairs of points are uncorrelated and the procedure yields
N clusters, with a single point in each. Hence clearly T controls the resolution
at which the data are viewed; as it increases, we generate a dendrogram of
clusters of decreasing sizes.
This algorithm has several attractive features, such as (i) the number of
clusters is determined by the algorithm itself and not externally prescribed
(ii) Stability against noise in the data; (iii) ability to identify a dense set of
points, that form a cloud of an irregular, non-spherical shape, as a cluster, (iii)
generating a hierarchy (dendrogram) and providing a mechanism to identify
in it robust, stable clusters.
The physical basis for the last feature is that if a cluster is made of a
dense set of points on a background of lower density, well separated from
other dense regions, it will form (become an independent magnetized grain)
at a low temperature T\ and dissociate into subclusters at a high temperature
T 2 . The ratio of the temperatures at which a cluster "dies" and "is born",
R = T2/Ti, is a measure of its stability.
SPC was used in a variety of contexts, ranging from computer vision 15
to speech recognition 14 . Its first direct application to gene expression data
has been 16 for analysis of the temporal dependence of the expression levels
in a synchronized yeast culture 17 ' 9 , identifying gene clusters whose variation
reflects the cell cycle. d Subsequently, SPC was used 18 to identify primary
targets of p53, a tumor suppressor that acts as a trascription factor of central
importance in human cancer.
Our ability to identify stable (and statistically significant) clusters is of
central importance for our usage of SPC in our algorithm for gene expression
analysis.

4 Clustering Gene Expression Data

4-1 Two way clustering


The clustering methodology described above can be put to use for analysis
of gene expression data in a fairly straightforward way, bearing in mind the
d
We have also discovered in this analysis that the samples taken at even indexed time
intervals were placed in a freezer!
186

questions and aims metioned above.


We clearly have two main seemingly distinct aims; to identify groups of
co-regulated genes which probably belong to the same machinery or network,
and to identify molecular characteristics of different clinical states and dis-
criminators between them. The obvious way to go about these two tasks is by
Two Way Clustering. First view the N samples as the objects to be clustered;
each is represented by a point in a G dimensional "feature space", where G
is the number of genes for which expression levels were measured (in fact one
works only with a subset of the genes on a chip - those that pass some preset
filters). This analysis yields a dendrogram of samples, with each cluster con-
taining samples with sizeable pairwise similarities of their expression profiles
measured over the entire set of genes.
The second way of looking at the same data is by considering the genes
as the objects to be clustered; G data points embedded in an N dimensional
feature space. This analysis groups together genes on the basis of their cor-
relations over the full set of samples. In Fig. 4 we present the results of
two-way clustering data obtained for 36 brain tumors (see th enext section for
details). We show here the expression matrix, with the rows corresponding
to the genes and columns to samples. The dendrograms the correspond to
the two clustering operations described above are shown next to the matrix,
whose rows and columns have been already permuted according to the linear
order imposed by the two dendrograms.
This is the type of analysis that has been widely used in the gene expres-
sion clustering literature. It represents a holistic approach to the problem;
using every piece of reliable information to look at the entire grand picture.
This apprach does have, however, several obvious shortcomings; overcoming
these was the motivation to develop a method which can be viewed as taking a
more reductioninst approach, while improving significantly the signal to noise
ratio of the processed data.

4-2 Coupled Two Way Clustering - Motivation

The main motivation of introducing CTWC 19 was to increase the signal to


noise ratio of the expression data. There are two different kinds of "noise"
the method is designed to overcome.
The first of these is a problem generated by the very advantage and most
exciting aspect of DNA-chips - the ability to view expression levels of a very
large number of genes simultaneously. Say one stays, after initial filtering,
with two thousand genes, and one wishes to study a particular aspect of the
samples (e.g. differentiating between several kinds of cancer). Chances are
187

Figure 4. Two-way clustering of brain tumor data; the two dendrograms, of genes and
samples, are shown next to the expression matrix.

t h a t the genes which participate in the pathology of interest constitute only a


small subset of t h e total 2000 - say we have 40 genes whose expression indeed
distinguishes the samples on the basis of the process t h a t is studied. Hence
the desired "signal" resides in 2 % of the total genes t h a t are analysed; the
remaining 98 % behave in a way t h a t is uncorrelated with these and introduce
nothing but noise. T h e contribution of the relevant genes to the distance
between a pair of samples will be overwhelmed by the random signal of the
much larger irrelevant set. My favorite example for this situation is t h a t of
a football stadium, in which 99,000 spectators scream at random, while 1000
others are singing a coherent t u n e . These 1000 are, however, scattered all over
the stadium - the chance t h a t a listener, standing at the center of the field,
will be able to identify the t u n e are very small. If only we could identify the
singers, concentrate t h e m into one stand and point a directional microphone
at t h e m - we could hear the signal!
In the language of gene expression analysis, we would like to identify the
relevant subset of 40 genes, and use only their expression levels to charac-
terize the samples. In other words, t o project t h e d a t a p i n t s representing the
samples from t h e 2000 dimensional space in which they are embeddded, down
to a 40 dimensional subspace, and to assess t h e structure of the d a t a (e.g.
- do they form two or more distinct groups?) on the basis of this projected
representation. A similar effect may arise due to the subjects; a partition of
188

the genes which is much more relevant to our aims could have been obtained
had we used only a subset of the samples.
Both these examples have to do with reducing the size of the feature
space. Sometimes it is important to use the reduced set of features to cluster
only a subset of the objects. For example, when we have expression profiles
from to kinds of leukemia patients, ALL and AML, with the ALL patients
breaking further into two sub-families, of T-ALL and B-ALL, the separation
of the latter two subclouds of points may be masked by the interpolating
presence of the AML group. In other words, a special set of genes will reveal
an internal structure of the ALL cloud only when the AML cloud is removed.
These two statements amount to a need to work with special submatrices
of the full expression matrix. The number of such submatrices is, however,
exponential in the size of the dataset, and the obvious question that arises is -
how can one select the "right" submatrices in an unsupervised and yet efficient
way? The CTWC algorithm provides a heuristic answer to this question.

4-3 Coupled Two Way Clustering - Implementation


CTWC is an iterative process, whose starting point is the standard two way
clustering mentioned above. Denote the set of all samples by SI and that of
all genes used as Gl. The notation S1(G1) stands for the clustering operation
of all samples, using all genes, and G1(S1) for clustering the genes using all
samples. From both clustering operations we identify stable clusters of genes
and samples, i.e. those for which the stability index R exceeds a critical value
and whose size is not too small. Stable gene clusters are denoted as GI with
1=2,3,... and stable sample clusters as SJ, J=2,3,... In the next iteration we
use every gene cluster GI (including 1=1) as the feature set, to characterize
and cluster every sample set SJ. These operations are denoted by SJ(GI)
(we clearly leave out S1(G1)). In effect, we use every stable gen cluster as a
possible "relevant gene set"; the submatrices defined by SJ and GI are the
ones we study. Similarly, all the clustering operations of the form GI(SJ) are
also carried out. In all clustering operations we check for the emergence of
partitions into stable clusters, of genes and samples. If we obtain a new stable
cluster, we add it to our list and record its members, as well as the clustering
operation that gave rise to it. If a certain clustering operation did not give
rise to new significant partitions, we move down the list of gene and sample
clusters to the next pair.
This heuristic identification of relevant gene sets and submatrices is noth-
ing but an exhaustive search among the stable clusters that were generated.
The number of these, emerging from G1(S1) is a few tens, whereas S1(G1)
189

generates a few stable sample clusters usually. Hence the next stage typically
involves less than a hundred clustering operations. These iterative steps stop
when no new stable clusters beyond a preset minimal size are generated, which
usually happens after the first or second level of the process.
In a typical analysis we generate between 10 and 100 interesting partitions,
which are searched for biologically or clinically interesting findings, on the
basis of the genes that gave rise to the partition and on the basis of available
clinical labels of the samples. It is important to note that these labels are used
a posteriori, after the clustering has taken place, to interpret and evaluate the
results.

5 Applications of C T W C for gene expression data analysis

So far CTWC has been applied primarily to analysis of data from various
kinds of cancer. In some cases we used publicly available data, with no prior
contact with the groups that did the original acquisition and analysis. Our
initial work on colon cancer 6 and leukemia 20 fall in this category.
Subsequently we collaborated with a group at the University Hospital
at Lausanne (CHUV) on Glioblastoma - in this work we were involved from
early in the data acquisition stage. Our current collaborations include work
on colon cancer and breast cancer. In the latter case we worked with publicly
available data, but its choice and the challenge to improve on existing analysis
came from our collaborators. We are also involved in work on leukemia and
on meiosis 21 in yeast; finally, the same method was applied successfully 22
to analyze data obtained from an "antigen chip", used to study the antibody
repertoire of subjects that suffer from autoimmune diseases, such as diabetes.
I will limit the discussion here to presentation a few select results obtained
for glioblastoma 23 and for breast cancerItaiMSc.

5.1 CTWC analysis of brain tumors (gliomas)


Brain tumors are classified into three main groups. Low grade astrocytoma
(A) are small sized tumors at an early stage of development. Cancerous
growth may recur after their removal, giving rise to secondary gliomas (SC).
The third kind are primary (PR) glioblastoma (GBM); this classification is
assigned when at the stage of initial diagnosis and discovery the tumor is
already of a large size. A dataset 51 of 36 samples was obtained by a group
from the University Hospital at Lausanne 2 3 . 17 of these were from PR GBM,
4 - from SC, 12 were from A and 3 from human glioma cell lines grown in
culture. Expression profiles were obtained using Clontech Atlas 1.2 arrays of
190

•T^fTTTT'T"

m
<$- ,!©-

&).<&)

a 5 10 is

Figure 5. The operation 51(G5), clustering all tumors on the basis of their expression
profiles over the genes of cluster G5. A stable cluster, 511 emerges, containing all the
non-primary tumors and only two of the primaries.

1176 genes. For each gene g the measured expression value for tumor sample
s was divided by its value in a reference sample composed of a mixture of
normal brain tissue. We filtered the genes by keeping only those for which the
maximal value of this ratio (over the 36 samples) exceeded its minimal value
by at least a factor of two. 358 genes passed this filter and constituted our
full gene set (71, which was clustered using expression ratios over 5 1 . The
G1(S1) clustering operation (see Fig 4) yielded 15 stable gene clusters. The
complementary operation S1(G1) did not yield any partition of the samples
that could be given clear clinical interpretation.
One of the stable gene clusters, G5, contained 9 genes. When the ex-
pression levels of only these genes are used to characterize the tumors [in the
operation denoted 51(G5)], a large and stable cluster, 511, of 21 tumors,
emerged (see Fig 5. This cluster contained all the 12 astrocytoma and all 4
SC tumors. Three of the remaining 5 tumors of 511 were cell lines and two
were registered as PR GBMs. Pathological diagnosis was redone for these two
tumors; one was found to contain a significant oligoastrocytoma component,
and much of the piece of the other, that was used for RNA extraction, was
diagnosed as of normal brain ifiltrative zone. Hence the expression levels of
G5 gave rise to a nearly perfect separation of PR from non-PR (A and SC
tumors). The genes of G5 were significantly upregulated in PR and downreg-
ulated in A and SC.
These findings made good biological sense, since three of the genes in
G5 (VEGF, VEGFR and PTN) are related to angiogenesis. Angiogenesis is
191

the process of development of blood vessels, which are essential for growth
of tumors beyond a certain critical size, bringing nutrition to and removing
waste from the growing tissue. Since PR GBM are large, Upregulation of of
genes that are known to be involved in angiogenesis is a logical consequence
of the fact that PR GBM are large tumors.
An important application of the method concerns investigation of the
genes that belong to G5; in particular, one of the genes of G5, IGFBP2,
was of considerable interest with little existing clues to its function and role
in cancer development. Our finding, that its expression is strongly correlated
with the angiogenesis related genes came as a surprise that was worth detailed
further study. The co-expression of genes from the IGFBP family with VEGF
and VEGFR has been demonstrated in an independent experiment that tested
this directly for cell lines under different conditions.
This example demonstrates the power of CTWC; a subgroup of genes
with correlated expression levels was found to be able to separate P R from
non-PR GBM, whereas using all the genes introduced noise that wiped out
this separation. In addition, by looking at the genes of this correlated set,
we provided an indication for the role that a gene with previously unknown
function may play in the evolution of tumors.
For other findings of interest in this data set we refer the reader to the
paper by Godard et al 23 .

5.2 Breast Cancer Data


In a different study, on breast cancer, we used publicly available expression
data of Perou et al 24 . The choice of this particular data set was guided by D.
Botstein, who informed us that these were of the highest quality, were sub-
mitted to most extensive effort for analysis and challenged us to demonstrate
that our method can extract findings that eluded previous treatments. The
results of this study are available25; here I present only one particular new
finding.
The Stanford data contained expression profiles of 65 human samples (51)
and 19 cell lines. 40 tumors were paired, with samples taken before and after
chemotherapy (with doxorubicin), to which 3 (out of 20) subjects responded
positively. 1753 genes (Gl) passed initial filtering; the clustering operation
51 (Gl), of all the samples using their expression profiles over all these genes,
did not yield any clear meaningful partitions. Perou et al realized the same
point that has motivated us to construct CTWC, namely that one has to
prune the number of genes that are used in order to improve the signal to
noise ratio. They ranked the genes according to a figure of merit they in-
192

after-
-C
fttt^d
-fcj
10

iJP 3ar

before
\}0
n \
^

10 20 30 40 50 10 20 30
Genes

Figure 6. The operation S1(G46), clustering all tumors on the basis of the proliferation
related genes of G46. We found a cluster (b) which contained all three samples from
patients for who chemotherapy was successful, taken before the treatment. Cluster (b)
contained 10 out of the 20 "before" samples.

troduced, which measures the proximity of expressions of the two samples


taken from the same patient before and after chemotherapy, versus the (ex-
pectedly larger) dissimilarity of samples from different patients. The 496 top
scorers constituted their "intrinsic gene set" which was then used to cluster
the samples.
We did not use this intrinsic set but rather, applied CTWC on the full
sets of samples and genes. In the G1(S1) operation we found several stable
gene clusters. One of these, G46, contained 33 genes, whose expression levels
correlate well with the cells' proliferation rates. Only 2 out of these made
it into the intrinsic set of Perou et al; hence they could not have found any
result that we obtained on the basis of these genes.
The operation 51(G46) identified three main clusters; (a) of samples with
193

low proliferation rates - these are 'normal breast - like'; (b) samples with in-
termediate, and (c) with high proliferation rates. Interestingly, the "before
treatment" samples taken from all three tumors for which chemotherapy did
succeed were in cluster (b), whereas the corresponding 'after treatment' sam-
ples were in (a), the 'normal breast - like' cluster. Therefore the genes of
G46 can perhaps be used a posteriori, to indicate success of treatment on
the basis of their expression measured after treatment and, more importantly,
may have predictive power with respect to the probability of success of the
doxorubicin therapy, that was used. Intermediate expression of the G46 genes
may serve as a marker for a relatively high success rate of the Doxorubicin
treatment (3/10 versus 3/20 for the entire set of "before treatment" sam-
ples). Clearly these statements are backed only by statistics based on small
samples, but they do indicate possible clinical applications of the method,
provided experiments on more samples strengthen the statistical reliability of
these preliminary findings.

6 Summary

DNA chips provide a new, previously unavailable glimpse into the manner
in which the expression levels of thousands of genes vary as a function of
time, tissue type and clinical state. Coupled Two Way Clustering provides
a powerful tool to mine large scale expression data by identifying groups of
correlated (and possibly co-regulated) genes which, in turn, are used to divide
the samples into biologically and clinically relevant groups. The basic "engine"
used by CTWC is a clustering algorithm rooted in the methodology of and
insight gained from Statistical Physics.
The extracted information may enlarge our body of general basic knowl-
edge and understanding, especially of gene rgulatory networks and processes.
In addition, it may provide clues about the function of genes and their role
in various pathologies; one can also hope to develop powerful diagnostic and
prognostic tools based on gene microarrays.

Acknowledgments

I have benefited from advice and assistance of my students G. Getz, I. Kela,


E. Levine and many others. I am particularly grateful to the community
of biologists who were extremely open minded, receptive and helpful at ev-
ery stage of our entry to their fields: D. Givol provided our first new data, as
well as invaluable advice and encouragement. The CHUV group, in particular
Monika Hegi and Sophie Godard, shared their data and knowledge generously,
194

D. Notterman and U. Alon were instrumental in getting us started on their


colon cancer experiment, D. Botstein guided us towards his best breast cancer
data, I. Cohen was a powerful driving force motivating us to apply our meth-
ods to "antigen chips" which he invented. Our work has been supported by
grants from the Germany-Israel Science Foundation (GIF) the Israel Science
Foundation (ISF) and the Leir-Ridgefield Foundation.

References

1. B. Alberts, D. Bray, J. Lewis,M. Raff, K. Roberts and J. D. Watson,


Molecular Biology of the Cell, 3rd edition, (Garland Publishing, New
York, 1994).
2. J. L. Gould and W. T. Keeton, Biological Science, 6th edition (W.W.
Norton & Co., New York, London, 1996).
3. A. Schulze and J. Downward, Nature Cell. Biol. 3, 190 (2001)
4. See http://www.affymetrix.com for information.
5. See http://cmgm.stanford.edu/pbrown/mguide/index.html
6. U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and
A.J. Levine Proc. Natl Acad. Sci. USA 96, 6745 (1999).
7. A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, (Prentice
Hall, Englewood Cliffs NJ, 1988).
8. O.R. Duda, P. E. Hart and D. G. Stork, Pattern Classification (John
Wiley & Sons Inc., New York 2001)
9. M. Eisen, P. Spellman, P. Brown, and D. Botstein, Proc. Natl. Acad.
Sci. USA 95, 14863 (1998).
10. T. Kohonen, Self-Organizing Maps (Springer, Berlin 2001)
11. K. Rose, E. Gurewitz and G. C. Fox, Phys. Rev. Lett 65, 945 (1990).
12. L. Angelini, F. De Carlo, C. Marangi, M. Pellicoro and S. Stramaglia,
Phys. Rev. Lett. 85, 554 (2000).
13. M. Blatt, S. Wiseman, and E. Domany, Phys. Rev. Lett. 76, 3251
(1996).
14. M. Blatt, S. Wiseman, and E. Domany, Neural Comp. 9, 1805 (1997).
15. E. Domany, M. Blatt, Y. Gdalyahu and D. Weinshall, Comp.Phys.Comm.
121, 5 (1999).
16. G. Getz, E. Levine, E. Domany, and M. Zhang Physica A 279, 457 (2000).
17. P. T. Spellman et al, Mol.Biol.Cell 9, 3273 (1998).
18. K. Kannan, N. Amariglio, G. Rechavi, J. Jakob-Hirsch, I. Kela, N.
Kaminski, G. Getz, E. Domany and D. Givol, Oncogene 20, 2225 (2001).
19. G. Getz, E. Levine and E. Domany, Proc. Natl. Acad. Sci. USA 97,
12079 (2000).
195

20. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek,


J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri,
C D . Bloomfield, and E.S. Lander, Science 286, 531 (1999).
21. M. Primig et al, Nature Genetics 26, 415 (2000).
22. F. Quintana, G. Getz, G. Hed, E. Domany and I. R. Cohen, (submitted
2002).
23. Human gliomas and contributes to their classification, S. Godard, G.
Getz, H. Kobayashi, M. Nozaki, A.-C. Diserens, M.-F. Hamou, R.
Stupp, R. C. Janzer, P. Bucher, N. de Tribolet, E. Domany, M. E.
Hegi,(submitted, 2002).
24. C M . Perou et al, Nature 406, 747 (2000).
25. I. Kela, Unraveling Biological Information from Gene Expression Data,
Using Advanced Clustering Techniques, (M.Sc. Thesis, Weizmann Insti-
tute of Science, 2001).
196

CLUSTERING mtDNA SEQUENCES FOR H U M A N EVOLUTION STUDIES

C. MARANGI
Dipartimento Interateneo di Fisica, Universita di Bari, 70126 Bari, Italy
Istituto per le Applicazioni del Calcolo "M. Picone ", Sezione di Bari, CNR,
70126 Bari, Italy
E-mail: c. marangi@area. ba. cnr. it

L. ANGELINI, M. MANNARELLI, M. PELLICORO , S. STRAMAGLIA


Dipartimento Interateneo di Fisica, Universita di Bari, 70126 Bari, Italy
Center of Innovative Technologies for Signal Detection and Processing, 70126 Bari, Italy

M. ATTIMONELLI
Dipartimento di Biochimica e di Biologia Molecolare, Universita di Bari, 70126 Bari, Italy

M. DE ROBERTIS
Dipartimento di Genetica ed Anatomia Patologica, Universita di Bari, 70126 Bari, Italy

L. NITTI
D.E.T.O., Universita di Bari, 70126 Bari, Italy
Center of Innovative Technologies for Signal Detection and Processing, 70126 Bari, Italy

G. PESOLE
Dipartimento di Fisiologia e Biochimica Generali, Universita di Milano,20133Milano, Italy

C. SACCONE
Dipartimento di Biochimica e di Biologia Molecolare, Universita di Bari, 70126 Bari, Italy

M. TOMMASEO
Dipartimento di Zoologia, Universita di Bari, 70126 Bari, Italy

A novel distance method for sequence classification and intraspecie phylogeny reconstruction
is proposed. The method incorporates biologically motivated definitions of DNA sequence
distance in the recently proposed Chaotic Map Clustering algorithm (CMC) which performs a
hierarchical partition of data by exploiting the cooperative behavior of an inhomogeneous
lattice of chaotic maps living in the space of data. Simulation results show that our method
outperforms, on average, the simple and most widely used approach to intra specie phylogeny
reconstruction based on the Neighbor Joining (NJ) algorithm. The method has been tested on
real data too, by applying it to two distinct datasets of human mtDNA HVRI haplotypes from
different geographical origins. A comparison with results from other well known methods
such as Stochastic Stationary Markov method and Reduced Median Network has also been
performed.
197

1 Introduction

The study of genetic diversity provide a powerful instrument to infer the historical
patterns of human evolution by assessing relationships among populations on the
basis of nucleotide composition of specific DNA sequences [12]. Limiting ourselves
to consider intra specie evolution, we assume that a molecular clock exists so that
DNA mutations appear at a more or less constant speed (on a large time scale) for
all evolutionary lines. That results in a correlation between the mutation rate and the
length of time intervals: the differences at molecular level would play the role of
estimators of the divergence time among groups belonging to the same specie. The
final goal is the reconstruction of a phylogenetic tree, i.e. the evolutionary temporal
lines through which human groups differentiate.
In the debate about the appropriate genetic analysis for evolution studies, a
prominent role has been achieved by analysis of mitochondrial DNA (mtDNA).
Although the mtDNA contains only a small percentage of the total information of
the human genome (0.0006%), it is known to represent an efficient marker of
biological intra specific diversity. This haploid genome is not recombinant and is
transmitted through maternal lines, i.e. it is inherited as a single block or haplotype.
Moreover the mtDNA exists in a large number of copies in each cell and shows a
higher mutation rate than nuclear genes, which appears to be a relevant feature for
estimation of genetic distances and for ancient DNA studies [1]. In particular the
HVRI and HVRII hypervariable regions of the human mtDNA D-loop have been
extensively used to study human population history and to estimate the age of
MRCA (Most Recent Common Ancestor), a still controversial problem [8].
In order to reconstruct a phylogeny within a human population of a given
geographical area, individuals belonging to different groups and sharing the same
pattern of variant sites (haplotype) are clustered in extended macro classes
(haplogroups) according to the measured genetic distance among different
haplotypes. If the haplogroup discrimination is performed in a hierarchical way,
results at different hierarchical levels can be identified as different branch levels of a
phylogenetic tree. It is clear that the choice of a clustering methodology is crucial to
obtain a classification hierarchy which is coherent with the anthropological
observation. Moreover, since we are typically dealing with datasets in high
dimensional spaces (genetic sequences may be as long as several thousands of
nucleotide basis) we are looking for clustering algorithms with low computational
complexity.
In this paper we propose a novel approach to phylogeny reconstruction based
on the recently proposed Chaotic Map Clustering algorithm (CMC) [2,3] which
relies on the cooperative behaviour of a inhomogeneous lattice of coupled chaotic
maps. In the original formulation , CMC is a clustering tool to process an input
dataset of arbitrary nature. To tailor CMC to the specific application we define new
198

distance measures biologically motivated by the heterogeneous variation rates at


different sequence sites.
The paper is organized as follows. In section 2, we briefly describe the CMC
algorithm together with a method for parameter estimation. In section 3 we report
simulation results in order to compare CMC with the most widely applied algorithm
for phylogeny reconstruction, namely the Neighbor Joining algorithm. In section 4
two alternative definitions of sequence distance are proposed to take into account
the heterogeneous variability of sequence sites. In section 5 results of application to
haplogroup classification of datasets from the Pacific area are briefly described.
Conclusions are drawn in section 6.

2 CMC algorithm

A new clustering algorithm has been recently proposed [2], which is based on the
cooperative behaviour of a inhomogeneous lattice of coupled chaotic maps whose
dynamics leads to the formation of clusters of synchronized maps sharing the same
chaotic trajectory [11]. Cluster structure is biased by the architecture of the
couplings among the maps and a full hierarchy of clusters can be achieved using the
mutual information of map pair states as similarity index. The Chaotic Map
Clustering (CMC) performs a non-parametric partition of a data without prior
assumptions about the number of classes and the geometric distribution of clusters.
In the following we briefly review some bases of the CMC algorithm.
Let us consider a set of N points (representing here DNA sequences) in a D-
dimensional space (with D equal to the number of variant sites in the sequence). We
assign a real dynamical variable x, e [-1,1] to each point and define pair-interactions
JtJ - exp\-dv/2a::) where a is the local length scale and dy is a suitable measure of
distance between points i and j in our D-dimensional space. The time evolution of
the system is given by:

*,('+')=-^-I-V?V^

where C, = JV, ; ., and f(x) = 1 -2x2. Due to the choice of the function/the equation

represents the dynamical evolution of chaotic maps x, coupled through pair


interactions J . The lattice architecture is fully specified by fixing the value of a as
the average distance of A>nearest neighbors pairs of points in the whole system (our
results are quite insensitive to the particular value of A:). To save computational time,
we consider only interactions between maps whose distance is less than 3 a, setting
all the other J., to zero.
199

Starting from a random initial configuration of x, the dynamical equations are


iterated until the system attains its stationary regime, corresponding to a
macroscopic attractor which is independent of the initial conditions. To study the
correlation properties of the system, we consider the mutual information [19]
between maps as follows.
If the state of element 7 is x, >0 then it is assigned a value 1, otherwise it is
assigned 0: this generates a sequence of bits, in a set time interval, which allows the
calculation of the Boltzmann entropy //j for the i-th map. In a similar way the joint
entropy Hy is calculated for each pair of maps and finally the mutual information is
defined as / = H, + Hj - Hv . The mutual information is a good measure of
correlation [18] and it is precision independent, due to the coarse graining of the
dynamics. If maps i and j evolve independently then / = 0 ; if the two maps are
exactly synchronized then the mutual information achieves its maximum value, here
equal to ln2, due to our choice of the function / The algorithm identifies clusters
with the linked components of the graph obtained by drawing a link between all the
pairs of maps whose mutual information exceeds a threshold 0. Since long-range
correlation is present, all the scales in the dataset contribute to the mutual
information pattern and the threshold 6 controls the resolution at which data are
clustered. Each hierarchical clustering level corresponds to a dataset partition with a
finite stability region in the 0 parameter. The most stable solution identifies the
optimal partition of the given dataset. The computational cost of the CMC algorithm
scales as Nlog(N) with the dataset size N. We note that since clustering is performed
in a hierarchical way, the algorithm can provide an effective tool foi phylogeny
reconstruction. Results at different hierarchical levels can be identified as different
branch levels of a phylogenetic tree, wheras the stable clustering solution represents
the terminal branching of the tree. We limit ourselves to consider a valid
phylogenetic tree whenever we are dealing with homologous sequences verifying
stationary conditions on the stochastic evolutionary process, i.e. when one compares
processes having the same type of dynamics on different lineages, as it is the case of
intra specie evolution.
Let us spend a few words on the selection of the algorithm parameters, namely
the number of interacting maps k and the resolution 0. We stress here that the
algorithm is a deterministic one, since the dependence from the initial random
configuration of the maps is wiped out because of the peculiar dynamics of chaotic
systems . That implies a dependence of final results on the particular choice of the
external parameter, even though, as already tested in several contexts of application
[3], clustering solutions provided by the CMC algorithm display robustness against
quite a rough tuning of k and 0.
To improve the reliability of clustering results, a validation technique has been
recently proposed [10] that provides a satisfying solution to the parameter setting
problem as well as a good test of the robustness of a given cluster algorithm with
200

respect to noise. Hereafter we only describe the guidelines of the method, which can
be viewed as an alternative to bootstrap for assessing the reliability of a cluster
analysis. Further details and applications to different algorithms can be found in
[10,3].
The method can be easily implemented as follows. A set of values V for the
cluster parameters is used to perform a clustering on the N points of a given dataset
and on a number of subsets of size rN (0 <r <1 ) randomly generated from it.
Clustering results for each resample are compared with the ones obtained on the
initial dataset and a suitable defined measure of the overlap of the solutions,
averaged on all resampling, can be calculated as a function of the algorithm
parameters. The step is repeated for all the values in V, and the optimal parameters
are selected as those which maximize the average overlap of the clustering solutions.

3 Simulations

Simulations have been performed to compare the performances of CMC with a

tree 1 tree 2

Lk^k

Figure 1. Trees constructed by connecting 64 arbitrary taxonomic units for simulation purpose.

well known and widely used algorithm belonging to the same class of distance
methods, namely the Neighbor Joining [17].
Two arbitrary trees (treel and tree2), each connecting 64 taxonomic units have
been constructed and displayed in Fig. 1 using the web application for drawing
phylogenetic trees Phylodendron [6]. For each tree, 200 random datasets of
sequences with length of 80 units have been generated by Montecarlo simulations
201

using the program SeqGen [14] with the simple Kimura two-parameter generation
model. The variability has been assumed as uniform throughout the sequence and
low , with a transition versus transversion ratio of 2, and an equal starting
probability for the four nucleotide basis is imposed.
In order to determine the pairwise distances we used the simple Kimura two-
parameter model as for the purpose of the simulation there was no need to obtain
accurate genetic distance estimates.
The sequence distance calculation, as well as the NJ tree reconstruction have
been performed by the PHILIP package's programs DNADIST and NEIGHBOR
respectively [5]. We applied a routine of the same package (TREEDIST) to compute
the Symmetric Distance (SD) of Robinson and Foulds [15] between each
reconstructed tree and the initial tree used for sequence generation.
The Symmetric Distance between two trees is defined as the number of the
partitions (unrooted trees) or clades (rooted trees) that are on one tree and not on the
other. For fully resolved, i.e. bifurcating, trees the Symmetric Distance must be an
even number ranging from 0 to twice the number of internal branches, which for n

simulation Index Simmetric Distance

Figure 2. Left plot: the difference between CMC symmetric distance from tree 1 and the corresponding
measure for NJ is reported for each simulation. Right plot: the histogram of CMC symmetric distances
from treel (white) is compared with the corresponding one by NJ (black). Overlap regions are displayed
in grey.

units is 4n-6. Odd numbers can be obtained if the input trees can have
rnultifurcations .
The results obtained by NJ have then been compared with the ones produced by
CMC for the same pairwise distance matrix. Before analyzing the results we have to
202

stress that the absolute value of symmetric distance is not related to any statistical
interpretation and that only tree topologies are used in the computation, neglecting
branch length information. In the left plots of Fig. 2 and Fig. 3 we report for both
initial trees the difference, computed for each simulation, between NJ and CMC
symmetric distance from the tree. In the right plots of Fig. 2 and Fig. 3 the
comparison between the SD histograms obtained by CMC and NJ is shown, for the
first and the second tree respectively. The k parameters has been fixed in the range
(3,10) for both cases. We note that, since NJ produces only bifurcating trees and the
bin size is set to one, there are no counts in bins corresponding to odd values of SD.
Although with the above mentioned restrictions on the quantitative interpretation of
SD measure , we observe that, on average, CMC outperforms NJ method.

40 80 80 100 120

simulation index
140 100 180 200

Symmetric Detance
I
Figure 3. Left plot: the difference between CMC symmetric distance from tree2 and the corresponding
measure for NJ is reported for each simulation. Right plot: the histogram of CMC symmetric distances
from tree2 (white) is compared with the corresponding one by NJ (black). Overlap regions are displayed
in grey.

4 Distance Measures

On account of several human population studies based on the HVRI and HVRII
regions, doubts arise on the reliability of classical evolutionary models such as
Jukes-Cantor, Kimura e Maximum Likelihood [7], mainly due to the strong
assumptions they make about a constant mutation rate at different sites.
Here we propose a distance measure that incorporates the biological evidence
of heterogeneous variation rate at different sites which results from a recent
theoretical analysis on site variability, supported by simulation and experimental
data [13].
203

Let us remind that a DNA sequence can be represented by a one-dimensional strings


of letters taken from a 4 symbol alphabet {A,C,G,T}, standing for the four
nucleotide bases the DNA is composed of. A simple genetic distance (p-distance)
between two individuals can be defined as the number of differences of nucleotides
at same sites of a selected DNA segment, divided by the sequence length. Or it can
be estimated by modelling the variation probability for both transition and
transversion as for the Kimura two-parameter model. The major drawback of such
definitions is that they do not take into account heterogeneous variation rate at
different sites.
An alternative distance measure can be introduced that incorporates a weight in
terms of site variability, recently defined as a reliable measure of different evolution
rate on sites [13]. We define a distance between two sequences i andy of length S as
follows
d =YSsv'
•j i—i a
s=l

where <5* is 1 if ij pair exhibits a different nucleotide at site 5 , otherwise it is 0. The


term vs represents the variability of site s and is defined as

where Kr is an estimate of the overall genetic distance of the ij pair as determined


by a given model of the stochastic evolution process. For the following we will
adopt the Stationary Markov Model [9,16]. The site variability is then normalized to
the maximum value vmax it takes on the whole dataset. Site variability is then
incorporated in the CMC algorithm as a weight in distance definition, providing a
suitable measure of the different information content related to each site. Note that
the introduction of site variability, as a weight on the distance, implies a correlation
among sites.
A further distance definition can be introduced for applications in the context of
haplogroup discrimination and population divergence time estimate, where the
grouping of haplotypes is usually performed on the basis of shared patterns at
relatively rapidly changing sites. Discrimination occurs at highly variant sites which
should then give the most relevant contribution to cluster identification. Based on
that, the novel distance definition, with the same notation as above, can be weighted
by an 'entropic' term

where Es is expressed as an entropy


204

E-=£p;iog(p;)
whereas the index /, running over the number of different nucleotides, represents the
frequency of the nucleotide / at site s calculated with respect to the given dataset.
An appealing feature of 'entropic' distance is the lack of bias by any biological
model of genetic distance, although, depending on the data-set, the information
provided without any complementary assumption on sequence generating processes,
could be insufficient to solve sequence classification ambiguities. Of course the
'entropic' distance is strictly related to the specific context of haplogroup
discrimination. Depending on the dataset under investigation, the two distance
measures can appear as more or less correlated, although they cannot be considered
as equivalent. The main difference is in the correlation among sites introduced by
the site variability definition. Since it is questionable whether site variations along a
sequence have to be considered as really independent, this could be regarded as an
intriguing feature of a sequence classification based on the site variability concept.

- 0 Group I
•s - *

o 3<>
19 Q

Ht>
- o Group II
. o Group III

170 St-

5*

Q21 DJS
an?

Figure 4. Phenogram representing the evolution of the Pacific area's data set. The time scale is given in
terms of the resolution parameter 0. The cluster size at each branching point is represented by a number
and a circle of variable size. On the right, the final classification is reported for clusters of size > 4.

5 Applications to real sequence data

As an application of the CMC method, we report the analysis of a sample of 202


subjects from the Pacific area, known to generate 89 haplotypes and which have
been also thoroughly studied by anthropological point of view (results of the
205

anthropological analysis can be found in Tommaseo et al.). The dataset consisted of


89 sequences of 71 variant sites taken from the mtDNA hypervariable region HVRI.
The CMC results are shown in Fig. 4 as a phenogram illustrating a temporal
evolution where the time scale is fixed in terms of the resolution parameter . New
clusters originate at each branching point, their cardinality being described by both a
number and a circle of decreasing radius with cluster size. The final classification
has been obtained with a distance weighted by site variability and external
parameters k, 9, respectively set at &=10 and (9=0.35.
The optimal parameters have been selected by applying the above described
resampling method on 100 resamplings randomly extracted from the original dataset
with r =0.75 . The optimal value of 9, corresponding to the terminal branching of the
phenogram represents the final classification to be compared with results from
known techniques. Clustering results were found to be consistent with
anthropological data.
The same sample has been investigated with other two methods , widely used in
DNA sequence classification, the Neighbor Joining method, and the Reduced
Median Network (RMN).
The NJ method generates a tree starting from an estimate of the genetic distance
matrix, here calculated by Stationary Markov Model. As for CMC results, the NJ
tree evidenced three main subdivisions. The largest group of sequences (49
haplotypes) - identified as Group I - is clearly distinguished from the two other
clusters, Group II and Group III (Table 1).

The RMN method was used for deeper insight of haplotype genetic
relationships. It generates a network which harbours all most parsimonious trees.
The resultant network (data not shown) is quite complex, as a consequence of the
high number of haplotypes considered in the analysis, while its reticulated structure
reflects the high rate of homoplasy in the dynamic evolution of the mtDNA HVRI
region. The topological structure of the RMN also evidences three major haplotype
clusters, which reflects the same "haplotype composition" shown by the NJ tree that
has been constructed on the distance matrix computed by the Stationary Markov
Model (Table 1).

6 Conclusions

In this paper we propose a novel distance method for phylogeny reconstruction and
sequence classification that is based on the recently proposed CMC algorithm, as
well as on a biologically motivated definition of distance.
The main advantage of the algorithm relies in high effectiveness and low
computational cost which make it suitable for analysis of large amounts of data in
high dimensional spaces. Simulations on artificial datasets show that CMC
206

algorithm outperforms, on average, the well known NJ method, in terms of


measured Symmetric Distance between the true tree and the reconstructed one.

ref. V E S R ref. V E S R ref. V E S R

UNAJ15 ASMAT 404 I KETEN_134 II II II II


UNA_40 ASMAT_416 I MAPPI_302 II II II II
ASMAT_393 MUYU_428 I ASMAT_397 II II II II
AWYN_320 ASMAT_391 I UNA_93 II II II II
DANI_23 ASMAT_427 I UNA_70 II II II II
MAPPI378 KETEN_192 I UNA_35 II II II II
ASMAT_419 KETEN_223 I UNA_75 II II II II
LANI_17 ASMAT399 I UNA_44 II II II II
CITAK341 CITAK_357 I MAPPI_309 III u s s
CITAK352 ASMAT_389 I AWYN_385 III u HI HI
MAPPI_331 CITAK359 I AWYN_364 III u III III
LANI_15 UNA_38 I AWYN_374 III III III III
DANI_34 UNA_65 I CITAK_353 III III III III
MUYU_415 UNA_83 I ASMAT_422 III III III III
MUYU_345 CITAKJ34 I UNA_78 III III HI III
UNAJ72 UNA63 I UNA_102 III III III III
UNA_74 LANI_18 I CITAK_351 III III III III
DAN1_33 UNA_64 I MAPPI367 III III III III
LANI_8 CITAK_317 I CITAK_343 III III III III
DANI_24 UNA^94 II II DANI32 HI III HI III
MAPPI_387 UNA_98 II II LANI10 U u s S
ASMAT_403 MAPPI370 II II DANI_27 U u s s
CITAK350 CITAK_292 II II UNA_109 U u s s
ASM AT 411 KETEN_152 II II KETEN_220 U u s s
ASMAT_401 UNA_89 II II ASMAT_426 U u III s
ASMAT_402 CITAK_286 II II UNA 45 U u III s
AWYN_382 UNA_36 II II MUYU_347 U u s s
LANI49 DANI25 II II AWYN_315 U u III s
ASMAT_407 AWYNJ76 II II C1TAK_291 U III III III
ASMAT_396 KEPI384 II II

Table 1. Comparison of classification results obtained on the Pacific area data set by the CMC method
(V=site variability distance , E=entropic distance), Neighbor Joining performed on Stationary Markov
Model (S) distance matrix, Reduced Median Network (R)

Since we are dealing with a distance method of general applicability , any prior
biological information has to be coded in an ad hoc distance definition, in order to
improve the reliability of sequence grouping. That is the rationale for the
207

introduction of site variability and entropy terms in distance measures that account
for the dependency of classification on the different rates of variation occurring on
sites. Performances obtained by applying both distance definitions on two
population datasets have been compared with classification obtained using SMM
and Reduced Median Network [4].
We found that our method performs as well as the two known techniques but at
lower complexity and computational costs. Moreover, compared to RMN, the
method has the main advantage of providing an easy reading and interpretation of
results regardless the dataset size.
Further investigations are currently carried on regarding the use of CMC
method for phylogenetic inference and the possibility to perform divergence time
estimate by relating internal node depths of CMC trees to the estimated number of
substitutions along lineages.

Acknowledgements

This work has been partially supported by the MURSTPRIN99 and by "program
Biotecnologie, legge 95/95 (MURST 5%)"- Italy.

References

1. Anderson, S., A. T. Bankier, B. G. Barrell, M. H. L. de Bruijn, A. R. Coulson,


et al., 1981 Sequence and organization of the human mitochondrial genome.
Nature 290:457-465.
2. Angelini, L., F. De Carlo, C. Marangi, M. Pellicoro and S. Stramaglia, 2000
Clustering data by inhomogeneous chaotic map lattices. Phys. Rev. Letters
85(3): 554-557.
3. Angelini, L., F. De Carlo, M. Mannarelli, C. Marangi, G. Nardulli, M.
Pellicoro, G. Satalino, S. Stramaglia, 2001 Chaotic neural network clustering:
an application to landmine detection by dynamic infrared imaging. Optical
Engineering Volume 40, Issue 12, pp. 2878-2884.
4. Bandelt, H. J., P. Forster, C. S. Bryan and M. B. Richards, 1995 Mitochondrial
portraits of human population using median network. Genetics 141: 743-753.
5. Felsenstein, J., 1993 PHYLIP (Phylogeny Inference Package), Department of
Genetics, University of Washington, Seattle.
6. Gilbert D.G., IUBio Archive for Biology Data and Software, USA.
7. Hasegawa, M. and Yano T., 1984 Maximum likelihood method of phylogenetic
inference from DNA sequence data. Bulletin of the Biometric Society of Japan
5:1-7.
8. Hasegawa, M. and S. Horai, 1991 Time of the deepest root for polymorphism in
human mitochondrial DNA. J. Mol. Evol. 32(l):37-42.
208

9. Lanave, C , G. Preparata, C. Saccone and G. Serio, 1984 A new method for


calculating evolutionary substitution rates. J. Mol. Evol. 20:86-93.
10. Levine, E. and E. Domany, 2000 Resampling Method For Unsupervised
Estimation Of Cluster Validity. Preprint arXiv:physics/0005046, 18 May 2000.
11. Manrubia, S.C. and C. Mikhailov, 1999 Mutual Synchronization and Clustering
in Randomly Coupled Chaotic Dynamical Networks, Phys. Rev. E 60:1579-
1589.
12. Pagel, M., 1999 Inferring the historical patterns of biological evolution. Nature
401: 877-884.
13. Pesole, G. and C. Saccone, 2001 A novel method to estimate substitution rate
variation among sites in large dataset of homologous DNA sequences. Genetics
157(2):859-865.
14. Rambaut, A. and Grassly, N. C. (1997) Seq-Gen: An application for the Monte
Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput.
Applic. Biosci., 13: 235-238.
15. Robinson, D. F., and L. R. Foulds. 1981. Comparison of phylogenetic trees.
Math. BioSci. 53:131-147.
16. Saccone, C , C. Lanave, G. Pesole and G. Preparata, 1990 Influence of base
composition on quantitative estimates of gene evolution. Meth. Enzymol.
183:570-583.
17. Saitou, N. and M. Nei, 1987 The neighbor-joining method: a new method for
reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.
18. Sole, R.V., S.C. Manrubia, J. Bascompte, J. Delgado and B. Luque, 1996 Phase
transitions and complex systems. Complexity 4:13-26.
19. Wiggins, S., 1990 Introduction to applied nonlinear dynamical systems and
chaos. Springer, Berlin.
209

F I N D I N G REGULATORY SITES F R O M STATISTICAL


ANALYSIS OF N U C L E O T I D E F R E Q U E N C I E S I N T H E
U P S T R E A M R E G I O N OF E U K A R Y O T I C G E N E S

M. Caselle" and P. Provero a ' 6


a
Dipariimento di Fisica Teorica, Universita di Torino, and INFN,
sezione di Torino, Via P. Giuria 1, 1-10125 Torino, Italy.
e-mail: caselle@to.infn.it, provero@to.infn.it

Dipartimento di Scienze e Tecnologie Avanzate, Universita del


Piemonte Orientale, 1-15100 Alessandria, Italy.

F. Di Cunto and M. Pellegrino


Dipartimento di Genetica, Biologia e Biochimica, Universita di Torino,
Via Santena 5 bis, I-10100, Torino, Italy.
e-mail: ferdinando.dicunto@unito.it

We discuss two new approaches to extract relevant biological information on the


Transcription Factors (and in particular to identify their binding sequences) from
the statistical distribution of oligonucleotides in the upstream region of the genes.
Both the methods are based on the notion of a "regulatory network" responsible
for the various expression patterns of the genes. In particular we concentrate
on families of coregulated genes and look for the simultaneous presence in the
upstream regions of these genes of the same set of transcription factor binding sites.
We discuss two instances which well exemplify the features of the two methods:
the coregulation of glycolysis in Drosophila melanogaster and the diauxic shift in
Saccharomyces cerevisiae.

1 Introduction
As more and more complete genomic sequences are being decoded it is becom-
ing of crucial importance to understand how the gene expression is regulated.
A central role in our present understanding of gene expression is played by the
notion of "regulatory network". It is by now clear that a particular expression
pattern in the cell is the result of an intricate network of interactions among
genes and proteins which cooperate to enhance (or depress) the expression
rate of the various genes. It is thus important to address the problem of gene
expression at the level of the whole regulatory network and not at the level of
the single gene 1 ' 2 ' 3,4 ' 5 .
In particular, most of the available information about such interactions
concerns the transcriptional regulation of protein coding genes. Even if this
is not the only regulatory mechanism of gene expression in eukaryotes it is
certainly the most widespread one.
210

In these last years, thanks to the impressive progress in the DNA array
technology several results on these regulatory networks have been obtained.
Various transcription factors (TF's in the following) have been identified and
their binding motifs in the DNA chain (see below for a discussion) have been
characterized. However it is clear that we are only at the very beginning of
such a program and that much more work still has to be done in order to
reach a satisfactory understanding of the regulatory network in eukaryotes
(the situation is somehow better for the prokaryotes whose regulatory network
is much simpler).
In this contribution we want to discuss a new method which allows to re-
construct these interactions by comparing existing biological information with
the statistical properties of the sequence data. This is a line of research which
has been pursued in the last few years, with remarkable results, by several
groups in the world. For a (unfortunately largely incomplete) list of references
s e e 2,3,4,5,6,7,8,9 j n p a r t i c u i a r the biological input that we shall use is the fact
that some genes, being involved in the same biological process, are likely to be
"coregulated" i.e. they should show the same expression pattern. The simplest
way for this to happen is that they are all regulated by the same set of TF's.
If this is the case we should find in the upstream" region of these genes the
same T F binding sequences. This is a highly non trivial occurrence from a sta-
tistical point of view and could in principle be recognized by simple statistical
analysis.
As a matter of fact the situation is much more complex than what appears
from this idealized picture. TF's not necessarily bind only to the upstream
region. They often recognize more than one sequence (even if there is usually
a "core" sequence which is highly conserved). Coregulation could be achieved
by a complex interaction of several TF's instead than following the simple
pattern suggested above. Notwithstanding this, we think that it is worthwhile
to explore this simplified picture of coregulation, for at least three reasons.

• Even if in this way we only find a subset of the TF's involved in the coreg-
ulation, this would be all the same an important piece of information: It
would add a new link in the regulatory network that we are studying.

• Analyses based on this picture, being very simple, can be easily per-
formed on any gene set, from the few genes involved in the Glycolysis
(the first example that we shall discuss below) up to the whole genome
(this will be the case of the second example that we shall discuss). This
"With this term we denote the portion of the DNA chain which is immediately before the
starting point of the open reading frame (ORF). We shall characterize this region more
precisely in sect.3 below.
211

feature is going to be more and more important as more and more DNA
array experiment appear in the literature. As the quantity of available
data increases, so does the need of analytical tools to analyze it.

• Such analyses could be easily improved to include some of the features


outlined above, keeping into account, say, the sequence variability or the
synergic interaction of different TF's.

To this end we have developed two different (and complementary) ap-


proaches. The first one (which we shall discuss in detail in sect.3 below) fol-
lows a more traditional line of reasoning: we start from a set of genes which
are known to be coregulated (this is our "biological input") and then try to
recognize the possible binding sites for the TF's. We call this approach the
"direct search" for coregulating TF's.
The second approach (which we shall briefly sketch in sect.4 below and is
discussed in full detail in 1 0 ) is completely different and is particularly suitable
for the study of genome-wide DNA array experiments. In this case the bio-
logical input is taken into account only at the end of the analysis. We start
by organizing all the genes in sets on the basis of the overrepresented common
sequences and then filter them with expression patterns of some DNA array
experiment. We call this second approach the "inverse search" for coregulating
TF's.
It is clear that all the candidate gene interactions which we identify with
our two methods have to be tested experimentally. However our results may
help selecting among the huge number of possible candidates and could be
used as a preliminary test to guide the experiments.
This contribution is organized as follows. In sect.2 we shall briefly intro-
duce the reader to the main features of the regulatory network (this introduc-
tion will necessarily be very short, the interested reader can find a thorough
discussion for instance in n ) . We shall then devote sect.3 and 4 to explain
our "direct" and "inverse" search methods respectively. Then we shall discuss
two instances which well exemplify the two strategies. First in sect.5 we shall
study the coregulation of glycolysis in Drosophila melanogaster. Second, in
sect.6 we shall discuss the diauxic shift in Saccharomyces cerevisiae. The last
section will be devoted to some concluding remarks.

2 Transcription factors
As mentioned in the introduction, a major role in the regulatory network is
played by the Transcription Factors, which may have in general a twofold action
on gene transcription. They can activate it by recruiting the transcription
212

machinery to the transcription starting site by binding enhancer sequences


in the upstream noncoding region, or by modifying chromatine structure, but
they can also repress it by negatively interfering with the transcriptional control
mechanisms.
The main point is that in both cases TFs act by binding to specific, often
short DNA sequences in the upstream noncoding region. It is exactly this
feature which allows TF's to perform a specific regulatory functions. These
binding sequences can be considered somehow as the fingerprints of the various
TF's. The main goal of our statistical analysis will be the identification and
characterization of such binding sites.

2.1 Classification
Even if TF's show a wide variability it is possible to try a (very rough) clas-
sification. Let us see it in some more detail, since it will help understanding
the examples which we shall discuss in the following sections. There are four
main classes of binding sites in eukaryotes.

• Promoters
These are localized in the region immediately upstream of the coding
region (often within 200 bp from the transcription starting point). They
can be of two types:

- short sequences like the well known CCAAT-box, TATA-box, GC-


box which are not tissue specific and are recognized by ubiquitous
TFs
— tissue specific sequences which are only recognized by tissue specific
TFs

• Response Elements
These appear only in those genes whose expression is controlled by an
external factor (like hormones or growing factors). These are usually
within lkb from the transcription starting point. Binding of a response
element with the appropriate factor may induce a relevant enhancement
in the expression of the corresponding gene

• Enhancers
these are regulatory elements which, differently from the promoters, can
act in both orientations and (to a large extent) at any distance from the
transcription starting point (there are examples of enhancers located even
213

50-60kb upstream). They enhance the expression of the corresponding


gene.
• Silencers
Same as the enhancers, but their effect is to repress the expression of the
gene.

2.2 Combinatorial regulation.


The main feature of TF's activity is its "combinatorial" nature. This means
that:
• a single gene is usually regulated by many independent TF's which bind
to sites which may be very far from each other in the upstream region.
• it often happens that several TF's must be simultaneously present in
order to perform their regulatory function. This phenomenon is usually
referred to as the "Recruitment model for gene activation" (for a review
see x) and represents the common pattern of action of the TF's. It is
so important that it has been recently adopted as guiding principle for
various computer based approaches to detect regulatory sites (see for
instance 4 ).
• the regulatory activity of a particular TF is enhanced if it can bind to
several (instead of only one) binding sites in the upstream region. This
"overrepresentation" of a given binding sequence is also used in some
algorithms which aim to identify TF's. It will also play a major role in
our approach.

3 The "direct" search method


In this case the starting point is the selection of a set of genes which are known
to be involved in the same biological process (see example of sect. 5).
Let us start by fixing a few notations:

• Let us denote with M the number of genes in the coregulated set and
with gi, i = 1 • • • M the genes belonging to the set
• Let us denote with L the number of base pairs (bp) of the upstream non
coding region on which we shall perform our analysis. It is important to
define precisely what we mean by "upstream region" With this term we
denote the non coding portion of the DNA chain which is immediately
before the transcription start site. This means that we do not consider as
214

part of this region the UTR5 part of the ORF of the gene in which we are
interested. If we choose L large enough it may happen that other ORFs
are present in the upstream region. In this case we consider as upstream
region only the non coding part of the DNA chain up to the nearest ORF
(even if it appears in the opposite strand). Thus L should be thought of
as an upper cutoff. In most cases the length of the upstream region is
much smaller and is gene dependent. We shall denote it in the following
as L(g).

• In this upstream region we shall be interested in studying short sequences


of nucleotides which we shall call words. Let n be the length of such a
word. For each value of n we have N = 4" possible words Wi,i = 1 • • • N.
The optimal choice of n (i.e. the one which optimize the statistical
significance of our analysis) is a function of L and M. We shall see
some typical values in the example of sect.5 In the following we shall
have to deal with words of varying size. When needed, in order to avoid
confusion, we shall call A;-word a word made of k nucleotides.

Let us call U the collection of upstream regions of the M genes p i , . . . QM- Our
goal is to see if the number of occurrences of a given word W{ in each of the
upstream regions belonging to U shows a "statistically significant" deviation
(to be better defined below) from what expected on the basis of pure chance.
To this end we perform two types of analyses.

First level of analysis

This first type of analysis is organized in three steps


^Construction of the "Reference samples". The first step is the con-
struction of a set of p "reference samples" which we call Ri,i = 1, • • -p.
The Ri are nonoverlapping sequences of LR nucleotides each, extracted
from a noncoding portion of the DNA sequence in the same region of
the genome to which the genes that we study belong but "far" from any
ORF. From these reference samples we then extract for each word the
"background occurrence probability" that we shall then use as input of
the second step of our analysis. The rationale behind this approach is the
idea that the coding and regulating parts of the genome are immersed
in a large background sea of "silent" DNA and that we may recognize
that a portion of DNA has a biological function by looking at statisti-
cal deviations in the word occurrences with respect to the background.
215

However it is clear that this is a rather crude description of the genome,


in particular there are some obvious objections to this approach:

— There is no clear notion of what "far" means. As we mentioned


in the introduction one can sometimes find TF's which keep their
regulatory function even if they bind to sites which are as far as
~ 50kb from the ORF
— It is possible that in the reference samples the nucleotide frequencies
reflect some unknown biological function thus inducing a bias in the
results
— It is not clear how should one deal with the long repeated sequences
which very often appear in the genome of eukaryotes

We shall discuss below how to overcome these objections.

Background probabilities. For each word w we study the number of


occurrences n(w, i) in the ith sample. They will follow a Poisson distribu-
tion from which we extract the background occurrence probability of the
word. This method works only if p and LR are large enough with respect
to the number of possible words N (we shall see in the example below
some typical values for p and LR). However we have checked that our
results are robust with respect to different choices of these background
probabilities.

• Significant words. From these probabilities we can immediately con-


struct for each n-word the expected number of occurrences in each of
the upstream sequences of U and from them the probabilities p(n, s) of
finding at least one n-word simultaneously present in the upstream re-
gion of s (out of the M) genes. By suitably tuning L, s and n we may
reach very low probabilities. If notwithstanding such a low probability
we indeed find a n-word which appears in the upstream region of s genes
then we consider this fact as a strong indication of its role as binding
sequence for a TF's. We may use the probability p(n, s) as an estimate
of the significance of such a candidate binding sequence.

As we have seen the critical point of this analysis is in the choice of the
reference sample. We try to avoid the bias induced by this choice by crossing
the above procedure with a second level of analysis

Second level of analysis


216

The main change with respect to the previous analysis is that in this case we
extract the reference probabilities for the n-words from an artificial reference
sample constructed with a Markov chain algorithm based on the frequencies
of k-words with k << n (usually k = 1,2 or 3) extracted from the upstream
regions themselves. Then the second and third step of the previous analysis
follow unchanged. The rationale behind this second approach is that we want
to see if in the upstream region there are some n-words (with n = 7 or 8,
say) that occur much more often than what one would expect based on the
frequency of the A;-words in the same region.
These two levels of analysis are both likely to give results that are biased
according to the different choices of reference probabilities that define them.
However, since these biases are likely to be very different from each other, it
is reasonable to expect that by comparing the results of the two methods one
can minimize the number of false positives found.

4 The "inverse" search method

A major drawback of the analysis discussed in the previous section is that it


requires a precise knowledge of the function of the genes examined. As a matter
of fact a large part of the genes of eukaryotes have no precisely know biological
function and could not be studied with our direct method. Moreover in these
last years the richest source of biological information on gene expression comes
form microarray experiments, thus it would be important to have a tool to
study gene coregulation starting from the output of such experiments. These
two observations suggested us the inverse search method that we shall briefly
discuss in this section. We shall outline here only the main ideas of the method,
a detailed account can be found in 1 0 .
The method we propose has two main steps: first the ORFs of an eukaryote
genome are grouped in (overlapping) sets based on words that are overrepre-
sented in their upstream region, with respect to their frequencies in a reference
sample which is made of all the upstream regions of the whole genome. Each
set is labelled by a word. Then for each of these sets the average expression
in one ore more microarray experiments are compared to the genome-wide av-
erage: if a statistically significant difference is found, the word that labels the
set is a candidate regulatory site for the genes in the set, either enhancing or
inhibiting their expression.
An important feature is that the grouping of the genes into sets depends
only on the upstream sequences and not on the microarray experiment consid-
ered: It needs to be done only once for each organism, and can then be used
to analyse an arbitrary number of microarray experiments.
217

Table 1: Genes involved in the glycolysis.

Gene Description Locus Chromosome


Aid Aldolase AE003755 3R
Eno Enolase AE003585 2L
Gapdhl Glyceraldehyde 3-ph. dehydrogenase 1 AE003839 2R
Gapdh2 Glyceraldehyde 3-ph. dehydrogenase 2 AE003500 X
Hex Hexokinase AE003756 3R
ImpL3 L-lactate dehydrogenase AE003563 3L
Pfk 6-phosphofructokinase AE003755 2R

We refer t o 1 0 for a detailed description of how the sets are constructed, we


only stress here that this construction only requires three external parameters
which must be fixed by the user: the length L of the upstream region (see
sect.3 for a discussion of this parameter), the length n of the words that we
use to group the sets and a cutoff probability P which quantifies the notion of
"overrepresentation" mentioned above.

5 Example: glycolysis in Drosophila melanogaster


As an example of the analysis of sect.3, we studied the 7 genes of Drosophila
melanogaster involved in glycolysis. These genes are listed in Tab.l. We per-
formed our analysis with two choices of the parameters:
1] Promoter region. In this first test we decided to concentrate in the
promoter region. Thus we chose L < 100. With this choice, and since
M = 7, we are bound to study n-words with n = 3,4,5 in order to have a
reasonable statistical significance. In particular we concentrate on n = 3
In the first level of analysis we chose LR = 100 and p = 1000 (p is the
number of reference samples). In the second level of analysis we chose
k — 1 (k being the number of nucleotides of the k-words used to construct
the Markov chain). We found (among few other motifs which we do not
discuss here for brevity) that a statistically relevant signal is reached by
the sequence GAG. This result has a clear biological interpretation since
it is the binding site of an ubiquitous TF known as GAGA factor which
belongs to the class of the so called "zinc finger" TF's 6 . We consider this
finding as a good validation test of the whole procedure.
6
The commonly assumed binding site for the GAGA factor is the sequence GAGAG, however
it has been recently realized that the minimal binding sequence is actually the 3-word G A G 1 2 .
218

Table 2: Probability p(n, 7) of finding a ra-word in the upstream region of all the 7 genes
involved in glycolysis. In the first column the value of n, in the second the result obtained
using the background probabilities. In the last two columns the result obtained with the
Markov chains with k = 1 and k = 2 respectively.

n p(n, 7) p(n,7), k = l p(n,7), k = 2


6 0.346 0.76 0.78
7 0.007 0.013 0.022
8 0.00025 0.000034 0.00011

2] large scale analysis


In this second test we chose L = 5000. This allowed us to address n-
words with n = 6,7,8. For the reference samples we used LR = 5000 and
p = 21 As a result of our analysis we obtained the probabilities p(n, s)
of finding at least one n-word in the upstream region of s out of the 7
genes that we are studying. As an example we list in Tab.2 the values of
p(n, s) for s = 7 and n = 6,7,8. For the Markov chain analysis we used
k= 1,2.
In this case we found a 7-word which appeared in the upstream region of
all the seven genes: A fact that, looking at the probabilities listed in tab.3
certainly deserves more attention. The word is T T T A A A T . A survey
in the literature shows that this is indeed one of the binding sequences of
a TF known as "even-skipped" which is known to regulate segmentation
(and also the development of certain neurons) in Drosophila. This TF has
been widely studied due to its crucial role in the early stages of embryo
development, but it was not directly related up to now to the regulation
of glycolysis.

6 Example: diauxic shift in S. cerevisiae


As an example of the analysis of sect.4, we studied the so called diauxic shift,
(i.e. the metabolic shift from fermentation to respiration), in S. cerevisiae
the pattern of gene expression during the shift was measured with DNA mi-
croarrays techniques in Ref. 13 . In the experiment gene expression levels were
measured for virtually all the genes of at seven time-points while glucose in
the medium was progressively depleted. As a result of our analysis we found
29 significant words, that can be grouped into 6 motifs (i.e. groups of similar
words). Five of them correspond to known regulatory motifs (for a database of
known and putative TF's binding sites in S. cerevisiae see ref. 4 ). In particular
219

three of them: STRE, MIG1 and UME6 (for the meaning of these abbrevi-
ations see again 4 ) were previously known to be involved in glucose-induced
regulation process, while for the two other known motifs: PAC and RRPE
this was a new result. We consider the fact of having found known regulatory
motifs a strong validation of our method.
Finally we also found a new binding sequence: ATAAGGG, which we
could not associate to any known regulatory motif.

7 Conclusions
We have proposed two new methods to extract biological information on the
Transcription Factors (and more generally on the mutual interactions among
genes) from the statistical distribution of oligonucleotides in the upstream re-
gion of the genes. Both are based on the notion of a "regulatory network"
responsible for the various expression patterns of the genes, and aim to find
common binding sites for TFs in families of coregulated genes.

• The methods can be applied both to selected sets of genes of known


biological functions (direct search method) or to the genome wide mi-
croarrays experiments (inverse search method).

• They require a complete knowledge of the upstream oligonucleotide se-


quences and thus they can be applied for the moment only to those
organisms for which the complete genoma has been sequenced.

• In the direct method, once the set of coregulated genes has been chosen,
no further external input is needed. The significance criterion of our
candidates binding sites only depends on the statistical distribution of
oligonucleotides in the upstream region (or in nearby regions used as test
samples)

• Both can be easily implemented and could be used as standard prelimi-


nary tests, to guide a more refined analysis.

Even if they already give interesting results, both our methods are far from
being optimized. In particular there are three natural directions of improve-
ment.

a] Taking into account the variability of the binding sequences,

b] Recognizing dyad like binding sequences (see for instance 7 ) which are
rather common in eukaryotes,
220

c] Recognizing synergic interactions between TF's.

Work is in progress along these lines.

Needless to say the candidate binding sequences that we find with our
method will have to be tested experimentally. However our method could help
to greatly reduce the number of possible candidates and could be used as a
guiding line for experiments.

References
1. M. Ptashne and A. Gann, Nature 386 (1997) 569
2. A. Wagner, Nucleic Acids Research 25 3594-3604 (1997).
3. S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho and G.M. Church,
Nature Genetics 22 281-285 (1999).
4. Y. Pilpel, P. Sudarsanam and G.M. Church, Nature Genetics 29 153-159
(2001). Web supplement:
http://genetics.med.harvard.edu/~tpilpel/MotComb.html
5. H.J. Bussemaker, H. Li and E.D. Siggia, Nature Genetics 27 167-171
(2001).
6. J. van Helden, B. Andre and J. Collado-Vides, J. Mol. Biol. 281 827-842
(1998).
7. J. van Helden, A. F. Rios and J. Collado-Vides, Nucleic Acids Research
28 1808-1818 (2000).
8. J. D. Hughes, P. W. Estep, S. Tavazoie and G. M. Church, J. Mol. Biol.
296 1205-1214 (2000).
9. R.Hu and B. Wang, Archive: http://xxx.sissa.it/abs/physics/0009002
10. M. Caselle, F. DiCunto and P.Provero, "Correlating overrepresented up-
stream motifs to gene expression: a computational approach to regula-
tory element discovery in eukaryotes." Submitted to BMC Bioinformat-
ics.
11. B. Alberts et al., Molecular Biology of the Cell (Garland Publishing
Inc., New York, 1994).
12. R.C. Wilkins and J.T. Lis Nucleic Acids Research 26 2672-2678 (1998).
13. J.L. DeRisi, V.R. Iyer and P.O. Brown, Science 278 680-686 (1997).
221

REGULATION OF EARLY GROWTH RESPONSE-1 GENE EXPRESSION


AND SIGNALING MECHANISMS IN NEURONAL CELLS:
PHYSIOLOGICAL STIMULATION AND STRESS

GIUSEPPE CIBELLI
Department of Pharmacology and Human Physiology, University ofBari, P.le G. Cesare 11,
70124 Bari, and Chair of Human Physiology, University ofFoggia Medical School, V.le L.
Pinto, 71100 Foggia
Italy
E-mail: g.cibelli@unifg.it

Extracellular signals trigger important adaptative responses that enable an organism to cope
with a changing environment. The induction of immediate early genes is a key initial event in
responses to diverse stimuli as changes in inducible transcription factor expression lead to a
complex array in transcriptional control signals for use in the coordination of late response
gene expression. The early growth response-1 (Egr-1), first identified as an immediate early
gene induced by mitogenic stimulation, and subsequently shown to be activated by diverse
exogenous stimuli including growth factors, hormones and neurotransmitters, encodes for a
zinc finger transcription factor involved in the regulation of growth and differentiation. This
article reviews recent findings about the expression, the signaling pathways, and the
biological activity of Egr-1 in neuronal cells, following differentiation and apoptosis, as
paradigms for nervous system functioning under physiological and stress-related conditions.

1 Introduction

Stimulated neurons process and transmit the information by either short-term cell
surface-dependent events that immediately process and convey information about
the stimulus, or long-term intracellular messenger systems-mediated events,
inducing changes in gene expression. Immediate-early genes are the first
downstream nuclear targets, activated by different second messenger signaling
cascades, linking membrane events to the nucleus, thus altering the neurons'
responses to subsequent stimuli. These genes are defined by rapid, often transient,
transcriptional induction occurring in the absence of de novo protein synthesis.
Immediate early genes encode many functionally different products such as secreted
proteins and cytoplasmic enzymes. In particular, a subclass of these genes encodes
inducible transcription factors, proteins that control the expression of genes.
By now, the best characterized immediate-early gene-encoded transcription
factors include AP-1, composed of members of the fos and jun families, and the
early growth response (Egr) family of transcription factors. Here, we focus on Egr-
1, the most extensively characterized member of the Egr gene family, first identified
as an immediate-early gene involved of cellular growth and differentiation control
and after confirmed to be a transcriptional regulatory protein. The potential role of
222

Egr-1 in neuronal differentiation and programmed cell death, as naturally occurring


paradigms of plasticity, will be discussed.

2 The Egr-1 transcription factor

The Egr-1 transcription factor [45] also known as zif268 [11], Krox-24 [28]), tis8
[31]), or nerve growth factor induced (NGFI)-A [34]), is a member of the early
growth response family of transcription factors which also includes Egr-2, Egr-3
and Egr-4 [3]. The human Egr-1 consists of 533 amino acids with a calculated
molecular weight of 57 kDa [45], but because of extensive phosphorylation it runs
at 75-80 kDa [8, 55]. Egr-1 contains three zinc finger domains in the C-terminal
portion of the cysteine2-histidine2 subtype, suggesting that it is a DNA binding and
regulatory protein.
The expression of Egr-1 in cell culture systems can be induced by a range of
stimuli, as extensively reviewed by Gashler and Sukhatme [17], including growth
factors, phorbol esthers, hypoxia, ionizing radiations, tissue damage and signals that
result in neuronal excitation, such as membrane depolarization or brain seizures.
Table 1 represents a summary of the stimuli which have been studied with respect to
the expression of Egr-1 in the nervous system.
The time course of Egr-1 expression is typical of inducible transcription
factors mRNAs, and resembles that of c-fos [8, 45]. The expression of Egr-1 has
been extensively studied in the mammalian brain. In the adult, low basal Egr-1
mRNA expression is detected in the rat cortex, amygdala, striatum, cerebellar cortex
and hippocampus [11]. Egr-1 mRNA is expressed at low levels in the early
postnatal rat cortex, midbrain, cerebellum, and brainstem. The Egr-1 message
increases throughout postnatal development to adult levels suggesting a role for
Egr-1 in postnatal maturation of the brain [56].

3 Second messenger systems and cis-acting elements

As general mechanism, extracellular stimuli activate second messenger systems


whose end-kinases, e.g. ERK, JNK, SAPK, PKA, translocate to the nucleus and
phosphorylate transcription factors already bound to DNA, thus leading to the
activation of the general transcriptional machinery to initiate mRNA synthesis.
Multiple intracellular pathways contribute to the regulation of Egr-1 expression.
Egr-1 is induced by the calcium ionophore, A23187, in PC12 cells [14], dibutyryl
cAMP (dbcAMP) in intact cortical cells monolayers [52] and the L-type voltage-
sensitive calcium channel agonists in cultured cortical neurons [36].
Phosphorylation of the cAMP response element (CRE)-binding protein (CREB) is
required for Egr-1 transcriptional activation through the CRE in response to
223

interleukin-3 and the granulocyte macrophage colony-stimulating factor in myeloid


leukemia cells [43]. Activation of other second messenger systems leads to
increased Egr-1 mRNA expression. Phorbol esters (TPA) stimulation of COS and
PC 12 cells transactivates promoter-driven reporter construct either via the TPA-
response element present in the same position relative to the serum response
element (SRE), or via TPA stimulation of the protein kinase C (PKC) and a
subsequent effect at the SRE [42]. The PKC pathway appears fundamental in
mediating Egr-1 induction in response to X-irradiation [19]. Induction of Egr-1 by
growth factors and stress are mediated through different subgroups of MAP kinases
which may also differentially affect Egr-1 function on its target genes [30].

Table 1. Stimuli which induce Egr-1 expression in the nervous system.


Stimulus Cell type/brain region Reference
NGF PC 12 cells [45]
Endothelin Astrocytes [22]
EQF, PDGF Glioma cells [23]
Glutamate Cortico-striatal monolayer [53]
NMDA Cerebral cortex, hippocampus, hypothalamus [4]
Dopamine agonists Striatal neurons, caudate, putamen, cortex [44, 6]
Amphetamine Basal ganglia, n. accumbens, olfactory bulb [35]
Cocaine Basal ganglia, n, accumbens [35]
Caffeine Striatum [47]
Morphine withdrawl Cerebral cortex, CAI-3, dentate gyrus, [5]
VIP Corrtical neurons [52]
CRF LC neurons [12]
Ethanol withdrawl Cerebral cortex, cerebellum, brainstem [57]
Light pulse N. suprachiasmatic, visual cortex [16]
Restraint stress Cerebral cortex [7]
Axotomy Induced in denervated areas [27]
Focal cerebral injury Cerebral cortex, hippocampus, basal ganglia [21]
Focal ischemia Nerve and non-nerve cells [1]
Electroconvulsive shock Neocortex, hippocampus [13]
LTP Granule cells of ipsilateral dentate gyrus fl31

The architecture of the Egr-1 promoter has been described by several groups
which have cloned the murine [11], rat [10] and human Egr-1 gene [42]. The
upstream region of the mouse Egr-1 gene contains five SREs. In addition, putative
regulatory elements in the Egr-1 promoter include a Spl-, a CRE-, an AP-1-like
elements, and two CCAATT sequences. The human Egr-1 gene promoter contains
these sequence in a conserved position. The SREs are the dominant regulators of
Egr-1 transcription [33]. The SREs mediate Egr-1 responses to TPA, growth factors
and serum [15]. The SRE is a 22 bp segment that contains the inner core sequence
CCA/T-6-GG, similar to the CArG box present in other inducible immediate early
genes, which is the binding element for the serum response factor (SRF), a nuclear
phosphoprotein present in most cell types [51]. As homodimer, the SRF binds to
224

Elk-1, a member of the ternary complex factor family of Ets domain proteins, over
the SREs [33]. The phosphorylation of Elk-1 in response to growth factors and other
stimuli is responsible of activation of transcription. Fig. 1 shows the schematic
representation of the mechanism of Egr-1 transcription induced by the SREs.

STIMULUS

end-kinase
phosphorylation
nuclear translocation

TRANSCRIPTIONAL
Eg r .1 ACTIVATION
Figure 1. Mechanism of Egr-1 transcription induced by the SREs..

A high affinity Egr-1 binding CG-rich DNA sequence 5'-GCGGGGGCG-3',


termed EBS, is also found in the Egr-1 promoter [48], thus allowing Egr-1 to
positively regulate its own expression by binding with high affinity to the EBS in
the promoter [8, 43].

4 Structure-function mapping

The structure of a complex formed between the three zinc fingers of the Egr-1 and
its cognate DNA binding site has been extensively analyzed [3]. Four distinct
activation domains have been identified within the Egr-1 molecule, three of them
localized in the N-terminal region [40]. Other investigators described an extensive
activation domain from amino acids 3 to 281 [18]. Finally, a domain for
transcriptional repression is contained between the activation domain and the DNA
binding domain [18]. This repression domain functions as a binding site for two
cellular inhibitors, NGFI-A binding protein 1 and 2 (NAB1 and NAB2) [41, 46],
which may negatively modulate transactivation by Egr-1 [49], thus conferring to
Egr-1 protein a bipartite function in alternatively activating or repressing
225

transcription. The structural features and the defined activity domains of the Egr-1
protein are depicted in fig. 2.

activation domain repressor DNAbinding


1 domain domain 533

Figure 2. The modular structure of the zinc finger protein Egr-1.

5 Egr-1 and neuronal differentiation

Egr-1 induction has been correlated with the onset of differentiation in several cell
types. In particular, monocytic differentiation of U-937 and HL-60 myeloid
leukemia cells induces Egr-1 expression [26, 25]. Neuronal differentiation has been
extensively investigated by using PC 12 cells as model cell line. Nerve growth factor
causes an initial mitogenic response in PC 12 cells, followed by growth arrest and
differentiation into sympathetic neuron-like cells with extended neurites and
induces sustained activation of extracellular signal-regulated protein kinases (ERK)
[50]. In addition, NGF stimulation of PC12 induces expression of Egr-1 [34, 45].
We have recently reported that the neuropeptide corticotropin-releasing factor
(CRF) induces neurite outgrowth in immortalized locus coeruleus-like CATH.a
cells, suggesting a potential role for CRF as a neurotrophic factor for noradrenergic
locus coeruleus neurons [12]. In addition, we used the CRF-induced CATH.a cells
neurite outgrowth as a bioassay to study CRF signaling in these cells. Our results,
which are summarized in fig. 3, indicate that cAMP-dependent protein kinase
(PKA) inhibitors block CRF differentiation entirely. Likewise, dbcAMP induces
neurite outgrowth of CATH.a cells indistiguishable from CRF-treated cells.
Moreover, we found that CRF induces the transcriptional activity of CREB. The
inhibition of the MAP kinase pathway, in particular inhibition of ERK, also blocks
CRF-induced neurite outgrowth. Furthermore, CRF stimulates the transcriptional
activity of the transcription factor Elk-1.
In PC 12 cells, NGF activates ERK, the kinase translocates to the nucleus and
phosphorylates transcription factors such as Elk-1 [54, 58]. Elk-1 and other
activated transcription factors subsequently induce transcription of those genes
whose gene products are required for the differentiation process. Inhibition of MAP
kinase kinase (MEK) blocks the differentiation of PC 12 cells by nerve growdi factor
[39]. We obtained very similar data with CRF-differentiated CATH.a cells using the
MEK inhibitor PD98059. Neuronal differentiation of PC12 cells can be induced not
only by NGF but also by an increase in the intracellular cAMP concentration. In
226

CATH.a cells, CRF and dbcAMP induced the differentiation of the cells. While
NGF activates ERK as discussed, cAMP activates the cAMP-dependent protein
kinase via binding to the regulatoy subunit of the holoenzyme. In many cell-types,
cAMP antagonizes with the ERK pathway. In PC 12 cells, however, a positive cross-
talk exists between the cAMP and the ERK signaling pathway. cAMP does not only
activate the cAMP-dependent protein kinase in PC 12 cells. Interestingly, cAMP
activates MAP kinase and Elk-1 in PC 12 cells through a pathway involving the
small G-protein Rap-1 [54], which is, in turn, activated by a family of cAMP
binding proteins termed cAMP-GEFs in a cAMP-dependent, but PKA-independent
manner [24]. Our data suggest that both PKA-dependent and PKA-independent
effects of cAMP could account for the attivation of ERK in CATH.a cells as well as
in PC 12 cells, indicating that both the cAMP and the ERK signaling pathways are
involved in signal transduction of CRF.
By using the Egr-1 DNA binding domain as a selective antagonist of Egr-
1-mediated transcription, Levkovitz et ah reported that the expression of this Egr-1
inhibitor construct suppresses neurite outhgrowth elicited in PC 12 cells by NGF, but
not by dbcAMP, indicating that Egr-1 expression is necessary, but not sufficient for
eliciting neurite outgrowth [29]. Conversely, the neuron-specific activator of cyclin-
dependent kinase 5, p35, has been identified as one of the targets of the NGF-
stimulated ERK pathway that are essential for neurite outgrowth in PC 12 cells [20].
The transcription factor Egr-1 is required for induction of p35, as the activation of
ERK by NGF correlates with the observed expression patterns for Egr-1 mRNA and
p35 mRNA and protein. To further define an essential signaling pathway,
downstream of ERK, that leads to CRF-induced neuronal differentiation, we
analyzed the effect of CRF on the transcriptional activity of the Egr-1 promoter.
We showed that CRF activates the Egr-1 reporter strikingly, most likely via the
upstream SREs. The fact that CRF very strongly activated the Egr-1 promoter
suggests that the transcription factor Egr-1 is necessary for the CRF-initiated
CATH.a differentiation process.
CRF

c A M P ^ H-89

RAP-1 PKA

MEK
PD98059 11
ERK

I 1
Elk1 CREB
\ *''
Egr-1
\
responsive genes
neurite
outgrowth

Figure 3. Mechanism of action of CRF-induced CATH.a cells neurite outgrowth by Egr-I.


227

6 Egr-1 in neuronal programmed cell death

In the last years several reports described Egr-1 as a proapoptotic molecule [32].
Egr-1 biosynthesis was reported to be stimulated in melanoma cells treated with the
apoptotic stimulus thapsigargin [37]. A p5 3-dependent and a p5 3-independent
pathway was proposed to explain the proapoptotic activity of Egr-1. In melanoma
cells, expressing a wild-type p53 protein, Egr-1 directly upregulated transcription of
the p53 gene, followed by the synthesis of p53 mRNA and protein [38]. In contrast,
transcriptional upregulation of the tumor necrosis factor a promoter was proposed
as a mechanisms by which Egr-1 may induce apoptosis in cells expressing a non-
functional p53 protein [2]. In the nervous system, an enhanced expression of Egr-1
has been connected with neuronal apoptosis of cerebellar granule cells [9].
We have studied nitric oxide (NO)-induced changes in gene transcription
in the human neuroblastoma cell line SH-SY5Y that undergo cell death upon
treatment with NOC-18 as NO donor [Cibelli G., Policastro V., Rossler O. and
Thiel G., Nitric oxide-induced programmed cell death in human neuroblastoma cells
is accompanied by the synthesis of Egr-1, a zinc finger transcription factor, J.
Neurosci. Res., in press]. Our results indicate that NO-induced signaling specifically
elevates the transcriptional activation potential of the ternary complex factor Elk-1.
The finding that Elk-1 is part of a NO-induced signaling cascade in neuronal cells
logically requested a search for Elk-1-regulated genes. Therefore, we measured Egr-
1 promoter activities, following administration of NOC-18, and detected an increase
in Egr-1 promoter controlled reporter gene transcription, indicating that the Egr-1
gene is a nuclear target for NO signaling in SH-SY5Y cells. Following the NO-
induced signaling cascade in SH-SY5Y cells, we demonstrated that NO stimulates
the biosynthesis of Egr-1. Furthermore, a striking increase in the transcriptional
activation potential of Egr-1-responsive genes was measured, due to elevated
concentrations of Egr-1. Taken together, these findings suggest that Egr-1 may be
an integral part of the NO-triggered apoptosis signaling cascade in SH-SY5Y
neuroblastoma cells. A model for the mechanism of action of Egr-1 following NO-
induced apoptosis in SH-SY5Y cells is proposed in fig. 4.
228

NO

I
ERK - » - Elk1
Egr-1 -« '

I
Egr-1
responsive genes

I
apoptosis
Figure 4. Mechanism of action of Egr-1 following NO-induced apoptosis in SH-SY5Y cells.

7 Acknowledgements

The author wishes to thank Prof. Gerald Thiel for scientific collaboration and
helpful discussion, Dr. Beatrice Greco for critical reading of the manuscript and
Prof. Carlo Di Benedetta for continuous support on this project.

References

1. Abe K., Kawagoe J., Sato S., Sahara M , Kogure K., Induction of the zinc
finger gene after transient focal ischemia in rat cerebral cortex, Neurosci. Lett.
123 (1991) pp. 248-250.
2. Ahmed M. M., Sells S. F., Venkatasubbarao K., Fruitwala S. M., Muthukkumar
S. , Harp C , Mohiuddin M., Rangnekar V. M., Ionizing radiation-inducible
apoptosis in the absence of p53 liked to transcription factor EGR-1, J. Biol.
Chem. 272 (1997) pp. 33056-33061.
3. Beckmann A. M. and Wilce P. A., Egr transcription factor in the nervous
system, Neurochem. Int. 31 (1997) pp. 477-510.
4. Beckmann A. M., Matsumoto I. and Wilce P. A., AP-1 and Egr DNA-binding
activities are increased in rat brain during ethanol withdrawal, J. Neurochem.
69 (1997) pp. 306-314.
5. Beckmann A. M., Matsumoto I., Wilce P. A., Immediate early gene expression
during morphine withdrawal, Neuropharmacology 34 (1995) pp. 1183-1189.
6. Bhat R. V, Worley P. F, Cole A. J, Baraban J. M., Activation of the zinc finger
encoding gene krox-20 in adult rat brain: comparison with zif268, Brain Res.
Mol. Brain Res. 13 (1992) pp. 263-266.
7. Bing G. Y., Filer D., Miller J. C , Stone E. A., Noradrenergic activation of
immediate early genes in rat cerebral cortex, Brain Res. Mol. Brain Res. 11
(1991)pp.43-46.
229

8. Cao X., Koski R. A., Gashler A., McKiernan M , Morris C. F., Gaffney R, Hay
R. V., Sukhatme V. P., Identification and characterization of the Egr-1 gene
product, a DNA-binding zinc finger protein induced by differentiation and
growth signals , Mol. Cell. Biol. 10 (1990) pp 1931-1939.
9. Catania M. V., Copani A., Calogero A., Ragonese G. I., Condorelli D. F.,
Nicoletti F., An enhanced expression of the immediate early gene, Egr-1, is
associated with neuronal apoptosis in culture, Neuroscience 91 (1999) pp.
1529-1538.
10. Changelian P. S., Feng P., King T. C , Milbrandt J., Structure of the NGFI-A
gene and detection of upstream sequences responsible for its transcriptional
induction by nerve growth factor, Proc. Natl. Acad. Sci. U S A. 86 (1989) pp.
377-381.
11. Christy B. A., Lau L. F., Nathans D., A gene activated in mouse 3T3 cells by
serum growth factors encodes a protein with zinc finger sequences, Proc. Natl.
Acad. Sci. USA 85 (1988) pp. 7857-7861.
12. Cibelli G., Corsi P., Diana G., Vitiello F., Thiel G., Corticotropin-releasing
factor triggers neurite outgrowth of a catecholaminergic immortalized neuron
via cAMP and MAP kinase signalling pathways, Eur. J. Neurosci. 13 (2001)
pp. 1339-1348.
13. Cole A. J., Saffen D. W., Baraban J. M., Worley P. F., Rapid increase of an
immediate early gene messenger RNA in hippocampal neurons by synaptic
NMDA receptor activation, Nature 10 (1989) pp. 474-476.
14. Day M. L., Fahrner T. J., Aykent S., Milbrandt J., The zinc finger protein
NGFI-A exists in both nuclear and cytoplasmic forms in nerve growth factor-
stimulated PC12 cells, J. Biol. Chem. 265 (1990) pp. 15253-15260.
15. DeFranco C , Damon D. H., Endoh M., Wagner J. A., Nerve growth factor
induces transcription of NGFIA through complex regulatory elements that are
also sensitive to serum and phorbol 12-myristate 13-acetate, Mol. Endocrinol. 7
(1993) pp. 365-379.
16. Ebling F. J., Maywood E. S., Staley K., Humby T., Hancock D. C , Waters C.
M., Evan G. I. and Hastings M. H., The role of N-methyl-D-aspartate-type
glutamatergic neurotrasmission in the photic induction or immediate-early gene
expression in the suprachiasmatic nuclei of the Syrian hamster, J.
Neuroendocrinol. 3 (1991) pp. 641-652.
17. Gashler A. and Sukhatme V. P., Early growth response protein 1 (Egr-1):
prototype of a zinc-finger family of transcription factors, Prog. Nucl. Ac. Res.
And Mol. Biol. 50 (1995) pp. 191-224.
18. Gashler A. L., Swaminathan S., Sukhatme V. P., A novel repression module, an
extensive activation domain, and a bipartite nuclear localization signal defined
in the immediate-early transcription factor Egr-1, Mol. Cell. Biol. 13 (1993) pp.
4556-4571.
230

19. Hallahan D. E., Sukhatme V. P., Sherman M. L., Virudachalam S., Kufe D.,
Weichselbaum R. R., Protein kinase C mediates x-ray inducibility of nuclear
signal transducers EGR1 and JUN, Proc. Natl. Acad. Sci. U S A. 88 (1991) pp.
2156-2160.
20. Harada T., Morooka T., Ogawa S., Nishida E., ERK induces p35, a neuron-
specific activator of Cdk5, through induction of Egrl, Nat. Cell. Biol. 3 (2001)
pp. 453-459.
21. Honkaniemi J., Sagar S. M., Pyykonen I., Hicks K. J., Sharp F. R., Focal brain
injury induces multiple immediate early genes encoding zinc finger
transcription factors, Brain Res. Mol. Brain Res. 28 (1995) pp. 157-163.
22. Hu R. M., Levin E. R., Astrocyte growth is regulated by neuropeptides through
Tis 8 and basic fibroblast growth factor, J Clin Invest. 93 (1994) pp.1820-1827.
23. Kaufmann K., Thiel G., Epidermal growth factor and platelet-derived growth
factor induce expression of Egr-1, a zinc finger transcription factor, in human
malignant glioma cells, J. Neurol. Sci. 189 (2001) pp.83-91.
24. Kawasaki H., Springett G. M., Mochizuki N., Toki S., Nakaya M., Matsuda M.,
Housman D. E., Graybiel A. M., A family of cAMP-binding proteins that
directly activate Rapl, Science 282 (1998) pp. 2275-2279.
25. Kharbanda S., Nakamura T., Stone R., Hass R., Bernstein S., Datta R.,
Sukhatme V. P., Kufe D., Expression of the early growth response 1 and 2 zinc
finger genes during induction of monocytic differentiation, J Clin. Invest. 88
(1991)pp.571-757.
26. Kharbanda S., Rubin E., Datta R., Hass R., Sukhatme V., Kufe D.,
Transcriptional regulation of the early growth response 1 gene in human
myeloid leukemia cells by okadaic acid, Cell Growth Differ. 4 (1993) pp. 17-
23.
27. Leah J. D., Herdegen T., Murashov A., Dragunow M., Bravo R., Expression of
immediate early gene proteins following axotomy and inhibition of axonal
transport in the rat central nervous system, Neuroscience 57 (1993) pp.53-66.
28. Lenaire P., Revelant O., Bravo R., Charnay P., Two mouse genes encoding
potential transcription factors with identical DNA-binding domains are
activated by growth factors in cultured cells, Proc. Natl. Acad. Sci. USA 85
(1988) pp. 4691-4695.
29. Levkovitz Y., O'Donovan K. J., Baraban J. M., Blockade of NGF-induced
neurite outgrowth by a dominant-negative inhibitor of the egr family of
transcription regulatory factors, J Neurosci. 21 (2001) pp. 45-52.
30. Lim C.P. , Jain N., Cao X., Stress-induced immediate-early gene, egr-1,
involves activation of p38/JNKl, Oncogene. 16 (1988) pp. 2915-2926.
31. Lim R. W., Varnum B. C , Herschman H. R., Cloning of tetradecanoyl phorbol
ester-induced primary response sequences and their expression in density-
arrested swiss 3T3 cells and a TPA nonproliferative variant, Oncogene 1 (1987)
pp. 263-270.
231

32. Liu C , Rangnekar V. M., Adamson E., Mercola D., Suppression of growth and
transformation and induction of apoptosis by EGR-1, Cancer Gene Ther. 5
(1998) pp. 3-28.
33. McMahon S. B., Monroe J. G., A ternary complex factor-dependent mechanism
mediates induction of egr-1 through selective serum response elements
following antigen receptor cross-linking in B lymphocytes, Mol. Cell. Biol. 15
(1995) pp. 1086-1093.
34. Milbrandt J., A nerve growth-induced gene encodes a possible transcriptional
regulatory factor, Science 238 (1987) pp. 797-799.
35. Moratalla R., Robertson H. A., Graybiel A. M., Dynamic regulation of NGFI-A
(zif268, egrl) gene expression in the striatum, J. Neurosci. 12 (1992) pp. 2609-
2622.
36. Murphy T. H., Worley P. F., Baraban J. M., L-type voltage-sensitive calcium
channels mediate synaptic activation of immediate early genes, Neuron. 7
(1991) pp. 625-635.
37. Muthukkumar S., Nair P., Sells S. F., Maddiwar N. G., Jacob R. J., Rangnekar
V. M., Role of EGR-1 in thapsigargin-inducible apoptosis in the melanoma cell
line A375-C6, Mol. Cell. Biol. 15 (1995) pp. 6262-6272.
38. Nair P , Muthukkumar S., Sells S. F., Han S.-S., Sukhatme V. P , Rangnekar V.
M., Early growth response-1-dependent apoptosis is mediated by p53, J. Biol.
Chem. 272 (1997) pp. 20131-20138.
39. Pang, L., Sawada, T., Decker, S. J. and Saltiel, A. R., Inhibition of MAP kinase
kinase blocks the differentiation of PC-12 cells induced by nerve growth factor,
J. Biol. Chem. 270 (1995) pp. 13585-13588.
40. Russo M. W., Matheny C , Milbrandt J., Transcriptional activity of the zinc
finger protein NGFI-A is influenced by its interaction with a cellular factor,
Mol. Cell. Biol. 13 (1993) pp. 6858-6865.
41. Russo M. W., Sevetson B. R., Milbrandt J., Identification of NAB1, a repressor
of NGFI-A- and Krox20-mediated transcription, Proc. Natl. Acad; Sci. U S A .
92 (1995) pp. 6873-6877.
42. Sakamoto K. M., Bardeleben C , Yates K. E., Raines M. A., Golde D. W.,
Gasson J. C , 5' upstream sequence and genomic structure of the human
primary response gene, EGR-1/TIS8, Oncogene. 6 (1991) pp. 867-871.
43. Sakamoto KM, Fraser JK, Lee HJ, Lehman E, Gasson J C , Granulocyte-
macrophage colony-stimulating factor and interleukin-3 signaling pathways
converge on the CREB-binding site in the human egr-1 promoter, Mol. Cell.
Biol. 14 (1994) pp. 5975-5985.
44. Simpson C. S., Morris B. J., Stimulation of zif/268 gene expression by basic
fibroblast growth factor in primary rat striatal cultures, Neuropharmacology 34
(1995) pp. 515-520.
45. Sukhatme V. P., Cao X., Chang L. C , Tsai-Morris C , Stamenkovich D.,
Ferreira P. C. P., Cohen D. R., Edward S. A., Shows T. B., Curran T., Le Beau
232

M. M., Adamson E. D., A zinc finger -encoding gene coregulated with c-fos
during growth and differentiation, and after cellular depolarization, Cell 53
(1988) pp. 37-43.
46. Svaren J., Sevetson B. R., Apel E. D., Zimonjic D. B., Popescu N. C ,
Milbrandt J., NAB2, a corepressor of NGFI-A (Egr-1) and Krox20, is induced
by proliferative and differentiative stimuli, Mol. Cell. Biol. 16 (1996) pp. 3545-
3553.
47. Svenningsson P., Johansson B., Fredholm B. B., Caffeine-induced expression
of c-fos mRNA and NGFI-A mRNA in caudate putamen and in nucleus
accumbens are differentially affected by the N-methyl-D-aspartate receptor
antagonist MK-801, Brain Res. Mol. Brain Res. 35 (1996) pp. 183-189.
48. Swirnoff A. H. and Milbrandt J., DNA-binding specificity of NGFI-A and
related zinc finger transcription factors, Mol. Cell. Biol. 15 (1995) pp. 2275-
2287.
49. Thiel G., Kaufmann K., Magin A., Lietz M., Bach K., Cramer M., The human
transcriptional repressor protein NAB1: expression and biological activity,
Biochim. Biophys. Acta. 1493 (2000) pp. 289-301.
50. Traverse S., Gomez N., Paterson H., Marshall C , Cohen P., Sustained
activation of the mitogen-activated protein (MAP) kinase cascade may be
required for differentiation of PC 12 cells. Comparison of the effects of nerve
growth factor and epidermal growth factor, Biochem J. 288 (1992) pp. 351-355.
51. Treisman R., Identification and purification of a polypeptide that binds to the c-
fos serum response element, EMBO J. 6 (1987) pp. 2711-2717.
52. Vaccarino F. M., Hayward M. D., Le H. N., Hartigan D. J., Duman R. S.,
Nestler E. J., Induction of immediate early genes by cyclic AMP in primary
cultures of neurons from rat cerebral cortex, Brain Res. Mol. Brain Res. 19
(1993) pp. 76-82.
53. Vaccarino F. M., Hayward M. D., Nestler E. J., Duman R. S. and Tallman J. F.,
Differential induction of immediate early genes by excitatory amino acid
receptor types in primary cultures of cortical and striatal neurons, Mol. Brain
Res. 12 (1992) pp. 233-241.
54. Vossler, M. R., Yao, H., York, R. D , Pan, M.-G., Rim, C. S. and Stork, P. J. S,
cAMP activates MAP kinase and Elk-1 through a B-Raf- and Rap 1-dependent
pathway, Cell 89 (1997) pp. 73-82.
55. Waters C. M., Hancock D. C , Evan G. J., Identification and characterisation of
the egr-1 gene product as an inducible, short-lived, nuclear phosphoprotein,
Oncogene 5 (1990) pp 669-674.
56. Watson M. A., Milbrandt J., Expression of the nerve growth factor-regulated
NGFI-A and NGFI-B genes in the developing rat, Development 110 (1990) pp.
173-183.
233

57. Wilce P. A., Le F., Matsumoto I., Shanley B. C , Ethanol inhibits NMDA-
receptor mediated regulation of immediate early gene expression, Alcohol
Alcohol Suppl. 2 (1993) pp. 359-363.
58. York, R. D., Yao, H., Dillon, T., Ellig, C. L., Eckert, S. P., McCleskey, E. W.
and Stork, P. J. S., Rapl mediates sustained MAP kinase activation induced by
nerve growth factor, Nature 392 (1998) pp. 622-628.
234

G E O M E T R I C A L A S P E C T S OF P R O T E I N FOLDING

CRISTIAN MICHELETTI
International School for Advanced Studies (S.I.S.S.A.) and INFM,
Via Beirut 2-4, 34014 Trieste, Italy
E-mail: michelet@sissa.it

An increasing amount of experimental evidence is supporting the view that certain


aspects of protein folding are more influenced by topological issues rather than
chemical details. Here we focus on two questions stimulated by these observations:
(a) is it possible to exploit the information contained in the native shape of proteins
to obtain clues about the main events of the folding process? (b) can one identify a
general mechanism, based on geometrical considerations, that can account for the
ubiquitous presence of secondary motifs (such as helices) in proteins? We tackle
both questions with concepts and tools that are particularly apt for revealing the
role exerted by the native-state topology in the folding process. In particular we
show that the mere knowledge of the native shape of a viral enzyme, the HIV-1
protease, allows a reliable identification of the key sites that ought to be targetted
by inhibiting drugs. Finally, concerning the wide presence of secondary motifs in
proteins, we present a selection criterion based on optimal packing requirement
that is able to single out protein-like helices among all possible three-dimensional
structures. This may hint at a general criterion adopted by nature to promote
viable protein motifs.

1 Introduction

Two of the properties that distinguish small globular proteins from random
heteropolymers are the ubiquitous presence of recurrent geometrical motifs
and the ability to fold rapidly and reversibly into the native state, i.e. the
shape providing maximum biological activity. It is generally believed that
these special properties are the result of evolutionary pressure to optimise the
protein chemical composition. Recently, an increasing amount of evidence has
accumulated showing that, besides detailed chemistry, also the geometrical
shape of native states has been especially selected to optimise the folding
process 1 . Here we focus on two questions that arise spontaneously from these
considerations. Firstly we try to characterize the main events of the folding
process by using schematic topology-based models. It is found that there are
a number of obligatory steps that heavily influence the whole folding process.
The knowledge of such crucial stages is not only of theoretical interest but
could be used to develop drugs tailored to target viral enzymes. In the next
section we report a validation of such strategy for the HIV-1 protease.
In the last section we focus on a more general problem, namely the ubiq-
uitous presence of secondary motifs (such as helices and sheets) in natu-
235

ral proteins 2 . The presence of secondary structures was first predicted by


Pauling 3 with a reasoning involving saturation of hydrogen-bonds. It is in-
teresting to note, however, that the number of hydrogen bonds is nearly the
same when a sequence is in an unfolded structure in the presence of a polar
solvent or in its native state rich in secondary structure content 4 . More re-
cently, a number of studies 4 ' 5 ' 6 have attempted to re-visit the emergence of
secondary motifs in terms of general geometric criteria (rather than invoking
chemical affinities and propensities) but failed to observe a realistic secondary
content. In our study we have considered a novel perspective where "thick"
three-dimensional structures are selected in terms of their ability to be opti-
mally packed, i.e. space-filling. It will be shown that this simple requirement
is sufficient to select helical shapes with the same aspect ratio observed in
natural proteins.
It is a pleasure to acknowledge the collaboration with Jayanth Banavar,
Paolo Carloni, Fabio Cecconi, Amos Maritan, Flavio Seno and Antonio
Trovato who have contributed to the results discussed here.

2 Topology-based study of the HIV-1 protease folding process

A major advancement in the characterization and understanding of the folding


process was the discovery that small proteins under physiological conditions
can fold reproducibly into their unique native state 7 . This result posed the
folding problem, that is the prediction of the native structure from the knowl-
edge of the protein chemical composition. Despite the enormous progress
made in the field, protein folding from first principles is still an unsolved prob-
lem. This is probably due to the fact that the detailed characterization of the
folding process entails the study of non-equilibrium dynamics in a rugged free
energy landscape 8 . Ultimately, the advent of more powerful computers will
certainly allow to simulate the detailed folding dynamics of entire proteins.
In a striking contrast, several recent advancements have been possible thanks
to the introduction of concepts and folding models of surprising simplicity,
as recently pointed out by D. Baker x . At the heart of this line of investiga-
tion there are some recent theoretical and experimental studies showing that
the topology of the native structure of a protein plays an important role in
determining many of the attributes of the folding process 9 ' 1,10 ' 11 ' 12 ' 13 ' 14 (also
see the contribution in this volume of Lattanzi and Maritan "The physics of
motor proteins").
The various theoretical models that are able to capture the influence of the
native geometry on the folding process are generally referred to as "topology-
based" . Their common starting point is the knowledge of the native confor-
236

mation that is exploited in order to construct effective energy functions that


admit the target native state as the one with lowest energy. The characteri-
zation of the folding process is then carried out in terms of the most probable
routes that take to the target structure starting from an arbitrary unfolded
(disordered) polymer configuration.
Here we focus on a model 9 that ascribes a favorable attractive energy
to the native contacts, so to bias the folding dynamics towards the known
native state, I V One of the simplest (or perhaps, the simplest) energy scoring
functions that accomplishes this is defined as follows:

£?(r) = - £ > £ . A?;,., (l)

where To is the known native state, T is a trial structure that has the same
length of T 0 . A s is the contact matrix of structure 5 , whose element Ay is 1
if residues i and j are in contact in the native state (i.e. their Ca separation is
below the cutoff r = 6.5 A) and 0 otherwise. This symmetric matrix encodes
the topology of the protein. The energy-scoring function of Eq. (1) ensures
that the state of lowest energy is attained in correspondence of structures
with the same contact map of To- This, in principle, may lead to a degenerate
ground state since more than one structure can be compatible with a given
contact matrix. In practice, however, unless one uses unreasonably small
values of r, the degenerate structures are virtually identical. In fact, for
r K, 6.5 A the number of distinct contacts is about twice the protein length;
this number of constraints nicely matches the number of degrees of freedom
of the peptide (two dihedral angles for each non-terminal CA), thus avoiding
both under- and over-constraining the ground states.
The introduction of this type of topology-based folding models can be
traced back to the work of Go and Scheraga 9 . For a long time, the interesting
property of these systems was the presence of an all-or-none folding process,
that is the finite-size equivalent of first-order transitions in infinite systems.
This is illustrated in the example of Fig. 1 where we have reported energy
and specific heat of the model applied to the target protein 1HJA; the code
refers to the Protein Data Bank tag. The plotted data were obtained through
stochastic (Monte Carlo) equilibrium (constant-temperature) samplings.
It is interesting to note the presence of a peak, that can be identified with
the folding transition of the model system. At the peak, about 50 % of the
native structure (measured as the fraction of formed native contacts 16,17 ) is
formed, consistently with analogous results on different proteins 15 . It is, how-
ever, possible to investigate the equilibrium properties of the system in finer
237

1hja

-20

-30

-50

-60

100

80

60
d<E>/dT
40

20

Figure 1. Plots of the energy (top) and specific heat (bottom) as a function of temperature
for protein lhja. The curves were obtained through histogram reweighting techniques.

detail, for example examining the probabilities of individual native contacts


to be formed at the various temperatures. Naturally, at high temperatures
all contacts will be poorly formed, while at sufficiently low temperatures they
will all be established. It is then tempting, and physically appealing, to draw
an analogy between this progressive establishment of native structure and the
one observed in a real folding process. However, in principle, the equilibrium
properties of our model need not parallel the dynamical ones of the real sys-
tem. Thus, it was a striking surprise when we established that, indeed, a
qualitative and even quantitative connection between the two processes could
be drawn 10 . In the past years other groups have used similar or alternative
techniques to elucidate the role of the native state topology in the folding
process n. 12 . 13 . 14 ^ confirming the picture outlined here.
An initial validation of this strategy was carried out by considering two
target proteins, chymotrypsin inhibitor and barnase, that have been widely
investigated in experiments. For each of them we generated several hundred
structures having about 40 % native content. It turned out that the most
frequent contacts shared by the native conformation of 2ci2 with the oth-
ers involved the helical-residues 30-42 (see Fig. 2). Contacts involving such
238

residues were shared by 56% of the sampled structures. On the other hand,
the rarest contacts pertained to interaction between the helix and /?-strands
and between the /3-strands themselves. A different behaviour (see Fig. 2) was
found for barnase, where, again, for overlap of « 40%, we find many contacts
pertaining to the nearly complete formation of helix 1 (residues 8-18), a par-
tial formation of helix 2, and bonds between residues 26-29 and 29-32 as well
as several non-local contacts bridging the /^-strands, especially residues 51-55
and 72-75.

Figure 2. Ribbon plot (obtained with RASMOL) of 2ci2 (left) and barnase (right). The
residues involved in the most frequent contacts of alternative structures that form « 40%
of the native interactions are highlighted in black. The majority of these coincide with
contacts that are formed at the early stages of folding.

Both this picture, and the one described for CI2 are fully consistent with
the experimental results obtained by Fersht and co-workers in mutagenesis
experiments 18 ' 19 . In such experiments, the key role of an amino acid at a given
site is probed by mutating it and measuring the changes in the folding and
equilibrium characteristics. By measuring the change of the folding/unfolding
equilibrium constant one can introduce a parameter, termed Rvalue, which
is zero if the mutation is irrelevant to the folding kinetics and 1, if the change
in folding propensity mirrors the change in the relative stability of the folded
and unfolded states (intermediate values are, of course, possible). Ideally, the
measure of the sensitivity to a given site should be measured as a suitable
susceptibility to a small perturbation of the same site (or its environment).
239

Unfortunately, this is not easily accomplished experimentally, since substitu-


tion by mutation can be rarely regarded as a perturbation. Notwithstanding
this difficulty, from the analysis of the Rvalues obtained by Fersht, a clear
picture for the folding stages of CI2 and barnase emerges. In both cases,
the crucial regions for both proteins are the same as those identified through
the analysis of contact formation probability reported above. This provides
a sound a posteriori justification that it is possible to extract a wealth of
information about the sites involved in crucial stages of the folding process.
Despite the fact, that such sites are determined from the analysis of their cru-
cial topological role with respect to the native state with no input of the actual
protein composition, they correlate very well with the key sites determined
experimentally. A striking example is provided in the following subsection,
which focuses on an enzyme encoded by the HIV virus. It the following we
shall show that from the mere knowledge of the contact map of the enzyme,
one can isolate a handful of important sites which correlate extremely well
the key mutating sites determined in clinical trials of anti-AIDS drugs.

2.1 Application to HIV-1 protease: drug resistance and folding Pathways


To further corroborate the validity of the proposed model in capturing the
most delicate folding steps we consider an application to an important enzyme,
the protease of the HIV-1 virus (pdb code laid), which plays an essential role
in the spreading of the viral infection. Through extensive clinical trials 20 , it
has been established that there is a well-defined set of sites in the enzyme
that are crucial for developing, through suitable mutations, resistance against
drugs and which play a crucial role in the folding process 21 .
To identify the key folding sites we looked for contacts whose establish-
ment further enhances the probability of other contacts to be formed. A
possible criterion to identify such contacts is through their contribution to
the overall specific heat. At a fixed temperature, T, the average energy of the
system described by the Hamiltonian (1) can be written as:

(E(T)) = -J2^PiAT) (2)


'J
where the pij(T) is the equilibrium probability of residues i and j to be in
contact. Hence, the specific heat of the system is:

c; m = Ǥ23>__ EA jj*affi. (3,


240

Thus, contribution of the various contacts to the specific heat will be then
proportional to how rapidly the contact is forming as temperature is lowered.
The contacts relevant for the folding process, will be those giving the largest
contribution to Cv at (or above) the folding transition temperature. Armed
with this insight, we can use this deterministic criterion to rank the contacts
in order of importance.
Our simulations on the protease of HIV-1 21 , are based on an energy-
scoring function that is more complex than Eq. (1). As usual, amino acids are
represented as effective centroids placed on Ca atoms, while the peptide bond
between two consecutive amino acids, i, i + 1 at distance r ^ + i is described
by the anharmonic potential adopted by Clementi et al. 22 , with parameters
a = 20, 6 = 2000. The interaction among non-consecutive residues is treated
again in Go-like schemes9 which reward the formation of native contacts with
a decrease of the energy scoring function. Each pair of non-consecutive amino
acids, i and j , contributes to the energy scoring function by an amount:

Votf?
»j
+ 5V1(l-ArHj)(j^j , (4)

where ro = 6.8A, r^- denotes the distance of amino acids i and j in the na-
tive structure and A r ° is the native contact matrix built with an interaction
cutoff, r, equal to 6.5 A. Vo and \\ are constants controlling the strength of
interactions (VQ = 20, V\ = 0.05 in our simulations). Constant temperature
molecular dynamics simulations were carried out where the equations of mo-
tion are integrated by a velocity-Verlet algorithm combined with the standard
Gaussian isokinetic scheme 23,21 . Unfolding processes can be studied within
the same framework by warming up starting from the native conformation
(heat denaturation).
The free-energy, the total specific-heat, Cv, and contributions of the in-
dividual contacts to Cv were obtained combining data sampled at different
equilibrium temperatures with multiple histogram techniques 24 . The ther-
modynamics quantities obtained through such deconvolution procedures did
not depend, within the numerical accuracy, on whether unfolding or refolding
paths were followed.
The contacts that contribute more to the specific heat peak are identified
as the key ones belonging to the folding bottleneck and sites sharing them as
the most likely to be sensitive to mutations. Furthermore, by following several
individual folding trajectories (by suddenly quenching unfolded conformations
below the folding transition temperature, Tf0u) we ascertained that all such
241

dynamical pathways encountered the same kinetic bottlenecks determined as


above.
For the ft sheets, the bottlenecks involve amino acids that are typically
3-4 residues away from the turns - specifically, residues 61, 62, 72, 74 for ft,
10, 11, 12, 21, 22, 23 for ft and 44, 45, 46, 55, 56, 57 for ft. At the folding
transition temperature, T/ 0 w, the formation of contacts around residues 30
and 86 is observed. The largest contribution to the specific heat peak is
observed from contacts 29-86 and 32-76 which are, consequently, identified as
the most crucial for the folding/unfolding process, and denote this set as the
"transition bottleneck", (TB).
Such sites are physically located at the active site of HIV-1 PR, which is
targeted by anti AIDS drugs 25 . Hence, within the limitations of our simplified
approach, we predict that changes in the detailed chemistry at the active site
also ruin key steps of the folding process. To counteract the drug action, the
virus has to perform some very delicate mutations at the key sites; within
a random mutation scheme this requires many trials (occurring over several
months). The time required to synthesize a mutated protein with native-like
activity is even longer if the drug attack correlates with several bottlenecks
simultaneously.
This is certainly the case for several anti-AIDS drugs. Indeed Table 1
summarizes the mutations for the FDA approved drugs 20 . In Table 2, we
list the sites taking part to the three most important contacts in each of the
four bottlenecks TB, ft, ft and ft. Remarkably, among the first 23 most
crucial sites predicted by our method, there are 6 sites in common with the
16 distinct mutating sites of Table 1. The relevance of these matches can
be assessed by calculating the probability of occurrence by chance. By using
simple combinatorial calculations, it is found that the probability to observe
at least 6 matches with the key sites of Table 1 by picking 12 contacts at
random among the native ones is approximately 1 %.
This result highlights the high statistical correlation between our predic-
tion and the evidence accumulated from clinical trials.
In conclusion, the strategy presented here, which is entirely based on
the knowledge of the native structure of HIV-1 protease, allows one both to
identify the bottlenecks of the folding process and to explain their highly sig-
nificant match with known mutating residues 21 . This and similar approaches
should be applicable to identify the kinetic bottlenecks of other viral enzymes
of pharmaceutical interest. This could allow a fast development of novel in-
hibitors targetting the kinetic bottlenecks. This is expected to dramatically
enhance the difficulty for the virus to express mutated proteins which still fold
efficiently into the same native state with unaltered functionality.
242

Name Point Mutations Bottlenecks

R T N 26.27 20, 33, 35, 36, 46, 54, 63, 71, 82, 84, 90 TB, 01,02,03
NLF28 30, 46, 63, 71, 77, 84, TB, 0 2 , 0 3
I N D 29.30 10, 32, 46, 63,71, 82, 84 TB, 01,02,03
S Q V 29.30,31 10, 46, 48, 63, 71, 82, 84, 90 TB,0i,0 2 ,03
APR32 46, 63, 82, 84 TB, 0 2 , 03

Table 1. Mutations in the protease associated with FDA-approved drug resistance 2 0 . Sites
highlighted in boldface are those involved in the folding bottlenecks as predicted by our
approach. Pi refers to the bottleneck associated with the formation of the i-th /3-sheet,
whereas T B refers to the bottleneck occurring at the folding transition temperature Tf0id
(see next Table).

Bottleneck Key sites


TB 22, 29, 32, 76, 84, 86
0i 10,11,13,20,21,23
02 44,45,46,55,56,57
ft 61,62,63,72,74

Table 2. Key sites for the four bottlenecks. For each bottleneck, only the sites in the top
three pairs of contacts have been reported.

3 Optimal shape of a compact polymeric chain

Optimal geometrical arrangements, such as the stacking of atoms, are of rel-


evance in diverse disciplines. A classic problem is the determination of the
optimal arrangement of spheres in three dimensions in order to achieve the
highest packing fraction; only recently has it been proved 33 ' 34 that the answer
for infinite systems is a succession of tightly packed triangular layers, as con-
jectured by Kepler several centuries ago. This problem has had a profound
impact in many areas ranging from the crystallization and melting of atomic
systems, to optimal packing of objects and subdivision of space 33,34,35,36,37
The close-packed hard sphere problem is simply stated: given N hard spheres
of radius R, how should we arrange them so that they can be fit in the box
with smallest possible side, IP. Interestingly, the role of R and L can be re-
versed in the following alternative, but equivalent, formulation: given a set
of N points inside a box of side L, how should we arrange them so that the
spheres centred in them have the (same) maximum radius, R? Also in this
second case, as in the first one, the spheres are not allowed to self intersect or
243

cross the box boundaries.


Here we study an analogous problem, that of determining the optimal
shapes of closely packed compact strings. This problem is a mathematical
idealization of situations commonly encountered in biology, chemistry and
physics, involving the optimal structure of folded polymeric chains. Biopoly-
mers like proteins have three dimensional structures which are rather compact.
Furthermore, they are the result of evolution and one may think that their
shape may satisfy some optimality criterion. This naturally leads one to con-
sider a generalization of the packing problem of hard spheres to the case of
flexible tubes with a uniform cross section. The packing problem then con-
sists in finding the tube configuration which can be enclosed in the minimum
volume without violating any steric constraints. As for the "free spheres"
case, also this problem admits a simple equivalent re-formulation that we
found more apt for numerical implementation. More precisely we sought the
curve which is the axis, or centerline, of the thickest tube (the analog of the
sphere centers in the hard sphere packing problem) that can be confined in
the pre-assigned volume 38 .
The maximum thickness associated to a given centerline is elegantly de-
fined in terms of concepts recently developed in the context of ideal knot
shapes 3 9 ' 4 0 ' 4 1 , 4 2 ' 4 3 ' 4 4 . The thickness A denotes the maximum radius of a uni-
form tube with the string passing through its axis, beyond which the tube
either ceases to be smooth, owing to tight local bends, or it self-intersects.
The presence of tight local bends is revealed by inspecting the local radius of
curvature along the centerline. In our numerical attempt to solve the problem,
our centerline was represented as a succession of equidistant beads. The local
radius of curvature was then measured as the radius of the circumcircle going
through three consecutive points. Remarkably, the same framework can be
used to deal with the non-local restrictions to the maximum thickness occur-
ring when two points, at a finite arclength separation, come in close approach.
In this case one can consider the smallest radius of circles going through any
non-local triplet of points. When both local and non-local effects are taken
into account, one is naturally lead to define the thickness of the chain by
considering all triplets of particles and selecting the smallest among all the
radii 42 . For smooth centerlines, an appreciable reduction of the complexity of
the algorithm can be found by considering only triplets where at least two of
the points are consecutive 42 .
Besides this intrinsic limitations to the thickness, one also needs to con-
sider the extrinsic ones due to the presence of a confining geometry. In fact,
the close proximity of the centerline to the walls of the confining box may
further limit the maximum thickness.
244

As for the packing of free spheres, also the present case is sensitive to the
details of the confining geometries when the system is finite. An example of
the variety of shapes resulting from the choice of different confining geometries
is given in Fig. 3.

(a) (b) (c)

(d) (e) (f)

Figure 3. Examples of optimal strings. The strings in the figure were obtained starting
from a random conformation of a chain made up of N equally spaced points (the spacing
between neighboring points is defined to be 1 unit) and successively distorting the chain
with pivot, crankshaft and slithering moves. A stochastic optimization scheme (simulated
annealing) is used to promote structures that have larger and larger thickness. Top row:
optimal shapes obtained by constraining strings of 30 points with a radius of gyration less
than R. a) R = 6.0, A = 6.42 b) R = 4.5, A = 3.82 c) R = 3.0, A = 1.93. Bottom
row: optimal shapes obtained by confining a string of 30 points within a cube of side L. d)
L = 22.0, A = 6.11 e) L = 9.5, A = 2.3 f) L - 8.1, A = 1.75.

In order to reveal the "true" bulk solution one needs to adopts suitable
boundary conditions. The one that we found most useful and robust was to
replace the constraint on the overall chain density with one working at a local
level. In fact, we substituted the fixed box containing the whole chain, with
the requirement that any succession of n beads be contained in a smaller box
of side /. The results were insensitive (unless the discretization of the chain
was poor) to the choice of n, I and even to replacing the box with a sphere etc.
245

The solutions that emerged out of the optimizaton procedure were perfectly
helical strings, corresponding to discretised approximations to the continuous
helix represented in Fig. 4b, confirming that this is the optimal arrangement.
In all cases, the geometry of the chosen helix is such that there is an
equality of the local radius of curvature (determined by the local bending of
the curve) and the radius associated with a suitable triplet of non-consecutive
points lying in two successive turns of the helix. In other words, among all
possible shapes of linear helices, the one selected by the optimization pro-
cedure has the peculiarity that the local radius of curvature is equal to the
distance of successive turns. Hence, if we inflate uniformly the centerline of
this helix, one observes that the tube contacts itself near the helix axis ex-
actly when successive turns touch. This is a feature that is observed only for
a special ratio c* = 2.512... of the pitch, p, and the radius, r, of the circle
projected by the helix on a plane perpendicular to its axis. As this packing
problem is considerably more complicated that the hard spheres one, we have
little hope to prove analytically that, among all possible three-dimensional
chains, the helix of Fig. 4b is the optimally packed one. However, if we
assume that the optimal shape is a linear helix, it is not too difficult to ex-
plain why the "magic" ratio we can explain why the "magic" ratio p/r = c*
is observed. In fact, when p/r > c* the local radius of curvature, given by
p = r{\ + p2 /(2wr)2), is smaller than the half of the distance of closest ap-
proach of points on successive turns of the helix (see 4a). The latter is given
by the first minimum of 1/2-^2 - 2cos(2?rt) +p2t2 for t > 0. Thus A = p in
this case.
On the other hand, if p/r < c*, the global radius of curvature is strictly
lower than the local radius, and the helix thickness is determined basically by
the distance between two consecutive helix turns: A ~ p/2 if p/r <C 1 (see 4c).
Optimal packing selects the very special helices corresponding to the transition
between the two regimes described above. A visual example is provided by
the optimal helix of Fig. 4b. An instructive quantity to monitor is the ratio,
/ , of the minimum radius of the circles going through each point and any two
non-adjacent points and the local radius. For discretized strings, / = 1 just
at the transition described above, whereas / > 1 in the 'local' regime and
/ < 1 in the 'non-local' regime. In our computer-generated optimal strings,
the value of / averaged over all sites in the chain differed from unity by less
than a part in a thousand.
It is interesting to note that, in nature, there are many instances of the ap-
pearance of helices. It has been shown 10 that the emergence of such motifs in
proteins (unlike in random heteropolymers which, in the melt, have structures
conforming to Gaussian statistics) is the result of the evolutionary pressure
246

(a) (b) (c)

Figure 4. Maximally inflated helices with different pitch to radius ratio, c. (a) c = 3.77: the
thickness is given by the local radius of curvature, (b) c = 2.512...: for this optimal value
the local and non-local radii of curvature match, (c) c = 1.26: the maximum thickness is
limited by non-local effects (close approach of points in successive turns). Note the optimal
use of space in situation (b), while in cases (a) and (c), empty space is left between the
turns or along the helix axis.

exerted by nature in the selection of native state structures that are able to
house sequences of amino acids which fold reproducibly and rapidly 38 and
are characterized by a high degree of thermodynamic stability 17 . Further-
more, because of the interaction of the amino acids with the solvent, globular
proteins attain compact shapes in their folded states.
It is then natural to measure the shape of these helices and assess if they
are optimal in the sense described here. The measure of / in a-helices found in
naturally-occurring proteins yields an average value for / of 1.03±0.01, hinting
that, despite the complex atomic chemistry associated with the hydrogen bond
and the covalent bonds along the backbone, helices in proteins satisfy optimal
packing constraints. An example is provided in Fig. 5 where we report the
value of / for a particularly long a-helix encountered in a heavily-investigated
membrane protein, bacteriorhodopsin.
247

1c3w
(first helix)
3.4
3.2 Ft local
R non-local
3
2.8
2.6
2.4
11 21

11
Helix site

Figure 5. Top. Local and non-local radii of curvature for sites in the first helix of bacteri-
orhodopsin (pdb code lc3w). Bottom. Plot of / values for the same sites.

This result implies that the backbone sites in protein helices have an
associated free volume distributed more uniformly than in any other confor-
mation with the same density. This is consistent with the observation 10 that
secondary structures in natural proteins have a much larger configurational
entropy than other compact conformations. This uniformity in the free vol-
ume distribution seems to be an essential feature because the requirement
of a maximum packing of backbone sites by itself does not lead to secondary
structure formation 5 ' 6 . Furthermore, the same result also holds for the helices
appearing in the collagen native state structure, which have a rather different
geometry (in terms of local turn angles, residues per turn and pitch 45 ) from
average a-helices. In spite of these differences, we again obtained an average
/ = 1.01 ± 0.03 very close to the optimal situation.

4 Conclusions

In summary, we have shown that topology-based models can lead to a vivid


picture of the folding process. In particular, they allow not only the overall
248

qualitative characterization of the rate-limiting steps of the folding process,


but also to pinpoint crucial sites that, for viral enzyme, should be targetted
by effective drugs. We have carried out a successful validation of this strategy
against data for clinical trials on the HIV-1 protease.
We have then addressed the question of whether there exists a simple vari-
ational principle accounting for the emergence of secondary motifs in natural
proteins. A possible selection mechanism has been identified in terms of op-
timal packing requirements. The numerical evidence presented here support
unambiguously the fact that, among all three-dimensional structures with uni-
form thickness, the ones that make the most economic use of the space are
helices with a well-defined geometry. Strikingly, the optimal aspect ratio is
precisely the same that is observed in helices of naturally-occurring proteins.
This provides a hint that, besides detailed chemical interactions, a more fun-
damental mechanism promoting the selection and use of secondary motifs in
proteins is associated with simple geometric criteria 38 ' 46 .

Acknowledgments

Support from INFM, Murst Cofin 1999 and Cofm 2001 is acknowledged.

References

1. D. Baker, Nature 405, 39, (2000).


2. C. Chothia, Nature 357, 543, (1992).
3. Pauling L., Corey R. B. and Branson H. R., Proc. Nat. Acad. Sci. 37,
205 (1951).
4. Hunt N. G., Gregoret L. M. and Cohen F. E., J. Mol. Biol. 241, 214
(1994).
5. Yee D. P., Chan H. S., Havel T. F. and Dill K. A. J. Mol. Biol. 241, 557
(1994).
6. Socci N. D., Bialek W. S. and Onuchic J. N. Phis. Rev. E 49, 3440
(1994).
7. Anfinsen C. Science 181, 223 (1973).
8. P. G. Wolynes J. N. Onuchic and D. Thirumalai, Science 267, 1619,
(1995)
9. N. Go & H. A. Scheraga, Macromolecules 9, 535, (1976).
10. C. Micheletti, J. R. Banavar, A. Maritan and F. Seno, Phys. Rev. Lett.
82, 3372, (1999).
11. Galzitskaya 0 . V. and Finkelstein A. V. Proc. Natl. Acad. Sci. USA 96,
11299 (1999).
249

12. Munoz V. , Henry E. R., Hofrichter J. and Eaton W. A. Proc. Natl.


Acad. Sci. USA 95, 5872 (1998).
13. Aim E. and Baker D. Proc. Natl. Acad. Sci. USA 96, 11305 (1999).
14. Clementi C , Nymeyer H. and Onuchic J. N., J. Mol. Biol., in press
(2000).
15. Lazaridis T. and Karplus M. Science 278, 1928 (1997).
16. Kolinski A. and Skolnick J. J. Chem. Phys. 97, 9412 (1992).
17. Sali A., Shakhnovich E. and Karplus M. Nature 369, 248 (1994).
18. Fersht A. R. Proc. Natl. Acad. Sci. USA 92, 10869 (1995).
19. Itzhaki L. S., Otzen D. E. and Fersht A. R., J. Mol. Biol. 254, 260
(1995).
20. Ala P.J, et al. Biochemistry 37, 15042-15049, (1998).
21. Cecconi F., Micheletti C , Carloni P. and Maritan A. Proteins: Str. Fund
Gen., 43, 365-372 (2001).
22. Clementi C , Carloni P. and Maritan A. Proc. Natl. Acad. Sci. USA 96,
9616 (1999).
23. Evans D. J., Hoover W. G., Failor B. H., Moran B., Ladd A. J. C. Phys.
Rev. A 28, 1016 (1983).
24. Ferrenberg A. M. and Swendsen R. H. Phys. Rev. Lett. 63, 1195 (1989).
25. Brown A. J., Korber B. T., Condra J. H. AIDS Res. Hum. Retroviruses
15, 247 (1999).
26. Molla A. et al. Nat. Med. 2, 760 (1996).
27. Markowitz M. et al. J. Virol. 69, 701 (1995).
28. Patick A. K., et. al. Antimicrob. Agents Chemother. 40, 292 (1996) .
29. Condra J.H. et al. Nature 374, 569 (1995).
30. Tisdale M. at al. Antimicrob. Agents Chemother. 39, 1704 (1995).
31. Jacobsen H. et al. J. Infect. Dis. 173, 1379 (1996).
32. Reddy P. and Ross J. Formulary 34, 567 (1999).
33. Sloane N. J. A. Nature 395, 435 (1998).
34. Mackenzie D. Science 285, 1339 (1999).
35. Woodcock L.V. Nature 385, 141 (1997).
36. Car R. Nature 385, 115 (1997).
37. Cipra B. Science 281, 1267 (1998).
38. A. Maritan, C. Micheletti, A. Trovato and J. R. Banavar, Nature 406,
287, (2000).
39. Buck G. and Orloff J. Topol. Appl. 61, 205 (1995).
40. Katritch V., Bednar J., Michoud D., Scharein R.G., Dubochet J. and
Stasiak A., Nature 384, 142 (1996).
41. Katritch V., Olson W. K., Pieranski P., Dubochet J. and Stasiak A.
Nature 388, 148 (1997).
250

42. Gonzalez O. and Maddocks J. H. Proc. Natl. Acad. Sci. USA 96, 4769
(1999).
43. Buck G., Nature 392, 238 (1998).
44. Cantarella J., Kusner R. B. and Sullivan J. M., Nature 392, 237 (1998).
45. Creighton T. E., Proteins - Structures and Molecular Properties, W.H.
Freeman and Company, New York (1993), pag. 182-188.
46. A. Maritan, C. Micheletti and J. R. Banavar, Phys. Rev. Lett. 84, 3009,
(2000).
251

T H E PHYSICS OF M O T O R P R O T E I N S

G. L A T T A N Z I
International School for Advanced Studies (S.I.S.S.A.) and INFM, via Beirut, 2~4,
34013 Trieste, Italy
E-mail: lattanzi@sissa.it

A. M A R I T A N
InternationalSchool for Advanced Studies (S.I.S.S.A.) and INFM, via Beirut, 2~4,
34013 Trieste, Italy
The Abdus Salam International Center for Theoretical Physics, Strada Costiera 11,
34100 Trieste, Italy

Motor proteins are able to transform the chemical energy of ATP hydrolysis into
useful mechanical work, which can be used for several purposes in living cells.
The paper is concerned with problems raised by the current experiments on mo-
tor proteins, focusing on the main question of conformational changes. A simple
coarse-grained theoretical model is sketched and applied to the motor domain of
the kinesin protein; regions of functional relevance are identified and compared with
up-to-date information from experiments. The analysis also predicts the functional
importance of regions not yet investigated by experiments.

1 Introduction to the biological problem

The increasing precision in the observation of single cells and their components
can be compared to the approach of one of our cities by air 1 : at first we notice a
complex network of urban arteries (streets, highways, railroad tracks). Then,
we may have a direct look at traffic in its diverse forms: trains, cars, trucks
and buses traveling to their destinations. We do not know the reason for that
traffic, but we know that it is essential to the welfare of the entire city. If we
want to understand the rationale for every single movement, we need to be at
ground level, and possibly drive a single element of the traffic flow.
In the same way, biologists have observed the complex network of fila-
ments that constitute the cytoskeleton, the structure that is responsible also
for the mechanical sustain of the cell. Advances in experimental techniques
have finally cast the possibility of observing traffic inside the cell. This trans-
port system is of vital importance to the functioning of the entire cell; as ordi-
nary traffic jam, or a dafect in the transportation network of a city, can impair
its organized functionmg, occasional problems in the transport of chemical
components inside the cell can be the cause of serious cardiovascular diseases
or neurological disorders.
252

The study of the transportation system and its molecular components is


therefore of great relevance to medicine. Recent advances in single molecule
experiments 2,3 allowed us to be spectators at the ground level for the first
time, i.e. to observe the single molecular elements of the traffic flow inside
the cells. These components are called protein motors.

1.1 Fuel: ATP


The fuel for such motors is ATP (adenosine triphosphate). ATP is an organic
molecule, formed by the nucleotide adenine, ribose sugar and three phosphate
bonds, held together by two high energy phosphoanhydride bonds 4 . Removal
of a phosphate group from ATP leaves ADP (adenosine diphosphate) and an
inorganic molecule of phosphate, Pj, as in the hydrolysis reaction:

ATP + H20—>ADP + Pi+H+. (1)


This reaction corresponds to a release of 7.3 kcal/mol of free energy. In-
deed, under standard chemical conditions, this reaction requires almost one
week to occur 5 , but it is accelerated, or catalyzed, by proteins. These pro-
teins are called motors, since they are able to transduce one form of energy
(chemical) into useful mechanical work.

1.2 Characteristics of motor proteins


Protein motors are very different from our everyday life motors: first, they
are microscopic, and therefore they are subject to a totally different physical
environment, where, for instance, thermal agitation has a strong influence on
their motion.
In addition, their design has been driven by evolutionary principles op-
erating for millions of years, and therefore they are optimized to have a high
efficiency, and specialized to the many different purposes required in the func-
tioning of living cells.
Our everyday motors usually operate with temperature differences, there-
fore, no matter how clever we are in designing the motor, its efficiency is
always limited by the Carnot theorem 6 . This is no longer true for motor pro-
teins. Indeed any temperature difference, on the length scale of proteins (the
nanometer) would disappear in a few picoseconds, therefore they are isother-
mal machines, operating at the constant temperature of our body. They are
not limited by Carnot theorem, and their efficiency could be rather close to
1, meaning that they are able to convert chemical energy almost entirely into
useful work.
253

1.3 Different families, different tasks


Most molecular motors perform sliding movements along tracks using the
energy released from the hydrolysis of ATP, to make macroscopic motion,
such as muscle contraction and maintain cell activities. Among these, the
most important families are: myosins, kinesins and dyneins.
The study of myosin dates back to 1864. It is usually found in bundles,
as in the thick filaments in our muscle cells and is extremely important for
muscle contraction.
Kinesin, discovered in 1985, is a highly processive motor, i.e. could take
several hundred steps on a filament called microtubule without detaching 7 ' 8 ,
whereas muscle myosin was shown to execute a single "stroke" and then dis-
sociate. Kinesins form a large superfamily, and the individual superfamily
proteins operate as motor molecules in various cell types with diverse cargoes.
Given that transportation requirements are particularly demanding and com-
plex in neurons, it is not a surprise that the highest diversity of kinesins is
found in brain 1 .
The discovery of kinesin only partially explained how membrane vesicles
are transported in cells. Some movements, such as retrograde axonal trans-
port, are in the direction opposite to this kinesin-dependent movement. Thus
there must be a second group of motor proteins responsible for this motility.
Such a group exists; it is composed of the dyneins, a superfamily of excep-
tionally huge proteins.
In neurons, kinesin and dynein motor molecules have not only been in-
volved in intracellular axonal and dendritic transport, but also in neuronal
pathfinding and migration. Given the various fundamental cellular functions
they serve in neurons, such mechanisms, if defective, are expected to con-
tribute to onset or progression of neurological disorders.
But these are not the only motor proteins. Another important track is
the DNA. And specific machines move upon DNA filaments, unzip them and
copy them in RNA.

1.4 Structure
Until 1992, it appeared as though kinesin and myosin had little in common 9 .
In addition to moving on different filaments, kinesin's motor domain is less
than one-half the size of myosin and initial sequence comparisons failed to
reveal any important similarities between these two motors. Their motile
properties also appeared to be quite different.
In the last few years of research, however, the crystal structures of kinesin
have revealed a striking similarity to myosin, the structural overlap pointing
254

to short stretches of sequence conservation 10,11 . This suggested that myosin


and kinesin originated from a common ancestor.
The opportunity to study and compare numerous kinesin and myosin mo-
tors provides a valuable resource for understanding the mechanism of motility.
Because kinesin and myosin share a similar core structure and evolutionary
ancestry, comparison of these motors has the potential to reveal common
principles by which they convert chemical energy into motion 9 .
Members of each family have similar motor domains of about 30 — 50%
identical residues that can function as autonomous units. The proteins are
differentiated by their nonmotor or tail domains. The motor domains, also
called head domains, have no significant identity in amino acid sequence, but
they have a common fold for binding the nucleotide (ATP). Adjacent to the
head domain lies the highly a-helical neck region. It regulates the binding of
the head domain by binding either calmodulin or calmodulin-like regulatory
light chain subunits (called essential or regulatory light chains, depending on
their function). The tail domain contains the binding sites that determine
whether the tail binds to the membrane or binds to other tails to form a
filament, or attaches to a cargo.
Motor proteins are composed of one or two (but also rarely three) motor
domains, linked together by the neck regions, which form the neck linker part
of the motor. To understand how hydrolysis of ATP is coupled to the move-
ment of a motor head along filaments, we need to know the three-dimensional
structure of the head domain.
An important feature in the structure of the myosin head is the presence
of two clefts on its surface. One cleft is bound to the filament, while the other
contains the ATP binding site. The clefts are separated by 3.5 nm, a long
distance in a protein. The presence of surface clefts provides a mechanism
for generating large movements of the head domain: we can imagine how
opening or closing of a cleft in the head domain, by binding or releasing ATP,
causes the head domain to pivot about the neck region, so that a change in
the conformation of the protein may occur.

1.5 Conformational change: experiments


Conformational changes have been detected using advanced experimental
techniques such as Fluorescence Resonance Energy Transfer12 (FRET); FRET
determines the distance between two probes on a protein, called donor (D)
and acceptor (A). When the emission spectrum of the donor fluorophore and
the excitation spectrum of the acceptor fluorophore overlap, and they are lo-
cated close to each other (in the order of nanometers), the excited energy of
255

the donor is transferred to the acceptor without radiation, resulting in ac-


ceptor fluorescence. When the donor and acceptor are far apart, the donor
fluoresces.
It is therefore possible to determine the distance between the donor and
acceptor fluorophores attached to two different sites on a protein by monitor-
ing the color of the fluorescence. For myosin, it has been shown 13 that fluores-
cence intensities of the donor and acceptor vary spontaneously in a flip-flop
fashion, indicating that the distance between the donor and acceptor changes
in the range of hundreds of angstroms; that is, the structure of myosin is not
stable but instead thermally fluctuates. These results suggest that myosin can
go through several metastable states, undergoing slow transitions between the
different states.

1.6 The problem of conformational change


The ATP binding pocket is a rather small region in the myosin (or kinesin)
motor domain. Yet, the information that ATP is bound to this well localized
site, can be transferred to very distant regions of the domain, so that the
entire protein may undergo a conformational change. FRET can be used to
monitor the distance between parts of the motor domain, but it is not possible
to probe all of the regions, because of time and budget limitations. The
identification of possible targets for FRET experiments would be therefore
a task for theoretical modeling. A theoretical model would be of great help
in answering some of the following questions: which parts are expected to
undergo the largest displacements in a conformational change? Which are
sensible to ATP binding? Which are important for the biological function of
the protein? Which are responsible for the transfer of information?

2 Gaussian Network Model ( G N M )

Many theoretical models have been proposed for the analysis of protein struc-
ture and properties. The problem with protein motors is that they are huge
proteins, whose size prevents any attack by present all-atoms computer sim-
ulations. To make things worse, even under an optimistic assumption of im-
mense computer memory to store all necessary coordinates, the calculations
needed would show only few nanoseconds of the dynamics, but the conforma-
tional rearrangements usually lie in the time range of milliseconds. Therefore,
a detailed simulation of the dynamics is not feasible and furthermore it is not
guaranteed that all the details can shed light on the general mechanism.
Yet, in recent years, dynamical studies have increased our appreciation of
256

the importance of protein structure and have shed some light on the central
problem of protein folding14'15'16. Interestingly, coarse grained models proved
to be very reliable for specific problems in this field. The scheme is as follows.
Proteins are linear polymers assembled from about 20 amino acid monomers
or residues. The sequence of aminoacids (primary structure) varies for dif-
ferent molecules. Sequences of amino acid residues fold into typical patterns
(secondary structure), consisting mostly of helical (a helices) and sheetlike (/?
sheets) patterns. These secondary structure elements bundle into a roughly
globular shape (tertiary structure) in a way that is unique to each protein
(native state). Therefore, the information on the detailed sequence of amino
acids composing the protein, uniquely encodes its native state. Once the lat-
ter is known, one may forget about the former (this is the topological point of
view).
The GNM is a recently developed simple technique which drives this prin-
ciple to its extreme. It has been applied with success to a number of large
proteins 17 and even to nucleic acids 18,19 .

2.1 Theory
Bahar et al. 20 proposed a model for the equilibrium dynamics of the folded
protein in which interactions between residues in close proximity are replaced
by linear springs. The model assumes that the protein in the folded state
is equivalent to a three dimensional elastic network. The nodes are identi-
fied with the Ca atoms" in the protein. These undergo Gaussian distributed
fluctuations, hence the name Gaussian Network Model.
The native structure of a given protein, together with amplitudes of
atomic thermal fluctuations, measured by x-ray crystallography, is reported
in the Brookhaven Protein Data Bank 21 (PDB). Given the structure of a
protein, the Kirchhoff matrix of its contacts is defined as follows:

l i f r r c
r-=t- = | ~ «- (2)
l 3 [Z)
* \0 ifry>rc
N
r « = - 2_^ Ti/t (3)

where the non-zero off-diagonal elements refer to residue pairs i and j

"Carbon atoms in amino acids are numbered with Greek letters: for each residue there is
at least one carbon atom, Ca, but there could be also additional carbon atoms, called Cg,
257

that are connected via springs, their separation r^ being shorter than a cutoff
value rc for inter-residue interactions. The diagonal elements are found from
the negative sum of the off-diagonal terms in the same row (or column); they
represent the coordination number, i.e. the number of individual residues
found within a sphere of radius rc. The Kirchhoff matrix is conveniently
used 22 for evaluating the overall conformational potential of the structure:

V = |ARTrAR. (4)

Here A R is the iV-dimensional vector whose elements are the 3 -


dimensional fluctuation vectors A R j of the individual residues around their
native position, while 7 represents a free parameter of the model.
The cross-correlations between residue fluctuations are found from the
simple Gaussian integral:

(AR; • ARj) = - J - f(AHi • ARj)<J ( ^ A R ; j e - ' ^ ' ^ r f j A R } (5)

where the integration is carried over all possible fluctuation vectors A R , Zjv
is the partition function, and 5 (J2i ARj) accounts for the constraint of fixing
the position of the center of mass. This integral can be calculated exactly,
yielding:

1
(ARi.ARj) = ^ I [ r - ]i. (6)

where T _ 1 is the inverse of F in the space orthogonal to the eigenvector with


zero eigenvalue. This inverse can be expressed in a more elegant way as a sum
over all non-zero eigenvalues Xk and eigenvectors ii* of T, so that (ARi • ARj)
can be finally expressed as the sum over the contributions [ARj • ARj]^, of
the individual modes:

<ARS • ARj) = £ [AR. • A R j ] , = ^ £ A" 1 [ u ^ . . (7)


y
k k

The summation is performed over all N — 1 non-zero eigenvalues of T.


The mean square (ms) fluctuations of individual residues can be readily found
from eq. (7), taking i = j .
The analysis of single modes indeed yields much more information; it has
been argued 20 that residues active in the fastest modes (largest A^) have a
258

very strong resistance to conformational changes and are therefore thought


to be important in maintaining the structure, or in underlying the stability
of the folded state. Residues active in the slowest modes, on the other hand,
are susceptible to large scale (global) motions. It is reasonable to conceive
that such motions are associated with the collective dynamics of the overall
tertiary structure, and thereby relevant to biological function.

2.2 Application: Kinesin Motor Domain


The coordinates of the backbone of human kinesin motor domain were down-
loaded from the Brookhaven Protein Data Bank. This protein structure has
been resolved in presence of the ADP nucleotide and a magnesium ion 11 . We
concentrated only on the backbone chain and assumed that each amino acid is
well represented by an effective centroid coinciding with the Ca atom. Hence,
in this simplified model, each aminoacid is represented by one bead. The
ADP molecule is represented by 4 beads, located at the center of mass of the
adenine, at the center of mass of ribose and at the positions occupied by the
two P atoms. The obtained structure is modeled by 327 beads.
The Kirchhoff matrix was constructed assuming a cutoff distance of 7
A, which has been shown to be a reasonable assumption for a wide range of
folded proteins 20 . The matrix was diagonalized; its eigenvalues were sorted in
ascending order and the corresponding eigenvectors were normalized so that:

TV

^ ( u f c u i O a = 1 Vfc, (8)

where TV = 323, i.e. we normalized the fluctuations so that the sum of


the residue fluctuations is 1. We focus on the first 4 vibrational modes, which
might be related to the biological function of the protein.
The normalized fluctuations corresponding to the first vibrational mode
of the kinesin structure are reported in figure 1(a).
The first vibrational mode, which should be the most relevant for the
biological function of the motor domain, implies a large motion of the loop
L l l (230-253) coupled to the helix Q4 (254-268). Indeed this helix is known
as the relay helix9 and is thought to play a very important role in the bio-
logical function of the motor domain. Its motion is coupled to the binding of
ATP at the phosphate switches (residues 199-200 for switch I in the current
nomenclature, and residue 232 for switch II), which have been identified to
constitute the j-phosphate sensor, i.e. the regions that sense the presence of
the third phosphate group on ATP.
259

0.09

0.08
(a)
0.07

0.06
Loop L l l
0.05

0.04

0.03

0.02

relay
0.01

._/Vj 50 100 150 ZOO 250 300 350

Figure 1. Normalized fluctuations in the first four vibrational modes of the motor domain
of kinesin. (a) Mode 1: loop L l l experiences the largest fluctuation; (b) Mode 2: vibration
of both ends; (c) Mode 3: microtubule binding regions; (d) Mode 4: switch I.

Inspection of the switch regions of myosin, kinesin, and G proteins indeed


suggested that ATP binding and phosphate release trigger the most critical
structural changes in the cycle. Comparison of the myosin and kinesin struc-
tures revealed that small movements of the 7-phosphate sensor are transmit-
ted to distant regions of the protein using a similar element: a long helix that
is connected to the switch II loop 9 . This highly conserved helix was called re-
lay helix. The relay helix is the key structural element in the communication
pathway linking the catalytic site, the binding site of the microtubule and the
mechanical elements in both kinesin and myosin. Therefore, the first mode of
GNM is in agreement with the current picture of a switch-based mechanism.
It describes the transmission of elastic energy between the most important
regions of the protein: the switches sense the presence of ATP; their vibra-
tion is coupled to that of loop L l l , which lies on the microtubule, to weaker
fluctuations of other microtubule binding regions and to structural elements
located at the tip of the protein (loops L6 and L10 with adjacent secondary
260

motifs), as shown in figure 1(a).


It is also remarkable to observe that the fluctuation of residues in the
relay helix are not uniform: those closer to loop L l l experience a larger
amplitude of vibration, when compared to the farther ones. This is in striking
agreement with the recent observation 23 of a 20° rotation of the relay helix in
the monomeric kinesin motor KIF1A.
The second vibrational mode is still of fundamental importance in the
biological function of the motor domain of kinesin. It is reported in figure 1(b).
It is evident that the largest fluctuations are experienced by those residues in
close proximity to the N - and C-termini. In particular, loop L14 (301-303)
and the helix a6 (304-319) for the C-terminus and loop LI (14-47) with the
adjacent /32 (48-50) for the N-terminus.
The analysis of the second vibrational mode drives our attention on the
transmission of the information through the structure of the motor domain.
The phosphate sensor, in facts, interacts with the structural elements pointed
out in the first mode. The same structural elements interact with both ter-
mini in the second vibrational mode. This might be relevant for negative
processivity. In facts, the opposite directions of motion of kinesin and Ned
(a motor from Drosophila) have inspired various chimera experiments aimed
at mapping a directionality element. In conventional plus-end-directed ki-
nesins, in facts, the neck-linker region is attached to the C-terminus, while in
minus-end-directed kinesins, it is usually joined to the N-terminus. In these
experiments 9 , the motor domain of Ned was joined at its C-terminus to the
neck of kinesin. Surprisingly the resultant chimeras moved to the plus-end,
even though the catalytic core belonged to that of the minus-end motor, Ned.
The converse was also shown to be true: in successive experiments, the mo-
tor domain of kinesin was joined to the neck of Ned at its N-terminus. This
chimera moved to the minus-end, despite the presence of a catalytic core be-
longing to a plus-end-directed motor. The same experiments also showed that
the correct junction between the motor domain and the neck was important
for allowing the motor chimera to move towards the minus-end.
Therefore, the experiments indicate that the direction of motion is not
determined by the motor domain, but rather by the adjacent joint with the
neck region. The second vibrational mode indeed shows that the two regions
where the neck-linker may bind have the same importance, and undergo the
same vibrational mean square displacement, upon the vibration of mechanical
elements correlated to the phosphate sensors.
The third vibrational mode is represented in figure 1(c). This vibrational
mode involves the microtubule binding regions in the motor domain, in par-
ticular again the microtubule binding loop L l l , part of loop L8 (147-173), the
261

relay helix, loop L12 (269-268), helix a5 (279-290). The tip of the protein is
again involved in a large amplitude vibration, which is now correlated with
the microtubule binding elements and also with the C-terminal of the protein.
The fourth vibrational mode is depicted in figure 1(d). This mode drives
our attention on the vibrations of the two switches of the motor domain
(switch I: residues 199 and 200, and switch II, residue 232) and the mechanical
elements in their neighborhoods, in particular, those in proximity of switch
I, helix aZ (174-189) and loop L9 (190-202), but there is also a lower peak
corresponding to switch II. This may explain how the chemistry is indeed
affected by a mechanical force acting on the protein. If we suppose that
this force is transmitted through the neck-linker to the C-terminus, then the
elastic structure of the protein transmits these vibrations to the switches, and
therefore the rate of binding and/or dissociation of nucleotides can be affected
by the mechanical force acting on the protein, as observed by Visscher et al. 24 .

3 Conclusions

In this paper we analyzed the motor domain of kinesin with the simple Gaus-
sian Network Model6. This analysis relies on the fact that the conformational
change of the kinesin protein should not be sought in a conformational change
of the motor domain. Indeed, as proposed by Vale et al. 9 , it is likely that mo-
tions within the motor domain are small. It is also unlikely that the catalytic
core undergoes large interdomain motions which are needed to drive efficient
unidirectional motility.
As shown by experiments, the directionality is not determined by the cat-
alytic core, but rather by the adjacent neck-linker region. Therefore the search
for a conformational change of the kinesin motor domain might be fruitless,
since the conformational change may not be observable experimentally, or
detectable by computer simulations. Instead, the transmission of mechanical
strain between regions in the motor domain, is of extreme importance.
Despite its simplicity, the GNM has been used to address such an issue.
The slowest modes drove our attention on structural elements, which were in-
deed shown to be important in recent experiments, but also on other elements
which have not been investigated yet. In particular the GNM analysis seems
to indicate that the tip region (which is also thought to interact with the
neck-linker) plays an important role not only to counterbalance motions of
the other parts, but mostly as a possible mechanical communication channel

b
A more comprehensive analysis, where also the kinetics of motor proteins is taken into
account 2 6 , 2 7 , can be found in GL's PhD Thesis 2 8 , available on request.
262

among the slowest vibrational modes and therefore the structural elements
that are most important for the biological function of the motor.
Our opinion on the way the motor domain of kinesin may effectively make
use of the binding energy of ATP to generate strain on the neck-linker, is as
follows: the phosphate sensor senses the presence of ATP by direct contacts
with the third phosphate. These newly formed contacts activate some of the
slowest vibrational modes of the motor domain, the first and fourth in our
model, for instance.
The vibration of the switch regions and their adjacent parts is accompa-
nied by the activation of other regions which may be far apart in the structure,
yet their vibrations are strongly correlated with those in the proximity of ATP,
in particular the tip.
This works as a mechanical amplifier: its vibrations activate all the slowest
vibrational modes, in particular the first one, activating the relay helix, the
second one activating the neck-linker joined at one of the termini, and the
third one allowing the motor domain to rotate on the microtubule binding
site.
This scheme is consistent with previous experiments and with the current
switch based mechanism, as recently proposed by Kikkawa et al. 23 ; it is also
consistent with the picture obtained by Wriggers25 using all atoms computer
simulations, but requires only a negligible fraction of the corresponding CPU
time (the only CPU intensive calculation being the Jacobi diagonalization of
a symmetric matrix).
In addition our analysis suggests a direct correlation among switch I and
II and the C-terminal part of the domain. This dependence could effectively
explain how the chemistry could be affected by a mechanical force, as observed
by Visscher et al. 24 .
The correlation was weaker (essentially active only in mode 2) for the
N-terminus, which seems to be more stable to vibrational motions, at least in
the available structure. This may imply that chimeras with the neck-linker
attached to the N-terminal of this motor domain, could be less efficient than
their natural counterparts.
More importantly our analysis seems to suggest that a particularly well
designed experiment aimed at constraining the tip of the protein, could af-
fect the communication among mechanical elements of the motor domain, by
killing the main communication channel among the slowest vibrational modes;
therefore such an experiment, if possible, could affect mobility and/or rate of
ATP binding/ADP dissociation.
Our conclusion is that the GNM, or similar coarse grained models, could
be extremely useful in predicting the pathway along which mechanical strain
263

could be transported, reduced or amplified in motor proteins and, in general,


in all other cases of extremely massive macromolecules involved in complex
reactions upon binding of a nucleotide, or any chemical substance.

References

1. H. Tiedge et al, Proc. Natl. Acad. Sci. USA 98, 6997 (2001).
2. Y. Ishii et al, TRENDS Biotech. 19, 211 (2001)
3. A. Ishijima and T. Yanagida, TRENDS Bioch. Sci. 26, 438 (2001)
4. H. Lodish et al, Molecular Cell Biology (Scientific American Books, New
York, 2001).
5. J. Howard, Mechanics of Motor Proteins and the Cytoskeleton (Sinauer
Associates, Sunderland, MA, 2001).
6. R.P. Feynman et al, The Feynman Lectures on Physics (Addison-Wesley,
Reading, MA, 1966).
7. J. Howard et al, Nature 342, 154 (1989)
8. S. M. Block et al, Nature 348, 348 (1990)
9. R .D. Vale and R. A. Milligan, Science 288, 88 (2000) and references
therein.
10. I. Rayment et al, Science 261, 50 (1993)
11. F. J. Kull et al, Nature 380, 550 (1996)
12. S. Weiss, Science 283, 1689 (1999)
13. Y. Ishii et al, Chem. Phys. 247, 163 (1999)
14. C. Micheletti et al, Proteins 42, 422 (2001)
15. A. Maritan et al, Phys. Rev. Lett. 84, 3009 (2000)
16. A. Maritan et al, Nature 406, 6793 (2000)
17. A. R. Atilgan et al, Biophys. J. 80, 505 (2001) and references therein.
18. I. Bahar and R. L. Jernigan, J. Mol. Biol. 281, 871 (1998)
19. B. Lustig et al, Nucl. Ac. Res. 26, 5212 (1998)
20. I. Bahar et al, Phys. Rev. Lett. 80, 2733 (1998) and references therein.
21. F. C. Bernstein et al, J. Mol. Biol. 112, 535 (1977)
22. P. J. Flory, Proc. Roy. Soc. London A 351, 351 (1976)
23. M. Kikkawa et al, Nature 411, 439 (2001)
24. K. Visscher et al, Nature 400, 184 (1999)
25. W. Wriggers and K. Schulten, Bioph. J. 75, 646 (1998)
26. G. Lattanzi and A. Maritan, Phys. Rev. Lett. 86, 1134 (2001)
27. G. Lattanzi and A. Maritan, Phys. Rev. E 64, 061905 (2001)
28. G. Lattanzi, Statistical Physics Approach to Protein Motors, PhD thesis,
International School for Advanced Studies, SISSA, Trieste, 2001.
264

PHASING PROTEINS: EXPERIMENTAL LOSS OF INFORMATION AND


ITS RECOVERY

C. GIACOVAZZO 12 , F. CAPITELLI2, C. GIANNINI2, C. CUOCCI ' AND M. IANIGRO 2


' Dipartimento Geomineralogico, Universita di Bari, Campus Universitario, via Orabona 4,
70125 Bari, Italy
2
IRMEC (Istituto di Ricerca per lo Sviluppo di MEtodologie Cristallogrqfiche) c/o
Dipartimento Geomineralogico, Universita di Bari, Campus Universitario, via Orabona 4,
70125 Bari, Italy
E-mail: c.giacovazzo@area. ba. cnr. it

1 Introduction

X-ray (and neutron) diffraction is a classical example where some relevant


information on the object under study is lost. A three-dimensional crystal (see Fig.
1), with electron density distribution described by the function p(r), interacts with
an incident X-ray beam, so producing thousands of secondary diffracted beams. Just
before the screen the situation is fully described by the Fourier transform of p(r),

F(r*) = T[p(r)] = ( p(r) exp(2rar * r) dr

where S denotes the crystal space.

Figure 1. Genesis of diffracted beams by interaction of a three-dimensional crystal with an incident x-ray
beam. Reconstruction of the crystal by inverse Fourier transform.
265

If we were able to measure F(r*) in modulus and phase [F(r*) is a complex


quantity], the trivial calculation of the inverse Fourier transform would provide us
by the complete information on p(r):

p(r) = T 1 [F(r*)] = T 1 T[p(r)].

Unfortunately we loose in the experiment the phase value of F(r*) and we are
only able to measure its modulus. Indeed the intensity of each diffracted beam (the
only observable), marked by a triple of integers (h k 1), is related to |Fhki|2 by

I hk ,=k,k 2 I 0 LPTE|F hkl | 2 (1)

where I0 is the intensity of the incident beam;


k] = e4 / (m2c4) takes into account universal constants (charge and mass of the
electron, light velocity);
k2 = A.3 Q / V2 is a constant for a given diffraction experiment (Q is the volume
of the crystal, V is the volume of the unit cell);
P is the polarization factor;
T is the transmission factor and depends on the capacity of the crystal to absorb
the radiation;
L is the Lorentz factor and depends on the diffraction technique;
E is the extinction coefficient, which depends on the mosaic structure of the
crystal.

Fhki = S f j exp(27iihrj) = |Fhkl | exp(i(|>hkl) (2)


j=i

is the structure factor with vectorial index h = (h k 1), fj is the scattering factor of the
j * atom (thermal factor included), r, is its position in the unit cell, N is the number
of atoms in the cell and 4>hki is the phase of the structure factor Fhk].
A typical experimental outcome is shown in Fig. 2, where a set of diffracted
beam intensities are collected over an area detector.
266

mi

Figure 2. Reciprocal-space plane of an oxydized form of the enzyme rhodanese, space group C2,
a=156.2 A, b=49.04 A, c=42.25 A, (3=98.6° [from Gliubich F., Gazerro M., Zanotti G., Delbono S.,
Bombieri G. and Bemi R., Active Site Structural Features for Chemically Modified Forms of Rhodanese.
J. Biol. Chem. 271 (1996) pp. 21054-21061].

The question now is: how to obtain the phases from the moduli? If the inverse
Fourier transform of |F(r*)|2 is calculated the result is the Patterson function:

P(u) = T ' [ | F(r*) |2 ] = T 1 [F(r*) * F(r*) ]

where F(r*) is the complex conjugate of F(r*). Owing to the convolution theorem
we have

P(u) = r 1 [ F(r*) ] * T 1 [F(r*) ] = p(r) * p(-r) . (3)

Thus P(u) is the autoconvolution of the electron density: its maxima correspond
to the interatomic vectors, not to the atomic positions. These last quantities may be
obtained only if the Patterson is "deconvoluted", which is a quite difficult problem
267

for large structures. In spite of this difficulty, a general suggestion comes out from
eq. (3). If we assume that:
i) the Patterson function is univocally defined from the collected diffraction
moduli;
ii) the Patterson function univocally defines (in principle) the interatomic
vectors;
iii) the interatomic vectors univocally define the crystal structure,
then we can conclude that the set of diffraction moduli contain all the necessary
information to define the crystal structure. The above conclusion encouraged
several scientists to directly obtain phases from the moduli without passing through
the Patterson function: these methods are called "direct methods", and their basic
concepts are described in the section 3.
Since

P(u) = -Z|F h k l fexp(-27iihu)


h.k.l

the larger the number of measured moduli, the larger the amount of saved
experimental information. Accordingly, the aim of each diffraction experiment is to
collect a set of experimental intensities as extended as possible. The ideal situation
occurs when the number of observations is much larger than the numbers of
structural parameters to find (in this case we say that the structure is overdetermined
by the data). It is usual to allocate nine parameters per atom into the asymmetric
unit: i.e., the three coordinates x y z and the six thermal anisotropic parameters by.
In case of scarsity of experimental data four parameters per atom are defined: the
three spatial coordinates and the isotropic vibrational factor B. We will see in the
section 4 that for proteins the use of a simple wavelength diffraction experiment
does not usually contain sufficient information to overdetermine the crystal
structure via one set of the experimental data.

2 From the moduli to the phases

The prior information available before a diffraction experiment usually reduces to:
the positivity of the electron density, i.e. p(r) > 0, and consequently fj > 0 for
j=l,....,N;
the atomicity: i.e, the electrons are concentrated around nuclei.
The above information, even if apparently trivial, constitutes a strong restraint
on the allowed phase values. Indeed, let:
a) S = {h 0 =0,h,,....,h n }
be a finite set of indices, origin included;
268

n
b) H[p(r)]=ZF hi . hj u, Uj
i,j=0

be the Hermitian form of order n associated to p(r). Since p is non-negative


definite, then also H is non-negative definite: as a conseguence all the Toeplitz
determinants

h
^0 hl-li2 rhn

Mi2-h, Fo Fh2.hn
D s =det[(F h i . h .)] = (4)

F F F
hnl>l hn-h2 0

are non-negative. The converse is also true: if Dh > for all S then p is non negative.
Since the analytical expression of Ds may involve phases, (4) may be considered as
a mathematical restraint for the phase values, generated by the positivity of the
electron density distribution.
The above result has been exploited in the crystallographic literature by several
authors: we quote Harker and Kasper [19], Karle and Hauptmann [26], Goedkoop
[16]. More recently the determinantal techniques have been integrated with
probabilistic methods, giving rise to effective procedures for the solution of the
phase problem (Tsoucaris [34]; de Graaf and Vermin [3]).
Let us now come to the problem of extracting the phase information from the
moduli. We observe that the direct passage

{|F|} -> {*}

is not allowed.

o
Xn
'0'

Figure 3. A unit cell with origin in O. A shift of origin in O' changes the positional vector Y- into T-.
269

The generic j t h atom is in Pj, and rj is its positional vector, F h the structure factor
with vectorial index h when the origin is assumed in O. If we move the origin in O',
the new structure factor will be

N N
Fh = Xf J exp(2rahr j ) = ^ e x p ^ T i i n ^ -X 0 )] = exp(-27tihX0)Fh, (5)

where Xo is the origin shift. We observe that the change of origin generates a phase
change equal to (-27thX 0 ). Many sets {<j>h} are therefore compatible with the same
set of magnitudes, each set corresponding to a given choice of the origin. This
implies that single phases (origin dependent) cannot be determined from the
diffraction moduli alone (observable quantities, and therefore origin independent).
Luckily there are combinations of phases which are origin independent: they only
depend on the crystal structure and therefore can be estimated via the diffraction
moduli. Let us consider the product

F h ,F h2 ...F hn . (6)

According to (5) an origin translation will modify (6) into

F'h, F 'h 2 -F' h n = Fhl Fh2...Fhnexp[-27ri(h, + h 2 +...+ h n )X 0 ]. (7)

The relation (7) suggests that the product of structure factors (6) is invariant
under origin translation if

h, + h 2 +...+ h n = 0.

These products are called structure invariants. The simplest examples are:
N
= me
a) for n=l, F00o /Li^i is simplest structure invariant (Zj is the atomic
j=i
number of j atom);
b) for n=2 eq. (6) reduces to |Fh|2;
c) for n=3 eq. (6) reduces to F hl F h2 F S]+K2 ;
d) for n= 4 the relation (6) reduces to F h F k F,F s+£+j .
Quintet, sextet, ... invariants are defined by analogy. Invariants of order 3 or
larger are phase dependent and therefore potentially useful to solve the phase
problem. Triplets and quartets are the most important invariants.
270

3 Basic concepts of Direct Methods

The so called direct methods may be divided in two categories:


a) reciprocal space techniques;
b) real space techniques.
Both of them try to find phases directly from the moduli. Let us first describe
the set a). The properties of the structure invariants mentioned in the section 2
encouraged the calculation of the conditional distribution functions

P (<D| {R}) (8)

where

# = <l>h, + <k 2 + + +n„

is a structure invariant and {R} is a suitable set of diffraction magnitudes. The


mathematical technique to calculate (8) is the following: the set of reflections

{E} = {E,,E 2 , En, E p }, p>n

which is considered useful for the estimation of the structure invariant is defined.
This may be made via the neighbourhoods principle by Hauptman [20] or by the
representation theory by Giacovazzo [9, 11].
Then the joint probability distribution function

P (E,,E 2 ,...,E n ,...,E p ) = P (<|>h„ 4>h2, <|>hn, , 4>hp, R h l , R h 2 , . . . R h p ) (9)

is derived. This distribution is of basic interest since R h l , R h 2 , . . . . , R h are known


from experiments, and therefore they constitute the prior information of the
probabilistic approach. Finally, the distribution

P ( 0 | R h l , R h 2 , . . . . , R h p ) « P(<P|{R})

is calculated.
The mathematical approach for the calculation of (9) is the classical one: the
characteristic function of (9) is first derived (the atomic positional vectors may be
used as random variables uniformly distributed in the unit cell), then its Fourier
transform provides the required distribution. The mostly used structure invariants
for which the distribution (9) is calculated are the triplet and the quartet invariants.
Real space techniques [the set b)] are based on the cycle
271

{F}i -> pj(r) -» p raod.(r) -> {F} i+I

where {F}j is the set of structure factors at the ith cycle, pj(r) the corresponding
electron density, pmodj(r) the modified electron density function (to match the
expected behaviour of p), {F}; +1 the set of structure factors in the (i+1)* cycle.
These techniques directly exploit the positivity condition without transforming it
into complex reciprocal space relationships.

4 Phasing proteins

The impressive achievement in protein phasing obtained in these last years are
mainly due (see the discussion below) to simultaneous advances in (see Fig. 4):
theoretical development, increasing of the computer power, new radiation sources,
sophisticated experimental techniques.

THE PHASE PROBLEM

Figure 4. Phasing proteins: factors allowing new advances.

There are intrinsic difficulties in recovering the protein phases directly from the
diffraction moduli:
a) the large number of atoms in the unit cell (i.e., only in few cases the
asymmetric unit contains less then 500 atoms, often more then 5000). Under these
conditions direct methods provide rather flat phase probability distributions;
b) in accordance with equation (1), the relation
272

Ihk. * 1/V

holds. Since the unit cell volume of a protein is large the diffraction intensities will
be weaker than for small molecules, and therefore their measure less accurate. Quite
efficient experimental techniques are therefore necessary for collecting data for
which the ratio I/a(I) is sufficiently good;
c) protein molecules are irregular in shape and they pack together with gaps
between them filled by a liquid. This constitutes a unordered region which ranges
from 35 to 75 per cent of the volume of the unit cell, giving rise to diffuse
scattering;
d) protein molecules are intrinsically flexible (to secure their biological
function), and their thermal vibration is generically high. Consequently their atoms
are bad scatterers.
The above drawbacks limit the number of observations available by a
diffraction experiment. While, for small molecules, the available number of
reflections per atom in the asymmetric unit is about 100 (i.e., for data up to 0.77 A
resolution), the same number for proteins lowers to about 12.5 if the data resolution
is 1.54 A. Unfortunately the resolution for proteins is usually between 3 and 1.5 A,
so that we are often in the case in which diffraction data do not overdetermine the
crystal structure (number of observation comparable with or inferior to the number
of parameters to define). We will briefly consider two cases: the first occurs when
data resolution is better or equal to 1 A . In this case the ab initio crystal structure
solution of the protein may be directly attempted without any use of supplementary
data. The second case occurs when the resolution is worse than 1 A: supplementary
information is then necessary.
Reciprocal space techniques were able to extend the complexity of the solvable
structures up to 200 atoms in the asymmetric unit. This extreme success has been
overcome in recent years (Weeks et al. [37]) when Shake-and-Bake introduced a
new approach: reciprocal and direct space techniques are cyclically and repeatedly
alternated. An effective variant of Shake-and-Bake is the program Half-baked
SHELX-D (Sheldrick [32]) which preserves the cyclic combination of direct and
reciprocal space techniques, but relies more on real space techniques. A third
program, SIR2000 (Burla et al. [2]) proved able to solve crystal structures with
more than 2000 atoms in the asymmetric unit without any user intervention. It is
mainly based on real space techniques: the role of tangent formula is ancillary.
In all the above mentioned programs the procedure is the following: random
phases are given to a subset of structure factors, and direct methods are applied to
drive them towards the correct values. The approach is a multisolution one: several
random sets are explored to obtain the structure. The computing time necessary to
succeed may be remarkable. As an example, in Table 1 we show, for some protein
structures, the cpu time needed to find the correct solution by application of
SIR2000: Nasym is the number of non-hydrogen atoms in the asymmetric unit, NH o
is the number of bonded water molecule.
273

Table 1. Large Size Structures (up to 2000 atoms in the a. u.); the structure resolution average time is
76.1 hours. Structure references in square brackets.
STRUCTURE „ „ w w SIR2000
CODE Reference Nasym-N, TIME (H)
TOXIN II [33] 508--96 6.3
LACTAL [18] 9 3 5 - 164 52.9
LYSOZIME [4] 1001--108 1.0
OXIDOREDUCTASE [7] 1106--283 78.4
HIPIP [30] 1229--334 76.2
MYOGLOBINE [35] 1241 --186 19.1
CUTINASE [28] 1141 --264 293.2
ISD [6] 1910 -374 87.4

Non-ab-initio methods
The supplementary information is generally provided by:
isomorphous replacement techniques (Green et al. [17]; Bragg and Perutz [1]);
anomalous dispersion techniques (Hoppe and Jakubowski [25]; Hendrickson et
al [23]);
molecular replacement (Rossmann and Blow [31]; Navaza [29])
crystallochemical restraints.
Let us first examine the nature of the first three techniques.
Isomorphous Replacement. The method requires the preparation of one or more
heavy atoms containing derivatives in the crystalline state. The most common
technique is by soaking the protein crystal in a solution of the reagent. Then X-ray
intensity data are collected both for the native protein and for its derivatives . One
speaks of SIR (Single Isomorphous Replacement) or MIR (Multiple Isomorphous
Replacement) according to whether one or more derivatives are available.
Anomalous dispersion. Atomic electrons can be considered as oscillators with
natural frequencies. If the frequency of the primary beam is near to some of these
natural frequencies resonance will take place, and the scattering factor may be
analitically expressed via the complex quantity

F =f0 +Af +if" (10)

where f0 is the atomic scattering factor in the absence of anomalous scattering. Af


and f' are called the real and the imaginary dispersion corrections and assume
specific values for each wavelength . Owing to (10) the Friedel law Fhkl = F ^ , and
the widely accepted rule Fi = Fj. are no more fulfilled. One speaks of SAS (Single-
wavelength Anomalous Scattering) and MAD (Multiple-wavelength Anomalous
Dispersion) according to whether diffraction data are collected at one or more
wavelengths.
274

Molecular replacement. The same protein, under different crystallization


conditions, may crystallize in different space groups. Analogously, homologous
proteins (representatives of a divergent evolution from a single primitive protein,
but very similar in tertiary structure) often crystallize in different space groups. One
can expect that the diffraction patterns of such similar proteins will be related to
each other, and that the knowledge of a protein structure can provide useful
information to find the structure of other homologous proteins. The problem is a
six-dimensional one (three rotation angles and three components of the translation
vector have to be fixed). The problem however may be approached as the sum of
two three-dimensional ones: first the rotation angles are determined, then the
translation is searched.
Let us now examine the nature of the supplementary information provided by
the three techniques, allowing to solve the protein crystal structure. As soon as the
diffraction data of the native and of one derivative are available, the differences

Aiso = | F d | - | F P I

can be obtained, where Fd represents the generic structure factor of the derivative, F p
the corresponding structure factor of the native protein. Magnitudes and signs of the
Ajso are determined by the heavy atom substructure; however Aiso does not coincide
with FH (the generic structure factor of the heavy atom substructure), owing to the
fact that

FH = Fd - Fp.

SIR and MIR techniques aim first at finding the heavy atom substructure: then
they use this information as a prior to phase the native protein. Alternative
techniques directly phase the protein from the Aiso (Hauptman [21]; Giacovazzo and
Siliqui [8], Giacovazzo et al. [10, 12-14]). The overall problem is not trivial, mostly
when lack of isomorphism occurs between the native and the derivative (i.e., the
introduction of the heavy atoms in the native crystal structure framework generates
too many conformational changes). In general, the signal (say the Aiso) is of the
same size of the error, and sophisticated techniques have to be used to succeed.
Also SAS and MAD techniques use a two steps procedure: first the substructure
of the anomalous scatterers is found, and then this information is used as prior for
phasing the protein. Alternative techniques directly phasing the protein from the
experimental data have also been proposed (Hauptman [22]; Giacovazzo [15]).
If SAS is used, only the anomalous differences

Aa„o= | F + | - | F |

are employed to locate the anomalous scatterers; however Aano does not coincide
with
275

N
F"= V^fj exp(23rihr.).
j=i

If MAD is used, besides the anomalous differences, also the dispersion


differences (i.e., differences between diffraction moduli measured at different
wavelengths) may be employed. Unlike for SIR and MIR techniques, no lack of
isomorphism occurs when MAD or SAS techniques are used, but the signal is
smaller.
SAS and MAD tend to become the method of choice for an ever-increasing
number of structural biologists. The reasons are manifold: among others, the
tunability of the synchrotron beamlines, which are able to select wavelengths for
which the ratio signal to noise is a maximum, and the capability of the modern
molecular biology techniques to produce selenomethionine - containing proteins
easily and in large quantities.
Molecular Replacement. The rotation and the translation searches may fail
because the model molecule differs too much from the unknown structure.
Additional difficulties arise form the necessity of limiting the data resolution. In
general quite low-resolution reflections are omitted because they strongly depend on
the solvent; the high - resolution cut-off depends on the similarity between the
model and the molecule of the structure under study (high resolution reflections are
too sensitive to differences between the model and the protein under study). Since
more and more protein crystal structures are solved, more model structures are
available to apply molecular replacement techniques.
Let us now discuss the role of the information provided by the
crystallochemistry. Suppose that an imperfect structural model of the protein is
available and that the crystal structure is underdetermined (i.e., low ratio between
the number of parameters to refine and the number of observations). The classical
tool used in crystallography to optimize structural parameters, say minimize by least
squares the quantity

S = Iwj(|FJ|0SS-|FJ|calc)2,
J

where the summation is extended to all the measured reflexions, is not quite useful.
Luckily, bond lengths and valence angles in amino acids are very well known, so
they can be held fixed to their theoretical values during the refinement and only
torsion angles around single bonds are allowed to vary (Diamond [5]). Also a group
of atoms can be treated as a rigid entity when the geometry of the group is believed
insensitive to the environment. This is the classical case of the phenil ring : the
eighteen positional variables are then reduced only to six (three rotational to define
the orientation of the ring, three of translational type to locate the ring). In this way
276

the number of parameters to refine decreases and the ratio (number of


observations)/(number of parameters to refine) increases, so improving the
efficiency of the least squares procedure.
A different but quite efficient alternative is to increase the number of
observations: any crystallochemical information may be used as a supplementary
observation in the least squares procedure (Konnert [27], Hendrickson and Konnert
[24], Waser [36]). Since distances and valence angles are not expected to deviate
significantly from the ideal values, one can minimize

S>2 - /_,Wi (dj(ideal) _


dj(ica,c))
J

where dj(caiC) is calculated from the structural model, dj(ideai) is the expected value.
Also deviations from the planarity can be minimized (for planar groups), as well as
the volume of the chiral atoms (defined for an a-carbon by the product of the
interatomic vectors of the three atoms bound to it). The above restraints introduce
the amount of information necessary to obtain a quite reliable structural models of
proteins.

References

1. Bragg. W. L. and Perutz M. F., The structure of haemoglobin. VI. Fourier


projections on the 010 plane. Proc. R. Soc. London Ser. A 225 (1954) pp. 315-
329.
2. Burla M. C , Cavalli M., Carrozzini B., Cascarano G., Giacovazzo C , Polidori
G. and Spagna R., SIR2000, a program for the automatic ab initio crystal
structure solution of proteins. Acta Cryst. A 56 (2000) pp. 451-457.
3. de Graaff R. A. G. and Vermin, W. J., The use of Karle-Hauptman
determinants in small-structure determinations. II. Acta Cryst. A 38 (1982) pp.
464-470.
4. Deacon A. M., Weeks C. M., Miller R. and Ealick S. E., The Shake-and-Bake
structure determination of triclinic lysozyme. Proc. Natl. Acad. Sci. USA 95
(1998) pp. 9284-9289.
5. Diamond R., A real-space refinement procedure for proteins. Acta Cryst. A 27
(1971) pp. 436-452.
6. Esposito L., Vitagliano L., Sica F., Sorrentino G., Zagari A. and Mazzarella L.,
The Ultrahigh Resolution Crystal Structure of Ribonuclease A Containing an
Isoaspartyl Residue: Hydration and Sterochemical. J. Mol. Biol. 297 (2000) pp.
713-732.
7. Ferraroni M., Rypniewski W., Wilson K.S., Viezzoli M.S., Banci L., Bertini I.
and Mangani S., The Crystal Structure of the Monomeric Human SOD Mutant
277

F50E/G51E/E133Q at Atomic Resolution. The Enzyme Mechanism Revisited.


J. Mol. Biol. 288 (1999) pp. 413-426.
8. Giacovazzo C. and Siliqi D., Improving Direct-Methods Phases by Heavy-
Atom Information and Solvent Flattening. Acta Cryst. A 53 (1997) pp. 789-798.
9. Giacovazzo C , A general approach to phase relationships: the method of
representations. Acta Cryst. A 33 (1977) pp. 933-944.
10. Giacovazzo C , Cascarano G. and Zheng C.-D., On integrating the techniques
of direct methods and isomorphous replacement. A new probabilistic formula
for triplet invariants. Acta Cryst. A 44 (1988) pp. 45-51.
11. Giacovazzo C , Direct Methods in Crystallography. (Academic, London 1980).
12. Giacovazzo C., Siliqi D. and Spagna R., The ab initio crystal structure solution
of proteins by direct methods. II. The procedure and its first applications. Acta
Cryst. A 50 (1994) pp. 609-621.
13. Giacovazzo C , Siliqi D. and Zanotti G., The ab initio crystal structure solution
of proteins by direct methods. III. The phase extension process. Acta Cryst. A
51 (1995) pp. 177-188.
14. Giacovazzo C., Siliqi D., Gonzalez Platas J., Hecht H.-J., Zanotti G. and York
B., The Ab Initio Crystal Structure Solution of Proteins by Direct Methods. VI.
Complete Phasing up to Derivative Resolution. Acta Cryst. D 52 (1996) pp.
813-825.
15. Giacovazzo C , The estimation of two-phase and three-phase invariants in PI
when anomalous scatterers are present. Acta Cryst. A 39 (1983) pp. 585-592.
16. Goedkoop J. A., Remarks on the theory of phase-limiting inequalities and
equalities. Acta Cryst. 3 (1950) pp. 374-378.
17. Green E. A., Ingram V. M. and Perutz M. F., The structure of haemoglobin. IV.
Sign determination by the isomophous replacement model. Proc. R. Soc.
London Ser. A 225 (1954) pp. 287-307.
18. Harata K., Abe Y. and Muraki M., Crystallographic Evaluation of Internal
Motion of Human a-Lactalbumin Refined by Full-matrix Least-squares
Method. J. Mol. Biol. 287 (1999) pp. 347-358.
19. Harker D. and Kasper J. S., Phases of Fourier coefficients directly from crystal
diffraction data. Acta Cryst. 1 (1948) pp. 70-75.
20. Hauptman H., A new method in the probabilistic theory of the structure
invariants. Acta Cryst. A 31 (1975) pp. 680-687.
21. Hauptman, H., On integrating the techniques of direct methods and
isomorphous replacement. I. The theoretical basis. Acta Cryst. A 38 (1982) pp.
289-294.
22. Hauptman, H., On integrating the techniques of direct methods with anomalous
dispersion. I. The theoretical basis. Acta Cryst. A 38 (1982) pp. 632-641.
23. Hendrickson W. A, Pahler A., Smith J. L., Satow Y., Merrit E. A. and
Phizackerley R. P., Crystal structure of core streptavidin determined from
multiwavelength anomalous diffraction of synchroton radiation. Proc. Natl.
Acad. Sci. USA 86 (1989), pp. 2190- 2194.
278

24. Hendrickson W. A. and Konnert J. H., Incorporation of Stereochemical


Restraints into Crystallographic Refinement. In Computing in crystallography,
ed. By R. Diamond, R. Ramaseshan and K. Venkatesan (The Indian Academy
of Sciences, Bangalore 1980) pp. 13.01-13.23.
25. Hoppe W. and Jakubowski U., The determination of phases of erythrocuorin
using the two-wavelength method with iron as anomalous scatterer. In
Anomalous Scattering, ed. by S. Ramaseshan and S. C. Abrahams.
(Munksgaard, Copenaghen 1975) pp. 437-461.
26. Karle J. and Hauptman H., The phases and magnitudes of the structure factors.
Acta Cryst. 3 (1950) pp. 181-187.
27. Konnert J. H., A restrained-parameter structure-factor least-squares refinement
procedure for large asymmetric units. Acta Cryst. A 32 (1976) pp. 614-617.
28. Longhi S., Czjzek M., Lamzin V., Nicolas A. and Cambillau C , Atomic
Resolution (1.0 A) Crystal Structure of Fusarium solani Cutinase:
Stereochemical Analysis. J. Mol. Biol. 268 (1997) pp. 779-799.
29. Navaza J., AMoRe: an automated package for molecular replacement. Acta
Cryst. A 50 (1994) pp. 157-163.
30. Parisini E., Capozzi F., Lubini P., Lamzin V., Luchinat C. and Sheldrick G. M.,
Ab initio solution and refinement of two high-potential iron protein structures
at atomic resolution Acta Cryst. D 55 (1999) pp. 1773-1784.
31. Rossmann M. G. and Blow D. M., The detection of sub-units within the
crystallographic asymmetric unit. Acta Cryst. 15 (1962) pp. 24-31.
32. Sheldrick G. M., SHELX: applications to macromolecules. In Direct Methods
for Solving Macromolecular Structures, ed. S. Fortier (Kluwer Academic
Publishers, Dordrecht, The Netherlands, 1998) pp. 401-411.
33. Smith G. D., Pangborn W. A. and Blessing R. H., Phase changes in T3R3f
human insulin: temperature or pressure induced? Acta Cryst. D 57 (2001), pp.
1091-1100.
34. Tsoucaris G., A new method for phase determination. The maximum
determinant role. Acta Cryst. A 26 (1970), pp. 492-499.
35. Vojtechovsky J., Berendzen J., Chu K., Schlichting I. and Sweet R.M.,
Implications for the Mechanism of Ligand Discrimination and Identification of
Substates Derived from Crystal Structures of Myoglobin-Ligand Complexes at
Atomic Resolution. To be Published (PDB code 1A6M).
36. Waser J., Least-squares refinement with subsidiary conditions. Acta Cryst. 16
(1963) pp. 1091-1094.
37. Weeks C. M., DeTitta G. T., Hauptman H. A., Thuman P. and Miller R.,
Structure solution by minimal-function phase refinement and Fourier filtering.
II. Implementation and applications. Acta Cryst. A 50 (1994) pp. 210-220.
279

LIST OF PARTICIPANTS

N. Accornero - Dipartimento di Neurologia, Universita di Roma I, Italy


N. Ancona - IESI-CNR, Bari, Italy
L. Angelini - Dipartimento di Fisica, Universita di Bari
E.O. Ayoola - ICTP Trieste, and Mathematics Department, University of Nigeria
A. Bazzani - Dipartimento di Fisica, Universita di Bologna, Italy
D. Bellomo, Dipartimento di Elettronica, Politecnico di Bari, Italy
R. Bellotti - Dipartimento di Fisica, Universita di Bari, Italy
A. Bertolino - Dipartimento di Psichiatria, Universita di Bari, Italy
G. Bhanot - IBM and Princeton University, USA
M. Bilancia -Dipartimento di Scienze Statistiche, Universita' di Bari, Italy
P. Blonda - IESI CNR, Bari, Italy
F. Bovenga - Dipartimento di Fisica, Universita di Bari, Italy
M. Caselle - Dipartimento di Fisica, Universita di Torino, Italy
P. Cea - Dipartimento di Fisica, Universita di Bari, Italy
G. Cibelli - Dipartimento di Fisiologia Umana, Universita di Foggia, Italy
P. Colangelo - INFN, Sezione di Bari, Italy
E. Conte - Dipartimento di Farmacologia e Fisiologia, Universita di Bari, Italy
L. Cosmai - INFN, Sezione di Bari, Italy
N. Cufaro Petroni - Dipartimento di Fisica, Universita di Bari, Italy
F. De Carlo - Dipartimento di Fisica, Universita di Bari, Italy
C. De Marzo - Dipartimento di Fisica, Universita di Bari, Italy
M. De Tommaso - Dipartimento di Scienze Neurologiche, Universita di Bari, Italy
E. Domany - Weizmann Institute, Israel
M.R. Falanga -Dipartimento di Fisica, Universita' di Salerno, Italy
A. Federici - Dipartimento di Farmacologia e Fisiologia, Universita di Bari, Italy
F. Franci - Dipartimento di Matematica Applicata, Universita di Firenze, Italy
C. Giacovazzo - IRMEC-CNR, Universita di Bari, Italy
R. Giuliani — Dipartimento per le Emergenze e Trapianti d'Organo, Bari, Italy
G. Gonnella - Dipartimento di Fisica, Universita di Bari, Italy
L. Guerriero - Dipartimento di Fisica, Politecnico di Bari, Italy
P.Ch. Ivanov - CPS, Boston U. & Harvard Medical School, USA
H.J. Kappen - Nijmegen University, The Netherlands
A. Lamura - Dipartimento di Fisica, Universita di Bari, Italy
G. Lattanzi - SISSA, Trieste, Italy
S. Lecchini - INI, ETH Zurich, Swiss
K. Lehnertz - Dept. of Epileptology, Medical Center, University of Bonn, Germany
M. Leone - SISSA Trieste, Italy
280

P. Livrea - Dipartimento di Scienze Neurologiche, Universita di Bari, Italy


M. Mannarelli - Dipartimento di Fisica, Universita di Bari, Italy
C. Marangi -IRMA-CNR, Bari, Italy
E. Marinari - Dipartimento di Fisica, Universita di Roma I, Italy
C. Micheletti - SISSA, Trieste, Italy
G. Nardulli - Dipartimento di Fisica, Universita di Bari, Italy
L. Nitti - Dipartimento per le Emergenze e Trapianti d'Organo, Bari, Italy
G. Paiano - Dipartimento di Fisica, Universita di Bari, Italy
G. Palasciano - Dipartimento di Medicina Interna, Universita di Bari, Italy
A.M. Papagni - Dipartimento di Farmacologia e Fisiologia, Universita di Bari, Italy
S. Pascazio -Dipartimento di Fisica, Universita di Bari, Italy
G. Pasquariello - IESI-CNR, Bari, Italy
M. Pellicoro - Dipartimento di Fisica, Universita di Bari, Italy
G. Perchiazzi - Dipartimento per le Emergenze e Trapianti d'Organo, Bari, Italy
M.V. Piztalis - Dip. di Metodologie Cliniche, Universita di Bari, Italy
A. Refice - Dipartimento di Fisica, Universita di Bari, Italy
P. Rizzon - Dipartimento di Metodologie Cliniche, Universita di Bari, Italy
R. Santostasi - Dipartimento di Scienze Neurologiche, Universita di Bari, Italy
G. Satalino - IESI-CNR, Bari, Italy
E. Scrimieri - Dipartimento di Fisica, Universita di Bari, Italy
F. Simone - Dipartimento di Scienze Neurologiche, Universita di Bari, Italy
R. Stoop - INI, ETH Zurich, Swiss
S. Stramaglia - Dipartimento di Fisica, Universita di Bari, Italy
F. Tecchio, IESS-CNR, Unita' MEG Ospedale Fatebenefratelli Roma, Italy
G. Turchetti - Dipartimento di Fisica, Universita di Bologna, Italy
A. Xu - Dipartimento di Fisica, Universita di Bari, Italy
A. Vena - Dipartimento per le Emergenze e Trapianti d'Organo, Bari, Italy
M. Villani - Dipartimento di Fisica, Universita di Bari, Italy
A. Zenzola - Dipartimento di Scienze Neurologiche, Universita di Bari, Italy
AUTHOR INDEX

Accomero, N. 93 Domany, E. 175


Andrzejak, R. G. 17
Angelini, L. 196 Elger, C. E. 17
Attimonelli, M. 196
Federici, A. 51
Baraldi,A. 157 Fiore, T. 165
Bazzani, A. 123 Fiore.T. 60
Bellotti,R. 144
Bertolino, A. 132 Giacovazzo, C. 264
Bhanot, G. 67 Giannini, C. 264
Blasi.G. 132 Giuliani, R. 165
Blonda,P. 157 Giuliani, R. 60

Capitelli,F. 264 Hedenstiema, G. 165


Capozza, M. 93
Caselle.M. 209 Ianigro, M. 264
Castellani, G. 123 Insolera, G. M. 60
Chung-Chuan, Lo 28 Intrator,N. 123
Cibelli,G. 221 IvanovP. Ch. 28
Conte,E. 51
Cuocci, C. 264 Kappen, H. J. 3
Kreuz,T. 17
D'addabbo.A. 157
David, P. 17 Lattanzi, G. 251
deBlasi,R. 157 Lecchini, S. 107
de Carlo, F. 144 Lehnertz, K. 17
de Robertis, M. 196 Luciani, F. 80
de Tommaso, M. 144
diCunto, F. 209 Mannarelli, M. 196
Difruscolo, O. 144 Marangi, C. 196
282

Mariani, L. 80 Remondini, D. 123


Maritan, A. 251 Rieke, C. 17
Masssafra, R. 144 Ruggiero, L. 165
Micheletti, C. 234
Mormann, F. 17 Saccone, C. 196
Satalino, G. 157
Nitti,L. 196 Sciruicchio, V. 144
Stoop, R. 107
Pasquariello, G. 157 Stramaglia, S. 144
Pellegrino, M. 209 Stramaglia, S. 196
Pellicoro, M. 196
Perchiazzi, G. 165 Tommaseo M. 196
Perchiazzi, G. 60 Turchetti, G. 80
Pesole, G. 196
Provero, P. 209 Vena, A. 165
Vena, A. 60
ISBN 981-02-4843-1
World Scientific
www. worldscientific. com
4881 he 9 "789810"248437"

You might also like