Professional Documents
Culture Documents
Editors
Giuseppe Nardulli
Sebastiano Stramaglia
1 J
Mod^tm^omedicaJ
•t^-f
This page is intentionally left blank
pmedica
igna s
Bari, Italy 19-21 September 2001
Editors
Giuseppe Nardulli
Sebastiano Stramaglia
Center of Innovative Technologies for
Signal Detection and Processing
University of Bari, Italy
V | f e World Scientific
wb New Jersey • London • Sim
Singapore • Hong Kong
Published by
World Scientific Publishing Co. Pte. Ltd.
P O Box 128, Farrer Road, Singapore 912805
USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.
ISBN 981-02-4843-1
Preface
Giuseppe Nardulli
Sebastiano Stramaglia
University of Bari
This page is intentionally left blank
VII
CONTENTS
Preface v
T H E C L U S T E R VARIATION M E T H O D FOR A P P R O X I M A T E
R E A S O N I N G IN M E D I C A L DIAGNOSIS
H.J. KAPPEN
Laboratory of Biophysics, University of Nijmegen,
bertSmbfys. kun. nl
In this paper, we discuss the rule based and probabilistic approaches to computer
aided medical diagnosis. We conclude that the probabilistic approach is superior to
the rule based approach, but due to its intractability, it requires approximations for
large scale applications. Subsequently, we review the Cluster Variation Method and
derive a message passing scheme that is efficient for large directed and undirected
graphical models. When the method converges, it gives close to optimal results.
1 Introduction
Medical diagnosis is the a process, by which a doctor searches for the cause
(disease) that best explains the symptoms of a patient. The search process is
sequential, in the sense that patient symptoms suggest some initial tests to
be performed. Based on the outcome of these tests, a tentative hypothesis is
formulated about the possible cause(s). Based on this hypothesis, subsequent
tests are ordered to confirm or reject this hypothesis. The process may pro-
ceed in several iterations until the patient is finally diagnosed with sufficient
certainty and the cause of the symptoms is established.
A significant part of the diagnostic process is standardized in the form
of protocols. These are sets of rules that prescribe which tests to perform
and in which order, based on the patient symptoms and previous test results.
These rules form a decision tree, whose nodes are intermediate stages in the
diagnostic process and whose branches point to additional testing, depending
on the current test results. The protocols are defined in each country by a
committee of medical experts.
The use of computer programs to aid in the diagnostic process has been
a long term goal of research in artificial intelligence. Arguably, it is the most
typical application of artificial intelligence.
The different systems that have been developed so-far use a variety of
modeling approaches which can be roughly divided into two categories: rule-
based approaches with or without uncertainty and probabilistic methods. The
rule-based systems can be viewed as computer implementations of the pro-
tocols, as described above. They consist of a large data base of rules of the
form: A -) B, meaning that "if condition A is true, then perform action B"
4
or "if condition A is true, then condition B is also true". The rules may be
deterministic, in which case they are always true, or 'fuzzy' in which case they
are true to a (numerically specified) degree. Examples of such programs are
Meditel 1 , Quick Medical Reference (QMR) 2 , DXplain 3 , and Iliad 4 .
In Berner et al. 5 a detailed study was reported that assesses the perfor-
mance of these systems. A panel of medical experts collected 110 patient
cases, and concensus was reached on the correct diagnosis for each of these
patients. For each disease, there typically exists a highly specific test that
will unambiguously identify the disease. Therefore, based on such complete
data, diagnosis is easy. A more challenging task was defined by removing this
defining test from each of the patient cases. The patient cases were presented
to the above 4 systems. Each system generated its own ordered list of most
likely diseases. In only 10-20 % of the cases, the correct diagnosis appeared
on the top of these lists and in approximately 50 % of the cases the correct
diagnosis appeared in the top 20 list. Many diagnoses that appeared in the
top 20 list were considered irrelevant by the experts. It was concluded that
these systems are not suitable for use in clinical practice.
There are two reasons for the poor performance of the rule based systems.
One is that the rules that need to be implemented are very complex in the
sense that the precondition A above is a conjunction of many factors. If each
of these factors can be true or false, there is a combinatoric explosion of con-
ditions that need to be described. It is difficult, if not impossible, to correctly
describe all these conditions. The second reason is that evidence is often not
deterministic (true or false) but rather probabilistic (likely or unlikely). The
above systems provide no principled approach for the combination of such
uncertain sources of information.
A very different approach is to use probability theory. In this case, one
does not model the decision tree directly, but instead models the relations
between diseases and symptoms in one large probability model. As a (too)
simplified example, consider a medical domain with a number of diseases
d = ( d i , . . . ,d„) and a number of symptoms or findings / = (/i, • • • , / r o ) -
One estimates the probability of each of the diseases p(di) as well as the
probability of each of the findings given a disease, p(fj\di). If diseases are
independent, and if findings are conditionally independent given the disease,
the joint probability model is given by:
P(d,f)=P(d)p(f\d)=Upwiipifjidi) (i)
i j
mft) = (2)
-7W
where ft is the list of findings that has been measured up to diagnostic itera-
tion t. Computing this for different di gives the list of most probable diseases
given the current findings ft and provides the tentative diagnosis of the pa-
tient. Furthermore, one can compute which additional test is expected to be
most informative about any one of the diagnoses, say di, by computing
for each test j that has not been measured so-far. The test j that minimizes
Iij is the most informative test, since averaged over its possible outcomes, it
gives the distribution over di with the lowest entropy.
Thus, one sees that whereas the rule based systems model the diagnos-
tic process directly, the probabilistic approach models the relations between
diseases and findings. The diagnostic decisions (which test to measure next)
is then computed from this model. The advantage of this latter approach
is that the model is much more transparent about the medical knowledge,
which facilitates maintenance (changing probability tables, adding diseases or
findings), as well as evaluation by external experts.
One of the main drawbacks of the probabilistic approach is that it is
intractable for large systems. The computation of marginal probabilities re-
quires summation over all other variables. For instance, in Eq. 2
p(/») = !>,/,*>(<*»/)
and the sum over d, f contains exponentially many terms. Therefore, prob-
abilistic models for medical diagnosis have been restricted to very small
domains 6 ' 7 or when covering a large domain, at the expense of the level of
detail at which the disease areas are modeled 8 .
In order to make the probabilistic approach feasible for large applications
one therefore needs to make approximations. One can use Monte Carlo sam-
pling but one finds that accurate results require very many iterations. An
alternative is to use analytical approximations such as for instance mean field
theory 9 ' 10 . This approach works well for probability distributions that resem-
ble spin systems (so-called Boltzmann Machines) but, as we will see, they
perform poorly for directed probability distributions of the form Eq. 1.
6
H{x) = YJHl{xa)
aeP
1= ]T aa, v/?er/
aeU,aD0
H\ =Hi= BiXi
Whereas (H) can be written exactly in terms of pa, this is not the case
for the entropy term in Eq. 3. The approach is to decompose the entropy of
a cluster a in terms of 'connected entropies' in the following way: c
S
Sa = ~ ^ P a ( Z a ) \ogPa{xa) = ^ 0' (5)
Xa 0Ca
where /3 runs over all subsets of variables d. The cluster variation method
approximates the total entropy by restricting this sum to only clusters in
U and re-expressing Sp in terms of Sa, using the Moebius formula and the
definition Eq. 5.
s
~ ]C sl= Yl H aaSl= 12aaSa (7)
0<EU 0€U aZ>0 a£U
Since Sa is a function of pa (Eq. 5) we have expressed the entropy in terms
of cluster probabilities pa.
The quality of this approximation is illustrated in Fig. 2. Note, that
the both the Bethe and Kikuchi approximation strongly deteriorate around
J = 1, which is where the spin-glass phase starts. For J < 1, the Kikuchi
approximation is superior to the Bethe approximation. Note, however, that
this figure only illustrates the quality of the truncations in Eq. 7 assuming that
the exact marginals are known. It does not say anything about the accuracy
of the approximate marginals using the approximate free energy.
Substituting Eqs. 4 and 7 into the free energy Eq. 3 we obtain the ap-
proximate free energy of the Cluster Variation method. This free energy must
be minimized subject to normalization constraints 5Zx pa(xa) = 1 and con-
sistency constraints
Pa(xp) = p0(xp), /3eM,a£B,l3ca. (8)
Note, that we have excluded constraints between clusters in M. This is suf-
ficient because when /?,/?' £ M, f3 C (3' and ft' C a 6 B: pa{xp') — Pp'{x0')
c
T h i s decomposition is similar to writing a correlation in terms of means and covariance.
For instance when a = (i), S^ = SjL is the usual mean field entropy and S^jf = SjL +
ST.. + S,.., defines two node correction.
d
0) (y)
O n n variables this sum contains 2 n terms.
10
Figure 2. Exact and approximate entropies for the fully connected Boltzmann-Gibbs dis-
tribution on n = 10 variables with random couplings (SK model) as a function of mean
coupling strength. Couplings viij are chosen from a normal Gaussian distribution with mean
zero and standard deviation J'/'s/n. External fields di are chosen from a normal Gaussian
distribution with mean zero and standard deviation 0.1. The exact entropy is computed
from Eq. 6. The Bethe and Kikuchi approximations are computed using the approximate
entropy expression Eq. 7 with exact marginals and by choosing B as the set of all pairs and
all triplets, respectively.
and Pa(xp) = pp{xp) implies pp>{xp) = pp{xp). In the following, a and /? will
be from B and M respectively, unless otherwise stated e .
Adding Lagrange multipliers for the constraints we obtain the Cluster
Variation free energy:
a
Fcvm{{pa(xa)}, {Xa}, {Xapixp)}) = ^2 a ^Pafca) {Ha(xa) + \ogpa(xa))
~ H A° ( X ^ * ^ ) -1
) ~ X ^2^2XMx0)(Pa(xp) -P0(x0))
(9)
Since the Moebius numbers can have arbitrary sign, Eq. 9 consists of a sum of
convex and concave terms, and therefore is a non-convex optimization prob-
lem. One can separate F c v m in a convex and concave term and derive an
e
In fact, additional constraints can be removed, when clusters in M contain subclusters in
M. See Kappen and Wiegerinck 1 6 .
11
P0 (xp) = —
Z
exp -H0{xp)
a
V Xa0{x0) (11)
? \ ^p J
The remaining task is to solve for the Lagrange multipliers such that all
constraints (Eq. 8) are satisfied. There are two ways to do this. One is to
define an auxiliary cost function that is zero when all constraints are satisfied
and positive otherwise and minimize this cost function with respect to the
Lagrange multipliers. This method is discussed in Kappen and Wiegerinck 16 .
Alternatively, one can substitute Eqs. 10-11 into the constraint Eqs. 8
and obtain a system of coupled non-linear equations. In Yedidia et al. 12 a
message passing algorithm was proposed to find a solution to this problem.
Here, we will present an alternative method, that solves directly in terms of
the Lagrange multipliers.
Consider the constraints Eq. 8 for some fixed cluster /? and all clusters
a D (3 and define Bp = {a E B\a D /?}. We wish to solve for all constraints
a D /?, with a € B0 by adjusting Xap,oi € Bp. This is a sub-problem
with |2?/3||:r3| equations and an equal number of unknowns, where \Bp\ is
the number of elements of B0 and \x0\ is the number of values that x0 can
take. The probability distribution p0 (Eq. 11) depends only on these Lagrange
multipliers, up to normalization. pa (Eq. 10) depends also on other Lagrange
multipliers. However, we consider only its dependence on \ap,ct G Bp, and
consider all other Lagrange multipliers as fixed. Thus,
with
a/3 + \B0\
We update the probabilities with the new values of the Lagrange multipliers
using Eqs. 11 and 12. We repeat the above procedure for all /? G M until
convergence.
4 Numerical results
— • o ,••-
x MF
:
O TAP .•• o
+ CVM
r x
x
0.1
s®
t-m^ •
0.1 0.2 0 .3 0.4
Exact marglinals
Figure 3. a) The Chest Clinic model describes the relations between diagnoses, findings and
prior conditions for a small medical domain. An arrow a —> b indicates that the probability
of b depends on the values of a. b) Inference of single node marginals using MF, TAP and
LMI method, comparing the results with exact.
5 Conclusion
Figure 4. BayesBuilder graphical software environment, showing part of the Anemia net-
work. The network consists of 91 variables and models some specific Anemias.
Table 1. Comparison of CVM method for large directed graphical models. Each
node is connected to k = 5 parents. \C\ is the tree width of the triangulated graph
required for the exact computation. Iter is the number of conjugate gradient descent
iterations of the CVM method. Potential error and margin error are the maximum
absolute error in any of the cluster probabilities and single variable marginals com-
puted with CVM, respectively. Constraint error is the maximum absolute error in
any of the constraints Eq. 8 after termination of CVM.
relevance and we have given several reasons t h a t could account for this failure.
T h e alternative approach uses a probabilistic model to describe the rela-
tions between diagnoses and findings. This approach has the great advantage
t h a t it provides a principled approach for the combination of different sources
15
Acknowledgments
References
ANALYSIS OF EEG I N E P I L E P S Y
'E-mail: Klaus.Lehnertz@ukb.uni-bonn.de
1 Introduction
2 EEG analysis
states of brain function and dysfunction, provided that limitations of the re-
spective analysis techniques are taken into consideration and, thus, results are
interpreted with care (e.g., only relative measures with respect to recording
time and recording site are assumed reliable).
In the following, we will concentrate on nonlinear EEG analysis tech-
niques and illustrate potential applications of these techniques in the field of
epileptology.
neurons within the focal area as well as with neurons surrounding this area.
Indeed, there is converging evidence from different laboratories that nonlinear
analysis is capable to characterize this collective behavior of neurons from the
gross electrical activity and hence allows to define a critical transition state,
at least in a high percentage of cases 63,64,65,67,34,68,69,70,55,71,28
4 Future perspectives
Results obtained so far are promising and emphasize the high value of nonlin-
ear EEG analysis techniques both for clinical practice and basic science. Up
to now, however, findings have been mainly obtained from retrospective stud-
ies in well-elaborated cases and using invasive recording techniques. Thus,
on the one hand, evaluation of more complicated cases as well as prospective
studies on a larger population of patients are necessary.
The possibility of defining a critical transition state can be regarded as the
most prominent contribution of nonlinear EEG analysis to advance knowledge
about seizure generation in humans. This possibility has recently been ex-
panded by studies indicating accessibility of critical pre-seizure changes from
non-invasive EEG recordings 65 ' 71 . Nonetheless, in order to achieve an un-
equivocal definition of a pre-seizure state from either invasive or non-invasive
recordings, a variety of influencing factors have to be evaluated. Most studies
carried out so far have concentrated on EEG recordings just prior to seizures.
Other studies 33 ' 48 ' 66 ' 55,47,28 , however, have shown that there are phases of dy-
namical changes even during the seizure-free interval pointing to abnormalities
that are not followed by a seizure. Moreover, pathologically or physiologically
induced dynamical interactions within the brain are not yet fully understood.
Among others, these include different sleep stages, different cognitive states,
as well as daily activities that clearly vary from patient to patient. In order
to evaluate specificity of possible seizure anticipation techniques, analyses of
long-lasting multi-channel EEG recordings covering different pathological and
physiological states are therefore mandatory 67,34,28 .
Along with these studies, EEG analysis techniques have to be further
improved. New techniques are needed that allow a better characterization
of non-stationarity and high-dimensionality in brain dynamics, techniques
disentangling even subtle dynamical interactions between pathological dis-
turbances and surrounding brain tissue as well as refined artifact detection
and elimination. Since the methods currently available allow a distinguished
characterization of the epileptogenic process, the combined use of these tech-
niques along with appropriate classification schemes 72 ' 73 ' 74 can be regarded a
promising venture.
24
Acknowledgments
References
S T O C H A S T I C A P P R O A C H E S TO MODELING OF
PHYSIOLOGICAL R H Y T H M S
P L A M E N CH. IVANOV
Center for Polymer Studies and Department of Physics,
Boston University, Boston, MA 02215
Cardiovascular Division, Beth Israel Deaconess Medical Center,
Harvard Medical School, Boston, MA 02215, USA
E-mail: plamen@argento.bu.edu
CHUNG-CHUAN LO
Center for Polymer Studies and Department of Physics,
Boston University, Boston, MA 02215, USA
E-mail: cclo@argento.bu.edu
1.1 Introduction
successfully accounts for key characteristics of the cardiac variability not fully
explained by traditional models: (i) 1 / / power spectrum, (ii) stable scaling
form for the distribution of the variations in the beat-to-beat intervals and
(hi) Fourier phase correlations Ii>i2,i3,i4,i5,i6,i7_ Furthermore, the reported
scaling properties arise over a broad zone of parameter values rather than at
a sharply-defined "critical" point.
n lnf A
of the crossover to longer time scales (lower frequencies) when stronger noise
is present. For weak noise the walker never leaves the close vicinity of the
attraction level, while for stronger noise, larger drifts can occur leading to
31
longer trends and longer time scales. However, in both cases, P(A) follows
the Rayleigh distribution because the wavelet transform filters out the drifts
and trends in the random walk (Fig. lb). For intermediate values of the noise
there is a deviation from the Rayleigh distribution and the appearance of an
exponential tail.
We find that Eqs. (2 & 3) do not reproduce the statistical properties of
the empirical data (Fig. lb). We therefore generalize them to include several
inputs Ik {k = 0,1, • • • , m), with different preferred levels Tk, which compete
in biasing the walker:
m
r(n + l)-T(n) = X)/*(«). (4)
ife=0
where
lk[ {b
>~ \-wk (1+v), iir(n)>rk. >
From a biological or physiological point of view, it is clear t h a t t h e pre-
ferred levels Tk of t h e inputs Ik cannot remain constant in time, for otherwise
the system would not be able t o respond t o varying external stimuli. We
assume t h a t each preferred interval T>. is a random function of time, with
values correlated over a time scale T]* ck . We next coarse grain t h e system
and choose T^(TI) t o be a random step-like function constrained t o have val-
ues within a certain interval and with t h e length of t h e steps drawn from a
distribution with an average value Xlock (Fig. l c ) . This model yields several
interesting features, including 1 / / power spectrum, scaling of the distribution
of variations, and correlations in t h e Fourier phases.
beat beat
10 r • , 10 3 .
10"" 10"3 ,10"' 10"1 10° 10s 10"4 10"3 ,10"2 10~1
/[beat] /[beat 1 ]
Figure 2. Stochastic feedback regulation of the cardiac rhythm. We compare the predictions
of the model with the healthy heart rate. Sequences of interbeat intervals r from (a) healthy
individual and (b) from simulation exhibit an apparent visual similarity, (c) Power spectra
of the interbeat intervals r(n) from the data and the model. To first approximation, these
power spectra can be described by the relation S(f) ~ l / / 1 1 . The presence of patches in
both heart and model signals lead to observable crossovers embedded on this 1 / / behavior
at different time scales. We calculated the local exponent B from the power spectrum of
24h records ( « 10 5 beats) for 20 healthy subjects and found that the local value of P
shows a persistent drift, so no true scaling exists. (This is not surprising, due to the non-
stationarity of the signals), (d) Power spectra of the increments in r(n). The model and
the data both scale as power laws with exponents close to one. Since the non-stationarity
is reduced, crossovers are no longer present. We also calculated the local exponent for the
power spectrum of the increments for the same group of 20 healthy subjects as in the top
curve, and found that the exponent Bj fluctuates around an average value close to one, so
true scaling does exist.
(ii) The PS fibers conduct impulses that slow the heart rate. Suppression
of SS stimuli, while under PS regulation, can result in the increase of the
interbeat interval to as much as 1.5s 20>21. The activity of the PS system
changes with external stimuli. We model these features of the PS input, Ips,
by the following conditions: (1) a preferred interval, Tps(n), randomly chosen
from an uniform distribution with an average value larger than TSA , and (2) a
correlation time, Tps, during which Tps does not change, where Tps is drawn
33
where the structure of each input is identical to the one in Eq. (5). Equa-
tion (6) cannot fully reflect the complexity of the human cardiac system. How-
ever, it provides a general framework that can easily be extended to include
other physiological systems (such as breathing, baroreflex control, different
locking times for the inputs of the SS and PS systems 5 ' 2 2 , etc.). We find
that Eq. (6) captures the essential ingredients responsible for a number of
important statistical and scaling properties of the healthy heart rate.
Next we generate a realization of the model with parameters N = 7 and
WSA = wss — W P S / 3 = O.Olsec (Fig. 2b). We choose Tj randomly from
an exponential distribution with average T)OCk = 1000 beats. (We find that a
different form of the distribution for Tj does not change the results.) The noise
r\ is drawn from a symmetrical exponential distribution with zero average and
standard deviation a = 0.5. We define the preferred values of the interbeat
intervals for the different inputs according to the following rules: (1) TSA =
0.6sec, (2) Tps are randomly selected from an uniform distribution in the
interval [0.9,1.5]sec, and (3) the r | s ' s are randomly selected from an uniform
distribution in the interval [0.2,1.0]sec. The actual value of the preferred
interbeat intervals of the different inputs and the ratio between their weights
are physiologically justified and are of no significance for the dynamics —
they just set the range for the fluctuations of r , chosen to correspond to the
empirical data.
34
/[beat"]
Tc TB1 TAl
lnS(f)
D
\C
~ \/ \/ \
\ // ^\ -
A
TB Ax B
2lock
\ A
TC
In/
Figure 3. (top) Effect of the correlation time T[oc^ on the scaling of the power spectrum of
T(TI) for a signal comprising 10 6 beats, (b) Schematic diagram illustrating the origin of the
different scaling regimes in the power spectrum of r ( n ) .
regimes in the power spectrum of r(n) (Figs. 2c and 3). We find that with
increasing Tiock, the power spectrum does not follow a single power law but
actually crosses over from a behavior of the type l / / 2 at very small time
scales (or high frequencies), to a behavior of the type 1//° for intermediate
time scales, followed by a new regime with l / / 2 for larger time scales (Fig. 3).
At very large time scales, another regime appears with flat power spectrum.
In the language of random walkers, T is determined by the competition of dif-
ferent neuroautonomic inputs. For very short time scales, the noise will domi-
nate, leading to a simple random walk behavior and l / / 2 scaling (regime A in
Fig. 3(bottom). For time scales longer than T^, the deterministic attraction
towards the "average preferred level" of all inputs will dominate, leading to a
flat power spectrum (regime B in Fig. 3(bottom), see also Fig. lb). However,
after a time TB (of the order of T Iock /iV), the preferred level of one of the
inputs will have changed, leading to the random drift of the average preferred
level and the consequent drift of the walker towards it. So, at these time
scales, the system can again be described as a simple random walker and we
expect a power spectrum of the type l / / 2 (regime C in Fig. 3(bottom)). Fi-
nally, for time scales larger than Tc, the walker will start to feel the presence
of the bounds on the fluctuations of the preferred levels of the inputs. Thus,
the power spectrum will again become fiat (regime D). Since the crossovers
are not sharp in the data or in the numerical simulations, they can easily be
misinterpreted as a single power law scaling with an exponent /3 « 1. By re-
ducing the strength of the noise, we decrease the size of regime A and extend
regime B into higher frequencies. In the limit a — 0, the power spectrum of
r(n), which would coincide with the power spectrum of the "average preferred
level", would have only regimes B, C and D. The stochastic feedback mecha-
nism thus enables us to explain the formation of regions (patches) in the time
series with different characteristics.
(b) By studying the power spectrum of the increments we are able to
circumvent the effects of the non-stationarity. Our results show that true
scaling behavior is indeed observed for the power spectrum of the increments,
both for the data and for the model (Fig. 2).
(c) We calculate the probability density P(A) of the amplitudes A of the
variations of interbeat intervals through the wavelet transform. It has been
shown that the analysis of sequences of interbeat intervals with the wavelet
transform 25 can reveal important scaling properties 26 for the distributions of
the variations in complex nonstationary signals. In agreement with the results
of Ref. 2 7 , we find that the distribution P(A) of the amplitudes A of inter-
beat interval variations for the model decays exponentially—as is observed
for healthy heart dynamics (Fig. 4). We hypothesize that this decay arises
36
10°
:
0"'
;
a. oe i Data
• Model
AP
Figure 4. Analysis of the amplitudes A of variations in r{n). We apply to the signal gener-
ated by the model the wavelet transform with fixed scale a, then use the Hilbert transform
to calculate the amplitude A. The top left panel shows the normalized histogram P(A) for
the data (6h daytime) and for the model (with the same parameter values as in Fig. 2),
and for wavelet scale a — 8 beats, i.e., ft) 40s. (Derivatives of the Gaussian are used as
a wevelet function). We test the generated signal for nonlinearity and Fourier phase cor-
relations, creating a surrogate signal by randomizing the Fourier phases of the generated
signal but preserving the power spectrum (thus, leaving the results of Fig. 2 unchanged).
The histogram of the amplitudes of variations for the surrogate signal follows the Rayleigh
distribution, as expected theoretically (see inset). Thus the observed distribution which is
universal for healthy cardiac dynamics, and reproduced by the model, reflects the Fourier
phase interactions. The top right panel shows a similar plot for d a t a collected during sleep
and for the model with N < wps/wss- We note that the distribution is broader for the
amplitudes of heartbeat interval variations during sleep compared wake activity indicating
counterintuitively a higher probability for large variations with large values deviating from
the exponential tail 2 8 . Our model reproduces this behavior when the number of sym-
pathetic imputs is reduced in accordance with the physiological observations of decreased
sympathetic tone during sleep 2 0 . The bottom panel tests the stability of the analysis for
the model at different time scales a. The distribution is stable over a wide range of time
scales, identical to the range observed for heart data 2 7 . T h e stability of the distributions
indicates statistical self-similarity in the variations at different time scales.
37
1.5 Conclusions
Scaling behavior for physical systems is generally obtained for fixed values of
the parameters, corresponding to a critical point or phase transition 3 5 . Such
fixed values seem unlikely in biological systems exhibiting power law scaling.
Moreover, such critical point behavior would imply perfect identity among
individuals; our results are more consistent with the robust nature of healthy
systems which appear to be able to maintain their complex dynamics over
a wide range of parameter values, accounting for the adaptability of healthy
38
systems.
The model we review here, and the data which it fits, support a revised
view of homeostasis that takes into account the fact that healthy systems
under basal conditions, while being continuously driven away from extreme
values, do not settle down to a constant output. Rather, a more realistic
picture may involve nonlinear stochastic feedback mechanisms driving the
system.
2.1 Introduction
In this Section we investigate the dynamics of the awakening during the night
for healthy subjects and find that the wake and the sleep periods exhibit
completely different behavior: the durations of wake periods are characterized
by a scale-free power-law distribution, while the durations of sleep periods
have an exponential distribution with a characteristic time scale. We find
that the characteristic time scale of sleep periods changes throughout the
night. In contrast, there is no measurable variation in the power-law behavior
for the durations of wake periods. We develop a stochastic model, based
on biased random walk approach, which agrees with the data and suggests
that the difference in the dynamics of sleep and wake states arises from the
constraints on the number of microstates in the sleep-wake system.
In clinical sleep centers, the "total sleep time" and the "total wake time"
during the night are used to evaluate sleep efficacy and to diagnose sleep dis-
orders. However, the total wake time during a longer period of nocturnal sleep
is actually comprised of many short wake intervals (Fig. 5). This fact suggests
that the "total wake time" during sleep is not sufficient to characterize the
complex sleep-wake transitions and that it is important to ask how periods
of the wake state distribute during the course of the night. Although recent
studies have focused on sleep control at the neuronal level 36>37,38,39^ v e r y little
is known about the dynamical mechanisms responsible for the time structure
or even the statistics of the abrupt sleep-wake transitions during the night.
Furthermore, different scaling behavior between sleep and wake activity and
between different sleep stages has been observed 40>41. Hence, investigating
the statistical properties of the wake and sleep states throughout the night
may provide not only a more informative measure but also insight into the
mechanisms of the sleep-wake transition.
39
Sleep
durations, defined as
p(r)dr, (7)
"For the stretched exponential y = aexp(—bxc), where a, b and c are constants, the
log(| log2/|) versus logx plot is not a straight line unless a = 1. Since we don't know what
the corresponding value of a is in our data, we can not rescale y so that a — I. The solution
is to shift x for a certain value to make y = 1 when x — 0, in which case a = 1. In our data,
P(t) = 1 when t = 0.5, so we shift t by —0.5 before plotting log(| logP(t)|) versus logt.
b
According Eq. 7, if P(t) is a power-law function, so is p(t). We also separately check
the functional form of p(t) for the data with same procedure and find that the power law
provides the best description of the data.
41
Figure 6. Cumulative probability distribution P(t) of sleep and wake durations of indi-
vidual and pooled data. Double-logarithmic plot of P(t) of wake durations (a) and semi-
logarithmic plot of P(t) of sleep durations (b) for pooled data and for data from one typical
subject. P(t) for three typical subjects is shown in the insets. Note that due to limited
number of sleep-wake periods for each subject, it is difficult to determine the functional
form for individual subjects. We perform K-S test and compare the probability density
p(t) for all individual data sets and pooled data for both wake and sleep periods. For both
sleep and wake, less than 10% of the individual d a t a fall below the 0.05 significant level of
disproof of the null hypothesis, that p(t) for each individual subject is very likely drawn
from the same distribution. The K-S statistics significantly improves if we use recordings
only from the second night. Therefore, pooling all data improves the statistics by preserving
the form of p(t).
We calculate the time constants r for the 20 subjects, and find an average
r = 20 min with a = 5. Using the Kolmogorov-Smirnov test, we find that we
cannot reject the null hypothesis that p(t) of the sleep state of each subject of
our 39 data sets is drawn from the same distribution (Fig. 6b). We further
find that P(t) of the sleep state for the pooled data is consistent with an
exponential distribution with a characteristic time r = 22 ± 1 min (Fig. 7b).
In order to verify that P(t) of sleep state is better described by an expo-
nential functional form rather than by a stretched exponential functional form,
we fit these curves to the P(t) from pooled data. Using Levenberg-Marquardt
method, we find that the stretched exponential form lead to worse fit. The
X2 error of exponential fit and stretched exponential fit are 8 x 10~5 and
2.7 x 10~ 2 , respectively. We also check the results by plotting log(| logP(t)\)
versus logt (1) and find that the data are clearly more curved than when we
plot logP(i) versus logt, indicating that an exponential form provides the
best description of the data.
Sleep is not a "homogeneous process" throughout the course of the night
42
Figure 7. Cumulative distribution of durations P(t) of sleep and wake states from data, (a)
Double-logarithmic plot of P(t) from the pooled data. For the wake state, the distribution
closely follows a straight line with a slope a — 1.3 ± 0.1, indicating power-law behavior of
the form, cf. Eq. (8). (b) Semi-logarithmic plot of P(t). For the sleep state, the distribu-
tion follows a straight line with a slope 1 / T where T = 22 ± 1, indicating an exponential
behavior of the form, cf. Eq. (9). It has been reported that the individual sleep stages have
exponential distributions of durations 53 > 54 > 55 . Hence we expect an exponential distribution
of durations for the sleep state.
42 43
' , so we ask if there is any change of a and r during the night. We study
sleep and wake durations for the first two hours, middle two hours, and the
last two hours of nocturnal sleep using the pooled data from all 39 records
(Fig. 8). Our results suggest that a does not change for these three portions
of the night, while r decreases from 27 ± 1 min in the first two hours to 22 ± 1
min in the middle two hours, and then to 18 ± 1 min in the last two hours.
The decrease in r implies that the number of wake periods increases as the
night proceeds, and we indeed find that the average number of wake periods
for the last two hours is 1.4 times larger than for the first two hours.
2.3 Model
10
>, \g, o First 2 hr >,
Cumulativ e probabilit
Cumulativ e probabili
• Middle 2 hr
£\o , A Last 2 hr
o
o
a=1.3
-
o
O A».
o
a. Wak e ° .X^
10" 3 , ,
1 10 100 50
time (min) time (min)
Figure 8. P(t) of sleep and wake states in the first two hours, middle two hours and last
two hours of sleep, (a) P(t) of wake states; the power-law exponent a does not change in a
measurable way. (b) P(t) of sleep states; the characteristic time r decreases in the course
of the night.
model these observations by assuming that the random walker moves in a log-
arithmic potential V(x) = b\uxJ yielding a force f(x) = -dV(x)/dx = —b/x,
where the bias b quantifies the strength of the force.
Assumptions 1-3 can be written compactly as:
c /,x ,. -.N /.x f e W> if-A<z(f)<0 (sleep), .,„.
6x(t)=x(t + l)-x{t) = {}l>+<t)t x^^J1- l P
( w a k 4; (10)
where e(t) is an uncorrelated Gaussian-distributed random variable with zero
mean and unit standard deviation. In our model, the bias b and the threshold
A may change during the course of the night due to physiological variations
such as circadian cycle 44 - 46 .
In our model, the distribution of durations of the wake state is identical to
the distribution of return times of a random walk in a logarithmic potential.
For large times, this distribution is of a power law form 48.49>50>51, Hence, for
large times, the cumulative distribution of return times is also a power law,
Eq. (8), and the exponent is predicted to be
a=\+b. (11)
From Eq. (11) it follows that the cumulative distribution of return times for a
random walk without bias (b = 0) decreases as a power law with an exponent
a = 1/2. Note that introducing a restoring force of the form f(x) = —b/x1
with 7 ^ 1 , yields stretched exponential distributions 51 , so 7 = 1 is the only
case yielding a power-law distribution.
Similarly, the distribution of durations of the sleep state is identical to
the distribution of return times of a random walk in a space with a reflecting
boundary. Hence P(t) has an exponential distribution, Eq. (9), in the large
time region, with the characteristic time r predicted to be
r ~ A2 . (12)
Equations (11) and (12) indicate that the values of a and r in the data can
be reproduced in our model by "tuning" the threshold A and the bias b
(Fig. 10). The decrease of the characteristic duration of the sleep state as the
night proceeds is consistent with the possibility that A decreases (Fig. 9. Our
calculations suggest that A decreases from 7.9 ± 0.2 in the first hours of sleep,
to 6.6 ± 0.2 in the middle hours, and then to 5.5 ± 0.2 for the final hours of
sleep. Accordingly, the number of wake periods of the model increases by a
factor of 1.3 from the first two hours to the last two hours, consistent with
the data. However, the apparent consistency of the power-law exponent for
the wake state suggests that the bias b may remain approximately constant
during the night. Our best estimate is b = 0.8 ± 0.1.
45
C.
Data
Wake
Sleep 1 II 11 1II 1 If
Wake Moclei
Sleep
I
11
. I . I .
1
100 200 300 400
Time (min)
Figure 9. Schematic representation of the dynamics of the model. T h e model can be viewed
as a random walk in a potential well illustrated in (a), where he bottom flat region between
—A < x < 0 corresponds to the area without field, and the region x > 0 corresponds to
the area with logrithmic potential, (b) The state x(t) of the sleep-wake system evolves as
a random walk with the convention that x > 0 corresponds to wake state and —A < x < 0
corresponds to the sleep state, where A gradually changes with time to account for the
decrease of the characteristic duration of the sleep state with progression of the night. In
the wake state there is a "restoring force," f(x) = —b/x, "pulling" the system towards the
sleep state. The lower panel in (b) illustrates sleep-wake transitions from the model, (c)
Comparison of typical data and of a typical output of the model. The visual similarity
between the two records is confirmed by quantitative analysis (Fig. 10).
Figure 10. Comparison of P(t) for data and model (two runs with same parameters), (a)
P(t) of the wake state, (b) P(t) of the sleep state. Note that the choice of A depends on the
choice of the time unit of the step in the model. We choose the time unit to be 30 seconds,
which corresponds to the time resolution of the data. To avoid big jumps in x(t) due to the
singularity of the force when x(t) approaches x = 0, we introduce a small constant A in the
definition of the restoring force f(x) — —b/(x + A). We find that the value of A does not
change a or T.
2.4 Conclusions
Our findings of a power-law distribution for wake periods and an exponen-
tial distribution for sleep periods are intriguing because the same sleep-control
mechanisms give rise to two completely different types of dynamics—one with-
out a characteristic scale and the other with. Our model suggests that the
difference in the dynamics of the sleep and wake states (e.g. power law versus
exponential) arises from the distinct number of microstates that can be ex-
plored by the sleep-wake system for these two states. During the sleep state,
the system is confined in the region —A < x < 0. The parameter A imposes
a scale which causes an exponential distribution of durations. In contrast,
for the wake state the system can explore the entire half-plane x > 0. The
lack of constraints leads to a scale-free power-law distribution of durations. In
addition, the l/x restoring force in the wake state does not change the func-
tional form of the distribution, but its magnitude determines the power-law
exponent of the distribution (see Eq. (11)).
Although in our model the sleep-wake system can explore the entire half-
plane x > 0 during wake periods, the "real" biological system is unlikely to
generate very large value (i.e., extreme long wake durations). There must be
47
3 Summary
Acknowledgments
for Research Resources (P41 RR13622), NSF, NASA, and The G. Harold and
Leila Y. Mathers Charitable Foundation.
References
E. CONTE, A. F E D E R I C I 0
Department of Pharmacology and Human Physiology-University ofBari, P.zza G. Cesare,
70100 Bari-Italy. '^Center ofInnovative Technologies for Signal Detection and Processing,
Bari, Italy.
E-mail: fisio2@fisiol. uniba. it
Correlation Dimension, Lyapunov exponents, Kolmogorov entropy were calculated by analysis of ECG,
respiratory movements and arterial pressure of normal subjects in spontaneous and forced conditions of
respiration. We considered the cardiovascular system arranged by a model of five oscillators having
variable coupling strengths, and we found that such system exhibits chaotic activity as well as its
components. In particular, we obtained that respiration resolves itself in a non linear input into heart
dynamics, thus explaining that it is a source of chaotic non linearity in heart rate variability.
1. Introduction
2. Methods
We measured signals of ECG, respiratory movements, arterial pressure in six
normal nonsmoking subjects in normal (NR) and forced (FR) conditions of
respiration, respectively. The condition FR was obtained by asking the subjects to
perform inspiratory acts with a 5 s periodicity, at a given signal. The signal for
expiration was given 2 s after every inspiration. The measured ECG signals were
53
sampled at 500 Hz for 300 s. Signals vs time tracings for respiration, ECG, Doppler,
and R-R intervals are given in Fig.l for the subject #13-07. Peak to peak values
were considered for time series. Programs for noise reduction were utilized in order
to use noise reduced time series data only. In order to follow variability in time of
the collected data, the obtained time series were re-sampled in five intervals (sub-
series), each interval containing 30.000 points. All the data were currently analyzed
by the methods of non linear prediction and of surrogate data.
Correlation dimension, Lyapunov spectrum, Kolmogorov entropy were estimated
after determination of time delay T by auto-correlation and mutual information.
Embedding dimension in phase space was established by the method of False
Nearest Neighbors (FNN) (for chaotic analysis, see, as example, ref. 3, 8).
3. The Results
The main results of chaotic analysis are reported in Table 1. For cardiac oscillations
time delays resulted ranging from 14 to 60 msec both in the two cases of subjects in
NR and FR. Embedding dimension in phase space resulted to be d = 4, thus
establishing that we need four degrees of freedom to correctly describe heart
dynamics. Correlation dimension, D2, established with saturation in a D 2 -d plot,
resulted to be a very stable value during the selected intervals of experimentation. It
assumed values ranging from 3,609 ± 0,257 to 3,714 + 0,246 in the case of five
intervals for normal subjects in NR, and D2-values ranging from 3,735 ± 0,228 to
3,761 ± 0,232 in the case of subjects in FR. On the basis of such results, we
concluded that normal cardiac oscillations as well as cardiac oscillations of subjects
under FR follow deterministic dynamics of chaotic nature. Soon after we estimated
Lyapunov exponents: ^-i and X2 resulted to be positive; X3 and X4 assumed negative
value; the sum of all the calculated exponents resulted to be negative as required for
dissipative systems. We concluded that cardiac oscillations of normal subjects under
NR and FR, represent an hyper-chaotic dynamics. X] and X2 positive exponents in
Table 1 represent the rates of divergence of the attractor in the directions of
maximum expansion. These are the direction in which the cardiac oscillating system
realizes chaoticity. ^,3 and ^.4 negative values in Table 1 represent the rates of
convergence of the attractor in the contracting directions. The emerging picture is
that cardiac oscillations, as measured by ECG in normal subjects, are representative
of a large ability of heart to continuously cope with rapid changes corresponding to
the high values of its chaoticity. Looking in Table 1 at the calculated values of X{
and X2 along the five different time intervals that we have analysed, we deduce that
such values remained substantially stable in different intervals. Thus, we may
conclude that, due to the constant action of the oscillators defined in our model, in
54
subjects, and the analysis was executed following the same previous methodological
criteria. Time delays resulted to vary from 4 to 76 msec, embedding dimension in
phase space resulted to be d = 3. As previously said, such dimension reflects the
number of degrees of freedom necessary for description of respiratory system. We
deduced from d = 3 that it is necessary to consider the action of three possible
oscillators determining the behaviour of such system. Mean value of correlation
dimension, D2, resulted to be 2,740 ± 0,390 in the case of NR in the first interval of
investigation. A rather stable mean value, ranging from D 2 = 2,579 + 0,340 to D2 =
2,665 ± 0,346, was also obtained in the case of the four remaining intervals. We
concluded that respiratory system of the examined normal subjects exhibits chaotic
determinism. As expected, during FR, we obtained a reduction of the mean values
of correlation dimension respect to NR. The mean value resulted to be D2 = 2,414 ±
0,417 in the first interval and varied between D2 = 2,339 ± 0,314 and D2 = 2,389 ±
0,383 in the remaining four intervals whit a decreasing percent about 10-12 %
respect to NR. Then, we had reduction of chaotic dynamics of respiration during FR
respect to NR physiological condition. A clear discrimination of such two
conditions was also obtained by calculation of the dominant Lyapunov exponent.
We obtained a mean value of such parameter, XD, XD = 0,028 ± 0,023 in the case of
NR and A.D = 0,009 ± 0,004 in the case of FR in the first interval of experimentation
whit a percent decreasing in the case of FR about 68%. Evident discrimination was
also obtained in the other four intervals, in the second interval it resulted A,D = 0,029
± 0,020 for NR and XD = 0,012 ± 0,004 for FR (decreasing percent about 59%), in
the third interval it resulted ^ D = 0,030 ± 0,022 for NR against XD = 0,008 ± 0,003
for FR (decreasing percent about 73%), in the fourth interval we had ^,D = 0,026 ±
0,022 for NR against XD = 0,009 ± 0,004 for FR (decreasing percent about 65%),
and in the fifth interval XD = 0,022 + 0,020 for NR and A.D = 0,011 ± 0,008 for FR
(decreasing percent about 50%).
In conclusion, we had a great stability of the dominant Lyapunov exponents
calculated along the intervals of experimentation and both in the two cases of NR
and FR, while instead we determined a decreasing percent of values in the case of
FR respect to NR. These results indicated that respiratory systems exhibit chaotic
dynamics. Without any doubt, chaoticity resulted strongly reduced during FR
respect to NR. This results clearly supports our thesis, based on the model of five
oscillators: during forced respiration, we have a reduction of chaoticity of
respiratory system respect to spontaneous respiration, to such reduction it
corresponds instead an increasing of chaoticity of cardiac oscillations in
consequence of a greater non linear input from the respiratory system to heart
dynamics. In other terms, a stronger coupling between the two oscillators is realized
56
s.d. 0,093 0,074 0,062 0,065 0,023 0,025 0,130 0,135 0,154 0,139
Fig.l
Li 1 !
i « V-
Acknowledgements
Authors wish to thank Ms Anna Maria Papagni for her technical assistance.
References
1. Akselrod S, Gordon D, Ubel FA, Shannon DS, Borger AC, Choen RJ. Science
1981;213:220-225
2. Babloyantz A, Destexhe A. Is the normal heart a periodic oscillator. Biological
Cybernetics 1988; 58: 203-211
3. Badii R, Politi A. Dimensions and Entropies in Chaotic Systems. Springer
Berlin, 1986
59
G. PERCHIAZZI
Department of Clinical Physiology,
Uppsala University Hospital, S-75185 Uppsala, Sweden
Evaluation of breath sounds is a basic step of patient physical examination. The auscultation
of the respiratory system gives direct information about the structure and function of lung
tissue that is not to be achieved with any other simple and non-invasive method. Recently, the
application of computer technology and new mathematical techniques has supplied alternative
methodologies to respiratory sounds analysis. We present a new computerized approach to
analyzing respiratory sounds.
1 Introduction
Acoustic respiratory signals have been the subject of considerable research over
the last years, however their origin is still not completely known. It is now generally
accepted that, during respiration, turbolent motion of a compressive fluid in larger
airways with rugous walls (trachea ad bronchi) generates acoustic energy [5]. This
energy is transmitted through the airways and lung parenchima to the chest wall that
represent a non-stationary system. [1,4]. Pulmonary diseases induce anatomical and
functional alterations in the respiratory system; changes in the quality of lung
sounds (loudness, length and frequency) are often directly correlated to pathological
changes in the lung.
The traditional method of auscultation is based on a stethoscope and human
auditory system; however, due to the poor response of the human auditory system to
lung sounds (low frequency and low sign-to-noise ratio) and the subjective
character of the technique, it is common to find different clinical description of the
respiratory sounds.
Lung-sounds nomenclature has long been unclear: until last decades were used
the names derived from originals given by Laennec [10] and translated into english
by Forbes [2]. In 1985, the International Lung Sounds Association (I.L.S.A.) has
composed an international standard classification of lung sounds that include fine
and coarse crackles, wheezes and rhonchi: each of these terms can be described
acoustically [13].
61
2 Respiratory Sounds
Lung sounds in general are classified into three major categories: "normal"
(vesicular, bronchial and bronchovesicular breath sounds), "abnormal" and
"adventitious" lung sounds.
Vesicular breath sounds consist of a quiet and soft inspiratory phase followed
by a short, almost silent expiratory phase. They are low pitched and normally heard
over most lung fields of a healthy subject. These sounds are not generated by gas
flow moving through the alveoli (vesicles) but are the result of attenuation of breath
sound produced in the larger bronchi. Bronchial breath sounds are normally heard
over the trachea and reflect turbulent airflow in the main-stem bronchi. They are
loud, high-pitched and the expiratory phase is generally longer than the inspiratory
phase with a tipical pause between the phases. Bronchial sounds heard over the
thorax suggest lung consolidation and pulmonary disease. Bronchovesicular breath
sounds are normally heard on both sides of the sternum in the first and second
intercostal spaces. They should be quieter than the bronchial breath sounds and
increased intensity of these sounds is often associated with increased ventilation.
Abnormal lung sounds include the decrease/absence of normal lung sounds or
their presence in areas where they are normally not heard (bronchial breath sounds
in peripheral areas where only vesicular sounds should be heard). This is
characteristic of parenchima consolidation (pneumonia) that transmit sound from
the lung bronchi much more efficiently than through the air-filled alveoli of the
normal lung.
The term "adventitious" (adventitious lung sounds) refers to extra or additional
sounds that are heard over normal lung sounds and their presence always indicates a
pulmonary disease. These sounds are classified into discontinuous (crackles) or
continuous (wheezes) adventitious sounds. Crackles are discontinuous, intermittent
and nonmusical noises that may be classified as "fine" (high pitched, low amplitude,
and short in duration) and "coarse" (low pitched, higher in amplitude, and long in
duration). Crackles are generated by fluid in the small airways or by sudden
opening of closed airways. Their presence is often associated with inflammation or
infection of the small bronchi, bronchioles, and alveoli, with pulmonary fibrosis,
with heart failure and many other cardiorespiratory disorders. Wheezes are
continuous (since their duration is much longer than that of crackles), lower-pitched
and musical breath sounds, which are superimposed on the normal lung sounds.
They originate by air moving through small airways narrowed by constriction or
62
swelling of airway or partial airway obstruction. They are often heard (during
expiration, or during both inspiration and expiration) in patients with asthma or
other obstructive diseases. Other respiratory sounds are: rhonchi (continuous sounds
that indicate partial obstruction by thick mucous in the bronchial lumen, oedema,
spasm or a local lesion of the bronchial wall); stridor (high-pitched harsh sound
heard during inspiration and caused by obstruction of the upper airway); snoring
(acoustical signals produced by a constriction in the upper airway, usually during
sleep) and pleural rubs (low-pitched sounds that occur when inflamed pleural
surfaces rub together during respiration).
3 Review of literature
The sensor was placed over the bronchial regions of the anterior chest (second
intercostal space on the mid clavicular line), the vesicular regions of the posterior
chest (apex and base of lung fields, bilaterally) and the trachea at the lower part of
the neck, 1-2 cm to the right of the medline.
Sounds were amplified, low-pass filtered and recorded in digital format (Sony
Minidisc MZ-37, Japan) using a rate of sampling of 44.1 Khz. and 16 bit
quantization. The signal was transferred to a computer (Intel Pentium 500 Mhz.
Intel Corp., Santa Clara, CA, USA) and then analyzed by a specific Fourier
Transform based spectral analysis software (CoolEdit pro 1.0, Syntrillium Software
Corp., Phoenix, USA).
Because of the clinical necessity to correlate the acoustic phenomenon to the
phases of the human respiration, a method of analysis dedicated to the
time/frequency plane was applied: the STFT (Short Time Fourier Transform). It
provided "spectrograms" related to different respiratory acoustic patterns that,
according to the intensity and frequency changes in the time domain, were analyzed
offline. The spectrogram shows, in a three-dimensional coordinate system, the
acoustic energy of a signal versus time and frequency.
We studied normal breath sounds (vesicular and tracheal) from healthy subjects
without pulmonary diseases and adventitious lung sounds (crackles and wheezes)
from patients with pneumonia and COPD (chronic obstructive pulmonary disease),
spontaneously breathing. The signals were examinated for artifacts (generally
emanating from defective contact between sensor and chest wall or background
noise) and contaminated segments were excluded from further analysis.
5 Results
Normal breath sounds (vesicular and tracheal) showed typical spectra with a
frequency content extending up to 700 Hz {vesicular sounds) and 1600 Hz {tracheal
sound). Generally, at a frequency below 75-100 Hz there are artefacts from the heart
and the muscle sounds. Inspiratory amplitude was higher than expiratory amplitude
for vesicular sounds and lower than expiratory amplitude for tracheal sounds (fig. 1,
fig. 2).
6 Conclusions
7 References
16. Piirila P., Sovijarvi A., Kaisla T., Rajala H.M., Katila T. Crackles in Patients
with Fibrosing Alveolitis, Bronchiectasis, COPD, and Heart Failure. Chest
99(5) (1991) pp. 1076-1083.
17. Pasterkamp H., Sanchez I. Tracheal Sounds in Upper Airway Obstruction.
Chest 102 (1992) pp. 963-965.
67
T H E I M M U N E SYSTEM: B CELL B I N D I N G TO
MULTIVALENT A N T I G E N
Gyan Bhanot
This is a description of work done in collaboration with Yoram Louzoun and Mar-
tin Weigert at Princeton University. Experiments in the late 80's by Dintzis etal
revealed puzzling aspects of the activation of B-Cells as a function of the valence
(number of binding sites) and concentration of presented antigen. Through com-
puter modeling, we are able to explain these puzzles if we make an additional
(novel) hypothesis about the rate of endocytosis of B-Cell receptors. The first
puzzling result we can explain is why there is no activation for low valence (less
than 10-20). The second is why the activation is limited to a small narrow range of
antigen concentration. We performed a computer experiment to model the B-Cell
surface with embedded receptors diffusing in the surface lipid layer. We presented
these surface receptors with antigen with varying concentration and valence. Using
experimentally reasonable values for the binding and unbinding probabilities for
the binding sites on the antigens, we simulated the dynamics of the binding pro-
cess. Using the single hypothesis that the rate of endocytosis of bound receptors
is significantly higher than that of unbound receptors, and that this rate varies
inversely as the square of the mass of the bound, connected receptor complex, we
are able to reproduce all the qualitative features of the Dintzis experiment and
resolve both the puzzles mentioned above. We were also able to generate some
testable predictions on how chimeric B-Cells might be non-immunogenic.
1 Introduction
The human immune system 2 , on encountering pathogen, has two distinct but
related responses. There is an immediate response, called the Innate Response
and there is also a slower, dynamic response, called the Adaptive Response.
The Innate Response, created over aeons by the slow evolutionary process, is
the first line of defense against bacterial infections, chemicals and parasites. It
comes into effect immediately and acts mostly by phagocytosis (engulfment).
The Adaptive Response is evolving even within an individual, is slower in its
action (with a latency of 4-7 days) but is much more versatile. This Adative
Response is created by a complex process involving cells called lymphocytes.
A single microliter of fluid in the body contains about 2500 lymphocytes.
All cellular components of the Immune System arise in the bone marrow
from hematopoietic stem-cells, which differentiate to produce the other more
specialized cells of the immune system. Lymphocytes derive from a lymphoid
progenitor cell and differentiate into two cell types called the B-Cell and the T-
Cell. These are distinguished by their site of differentiation, the B-Cells in the
bone marrow and the T-Cell in the thymus. B and T Cells both have receptors
on their surface that can bind to antigen (pieces of chemical, peptides, etc.) An
important difference between B and T Cell receptors is that B-Cell receptors
are bivalent (have two binding areas) while T-Cell receptors are monovalent
(with a single binding area). In the bone marrow, B-Cells are presented with
self antigen, eg. pieces of the body's own molecules. Those B-Cells that react
to such self antigen are killed. Those that do not are released into the blood and
lymphatic systems.T-Cells on the other hand are presented with self antigen
in the thymus and are likewise killed if they react to it.
Cells of the body present on their surface pieces of protein from inside the
cell in special structures called the MHC (Major Histocompatibility Complex)
molecules. MHC molecules are distinct between individuals and each individ-
ual carries several different alleles of MHC molecules. T-Cells are selected in
the thymus to bind to some MHC of self but not to any self peptides that are
presented on these MHC molecules. Thus, only T-Cells that might bind to
foreign peptides presented on self MHC molecules are released from the thy-
mus. There are two types of T-Cells, distinguished by their surface proteins.
They are called CD8 T-Cells (also called killer T-Cells) and CD4 T-Cells (also
called helper T-Cells).
When a virus infects a cell, it uses the cell's DNA/RNA machinery to repli-
cate itself. However, while this is going on, the cell will present on its surface
pieces of viral protein on MHC molecules. CD8 T-Cells in the surrounding
medium are programmed to bind strongly to such MHC molecules presenting
69
non-self peptides. After they bind to the MHC molecule, they send a signal to
the cell to commit suicide (apoptose) and then unbind from the infected cell.
Also, once activated in this way, the CD8 T-Cell will replicate aggressively and
seek out other infected cells to send them the suicide signal. The CD4 T-Cells
on the other hand, recognize viral peptides on B-cells and macrophages (spe-
cialized cells which phagocytose or engulf pathogens, digest them and present
their peptide pieces on MHC molecules). The role of the CD4 T-Cell, when
it binds in this way, is to signal the B-Cell and macrophages to activate and
proliferate.
B-Cell that are non-reactive to self antigens in the bone marrow are re-
leased into the blood and secondary lymphoid tissue. They have a life time
of about three days unless they successfully enter lymphoid follicles, germinal
centers or the spleen and get activated by binding to antigen presented to them
there. Those that have the correct antibody receptors to bind strongly to viral
peptide (antigen), will become activated and will start to divide, thereby pro-
ducing multiple copies of themselves with their specific high affinity receptors.
This process is called 'clonal selection' as the clone which is fittest (binds most
strongly to presented antigen) is selected to multiply. The B-Cells that bind
to antigen will also endocytose their own receptors with bound antigen and
present it on their surface on MHC-II molecules for an activation signal from
CD4 T-Cells. Once a clone is selected, the B-Cells also mutate and proliferate
to produce variations of receptors to achieve an even better binding specificity
to the presented antigen. B-Cells whose mutation results in improved bind-
ing will receive a stronger activation signal from the CD4 T-Cells and will
out-compete the rest. This process is called 'affinity maturation'. Once the
optimum binding specificity B-cells are produced, they are released from the
germinal centers. Some of these differentiate into plasma cells which release
large numbers of antibodies (receptors) with high binding affinity for the anti-
gen. These antibodies mark the virus for elimination by macrophages. Some
B-Cells go into a latent phase (become memory B-Cells) from which they may
be activated if the infection recurs.
It is clear from the above discussion that there are two competing pressures
in play when antigen binds to B-Cells. One pressure is to maximize the number
of surface bound receptors, until a critical threshold is reached when the B-
Cell is activated and will proliferate. The other pressure is to endocytosis the
receptor-antigen complex followed by presentation of the antigen peptide on
MHC-II molecules, binding to CD4 T-Cells and an activation signal from that
binding. To function optimally, the immune system must carefully balance
these two processes of binding and endocytosis.
70
A given antigen will bind either to the A or the re chain, or to neither, but
not to both. Thus areAB-Cell receptor is effectively monovalent. Furthermore,
a B-Cell with mixedrereand AA receptors would effectively have fewer receptors
available for a given antigen.
It is possible to experimentally enhance the probability of such genetic
errors and study the immunogenicity of the resulting B-Cells. This has been
done in mice. The surprising result from such experiments is that chimerical
B-Cells are non-immunogenic 9 . We shall attempt to explain how this may
come about as a result of our assumption about endocytosis.
Dintzis etal 3 ' 4 , 5 did an in-vivo (mouse) experiment using five different flu-
oresceinated polymers as antigen (Ag). The results of the experiment were
startling. It was found that to be immunogenic, the Ag mass had to be in a
range of 105 —106 Daltons (1 Dalton = 1 Atomic Mass Unit) and have a valence
(number of effective binding sites) greater than 10-20. Antigen with mass or
valence outside this range elicited no immune response for any concentration.
Within this range of mass and valence, the response was limited to a finite
range of antigen concentration.
A model based on the concept of an Immunon was proposed to explain
the results 6 ' 7 . The hypothesis was that the B-Cell response is quantized, ie.
to trigger an immune response, it is necessary that a minimum number of
receptors be connected in a cluster cross linked by binding to antigen. This
linked cluster of receptors was called an Immunon and the model came to be
called the 'Immunon Model'. However, a problem immediately presents itself:
Why are low valence antigens non immunogenic? Why can one not form large
clusters of receptors using small valence antigen? The Immunon model had no
answer for this question.
Subsequently, Perelson et al 8 developed mathematical models (rate equa-
tions) to study the antigen-receptor binding process. Assuming that B-Cell
response is quantized, they were able to show that at low concentration, be-
cause of antigen depletion (too many receptors, too little antigen), an Immunon
would not form. However, the rate equations made the flaws in the Immunon
model apparant. They were not able to explain why large valence antigen were
necessary for an immune response nor why even such antigen was tolerogenic
(non-immunogenic) at high concentration.
72
move. At every time step, receptors which have free binding sites can bind to
other haptens on the antigen or to any other antigen already present on the
surface. They can also unbind from hapten to which they are bound. Once an
antigen unbinds from all receptors, it is released within 5 time steps on average.
Once every 20 time steps, the receptors were presented with new antigen at
a constant rate which was a measure of the total antigen concentration. We
varied this concentration rate in our modeling.
The normal rate of endocytosis of unbound receptors is once every 1/2
hour. If this is the rate of endocytosis for bound receptors also, it will be
too small to play a role in antigen presentation. Thus we must assume that
a bound receptor has an higher probability of being endocytosed compared to
an unbound receptor. A receptor can bind to two haptens and every antigen
can bind to multiple receptors. This cross linking leads to the creation of large
complexes. We assume that the probability to enodcytose a receptor-antigen
complex is inversly proportional to the square of its mass. The mass of the B
cell receptor is much higher than the mass of the antigens, so, when computing
the mass of the complex we can ignore the mass of the antigen. We thus set the
endocytosis rate only as a function of the number of bound receptors. The rate
of endocytosis for the entire complex was chosen to be inversly proportional to
the square of the number of receptors in the complex. More specifically, we set
the probability to endocytose an aggregate of receptors to be 0.0005 divided
by the square of the number of receptors in the aggregate. For chimeric B-cells
we reduced the numerator in this probability by a factor of 100.
5 Results
The results of our computer study are shown in figure (1), where the solid
line shows the number of bound receptors after 10 seconds of simulation as a
function of antigen valence. The data are average values over several simula-
tions with different initial positions for receptors and random number seeds.
The dashed line shows the number of endocytosed receptors. One observes a
clear threshold below which the number of bound surface receptors stays close
to zero followed by a region where the number of bound receptors increases
and flattens out. This establishes that we can explain the threshold in antigen
valence in the Dintzis experiment.
The reason for the threshold is easy to understand qualitatively. Once
an antigen binds to a receptor, the probability that its other haptens bind to
the other arm of the same receptor or to one of the other receptors present in
the vicinity is an exponentially increasing function of the number of haptens.
Also, once an antigen is multiply bound in a complex, the probability of all
74
•«# * u
£ 300
5 IOO -
n
E
Antigen valence
Figure 1: Dependence of the number of bound receptors and number of endocytosed receptors
on Antigen Valence for medium levels of concentration after 10 seconds.
6000 i
75
*$
d and
5000 i
t
&
o
- • « 4000
*
8 Valence=20 t
t
?0) t
o 3000
$
* / ^ N
1 2000
: / \
I 1000
...^>^
Antigen concentration
Figure 2: Dependence of the number of bound receptors and number of endocytosed receptors
on antigen concentration for high valence antigens after 10 seconds of simulation.
77
Antigen concentration
Figure 3: Dependence of the number of bound receptors and number of endocytosed receptors
on antigen concentration for low valence antigens after 10 seconds of simulation.
Q.
0)
S 250
•o
(U
o «*
r,#*
s.
iendoc
-
f 150
s~
uno
n
*~
rot
t
S ,''
f
IM %*
t
y"
n 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Time [msj
Figure 4: The number of bound and endocytosed receptors for a cell with 50% KK and 50%
AA receptors. These cells would be non-immunogenic because of low levels of activation from
the low binding.
78
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Time [ms]
Figure 5: The number of bound and endocytosed receptors for a cell with only KX receptors.
These cells would be non-immunogenic because of low levls of endocytosis and consequent
lack of T-Cell help.
References
1. Y. Louzoun, M. Weigert and G. Bhanot, "A New Paradigm for B
Cell Activation and Tolerance", Princeton University, Molecular Biology
Preprint, June 2001.
2. C. A. Janeway, P. Travers, M. Walport and J. D. Capra, "Immunobiology
- The Immune System in Health and Disease", Elsevier Science London
and Garland Publishing New York, 1999.
3. R. Z. Dintzis, M. Okajima, M. H. Middleton, G. Greene, H. M. Dintzis,
"The Immunogenicity of Soluble Haptenated Polymers is determined by
Molecular Mass and Hapten Valence", J. Immunol. 143:4, Aug. 15, 1989.
4. J. W. Reim, D. E. Symer, D. C. Watson, R. Z. Dintzis, H. M. Dintzis,
"Low Molecular Weight Antigen Arrays Delete High Affinity Memory B
cells Without Affecting Specific T-cell Help", Mol. Immunol., 33:17-18,
Dec. 1996.
5. R. Z. Dintzis, M. H. Middleton and H. M. Dintzis, "Studies on the
Immunogenicity and Tolerogenicity of T-independent Antigens", J. Im-
munol., 131, 1983.
6. B. Vogelstein, R. Z. Dintzis, H. M. Dintzis, "Specific Cellular Stimulation
in the Primary Immune Response: a Quantized Model", PNAS 79:2, Jan.
79
1982.
7. H. M. Dintzis, R. Z. Dintzis and B. Vogelstein, "Molecular Determinants
of Immunogenicity, the Immunon Model of Immune Response", PNAS
73, 1976.
8. B. Sulzer, A. S. Perelson, "Equilibrium Binding of Multivalent Ligands
to Cells: Effects of Cell and Receptor Density", Math. Biosci. 135:2,
July, 1996; ibid. "Immunons Revisited: Binding of Multivalent Antigens
to B Cells", Mol. Immunol. 34:1, Jan. 1997.
9. Y. Li, H. Li and M. Weigert, "Autoreactive B Cells in the Marginal Zone
that Express Dual Receptors", Princeton University Molecular Biology
Preprint, June 2001.
80
S T O C H A S T I C MODELS OF I M M U N E S Y S T E M A G I N G
L. MARIANI, G. TURCHETTI
Department of Physics, Via Irnerio 46, 40126 Bologna, Italy
Centro Interdipartimentale L. Galvani, Universita di Bologna, Bologna, Italy
E-mail: turchetti@bo.infn.it frida@economia.unibo.it
F. LUCIANI
Max Planck Institute for Compex Systems, Noetnitzer 38, Dresden, Germany
E-mail: luciani@mpipks-dresden.mpg.de
1 Biological Complexity
The Immune System (IS) preserves the integrity of the organism, continuously
challenged by internal and external agents (antigens). The large variety of anti-
gens ranging from mutated cells and parasites to viruses, bacteria and fungi
requires a rapid and efficient antagonistic response of the organism. At the
top of the philogenetic tree the evolution has developed a specific (clonotipic)
immunity which cooperates with the ancestral innate immunity to control the
antigenic insults ^h The innate system has an arsenal of dendritic cells and
macrophagi with a limited number of receptors capable of recognizing and neu-
tralizing classes of antigens. With the appearance of vertebrates the increase
of complexity stimulated the development of a system based on two new types
of cells, B and T lymphocytes, with three distinct tasks: to recognize the anti-
gens, to destroy them and to keep track of their structure through a learning
process. This kind of immunological memory is the key for a more efficient
response to any subsequent antigenic insult caused by an antigen that the or-
ganism has already experienced (this is the basis of vaccination). The specific
response is based on a variety of memory cells which are activated, by specific
81
2 T lymphocytes
ANE AE ANE AE
(Virgin) (memory+effector) CVirgin) (memory+effector)
Figure 1: Markers of Virgin and Memory plus Effector T lymphocytes. Schematic organiza-
tion of main T cell pool: CD4+ (Helper) and CD8+ (cytotoxic) lymphocytes.
3 Modeling immunosenescence
In this note we propose a mathematical model to describe the time variation
of the ANE and AE T cells compartments due to the antigenic load and to
the remodeling of the system itself. The antigenic load has sparse peaks of
high intensity (acute insults) and a permanent low intensity profile with rapid
random variations (chronic antigenic stress). In a previous work I 6 ' a simple
model for the time evolution AE and ANE T cells concentrations was proposed
on the basis of Franceschi's theory of immunosenescence, which sees the entire
IS undergoing a very deep reshaping during the life span. The exchange be-
tween the compartments were considered due to antigen stimulated conversion
and to reconversion due to secondary stimulation. The average antigenic load
contributed to define these conversion rates jointly with a genetic average. The
deterministic part of the model described the decrease of the ANE CD8 + T
83
dV _
-aV -/3M + fi e~xt + ecos2(6>) f(i)
dt ~
dM _ aV + /3M . 2//,N/-/^
1+ M+ +
— 2
(*)^) (1)
dt 7
where V denotes the number of ANE (virgin) CD8 + cells and M the number
of AE (effector +memory) CD8+ cells, M+ = M if M > 0 and M+ = 0
if M < 0. The parameter a gives the conversion rate of virgin cells due to
primary antigenic insult whereas {3 is the reactivation rate of memory cells due
to secondary antigenic stimulus, which has an inhibitory effect on the virgin
cells. In the primary production of AE cells we have taken the conversion
and reconversion terms proportional to (1 + ^M+)~l, in order to take into
84
§
500
o
0
o
Memory
c 8
___
„o o •
° ° ,^-V--" o o
(ft) ^-^"O 0
Jfcg^~t- --I -«
-100 -200
0 Time (years) 120 0 Time (years) 120
Figure 2: Comparison of the model with experimental data for the virgin (ANE) and memory
plus effector (AE) CD8+ T cells for the following parameters a = 0.025, /? = 0.01, e = 15,
0 = 35°, 7 = 0.004, A = 0.05, ^ = 15 and V(0) = 50. The curves are (V(t)) + fcoy (t) and
(M{i)) + kaM{t) with k = - 2 , 0 , 2 .
account the shrinkage of the T cells compartment. The term fie~xt describes
the production by thymus which is assumed to decay exponentially. Finally
e£(£), where
«(*)> = o (at)at')) = s(t-t') (2)
is the contribution of stochastic fluctuations to the conversion rates. The
mixing angle 0 gives the weight of this term on the ANE and AE compartments.
The results are compared with experimetal data in figure 2.
^iv ?- ° . . - • - • "
Virgin
X
'" 8 \ o s
F f
,..'' &
,—
•—r
8«
*S- *
B
Figure 3: Comparison with CD95 data ' 2 1 of virgin (ANE) and memory plus effector (AE)
populations for the model without shrinkage and thymus for two different mixing angles.
The parameters are a - 0.02, /3 = 0.005, £ = 10, V(0) = 400 and 6 ~ 0 (left figures)
and 9 = 45° (right figures). The curves are (V{t)) + kav(t) and (M(t)) + kaM(t) with
k = -2,0,2.
e(0-a)t _ J
(V(t)) = V0 (M(t)) = V0a (4)
/3-a 0-a
and the T cell population is conserved M(t) + V(t) = V(0). The deterministic
solution with no shrinkage 7 = 0 can be analytically obtained t 11 !. Since
0 < P < a choosing for simplicity ,5 = 0 one has
e-at _ e-Xt
(V(t)) = V(0)e-at+v (M(t)) = V(0)-(V(t)) + ?-{l-e-»)
A—a
(5)
The solution with shrinkage and no thymic term n = 0, choosing for simplicity
/? = 0, reads
-at
(V(t))=V(0)e (M(t)) = 7 - 1 ( [1 + 2 7 V(0)(1 - e-at) Y1/2 - 1 ) (6)
The graph of (V(t)} (5) exhibits a peak at t = (log A — loga)/(A — a) if
1/(0) = 0. The peak disappears when the thymus contribution vanishes. Con-
versely (M(t)) is monotonic increasing but the thymus enhances its value. The
shrinkage reduces considerably the increase of (M(t)) whereas it does not af-
fect (V(t)). The stochastic term generates a family of immunological histories.
Their spread is measured by the variance. In figure 3 we show the effect of the
mixing angle for the same set of parameters chosen for the simplified model.
86
500
Virgin
Virgin
fa X ^-^
inn -100 ••., " " - —
Figure 4: Comparison with CD95 data I 2 J of virgin (ANE) and memory plus effector (AE)
populations for the model with a = 0.025, /9 = 0.01, e = 15, 6 = 35°. On the left side we
consider the contribution of thymus with A = 0.05, /x = 15, 7 = 0 and V(0) = 50 . On the
right side the contribution of shrinkage is shown for 7 = 0.004, /i = 0 and V(0) = 400. The
curves are (V(t)) + kav(t) and (M(t)) •+- kaM{t) with k - - 2 , 0 , 2 .
When 6 grows, the rms spread av of V decreses, whereas the rms spread aM
of M increases. The separate effect of thymus and shrinkage is shown in figure
4, for the same parameters as figure 2.
5 Survival curves
The simplified model with noise given by equation (3), corresponds to the
Ornstein-Uhlembeck process. The probability density, satisfies the Fokker-
Planck equation and has an expilict solution
1 (v-(v)(t)[
(v)(t) + (1 - t;oo)e- t / T p(v,t) = exp
^2Trai(t) ~'r [ 2a2 (t)
(7)
where T = (a - 0)~\ Voo = -0T and a2(t) = \ e2 T (1 - e~2tlT). In figure
5 we compare the results of the model with demographic data. Assuming that
the depletion v = 0 of the virgin T cells compartment marks the end of life it
is possible to compute from (7) the survival probability up to age t
r+00
S(t) 2 du x(t) = (8)
Jx(t) a(t)
Neglecting the thymus and shrinkage effects the concentrations obtained from
equation (1) are close to values of the simplified model if d = 0. Indeed when
7 = /x = 0 we have M(t) + V{t) = V(0) + M(0) + ew(t) where w(t) denotes the
Wiener process. Setting v = V/{V + M) and V = V0 + eVx and M = M0 + <LV\
where V\, M\ are the fluctuating parts with zero mean, we have
(v)=v + e2 (V0 (w2) - (Vi w)) ((v - (v))2) = e2 ((V, - v0 wf) (9)
87
o
500
J\ o° "°o\
Sm
• \ \ °
*
Mortality Rate
rvival Pr obab
0
% \ /"" "X
0.1
/ \
4
« V l
V
V i
-100 y
) Time (years) 0 50 100 150 0 50 100 150
150 Time (years) Time (years)
Figure 5: Virgin cells population (V(t)) ± 2av (i) with vx = -0.5, T = 67 and e =
0.016 (left). Corresponding survival curve (center) and mortality rate (right). These values
correspond to a - 0.022, 0 = 0.0075, e = 6.5 7 = fi = 0 and V(0) = 400
1/2
exp^y^
x(t) = C
1
Vl - e- 2 «/ T H(f (v* - Uoo) (10)
where tt is the death age {v(tr)} = u* namely e~'*/ T = (u* — Voo)/(l — ^oo)- In
figure 5 we fit the demographic data of human males using (8) and the values
of the parameters are close to the ones obtained from the fit of CD8 + T cells
concentrations.
Our mortality rate is not monotonically increasing, as for Gomperz law, but
decreases after reaching a maximum, see figure 5. It is better suited to describe
the survival of human populations in agreement with demographic data. This
property, due to the randomly varying antigenic load on the organism, explains
the occurrence of very long lived persons (centenarians). We notice that x(t) oc
- i - 1 / 2 as t ->• 0 and x(oo) = C so that
-C2/2
e
Jim S(t) = 1 lim S(t) = —7=r (13)
«*H f t=T
C
T
'
V2TT(1 - e- 2 )
(14)
The meaning of the parameters is obvious: T is the age at which the survival
probability is exactly 50% and the slope of the curve there is proportional to
C. We can say that C measures the flatness of the graph of S(t). For the
mortality rate R = —S/S we have the following asymptotic behavior
7 Conclusions
We consider the long time behavior of the CD8 + virgin T cells and CD8 + anti-
gen experienced T cells compartments and the remodeling of the IS system.
The stochastic variations of the antigenic load determine a spread in the time
evolution of the cells number, in agreement with experiments. The results are
compatible with a previous simplified model for the virgin T cells concentra-
tions and provides survival curves compatible with demographic data. The
effect of thymus and remodeling improves the description of the early stage for
the virgin T cells and late stage of antigen experienced T cells.
8 Acknowledgments
We would like to thank Prof. Franceschi for useful discussions on the immune
system and ageing.
89
9 References
1. A Lanzavecchia, F.Sallustio Dynamics of T lymphocytes Responses: In-
termediates,Effector and Memory Science 290, 92 (2000)
2. F. Fagnoni, R. Vescovini, G. Passeri, G. Bologna, M. Pedrazzoni, G. La-
vagetto, A. Casti, C. Franceschi, M. Passeri & P. Sansoni, Shortage of
circulating naive CD8 T cells provides new insights on immunodeficiency
in aging. Blood 95, 2860 (2000)
3. F. Luciani, S. Valensin, R. Vescovini, P. Sansoni, F. Fagnoni, C.
Franceschi, M. Bonafe, G. Turchetti, Immunosenescence: The Stress
Theory AStochastical Model for CD8+ T cell Dynamics in Human Im-
munosenescence: implications for survival and longevity, J. Theor. Biol.
213,(2001)
4. A. Lanzavecchia, F. Sallusto, Antigen decoding by T lymphocytes: from
synapse to fate determination, Nature Immunology 2, 487 (2001)
5. G.Pawelec et al, T Cells and Aging, Frontiers in Bioscience 3, 59 (1998)
6. F. Luciani, G. Turchetti, C. Franceschi, S. Valensin, A Mathematical
Model for the Immunosenescence, Biology Forum 94, 305 (2001).
7. C. Franceschi, S. Valensin, M. Bonafe, G. Paolisso, A. I. Yashin, D.
Monti, G. De Benedictis, The network and the remodeling theories of
aging: historical background and new perspectives., Exp. Gerontol. 35,
879 (2000)
8. C. Franceschi, M. Bonafe, S. Valensin. Human immunosenescence: the
prevailing of innate immunity, the failing ofclonotipic immunity, and the
Riling of immunological space. Vaccine. 18,1717(2000)
9. B. Gompertz On the nature of the function expressive of the law of hu-
man mortality, and on the new mode of determining the values of life
contingencies. Philos. Trans. R. Soc. London 115, 513 (1825)
10. F. Luciani Modelli hsico-matematici per la memoria immunologica
e l'immunosenescenza , Master thesis, Univ. Bologna (2000)
11. L. Mariani Modelli stocastici deU'immunologia: Risposta adattativa,
Memoria e Longevita' Master thesis, Univ. Bologna (2001)
12. L.A. Gavrilov, N. S. Gavrilova The Biology of Life Span: a quantitative
approach ( Harwood Academic Publisher,London, 1991)
13. L. Piantanelli, G. Rossolini, A. Basso, A. Piantanelli, M. Malavolta,
A. Zaia, Use of mathematical models of survivorship in the study of
biomarker of aging: the role of Heterogeneity, Mechanism of Ageing and
Development 122, 1461 (2001)
This page is intentionally left blank
NEURAL NETWORKS AND
NEUROSCIENCES
This page is intentionally left blank
93
N. ACCORNERO, M. CAPOZZA
Dipartimento di Scienze Neurologiche, Universita di Roma LA SAPIENZA
We present a review of the architectures and training algorithms of Artificial Neural Networks and their
role in Neurosciences.
The way an organism possessing a nervous system behaves depends on how the
network of neurons making up that system functions collectively. Singly, these
neurons spatially and temporally summate the electrochemical signals produced by
other cells. Together they generate highly complex and efficient behaviors for the
organism as a whole. These operational abilities are defined as "emergent" because
they result from interactions between computationally simple elements. In other
words, the whole is more complex than the sum of its parts.
Our understanding of these characteristics in biological systems comes largely from
studies conducted with artificial neural networks early in the 1980s [1]. Yet the
biological basis of synaptic modulation and plasticity were perceived by intuition 40
years earlier by Hebb, and the scheme for a simple artificial neuronal network, the
perceptron, was originally proposed by Rosenblatt [2] and discussed by Minsky [3]
in the 1960s.
An artificial neural network, an operative model simulated electronically (hardware)
or mathematically (software) on a digital processor, consists of simple processing
elements (artificial neurons, nodes, units) that perform algorithms (stepwise linear
and sigmoid functions) on the sum or product of a series of numeric values coming
from the various input channels (connections, synapses). The processing elements
distribute the results of the output connections multiplying them by the single
connection "weights" received from the other interconnected processors. The final
complex computational result therefore depends on how the processing units
function, on the connection weights, and on how the units are interconnected (the
network architecture).
CONNECTIONS
PLASTICITY
F(l) Z^-—^F(')*P2
F(I)*P3
(l)*P4
SIMULATION
HARDWARE SOFTWARE
The functions of the processing elements (units) and the network architecture are
often pre-determined: the automated algorithms alter only the connection weights
during training. Other training methods entail altering the architecture, or less
frequently, the function of each processing unit.
The architecture of a neural network may keep to a pre-determined scheme (for
example with the processing elements, artificial neurons, grouped into layers, with a
single input layer, several internal layers, and a single output layer). Otherwise it
starts from completely chance connections that are adjusted during the training
process.
95
A R T I F I C I A L NEURAL NETWORKS
TOPOLOGY
Network training may simply involve increasing the differences between the various
network responses to the various input stimuli (unsupervised learning) so that the
network automatically identifies "categories" of input [4, 5, 6]. Another training
method guides the network towards a specific task (making a diagnosis or
classifying a set of patterns). Networks designed for pattern classification are trained
by trial and error. Training by trial and error can be done in two ways. In the first, an
external supervisor measures the output error then changes the connection weights
between the units in a way that minimizes the error of the network (supervised
learning) [7, 8]. The second training method involves selective mechanisms similar
to those underlying the natural selection of biological species — a process that makes
random changes in a population of similar individuals, then eliminates those
individuals having the highest error and reproduces and interbreeds those with the
lowest error. Reiterating the training examples leads to genetic learning of the
species [9].
96
0>
J I "
SUPERVISED
BEITA RlJt.ES
M
P '. ERROR 8ACIC-PMOPASATION
UNSUPERVISED
TOWETlTlWe UEARftHMG
GENETIC AiGORITHRSS
l -HI I l in I i Hi UtaHFUHlBlH
CROSSOVER
The choice of training method depends on the aim proposed. If the network is
intended to detect recurrent signal patterns in a "noisy" environment, then excellent
results can be obtained with a system trained through unsupervised learning. If one
aims to train a diagnostic net on known knowledge, or to train a manipulator robot
on precise trajectories, then one should choose a multilayered network trained
through unsupervised learning. If the net is designed for use as a model, that is to
simulate biologic nervous system functions, then the ideal solution is probably
genetic learning. Adding the genetic method to either of the other two methods will
improve the overall results.
In summary, biological and artificial neural networks are pattern transformers. An
input stimulation-pattern produces an output pattern-response, especially suited to a
given aim. To give an example from biology: a pattern of sensory stimuli, such as
heat localized on the extremity of a limb, results in a sequence of limb movements
that serve to remove the limb from the source of heat. A typical example of an
artificial network is a system that transforms a pattern of pathologic symptoms into a
medical diagnosis.
97
Input and output variables can be encoded as a vectorial series in which the value of
a single vector represents the strength of a given variable. The power of vectorial
coding becomes clear if we imagine how some biological sensory systems code the
reality of nature. The four basic receptors located on the tongue (bitter-sweet-salt-
acid) allow an amazing array of taste sensations. If each receptor had only ten
discrimination levels - and they certainly have more - we could distinguish as many
as 10.000 different flavors. On this basis, each flavor corresponds to a point in a
four-dimensional space identified by the 4 coordinates of the basic tastes. Similar
vectorial coding could make up the input of an artifical neural network designed to
identify certain categories of fruit. One or more internal (hidden) layers would
transform this coding first into numerous arbitrary hidden (internal) codes, and
ultimately into output codes that classify or recognize the information presented.
APPLE
BITTER BANANA
SWEET
ACID
SALTY CHERRY
«
• GRAPE
INPUT OUTPUT
VECTORIAL CODING POSITIONAL CODING
rapidly, and the way in which these data follow one another can provide information
that is essential for recognizing the phenomenon sought or for predicting how the
system will behave in the future. Enabling a network to detect structure in a time
series, in other words to encode "time", means also inserting "recurrent"
connections (carrying output back to input). These connections relay back to the
input unit the values computed by units in the next layer thus providing information
on the pattern of preceding events. Changing of the connection weights during
training is therefore also a function of the chain of events.
The nervous system is rich in recurrent connections. Indeed, it is precisely these
connections that are responsible for perceiving "time". If a "forward" network
allows the input pattern to be placed in a single point of the multidimensional vector
space, a recurrent network will evaluate this point's trajectory in time.
DYNAMIC RECOGNITION
T I M E CODING
PREDICTION
Advances in neural networks, the unceasing progress in computer science, and the
intense research in this field have brought about more profound changes in
technology than are obvious at first glance. Paradoxically, because these systems
have been developed thanks to digital electronics and by applying strict
99
mathematical rules, the way artificial neural networks function runs counter to
classic computational, cognitive-based theories. A distinguishing feature of neural
networks is that knowledge is distributed throughout the network itself rather than
being physically localized or explicitly written into the program. A network has no
central processing unit (CPU), no memory modules or pointing device. Instead of
being coded in symbol (algorithmic-mathematical) form, a neural network's
computational ability resides in its structure (architecture, connection weights and
operational units). To take an example from the field of mechanics, a network
resembles a series of gears that calculates the differential between the rotation of the
two input axles and relays the result to the output axle. The differential is executed
by the structure of the system with no symbolic coding.
4) Performance plasticity.
Connectionist systems still have limited spread in the various fields of human
applications, partly because their true potential remains to be discovered, and partly
because the first connectionist systems were proposed as alternatives to traditional
artificial-intelligence techniques (expert systems) for tasks involving diagnosis or
classification. This prospect met with mistrust or bewilderment, due paradoxically to
their inherent adaptiveness and plasticity, insofar as misuse of these qualities can
lead to catastrophic results. Disbelievers also objected that the internal logic of
artificial neural networks remains largely unknown: networks do not indicate how
they arrive at a conclusion. These weaknesses tend to disconcert researchers with a
determinist background (engineers, mathematicians and physicists) and drive away
researchers in medical and biological fields, whose knowledge of mathematics and
computer science is rarely sufficient to deal with the problems connectionist systems
pose.
Neural networks now have innumerable uses in neuroscience. Rather than listing the
many studies here, we consider it more useful to give readers some reference points
for general guidance. For this purpose, though many categories overlap or remain
borderline, we have distinguished three: applications, research, and modelling.
Applications essentially use neural networks for reproducing, in an automated, faster
or more economic manner, or in all three ways, tasks typically undertaken by human
experts. The category of applications includes diagnostic neural networks (trained to
100
EMG -NET
A NEURAL NETWORK FOR THE DIAGNOSIS OF
SURFACE MUSCLES ACTIVITY
"wi m
i P ' ' lfcO(.H."<»l
BRAINSTEM-NET
CLINICAL
NEUR0PHYSI0L0G1CAL
5268
VOXELS
_ l
Figure 8: 3d-images computed by a forward-layered neural network
102
Neural networks of the diagnostic type have many practical uses (for example,
automated recognition of EEG epileptic abnormalities quickens the tedious
examination of a 24-48 hour dynamic EEG recording). They have the advantage of
being able to process even noisy data, and can smoothly degrade the network's
response to excessive signal deterioration, whereas traditional expert systems
function well up to a given signal-to-noise ratio but then stop responding.
Physicians often seem reluctant to make use of neural networks. The reasons why
they do so are many and merit an analysis in depth that is outside the scope of this
chapter. To put it in brief, we suspect that physicians fear that a diagnosis provided
automatically by a machine will diminish their authority.
An extremely interesting field is the use of a neural network as a research tool in
areas where more traditional research tools seem to have exhausted their potential.
These cases essentially call for unsupervised networks, able to discover correlations,
regularity, and hidden patterns that other methods fail to disclose. In this context,
neural networks function as powerful multivariate and non-linear statistical tools.
Research conducted by Roberts and Tarassenko [12] shows, for example, that the
traditional division of sleep into five stages according to visual inspection of the
EEG is an arbitrary classification. Using a neural network they identified seven EEG
attractors during sleep (and one during wakefulness). Each sleep stage arises from a
characteristic trajectory between some of these attractors: and all these dynamic
events originate from the competitive interactions between three processes.
Last, the research field of greatest interest to the neuroscientist is also the area where
the cormectionist method of analysis finds widest agreement, namely the use of
neural networks in nervous system modeling. In this field, neural networks are
strictly speaking used not as tools but as models- though simplified ones - of
biologic nervous system functioning.
Because these models are simulated on computers, they can study the properties of
systems in a dynamic, quantitative manner, starting from the single local phenomena
emerging from collective interactions: memory, imagination, language, and maybe
in the future, from consciousness.
In effect, artificial neural network simulation offers the first - and at present the only
chance — of bringing the study of higher nervous system functions back from
psychology and philosophy into the realms of the natural quantitative sciences. No
longer will research be limited to observing and formulating hypotheses on nervous
system function: it will also be able to test function experimentally, though for the
time being in a limited way.
Sensory systems behave similarly to biological systems (artificial retina). Artificial
neuronal systems for motor control display an ability for unsupervised learning on
the control of a physical artificial arm or one simulated on a computer [13, 14].
103
LEARNING BY DOING
They will also control a double-inverted pendulum model of standing posture that
appears able to learn the upright posture spontaneously and to compensate for
perturbations in balance due to environmental conditions.
Studies of this type have helped us to understand some of the essential mechanisms
underlying the development of central nervous system sensorimotor control. This
knowledge has been put to various practical uses including neurorehabilitation and
attempts to construct sensory and motorized prostheses.
A connectionist (neural network) approach allows one to investigate not only the
functioning, but also the birth and evolution of simple nervous systems [15, 16].
The unavoidable, spontaneous affinity between connectionism and the
computational branch of evolutionary biology that studies the formation and
transformation of elementary organisms simulated by genetic algorithms has lead to
extraordinary results in the simulation of "artificial organogenesis" (the eye) and of
"artificial life".
The term artificial life refers to a computer simulation and study of dynamic
ecosystems. These systems are described as artificial in the sense that they are
104
originally designed by humans and are immaterial, but they evolve and reproduce in
an autonomous and often unpredictable manner. The lack of a material constitution
limits the ability of stimulation to describing the physicochemical properties of
organic and inorganic materials in mathematical terms.
To date, most studies focus on macroscopic behaviors including reproduction,
movement strategies, and energy exchange with the simulated environment (the
search for food) or the appearance of cooperative behaviors (including storms and
schools offish).
The search for criteria that will distinguish between living and non-living organisms
is as old as man. It also seems ever more arduous as investigational techniques
become increasingly refined. Currently the borderline between the two, if the
question is legitimate, lies between crystalline mineral structures and self-replicating
biological structures (DNA-RNA). At behavioral level, a useful definition is that
proposed by Monod, indicating three essential features of the living world:
teleonomy (the presence of a structural plan), autonomous morphogenesis and
reproductive invariance. But again, these three characteristics are wholly
interdependent. Most probably, the only law that distinguishes organic from
inorganic matter is their tendency to saturate the environment with copies of
themselves, thus modifying the environment itself, whenever possible to their own
advantage.
Because many species found in nature compete with one another and cooperate
towards the same aim, local situations of dynamic equilibrium are reached, termed
ecosystems. The scale of observation is obviously an important variable since the
living world seems to be organized into ecosystems within other ecosystems, rather
like Chinese boxes. From this viewpoint, expecting to simulate fully a biologic
ecosystem, however small, within the isolation and immaterial setting of a
mathematical process, may seem absurd and misleading. Yet if the aim is not to
simulate the ecosystem exactly but to understand only some of the rules governing
the biological world then the method is right and can be enlightening.
The opportunities for investigation in this field stem from at least three determinant
coincidences:
1) the development and spread of sufficiently powerful digital computers;
2) progress in studies on connectionism; and
3) an improved understanding of genetic biological mechanisms that allowed basic
laws to be simulated with mathematical formulas termed "genetic algorithms".
105
ARTIFICIAL LIFE
COMPUTER SIMULATION OF EVOLUTION AND BEHAVIOUR
OF SIMPLE ORGANISMS IN AN ENVIRONMENT
GENETIC ALGORITHMS AND THE RULES OF "NATURAL SELECTION "ARE EMBEDDED IN THE SIMULATED
SYSTEM . STARTING CONDITIONS ARE CASUAL . SPECIES EVOLVE BECAUSE
BEST-FITTING ORGANISMS .THAT CAN UTILIZE THE RESOURCES OF THE ENVIRONMENT
TO REPRODUCE EFFICIENTLY, ARE SELECTED
PARENTS
011011110101010000 i,'ur~Ti'~ •]
NATURAL SELE( i l 1
or>~'-,rFiuc-.
3« %
Fig 10: Model of artificial life simulated using genetic algorithms on populations
(hundreds to thousands) of unsupervised neural networks that compete in an artifical
environment.
3. Conclusions
After a fatiguing course, with trials and tribulations lasting more than 50 years,
connectionism has at last achieved the recognition it deserves. It is now beginning to
show its enormous potential for managing and understanding complex phenomena
that the deterministic means hitherto available could not approach. The coming
years should therefore witness a major breakthrough in scientific investigation.
We recommend readers who wish to approach this technology, as well as reading
explanatory texts (1,8), to begin experimenting using one of the inexpensive
commercial software programs suitable for a personal computer of modest
performance. These programs enable even inexpert users to set up and train a neural
network suitable for multiple applications.
Innumerable Internet sites are dedicated to neural networks: in response to keywords
including "neural networks" any research engine will supply hundreds, or often
106
thousands of links. Many of these sites offer neural nets as "freeware" (software that
can be downloaded free and used without restrictions) and "shareware" (software
that can be downloaded free-of-charge for personal evaluation and ultimately
eliminated or bought).
References
When interaction among regularly firing neurons is simulated (using measured cor-
tical response profiles as experimental input), besides complex network dominated
behavior, embedded periodicity is observed. This is the starting point for our the-
oretical analysis of the potential of neocortical neuronal networks for synchronized
firing. We start from the model that complex behavior, as observed in natural
neural firing, is generated from such periodic behavior, lumped together in time.
We address the question of whether, during periods of quasistatic activity, different
local centers of such behavior could synchronize, as required, e.g., by binding the-
ory. It is shown that for achieving this, methods of self-organization are insufficient
- additional structure is needed. As a candidate for this task, thalamic input into
layer IV is proposed, which, due to the layer's recurrent architecture, may trigger
macroscopically synchronized bursting among intrinsically non-bursting neurons,
leading in this way to a robust synchronization paradigm. This collective behavior
in layer IV is hyperchaotic; its characteristic statistical descriptors agree well with
the characterizations obtained from in vivo time series measurements of cortical
response to visual stimuli. When we evaluate a novel, biological relevant measure
of complexity, we find indications that the natural system has a tendency of tuning
itself to the regions of highest complexity.
1 Introduction
2 Results
a) b)
-20
-20 log e 0 -20 log e 0
-3.5
-7
-4 loge -0.5 -4 |og e -1
Figure 1. In vivo measurements (cat, visual cortex VI). log-log correlation plots and inter-
spike probability distributions of a) a noisy, and b) a pattern-firing neuron.
1) The qualitative behavior in the CML-model of layers I-II and V-VI, is in-
dependent from the local lattice map;
2) In the absence of noise, the CML-model is unable to synchronize in a self-
organized manner. Synchronization, however, could emerge in an input-driven
way;
3) Collective, roughly synchronized, behavior can be generated in a model of
110
layer IV, which also is able to generate responses that are very close to in vivo
measured neuronal firing;
4) The highest complexity of the CML-model is found where the global be-
havior of the network is close to the border between order and disorder. It is
possible that the objective of this property is to facilitate the inheritance of
rhythms of firing to other layers than IV.
3 Methods
where <f> is the phase of the phase-return map, at the indexed site, and nn
again denotes the cardinality of the set of all nearest-neighbors of site i, j .
&2 describes the overall coupling among the site maps. This global coupling
strength is locally modified by realizations kij, taken from some distribution,
which may or may not have a first moment (in the first case, k2 can be nor-
malized to be the global average over the local coupling strengths). In Eq.
3, the first term reflects the degree of self-determination of the phase at site
112
{i,j}, the second term reflects the influence by nearest-neighboring (i.e., the
ones producing the strongest interactions) centers.
V 0 IfaQlfa 0 (l-k2)a\J
where a is the (absolute) slope of the local tent maps and k2 is the diffusive
coupling strength. The thermodynamic formalism formally proceeds by raising
the (matrix) entries to the (inverse) temperature /3, and then focusing, as
the dominating effect, on the largest eigenvalue as a function of the inverse
temperature. For large network sizes, the latter converges towards
fi(p,k) = (\(l-k2)a\)P + (ak2)0. (5)
This expression explicitly shows that the network behavior is determined
by two sources: by the coupling (fc2) and by the site instability (a). Using this
expression of the largest eigenvalue, we obtain the free energy of our model as
F(/3) = log((| a(l — k2) Q'3 + (ak2)@). From the free energy, the largest network
Lyapunov exponent is derived as a function of the diffusive coupling strength
k2 and the slope of the local maps a, according to the formula
X= -^F(/3,k2)\0=1, (6)
in the form of statistical cycling, see [11])- Upon further increasing the local
instability, chaotic network behavior of turbulent characteristics emerges.
of the tent map model with the numerically calculated Lyapunov exponent of
the biological network, which shows the identical qualitative behavior. Based
on our insight into the tent-map model behavior, we conclude that in the bio-
logically motivated network, a notable degree of global synchronization would
require a large subset of all binary connections to be in the chaotic interaction
regime. This possibility, however, only exists for the inhibitory connections
(excitatory connections are unable to reach this state [10]). Moreover, the
part of the phase space on which the chaotic maps dwell, is rather small (al-
though of nonzero measure, see [11]). It is then reasonable to expect that for
the network including biological variability, statistical cycling is of vanishing
measure, and therefore cannot provide a means of synchronizing neuron fir-
ing on a macroscopic scale. To phrase it more formally: this implies that by
methods of self-organization, the network cannot achieve states of macroscopic
synchronization. In addition, we also investigated whether Hebbian [13] learn-
ing rules acting on the weak connections between centers of stronger coupling
could be a remedy for this lack of coherent behavior. Even with this additional
mechanism, the model does not show macroscopic synchronization.
The observation that the tent and the biological response site maps yield
qualitative identical properties, has some additional bearing. In simple systems
of nearest-neighbor coupled map lattices, it is found that the order parameter
corresponding to the average phase would display a phase transition at high
enough coupling strength, as the system is essentially equivalent to an Ising
model at a finite temperature. This is qualitatively similar to our model, where
a first order phase transition is observed at the coupling &2 = 1, for all values
of the local instability.
Layer IV's task is believed to be more centered around amplification and coor-
dination, than on computation. Accurately modeling layer IV is more difficult,
since, because of the smaller size of its typical neurons, it is difficult to measure
in-vitro response profiles. Our model of the "amplifier" layer IV is based on
biophysically detailed, variable model neurons that are connected in an all-to-
all fashion. This ansatz is partially motivated by the facts that layer IV is more
densely connected than other layers and that natural, measured responses can
easily be reproduced when using this setting. If synchronization - understood
as an emergent, not a feed-forward property - is needed for computational and
cognitive tasks, the question remains how this property is generated. In our
simulations of biophysically detailed models of layer IV cortical architecture
[3,14], we discovered a strong tendency of this layer to respond to stimula-
116
i i i i i i i i i
kkk
-80 -40 0
time lag [ms]
40 80
When we compared the responses from the layer IV model with data from
in vivo anesthetized cat (17 time series from 4 neurons of unspecified type from
unspecified layers), we found corresponding behavior. Not only the Lyapunov
exponents (generally hyperchaotic, Xmax ~ 0.8, the second exponent little
above zero), correspond well to the simulated ones from the model {\max ~ 0.5,
second exponent also little above zero). Also, the measured dimensions were
in the range predicted by the model of layer IV; specific characteristic patterns
found in vivo could be reproduced by our simulation model with ease. Of
particular interest are step-wise structures found in the log-log-plots used for
the evaluation of the dimensions [16] (see Fig. 1). However, as the majority of
the measured in vivo neurons could not be attributed to layer IV, the natural
hypothesis is that the other layers inherit these characteristics from the latter.
To investigate in more detail the relation between layer IV with the remaining
layers, we calculated for the coupled map lattice model a recently proposed,
biologically relevant complexity measure, Cs(l,0) [17]. To evaluate this quan-
tity, we first calculated from the largest eigenvalue the free energy of the net-
work, and from this quantity, C s ( l , 0 ) . We find that the highest complexities
are situated in the area beyond the line separating negative and positive Lya-
punov exponents (see Fig. 4, and compare with Fig. 2). Moreover, the area of
highest complexity roughly coincides with the area where the model Lyapunov
exponents agree with the in vivo measured exponents. This suggests that the
natural system has the tendency of being tuned to a state of high complexity.
These coincidences of modeling and experimental aspects lead us to be-
lieve that the ability of the network, to fire in well-separated characteristic
time scales or in whole patterns, is not accidental, but serves to evoke corre-
sponding responses by means of resonant cortical circuits. However, as has
been mentioned above, not every neuron shares this property. In our recent
studies of in vivo anesthetized cat data, we found, in evoked or spontaneous
firing experiments, essentially three different classes of behavior. The neurons
of the first class show no patterns in their firing at all. The neurons of the
second class are able to pick up stimulation patterns and convert them into
well-defined firing patterns. Neurons of the third class respond with smeared
patterns that seem not to be compatible with the stimulation paradigms (for
the first two classes see Fig. 1). With regard to the interspike distributions,
for the first class, a long-tail behavior of the interspike distribution is charac-
teristic. For the second class, a clean separation of the distribution into two
regimes, dominated by individual interspike intervals and by compound pat-
118
1.5
k2
0.5
0
0 1 2 3 4
a
Figure 4. Region in the parameter space (site map slope a and coupling £2), where the
biologically relevant complexity measure C$(l,0) is maximal. The contour lines increase in
steps of 0.1, starting from the line at the rigth upper corner with C s ( l , 0 ) = 0.1. From the
comparison of Lyapunov exponents, it is suggestive that the measured biological neurons are
tuned to working points of maximal complexity.
Chaotic firing emerges from the proposed model, as well as from the in vivo
data that we compare with, with nearly identical Lyapunov exponents and
fractal dimensions. The agreement between Kaplan-Yorke and correlation di-
mensions [12] corroborates the consistency of the results obtained. The ques-
tion then arises of what functional, possibly computational, relevance this phe-
nomenon could be associated with? Cortical chaos essentially reflects the abil-
ity of the system to express its internal states (e.g., a result of computation) by
choosing among different interspike intervals (ISI) or, more generally, among
distinct patterns of firing. This mechanism can be viewed in a broader con-
text. Chaotic dynamics is generated through the interplay of distinct unstable
periodic orbits, where the system follows a particular orbit until, due to the
instability of the orbit, the orbit is lost and the system follows another orbit,
and so on. It is then natural to exploit this wealth of structures hidden within
chaos, especially for technical applications. The task that needs to be solved to
119
do so, is the so-called targeting, and chaos control, problem: The chaotic dy-
namics first needs to be directed onto a desired orbit, on which it then needs to
be stabilized, until another choice of orbit is submitted. From an information-
theoretic point of view, information content can be associated with the different
periodic orbits. This view is related to beliefs that information is essentially
contained in the patterns of neuronal firing. If well-resolved interspike intervals
can be extracted from the spike trains of a neuron, the interspike lengths can
directly be mapped onto symbols. A suitable transition matrix then specifies
the allowed, and the forbidden, successions of interspike intervals. I.e., this
transition matrix provides an approximation to the grammar of the natural
system. In the case of collective bursting, it may be more useful to associate
information content with firing patterns consisting of characteristic intermit-
tent successions of spikes. In a broader context, the two approaches can be
interpreted as realizations of a statistical mechanics description by means of
different types of ensembles [18-19].
In the case of artificial systems or technical applications, strategies on how
to use chaos to transmit messages, and more generally, information, are well
developed. One basic principle used is that small perturbations applied to a
chaotic trajectory are sufficient to make the system follow a desired symbol
sequence containing the message [20]. This control strategy is based upon the
property of chaotic systems known as "sensitive dependence on initial condi-
tions" . Another approach, which is currently the focus of applications in areas
of telecommunication, is the addition of hard limiters to the system's evolution
[21-22]. This very robust control mechanism, can, due to its simplicity, even
be applied to systems running at Giga-Hertz frequencies. It has been shown
[23] that optimal hard limiter control leads to convergence onto periodic or-
bits in less than exponential time. In spite of these insights into the nature
of chaos control, which kind of control measures should be associated with
cortical chaos, however, is unclear. In the collective bursting case of layer IV,
one possible biophysical mechanism would be a small excitatory post-synaptic
current. When the membrane of an excitatory neuron is perturbed at the
end of a collective burst with an excitatory pulse, the cell may fire additional
spikes. Alternatively, at this stage inhibitory input may prevent the appear-
ance of spikes and terminate bursts abruptly. In a similar way, also the firing
of inhibitory neurons can be controlled. Another possibility, is the use of local
recurrent loops to establish delay-feedback control [24]. In fact, such control
loops could be one explanation for the abundantly occurring recurrent con-
nections among neurons. The relevant parameters in this approach are the
time delay of the re-fed signal, and the synaptic efficacy, where especially the
latter seems biologically realistic. In addition to the encoding of information,
120
one also needs read-out mechanisms, able to decode the signal at the receiver's
side. Thinking in terms of the encoding strategies outlined above, this would
amount to the implementation of spike-pattern detection mechanisms. Besides
simple straightforward implementations based on decay times, more sophisti-
cated approaches, such as the recently discovered activity-dependent synapses
[25-27], seem natural candidates for this task. Also the interactions of synapses,
with varying degrees of short-term depression and facilitation, could provide
the selectivity for certain spike patterns. Small populations of neurons, which
(due to variable axonal, and synaptic potential propagation delays) achieve
supra-threshold summation only for particular input spike sequences, is yet
another possible mechanism.
6 Conclusion
We find that the origin of synchronized firing in the cortex - if its existence can
be proven experimentally - would most likely be in layer IV, and be heavily
based on recurrent connections and simultaneous thalamic feed-forward in-
put. We expect that firing in patterns, in this layer, is able to trigger specific
resonant circuits in other layers, where then the actual computation is done
(which we propose to be based on the symbol set of an infinity of locked states
[4]). It is a fact that the majority of ISI measurements from in vivo cat vi-
sual cortex neurons and simple statistical models of neuron interaction show
emergent long-tail behavior, but this also might be the result of the interaction
between different areas of the brain or a consequence of the input structure,
or of a mixture of all of them. Long-tail interspike interval distributions are
in full contrast to the current assumption of a Poissonian behavior that orig-
inates from the assumption of random spike coding. We propose that in the
measurements that are compatible with the Poissonian spike train assump-
tion, layer IV explicitly shuts down the long-range interactions via inhibitory
connections or by pumping energy into new temporal scales that no longer
sustain the ongoing activity. Long-tailed ISI distributions could, however, also
be of relevance from the point of view of the tendency of the system of tuning
itself to a state of maximal complexity Cs(l,0). At such a state, due to the
structure of the measure, long range and slowly decaying correlations can be
expected to dominate the dynamical behavior.
Recently, it has been reported that noise can synchronize even coupled map
lattices with variable lattice maps [28]. It will be worthwhile investigating
whether this property also holds for our site maps (which we are willing to
believe), and whether this finally can be attributed to a kind of mean-field
activity generated by the noise. To what extent the assumptions made are the
121
relevant ones will have to be explored in continued work in the future, from
the biological, as well as from the mathematical side.
References
1. Chaos and Noise in Biology and Medicine, eds. M. Barbi and S. Chillemi
(World Scientific, Singapore, 1998).
2. J.K. Douglass, L. Wilkens, E. Pantazelou, and F. Moss, Nature 365, 337-
340 (1993).
3. R. Stoop, D. Blank, A. Kern, J.-J. v.d. Vyver, M. Christen, S. Lecchini,
and C. Wagner, Cog. Brain Res., in press (2001).
4. R. Stoop, L.A. Bunimovich, and W.-H. Steeb, Biol. Cybern. 83, 481-489
(2000).
5. C. Von der Malsburg in Models of Neural Networks II, eds. E. Domany,
J. van Hemmen, and K. Schulten, 95-119 (Springer, Berlin, 1994).
6. W. Singer in Large-Scale Neuronal Theories of the Brain, eds. C. Koch
and J. Davis, 201-237 (Bradford Books, Cambridge MA, 1994).
7. R. Stoop, K. Schindler, and L.A. Bunimovich, Acta Biotheoretica 48,
149-171 (2000).
8. R. Stoop in Nonlinear Dynamics of Electronic Systems, eds. G. Setti, R.
Rovatti, G. Mazzini, 278-282 (World Scientific, Singapore, 2000).
9. F.C. Hoppensteadt and E.M. Izhiekevich, Weakly Connected Neural Net-
works (Springer, New York, 1997).
10. R. Stoop, K. Schindler, and L.A. Bunimovich, Nonlinearity 13, 1515-1529
(2000).
11. J. Losson and M. Mackey, Phys. Rev. E 50, 843-856 (1994).
12. R. Stoop and R F . Meier, J. Opt. Soc. Am. B 5, 1037-1045 (1988); J.
Peinke, J. Parisi, O.E. Roessler, and R. Stoop, Encounter with Chaos
(Springer, Berlin, 1992).
13. D. Hebb, The Organization of Behavior (Wiley and Sons, New York,
1949).
14. D. Blank, PHD thesis (Swiss Federal Institute of Technology ETHZ, 2001).
15. O.E. Roessler, Phys. Lett. A 7 1 , 155-159 (1979).
16. A. Celletti and A. Villa, Biol. Cybern. 74, 387-393 (1996).
17. R. Stoop and N. Stoop, submitted (2001).
18. R. Stoop, J. Parisi, and H. Brauchli, Z. Naturforsch. a 46, 642-646 (1991).
19. C. Beck and F. Schloegel, Thermodynamics of Chaotic Systems: An In-
troduction (Cambridge University Press, Cambridge, 1993).
122
20. S. Hayes, C. Grebogi, E. Ott, and A. Mark, Phys. Rev. Lett. 73, 1781-
1784 (1994).
21. N. Corron, S. Pethel, and B. Hopper, Phys. Rev. Lett. 84, 3835-3838
(2000).
22. C. Wagner and R. Stoop, Phys. Rev. E 63, 017201 (2000).
23. C. Wagner and R. Stoop, J. Stat. Phys. 106, 97-107 (2002).
24. K. Pyragas, Phys. Lett. A 170, 421-428 (1992).
25. L. Abbott, J. Varela, K. Sen, and S.B. Nelson, Science 275, 220-224
(1997).
26. M.V. Tsodyks and H. Markram, Proc. Natl. Acad. Sci. USA 94, 719-723
(1997).
27. A. M. Thomson, J. Physiol. 502, 131-147 (1997).
28. C. Zhou, J. Kurths, and B. Hu, Phys. Rev. Lett. 87, 098101 (2001).
123
SELECTIVITY P R O P E R T Y OF A CLASS OF E N E R G Y B A S E D
L E A R N I N G RULES IN P R E S E N C E OF N O I S Y SIGNALS
A.BAZZANI, D.REMONDINI
Dep. of Physics and Centro Interdipartimentale Galvani Univ. of Bologna,
v.Irnerio 46 40126 Bologna, ITALY
and INFN sezione di Bologna E-mail: bazzani@bo.infn.it
N.INTRATOR
Inst, for Brain and Neural Systems, Brown Univ., Providence 02912 RI, USA
G.CASTELLANI
DIMORFIPA and Centro Interdipartimentale Galvani, Univ. of Bologna, v. Tolara
di Sopra 50, 4OO64 Ozzano dell'Emilia, ITALY
We consider the selectivity property of a class of energy based learning rules with
respect to the presence of clusters in the input distribution. These rules are a
generalization of the BCM learning rule and use the distribution momenta of order
> 2. The analytical results show that selective solutions are possible for noisy input
data up to a certain signal to noise ratio and that the introduction of a bias in
the input signal could improve the selectivity. We illustrate this effect with some
numerical simulations in a simple case.
The BCM neuron 1 has been introduces to analyze the plasticity property of a
biological neuron. In particular the model takes into account the LTP (Long
Term Potentiation) 7 and LTD (Long Term Depotentiation) 4 phenomena ob-
served in the visual neuron response under modification of the experience. In
its simplest formulation the BCM model assumes that the neuron response c
to an external stimulus d 6 R " is linear: c = m • d where m are the synaptic
weights. The changing of the weights m is described by the equation
m = ${c,0)d (1)
where 8 is an internal threshold of the neuron and the typical shape of the
function $ is given by figure 1. At each time the external signal d can be
considered the realization of a random variable of given distribution. If the
threshold 9 is fixed the equilibrium position c = 9a is unstable and only
the LTD behavior is described by eq. (1). To overcome this difficulty one
introduces the hyphotesis that the threshold 6 depends on the external envi-
124
Figure 1. Typical shape of the function <I>(c,#) that defines the BCM neuron.
ronment0, according to
${c,6)=c(c .p-2 _ 0)
(5)
and the average equation (4) reads rh = d£/dm where we have introduced
the energy function
Tvp
f(y) = -j- y G Rn (7)
and C is the metric defined by the second moment matrix CV, = < didj >;
the correspondence is explicitly given byTO*= pf(y*)y*-
constrained on the unit sphere V - o2- = n. Let us suppose oi > 0b, then by
differentiating eq. (9) we get the system
o ^ + o r = 0
r * * = - * j=2,..,n (10)
J
aoj aoj o\
Then the critical points are computed from the equations
2 2
0>r -°r )=° j=2,.-,n (ii)
It is easy to check the existence of a local maximum Oj = 0, j = 2, ..,n and
O! = y/n so that the BCM neuron is able to select only the first vector i>i
among all the possible input vector. According to the relation Oj = y • Vj
the vectors y defined by the equilibrium solutions are directed as the dual
base of the input vector Vj. We call this property the selectivity property of
the BCM neuron 3 . We observe that in the value of selective solutions o are
independent from the lengths and the orientation of the input vectors v; this
is not true for the corresponding outputs c = m • v of the neuron whose values
depend on the choice of the signals v. However the numerical simulations
show that the basin of attraction of the stable selective solutions has a strong
dependence from the norms of the vectors v; this effect makes very difficult
to distinguish signals of different magnitude and we assume that a procedure
could be applied in order to normalized the input signals.
The situation becomes more complicate when we consider the effect of an
additive noise on the selectivity property of a BCM neuron. Let us define the
random variable
where £ is a random vector that selects with equal probability one of the vector
Vj and 77 is a gaussian random vector where < rjj > = 0 and < rjiTjj >= Sijcr2 jn
i,j = 1,.., n c . For seek of simplicity we consider the case p = 3, q = 2, but the
calculations can be generalized in a straightforward way. It is convenient to
introduce the matrix V whose lines are the vectors v and the positive define
symmetric matrix S = VVT where VT is the transposed matrix. The cubic
function (7) can be written in the form
2
After some algebraic manipulations we get the equation
oi+az(S L
o)i = —-2 \ ii s
,2 +2^(5^0), ^ o , + n - X : <
j=i j=i
According to our ansatz the r.h.s. of eq. (15) is of order 0(a2)/oi so that we
can estimate
2 (0{l)
oi ~ cH I — ^ + aO(l)oi j (16)
where we have defined a = max; = 2,.., n IS/i*! to take into account the leading
term of (S~1o)i and 0(a2) to denote a term of order a2. Eq.(16) shows that
we can have a selective solution only if a <C \fn and a ~ 1/n. If we substitute
eq. (16) into the equation (15) and considering the leading terms, we get a
biquadratic equation for o\
4
o?+a (n-l)[^+aO(lK] + 0(n) = 0 (17)
the different directions of the unperturbed signal Vj. The role of the noise
amplitude a is to reduce the "attraction force" of the stable solutions and
eventually to change the stability property through a bifurcation phenomenon
when it exceeds a critical value.
The quantity a (see eq. (17)) plays a crucial role in the selectivity property
of the BCM neuron. It has a easy geometrical interpretation in the space of
the input signals d (cfr. eq. (12)). According to the definition of the matrix
S, a = max/ =2 ,..,n {S^l is directly related to the projection of the vector vi
on the hyper-plane defined by the vectors v2,....,fTl. Indeed a is equal to
0 when v\ is orthogonal to the other vectors vi I = 2,..,n . The equations
(15) and (17) indicate that for a fixed level a of the noise the selectivity
for the vector v\ of the BCM neuron is maximum when vi is orthogonal
to the other unperturbed vectors v2,----,vn. This remark suggests that the
selectivity property of a BCM neuron whose input distribution has the form
(12), the selectivity property for a given unperturbed vector v\ is optimized
if one introduces a strategy to satisfy the condition v\ • vi = 0 for I = 2, ..,n.
A common procedure to optimize the performance of a neural network is to
introduce a bias b in the input signals. This is equivalent to translate the
input distribution (cfr. eq. (12))
d = J2 vAi -b + r} = J2(VJ ~ Wi + V ( 18 )
3 3
the selectivity with respect to the different input signals Vj and an exhaustive
analysis of the input space would require a neural network of n neurons with
inhibitory connections.
The introduction of a bias b changes the norms of the input vectors so
that it is necessary to apply a normalization procedure that could decrease the
signal to noise ratio; this may destroy the advantages of a bias. At the moment
it is not available an efficient procedure which automatically computes the
bias; this problem is at present under consideration.
4 Numerical simulations
In order to show the selectivity property of a BCM neuron and the effect
of a bias we have considered a the plane case; the input distribution has
been defined by eq. (12) where the vectors v\,V2 lie on the unit circle at
different angles a. We have normalized each input vector d on the unit circle
so that the effect of noise enters only in the phase; the noise level a has been
varied in the interval [0,1]. We study two cases for the energy function (6):
p = 3 and g = 2 that corresponds to the BCM neuron and p = 4 and g = 3
that simulates kurtosis-like learning rule for the neuron. The initial synaptic
weights are choosen in a neighborhood of the origin near the vertical axis. To
quantify the separability we introduce the quantity
Figure 2. Separability property for a BCM (black circles) and kurtosis-like (red squares)
neuron; the left plot refers to a separation angle a = 90° between the input signals whereas
the right plot refers to a separation angle a — 10°; we have used a statistics of 10 6 input
vectors and the threshold A = 1 is also plotted.
0
0.5
Figure 3. Normalized neuron outputs o = y • VJ j = 1,2 for the selective solution in the
BCM case (left plot) and in the kurtosis-like case (right plot); the separation angle between
the input vectors is a = 90°.
100
10
A
0.1
0.01
0.001
Figure 4. Comparison of the separability property without (circles) and with (squares) a
bias for a separation angle a = 10° between the input vectors: the left plot refers to the
BCM neuron whereas the right plot to the kurtosis-like neuron.
5 Conclusions
References
Functional Magnetic Resonance (fMRI) is an imaging technique with high spatial and temporal
resolution that allows investigation of in vivo information about the functionality of discrete neuronal
groups during their activity utilizing the magnetic properties of oxy- and deoxy-hemoglobin. fMRI
permits the study of normal and pathological brain during performance of various neuropsychological
functions. Several research groups have investigated prefrontal cognitive abilities (including working
memory) in schizophrenia using functional imaging. Even if with some contradictions, large part of these
studies have reported relative decrease of prefrontal cortex activity during working memory, defined as
hypofrontality. However, hypofrontality is still one of the most debated aspects of the patophysiology of
schizophrenia because the results can be influenced by pharmacotherapy, performance and chronicity.
The first fMRI studies in patients with schizophrenia seemed to confirm hypofrontality. However, more
recent studies during a range of working memory loads showed that patients are hypofrontal at some
segments of this range, while they are hyperfrontal at others. These studies seem to suggest that the
alterations of prefrontal functionality are not only due to reduction of neuronal activity, but they probably
are the result of complex interactions among various neuronal systems.
Like its functional brain imaging forebears single photon emission tomography
(SPECT) and positron emission tomography (PET), fMRI seeks to satisfy a long-
term desire in psychiatry and psychology to define the neurophysiological (or
functional) underpinnings of the so-called 'functional' illnesses.
For much of the last century, attempts to define the 'lesions' causing these
illnesses, such as schizophrenia, major depression and bipolar disorder, have been
elusive, leading to their heuristic differentiation from 'organic' illnesses, like stroke
and epilepsy, with more readily identifiable pathogenesis.
FMRI offers several advantages in comparison to functional nuclear medicine
techniques, including low invasiveness, no radioactivity, widespread availability
133
and virtually unlimited study repetitions [49]. These characteristics, plus the relative
ease of creating individual brain maps, offer the unique potential to address a
number of long-standing issues in psychiatry and psychology, including the
distinction between state and trait characteristics, confounding effects of medication
and reliability [80]. Finally, the implementation of 'realtime' fMRI will allow
investigators to tailor examinations individually while a subject is still in the
scanner, promising true interactive studies or 'physiological interviews' [26].
The physical basis of fMRI is the blood oxygenation level dependent (BOLD)
effect, that is due to oxygenation-dependent magnetic susceptibility of hemoglobin .
Deoxyhemoglobin is paramagnetic, causing slightly attenuated signals intensity in
MRI image voxel containing deoxygenated blood. During neuronal firing, localized
increases in blood flow oxygenation and consequently reduced deoxyhemoglobin
cause the MRI signal to increase. It is therefore assumed that these localized
increases in BOLD contrast reflect increases in neuronal activity.
The BOLD mechanism has been further clarified by more recent experiments.
By using imaging spectroscopy, which allows selective measurement of both
deoxyhemoglobin and oxyhemoglobin, Malonek and Grinvald [52] demonstrated
that hemoglobin-oxygenation changes in response to neuronal activation are
biphasic: an early (<3 s), localized increase in deoxyhemoglobin (often referred to
as the 'initial dip') is followed by a delayed decrease in deoxyhemoglobin and a
concomitant increase in oxyhemoglobin. Malonek et al. showed that the initial
increase in deoxyhemoglobin is caused by an increase in cerebral metabolic rate of
oxygenation without matching cerebral blood flow response. The later increase in
cerebral blood flow causes the subsequent decrease in deoxyhemoglobin and the
concomitant increase in oxyhemoglobin [51].
Working memory
Working memory is a construct that describes the ability to transiently store and
manipulate information on line to be used for cognition or for behavioral guidance
[2,40]. A key aspect of working memory is its capacity limitation, usually reflected
in cognitive testing as decreasing performance in response to increasing working
memory load [31,45,56,65]. Numerous functional neuroimaging studies have used
the spatial location and temporal characteristics of the 'activation' response during
working memory to localize this cognitive phenomenon to regionally distinct
components within a larger distributed network [8,16,17,18,19,44,54]. For example,
activation in dorsolateral prefrontal cortex (DLPFC) appears to be related to the
active maintenance of information over a delay [17,19] and/or the manipulation of
this information [68]. In contrast, activation in areas like the anterior cingulate is
more the result of increase effort or task complexity [3,12,58].
Parametric working memory tasks, most notably the popular 'n-back' task [32],
134
are ideally suited to examine issues of dynamic range since working memory load
can be increased during the same experiment. The 'no back' control task simply
requires the identification of the number currently seem. The working memory
conditions require the encoding of currently seen numbers and the concurrent recall
of numbers previously seen and retained over a delay: as memory load increased,
the task required the recollection of respectively one stimulus ('one back'), two
stimuli ('two back') or three stimuli ('three back') previously seen while encoding
additional incoming stimuli.
Callicott et al. have identified characteristics of working memory capacity
using this parametric 'n-back' working memory task involving increasing cognitive
load and ultimately decreasing task performance during fMRI in healthy subjects.
Loci within dorsolateral prefrontal cortex (DLPFC) evinced exclusively an
'inverted-U' shaped neurophysiological response from lowest to highest load,
consistent with a capacity-constrained response. Regions outside of DLPFC, in
contrast, were more heterogeneous in response and often showed early plateau or
continuously increasing responses, which did not reflect capacity constrains.
However, sporadic loci, including the premotor cortex, thalamus and superior
parietal lobule, also demonstrated putative capacity-constrained responses, perhaps
arising as an upstream effect of DLPFC limitations or as a part of a broader
network-wide capacity limitation. These results demonstrate that regionally specific
nodes within the working memory network are capacity-constrained in the
physiological domain [10].
arises during demanding cognitive tasks that tax PFC function [13,79]. A corollary
of this explanation s that cortical activation is relatively 'normal' during cognitive
tasks that are less taxing on PFC [4,74]. On the other hand, critics have raised a
number of objections regarding the relationship between hypofrontality and
schizophrenia, invoking issues of experimental design and related inconsistencies.
For example, an alternative interpretation of hypofrontality is that it arises as an
epiphenomenon of patient behavior, specifically task performance, that is tipically
abnormal in patients with schizophrenia [28,42,61]. Thus, while many studies of
PFC function in schizophrenia have reported reduced PFC activation when patients
perform poorly [11,13,25,27,69,76,77], others have observed normal [21,28,55],
reduced [20,81] and even increased PFC activation [53,69] when patients'
performance is near normal. Regardless of these uncertainties, most authors agree
that the physiological responses of the schizophrenic brain are abnormal when
cognitive challenges are beyond these patients' behavioural capacity [75].
The interpretation of hypofrontality in the context of capacity limitations is
further complicated by recent studies in healthy subjects. For example, Goldberg et
al. found that healthy subjects performing a dual task paradigm became relatively
hypofrontal when pushed beyond their capacity to maintain accuracy [34]. In the
above cited study, Callicott et al. found evidence of an inverted-U shaped PFC
response to parametrically increasing working memory difficulty in healthy subjects
who became relatively hypofrontal as they were pushed beyond their working
memory capacity [10]. In addition, diminished PFC activity coincident with
diminished behavioural capacity has been found in single-unit recording studies in
non-human primates during working memory tasks [29,30] and in
electrophysiological studies in humans attempting complex motor tasks [33]. Thus,
under certain circumstances, hypofrontality can be a normal physiological response
to excessive load. Collectively, these data make it difficult to resolve whether
hypofrontality as a 'finding' in schizophrenic patients is a direct (i.e. disease
dependent) manifestation of PFC pathology or whether hypofrontality simply reflect
diminished behavioural capacity as might occur for any subject pushed beyond
capacity (i.e. disease independent).
To complicate matters further, there is also evidence that the 'healthy'
relationship between reduced working memory capacity and PFC neuronal function
could be over-activation of PFC (i.e. relative hyperfrontality). Rypma and
D'Esposito recently demonstrated that healthy controls who have longer reaction
times during a working memory task respond by increasing activation in dorsal but
not in ventral PFC [63]. They interpreted these results as a reflection of reduced
efficiency of working memory information manipulation within dorsal PFC.
Further, they interpreted the failure of reaction time to correlate with fMRI
activation in ventral PFC as a reflection of the putative link between ventral PFC
and working memory maintenance functions [22,57,66,67,72]. Thus, it is
conceivable that under certain circumstances schizophrenic patients might evidence
over-activation especially in dorsal PFC given their poor performance.
136
Working memory deficits are well documented in many studies of patients with
schizophrenia [14,23,24,35,36,37,46,59,70,80]. While working memory is thought
to be capacity limited in all subjects [45,56], schizophrenic patients appear to have
additional capacity limitations presumed to arise from dorsal PFC dysfunction
[38,39,41].
Callicott et al. [9], using fMRI, mapped the response to varying working
memory difficulty in patients with schizophrenia and healthy comparison subject
using the above cited parametric version of the 'n-back' task. Consistent with earlier
neuropsychological studies, he found that patients with schizophrenia have limited
working memory capacity compared to healthy subjects. Although patients
activated the same distributed working memory network, the response of patients
with schizophrenia to increasing working memory difficulty was abnormal in dorsal
PFC. The salient characteristic of PFC dysfunction in schizophrenia in this
paradigm was not that the PFC was relatively 'up' or 'down' in terms of activation
when compared to healthy subjects; rather, the salient characteristic was an
inefficient dynamic modulation of dorsal PFC neuronal activity. While several
regions within a larger cortical network also showed abnormal dynamic responses
to varying working memory difficulty, the fMRI response in dorsal PFC (areas 9-
10, 46) met additional criteria for a disease-dependent signature of PFC neuronal
pathology. In fact, at higher memory difficulties (lback and 2back) wherein patients
showed diminished working memory capacity, dorsal PFC was consistently hyper-
responsive. Furthermore, there was a functional distinction between the response of
ventral and dorsal PFC, even though both were abnormal to some extent. In contrast
to dorsal PFC, ventral PFC (BA47) was hypo-responsive to varying memory
difficulty [9].
While hypofrontality as a finding generates continued debate, there is less
debate that PFC neuronal pathology exists in schizophrenia and that this pathology
may be more prominent in dorsal PFC (areas 9, 46). Similarities between some of
the clinical symptoms of schizophrenia - particularly between the negative or
deficit symptoms in schizophrenics and those of patients with frontal lobe lesions -
have long implicated PFC in schizophrenia [48,60]. Even though the heterogeneity
of clinical symptomatology implicates multiple brain regions, evidence that
schizophrenia fundamentally involves dorsal PFC neuronal pathology continues to
accumulates from many directions [50,64]. For example, proton magnetic resonance
spectroscopy studies have repeatedly found reduced concentrations of an
intraneuronal chemical N-acetylaspartate (NAA) in PFC [7,5,6,15,71]. Furthermore,
those studies that have examinated sub-regions within PFC have found NAA
reductions in dorsal but not ventral PFC [5,6,7]. In addition, Callicott et al
demonstrated dorsal but not ventral PFC NAA reductions specifically predicted the
extent of negative symptoms in schizophrenic patients [9]. These and other data
provide a strong basis for the assumption that specific neurocognitive abnormalities
in schizophrenia (particularly working memory) result from physiological
dysfunctions of PFC neurons.
137
References
1. Andreasen N.C., Rezai K., Alliger R., Swayze V.W., Flaum M., Kirchner P.,
Cohen G., O'Leary D., Hypofrontality in neuroleptic-naive patients and in
patients with chronic schizophrenia: assessment with Xenon 133 single-photon
emission computed tomography and the tower of London, Arch. Gen.
Psychia.49 (1992) pp. 943-958.
2. Baddeley A., Working memory (Clarendon Press, Oxford, 1986).
3. Barch D.M., Braver T.S., Nystrom L.E., Forman S.D., Noll D.C. and Cohen
J.D., Dissociating working memory from task difficulty in human prefrontal
cortex, Neuropsychologia 35 (1997) pp. 1373-1380.
4. Berman K.F., Illowsky B.P., Weinberger D.R., Physiological dysfunction of
dorsolateral prefrontal cortex in schizophrenia. IV. Further evidence for
regional and behavioral specificity, Arch. Gen. Psychiat. 45 (1988) pp. 616-
622.
5. Bertolino A., Callicott J.H., Elman I., Mattay V.S., Tedeschi G., Frank J.A.,
Breier A. and Weinberger D.R., Regionally specific neuronal pathology in
untreated patients with schhizophrenia: a proton magnetic resonance
spectroscopic imaging study, Biol. Psychiat., 43 (1998) pp. 641-648.
6. Bertolino A., Callicott J.H., Nawroz S., Mattay V.S., Duyn J.H., Tedeschi G.,
Frank J.A. and Weinberger D.R., Reproducibility of proton magnetic resonance
spectroscopic imaging in patients with schizophrenia,
Neuropsychopharmacology, 18 (1998) pp. 1-9.
7. Bertolino A., Nawroz S., Mattay V.S., Barnett A.S., Duyn J.F., Moonen C.T.,
Frank J.A., Tedeschi G. and Weinberger D.R., Regionally specific pattern of
neurochemical pathology in schizophrenia as assesed by multislice proton
magnetic resonance spectroscopic imaging, Am. J. Psychiat, 153 (1996) pp.
1554-1563.
8. Braver T.S., Cohen J.D., Nystrom L.E., Jonides J., Smith E.C. and Noll D.C, A
parametric study of prefrontal cortex involvment in human working memory,
Neuroimage 5 (1997) pp. 49-62.
9. Callicott J.H., Bertolino A., Mattay V.S., Langheim F.J.P., Duyn J., Copcpola
R., Goldberg T.E. and Weinberger D.R., Psysiological dysfunction of the
139
35. Goldberg T.E., Patterson K.J., Taqqu Y. and Wilder K., Capacity limitations in
short-term memory in schizophrenia: tests of competing hypoteses, Psychol.
Med., 28 (1998) pp. 665-673.
36. Goldberg T.E., Weinberger D.R., Berman K.F., Pliskin N.H. and Podd M.H.,
Further evidence for dementia of the prefrontal type in schizophrenia? A
controlled study of teaching the Wisconsin Card Sorting Test, Arch. Gen.
Psychiat., 44 (1987) pp. 1008-1014.
37. Goldberg T.E. and Weinberger D.R., Thought disorder, working memory and
attention: interrelationships and the effects of neuroleptic medications, Int.
Clin. Psychopharmacol., 10(Suppl 3) (1995) pp. 99-104.
38. Goldberg T.E. and Weinberger D.R., Probing prefrontal function in
schizophrenia with neuropsychological paradigms, Schizophr. Bull, 14 (1988)
pp. 179-183.
39. Goldman-Rakic P.S., Prefrontal cortical dysfunction in schizophrenia: the
relevance of working memory. In Psychopathology and the brain, ed. by
Carroll B.J. and Barnett J.E. (Raven Press, New York, 1991).
40. Goldman-Rakic P.S., Regional and cellular fractionation of working memory,
PNAS 93 (1996) pp. 13473-13480.
41. Goldman-Rakic P.S., Working memory dysfunction in schizphrenia, J.
Neuropsychiat. Clin. Neurosci., 6 (1994) pp. 348-357.
42. Gur R.C., Gur R.E., Hypofrontality in schizophrenia: RIP, Lancet 345 (1995)
pp. 1383-1384.
43. Ingvar D. and Franzen G., Distribution of cerebral activity in chronic
schizophrenia, Lancet 2 (1974) pp. 1484-1486.
44. Jonides J., Smith E.E., Koeppe R.A., Awh E., Minoshima S. and Mintun M.A.,
Spatial working memory in humans as revealed by PET, Nature 363 (1993) pp.
583-584.
45. Just M.A. and Carpenter P.A., A capacity theory of comprehension: individual
differences in working memory, Psychol. Rev. 99 (1992) pp. 122-149.
46. Keefe R.S., Roitman S.E., Harvey P.D., Blum C.S., DuPre R.L., Prieto D.M.,
Davidson M. and Davis K.L., A pen-and-paper human analogue of a monkey
prefrontal cortex activation task: spatial working memory in patients with
schizophrenia, Schizophr. Res, 17 (1995) pp. 25-33.
47. Kidd K.K., Can we find genes for schizophrenia?, Am J Med Genet 74 (1997)
pp. 104-111.
48. Kraepelin E., Dementia precox and paraphrenia (E.&S. Livingstone,
Edinburgh, 1919).
49. Levin J.M., Ross M.H. and Renshaw P.F., Clinical applications of functional
MRI in neuropsychiatry, J Neuropsychiatry Clin Neurosci 7 (1995) pp. 511-
522.
50. Lewis, D.A., Development of the prefrontal cortex during adolescence: insights
into vulnerable neural circuits in schizophrenia, Nueropsychopharmacology, 16
(1997) pp. 385-398.
142
51. Malonek D., Dirnagl U., Lindauer U., Yamada K., Kanno I. and Grinvald A.,
Vascular imprints of neuronal activity: relationships between the dynamics of
cortical blood flow, oxygenation, and volume changes following sensory
stimulation, PNAS 94 (1997) pp. 14826-14831.
52. Malonek D. and Grinvald A., Interactions between electrical activity and
cortical microcirculation revealed by imaging spectroscopy: implications for
functional brain mapping, Science 272 (1996) pp. 551-554.
53. Manoach D.S., Press D.Z., Thangaraj V., Searl M.M., Goff D.C., Halpern E.,
Saper C.B. and Warach S., Schizophrenic subjects activate dorsolateral
prefrontal cortex during a working memory task as measured by MRI, Biol.
Psychiat, 45 (1999) pp. 1128-1137.
54. McCarthy G., Blamire A.M. Puce A., Nobre A.C., Bloch G., Hyder F.,
Goldman-Rakic P.S. and Shulman R.G., Functional magnetic resonance
imaging of human prefrontal cortex activation during a spatial working
memory task, PNAS 91 (1994) pp. 8690-8694.
55. Mellers J.D.C., Adachi N., Takei N., Cluckie A., Toone B.K. and Lishman
W.A., Pet study of verbal fluency in schizophrenia and epilepsy, Br. J.
Psychiat. 173 (1998) pp. 69-74.
56. Miller G.A., The magical number seven, plus or minus two: some limits on our
capacity for processing information, Psychol. Rev. 63 (1956) pp. 81-97.
57. Owen A.M., Evans A.C. and Petrides M., Evidence for a two-stage model of
spatial working memory processing within the lateral frontal cortex: a positron
emission tomography study, Cereb. Cortex, 6 (1996) pp. 31-38.
58. Pardo J.V., Pardo P.J., Janer K.W. and Raichle M.E., The anterior cingulate
cortex mediates processing selection in the Stroop attentional conflict
paradigm, PNAS 87 (1990) pp. 256-259.
59. Park S. and Holzman P.S., Schizophrenics show spatial working memory
deficits, Arch. Gen. Psychiat, 49 (1992) pp. 975-982.
60. Piercy M., The effects of cerebral lesions on intellectual function: a rewiev of
current research trends, Br. J. Psychiat, 110 (1964) pp. 310-352.
61. Price M., Friston K.J., Scanning patients with task they can perform,
Hum.Brain Map., 8 (1999) pp. 102-108.
62. Risch N. and Merikangas K., The future of genetic studies of complex human
diseases, Science 273 (1996) pp. 1516-1517.
63. Rypma B. and D'Esposito M., The role of prefrontal brain regions in
components of working memory: effects of memory load and individual
differences, PNAS, 96 (1999) pp. 6558-6563.
64. Selemon L.D. and Goldman-Rakic P.S., The reduced neuropil hypotesis: a
circuit based model of schizophrenia, Biol. Psychiat, 45 (1999) pp. 17-25.
65. Shallice T., From neuropsychology to mental structure (Cambridge University
Press, Cambridge, 1988).
66. Smith E.E. and Jonides J., Neuroimaging analyses of human working memory,
PNAS, 95 (1998) pp. 12061-12068.
143
67. Smith E.E. and Jonides J., Storage and executive processes in the frontal lobes,
Science, 283 (1999) pp. 1657-1661.
68. Smith E.E., Jonides J., Marshuetz C. and Koeppe R.A., Components of verbal
working memory: evidence from neuroimaging, PNAS 95 (1998) pp. 876-882.
69. Stevens A.A., Goldman Rakic P.S., Gore J.C., Fulbright R.K. and Wexler B.E.,
Cortical dysfunction in schizophrenia during auditory word and tone working
memory demonstrated by functional magnetic resonance imaging, Arch. Gen.
Psychiat. 55 (1998) pp. 1097-1103.
70. Stone M., Gabrieli J.D., Stebbins G.T. and Sullivan E.V., Working strategic
memory deficits in schizophrenia, Neuropsychology, 12 (1998) pp. 278-288.
71. Thomas M.A., Ke Y., Levitt J., Caplan R., Curran J., Asarnow R. and
McCracken J., Preliminary study of frontal lobe 1H MR spectroscopy in
childhood onset schizophrenia, J. Magn. Reson. Imag., 8 (1998) pp. 841-846.
72. Wagner A.D., Working memory contributions to human learning and
remembering, Neuron, 22 (1999) pp. 19-22.
73. Weinberger D.R., Implications of normal brain development for the
pathogenesis of schizophrenia, Arch. Gen. Psychiatry 44 (1987) pp. 660-669.
74. Weinberger D.R. and Berman K.F., Prefrontal function in schizophrenia:
confounds and controversies, Phil. Trans. R. Soc. Med. 351 (1996) pp. 1495-
1503.
75. Weinberger D.R. and Berman K.F., Speculation on the meaning of cerebral
metabolic hypofrontality in schizophrenia, Schizophr. Bull., 14 (1988) pp. 157-
168.
76. Weinberger D.R., Berman K.F. and Illowsky B.P., Physiological dysfunction of
dorsolateral prefrontal cortex in schizophrenia III. A new cohort and evidence
for a monoaminergic mechanism, Arch. Gen. Psychiat. 45 (1988) 609-615.
77. Weinberger D.R., Berman K.F., Suddath R. and Torrey E.F. Evidence of
dysfunction of a prefrontal-limbic network in schizophrenia: a magnetic
resonance imaging and regional cerebral blood flow study of discordant
monozygotic twins, Am J Psychiatry 149 (1992) pp. 890-897.
78. Weinberger D.R., Berman K.F. and Zee R.F., Physiologic dysfunction of
dorsolateral prefrontal cortex in schizophrenia. I. Regional cerebral flood
evidence, Arch. Gen. Psychiat. 43 (1986) pp. 114-124.
79. Weinberger D.R., Mattay V., Callicott J., Kotrla K., Santha A.,van Gelderen P.,
Duyn J., Moonen C. and Frank J., fMRI applications in schizophrenia research,
Neuroimage 4 (1996) pp. 118-126.
80. Wexler B.E., Stevens A.A., Bowers A.A., Sernyak M.J. and Goldman-Rakic
P.S., Word and tone working memory deficits in schizophrenia, Arch. Gen.,
Psychiat, 55 (1998) pp. 1093-1106.
81. Yurgelun-Todd D.A., Waternaux CM., Cohen B.M., Gruber S.A., English
CD. and Renshaw P.F., Functional magnetic resonance imaging of
schizophrenic patients and comparison subjects during word production, Am. J.
Psychiat, 153 (1996) pp. 200-205.
144
The aim of this study was to develop a discriminant analysis based both on clas-
sical linear methods, as Fisher's Linear Discriminant (FLD) and Likelihood Ratio
Method (LRM), and non-linear Artificial Neural Network (ANN) classifier in or-
der to distinguish between patients affected by Huntington's disease (HD) and
normal subjects. R.O.C. curve analysis revealed ANN to be the best classifier.
Moreover the network classified gene-carrier relatives as normal thus suggesting
the EEG to be a marker of the evolution of the HD.
1 Introduction
2 Data set
The data set here considered refers to 8 patients affected by the HD, 7 gene-
carrier first-degree relatives and 7 controls.
The EEG signal was sampled at 512 Hz in 2-seconds epochs on 19 elec-
trodes positioned on FP1, FP2, F7, F 3 , FZ, FA, F8, T3, C3, CZ, C4, T4,
T5, P 3 , P Z , P4, T6, 0 1 , 0 2 derivations, according to the 10 - 20 system.
Artifacts-free random samples were selected to form a data-set constituted
of 160 epochs from patients' recordings, 160 epochs from controls and 71 from
gene-carrier relatives. These were Fast-Fourier transformed and the power
of the brain rhythms a (8 - 12.5)Hz, /3 (13 - 30)Hz, •d (4 - 7.5)Hz and 5
(0.5 — 3.5)Hz was considered.
Due to the limited availability of the data, the cross-validation technique6
was applied, which consists in considering all the possible 8 different partitions
of the data into a training set of 140 elements and a test set of 20 elements:
each partition is submitted to the classification systems and the corresponding
outputs are summed for the case of signals-controls classification while averaged
for the case of gene-carrier relatives' analysis.
w=
JMI S ^ m s ~ " ^ ^
where
ms m
= Jf- z l *i > c = TF z l X i (2)
s c
jzcs iecc
are the mean vectors of the two classes (signals and controls) and
S
™- zZ (XJ-mc)(xi-mc)T+ ^2 (xj - m s ) ( x j - m s ) T (3)
Due to the high dimensionality of the feature space, estimation of the probabil-
ity density (4) by the histogram methodwould not work (see, e.g., the discussion
on the curse of dimensionality in Bishop 4 , pp. 51). It is then estimated by a
non-parametric approach as the kernel-based method8 in which the functional
form is not specified a priori but relies on the data itself. In particular it is
given by the sum all over the training set of normal multivariate distributions
each one centered on a training data point:
1 *-^ 1 i i * - * , - ii 2
The weights are updated according to the gradient descent learning rule9:
6 R . O . C . curves' analysis
Figures (2), (3) and (4) show tipical output histograms in the a frequency re-
gion from the three classification systems. In particular, for LRM analysis, the
output histogram of the likelihood Ls, relative to the signal class, is considered
both for controls (figure (3), up) and for signals (figure (3), down).
Even by visual comparison it is clear that the FLD output gives the worst
discrimination between the classes due to the strong overlap of the distributions
while both LRM and ANN histograms are more separated and peaked, which
means a better classification.
The subsequent step is to put an appropriate threshold on the output
histograms so that once a new data point (which is not known to be a signal
or a control) is submitted to the classifier a decision can be taken on its class
depending on the side of the corresponding output with respect to the threshold
itself.
In order to have a quantitative measure of the performance of the algo-
rithms we use R.O.C. curve analysis which is a good technique to estimate
the quality of a classification in the particular case of binary hypothesis to be
tested. Given a threshold value on the output histogram, the sensitivity e and
the specificity s are defined as
where nss and the numbers of the correctly classified signal and control
data, respectively, and ncs and nsc the number of misclassifications.
Sweeping the threshold parameter through the [0,1] interval, the graphical
rappresentation of the sensitivity e versus the specificity s gives the R.O.C.
curve.
149
n r-
•'• l« • Ml
0.2 0.4
120 1 ~T 1 1
'
100 - "
80 "
60 "
40 "
20 -
lb .
-0.2 0.2
»• • I. I.I.
0.2 0.4 0.6 0.8
140 1 1 1
'
120 .
100 -
80 -
60 • -
40 - -
20 -
0 I.. , ! r ,
-0 2 ) 0.2 0.4 0.6 0.8 1
60 I
50 -
40 -
30 -
20 -
10 - -
0 . 1 i.... IJIJIIIIII
-0 2 ( 0.2 0.4 0.6 0.8 1 1
In the case of a perfect classification the two terms nsc and ncs tend to
zero and therefore both the sensitivity and the specificity tend to 1 as the area
under R.O.C. curve: this area is, therefore, an index of the goodness of the
performance of the classification system and will be used to compare our three
classifiers.
R.O.C. curves are shown for FLD, LRM (for different values of the pa-
rameter h) and ANN in the different frequency regions a (figure (5)), /3 (figure
(6)), d (figure (7)) and 5 (figure (8)). In figure (9) R.O.C. curves relative to
ANN are drawn to compare the performances of the network in the different
regions. By computing the areas a for a frequencies, we find FLD to be the
worst algorithm (a = 0.7954), ANN the best one (a = 0.9877) while LRM has
an intermediate performance increasing as h decreases (a = 0.8163 for h — 0.5,
a = 0.9314 for h = 0.1 and a = 0.945 for h = 0.05): this LRM behavior is ver-
ified also for j3, 8 and $ frequencies. In the other three regions FLD overcomes
LRM for h = 0.5 while the order of performance of ANN and LRM (h = 0.1,
h — 0.05) is the same as in a. Concerning ANN, its performance increases
from 5 (a = 0.9396) to •d [a = 0.9661) to /3 (a = 0.9864) to a (a = 0.9877).
Therefore we are led to the conclusion that ANN is, for each frequency,
the best of the three classifiers with a minimum performance for 5 rhythms.
151
a R.O.C. curves
~K 1~.
Figure 5: a R.O.C. curves for FLD (stars), LRM with h = 0.5 (squares), h — 0.1 (triangles),
h = 0.05 (rhombi) and ANN (circles).
p R.O.C. curves
**• * * * '•••
****
'**
Figure 6: /? R.O.C. curves for FLD (stars), LRM with /i = 0.5 (squares), ft = 0.1 (triangles),
h = 0.05 (rhombi) and ANN (circles).
152
* R.O.C. curves
Figure 7: •& R.O.C. curves for FLD (stars), LRM with h = 0.5 (squares), h = 0.1 (triangles),
h = 0.05 (rhombi) and ANN (circles).
8 R.O.C curves
*a^j*-
* •• •
0.9 * Ht m •
•k •"•**•%•.
*
0.8 • A
07
* A
• •• I
- * *
X 1
0.6
•
0.5
*
•* 1•
0.4 *
* 1
0.3
** 1
0.2
0.1
0
-
I
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
specificity
Figure 8: 8 R.O.C. curves for FLD (stars), LRM with h = 0.5 (squares), h = 0.1 (triangles),
h = 0.05 (rhombi) and ANN (circles).
153
** *
Figure 9: ANN R.O.C. curves for <5 (stars), i9 (triangles), /? (rhombi) and a (circles) region.
30 -
25- -
20 -
15 -
10 -
: I J J i i u i i ii in
0.2 0.4 0.6 0.8
0.2 0.4
0.2
II
0.4 0.6
II I
Figure 12: ANN gene-carrier relatives' i) histogram.
8 Conclusions
A comparison between the statistical methods of FLD and LRM and ANN
approach to evaluate the classification performance of the EEG data taken
from patients affected by the HD was presented.
R.O.C. curve analysis clearly showed the supremacy of non-linear ANN
approach over the classical linear methods {FLD and LRM).
Moreover ANN classified gene-carrier relatives as controls, thus leading
to the conclusion that the EEG is a marker of the phenotipic manifestation of
the HD.
Acknowledgments
We thank Carmela Marangi (I.R.M.A.-C.N.R.) and Fabio Bovenga (Physics
Department, University of Bari) for helpful discussions.
References
1. For general aspects see, e.g., S.E. Folstein, R.J. Leigh, I.M. Parhad, M.F.
Folstein Neurology 36, 1986, 1279 - 1283.
2. R.A. Fisher Annals of Eugenics 7, 1936, 179 - 188. Reprinted in Con-
tributions to Mathematical Statistics, John Wiley, New York (1950).
3. See, e.g., R. Bellotti, M. Castellano, C. De Marzo, N. Giglietto, G.
Pasquariello, P. Spinelli Computer Physics Communications 78, 1993,
17 — 22 and references therein.
4. C M . Bishop Neural Networks for Pattern Recognition Oxford University
Press, Oxford (1995).
5. J.A. Swets "Measuring the accuracy of diagnostic systems" Science, 240,
1285-1293, (1988).
6. M.Stone Journal of the Royal Statist. Society B 36 (1), 1974, 111-147.
M. Stone Math. Operat. Statist. Ser. Statistics 9 (1), 1978, 127 - 139.
G. Wahba, S. Wold Comm. in Statistics, Series A 4 (1), 1975, 1 — 17.
7. See, e.g., for application dealing with electron-hadron discrimination
K.K. Tang Astrophysics Journal 278, 1984, 881
A. Bungener et al. Nuclear Instruments Methods 214, 1983, 261.
8. M. Rosenblatt Annals of Mathematical Statistics 27, 1956, 832 - 837.
E. Parzen Annals of Mathematical Statistics 33, 1962, 1065 — 1076.
9. J. Hertz, A. Krogh, R.G. Palmer Introduction to the theory of neural
computation Addison-Wesley, 1991.
10. D.E. Rumelhart, J.L. McClelland Parallel Distribuited Processing - Vol.1
MIT Press, Cambridge, MA (1986), pp. 318.
157
A. BARALDI
I.S.A.O.-C.N.R., Via Gobetti 101; 3100-Bologna -Italy, baraldi@imga.bo.cnr.it.
R. DE BLASI
Cattedra e Servizio di Neuroradiologia , University ofBari, P.za G. Cesare -70126 Bari, Italy
The objective of this paper is to state the effectiveness of a two-stage learning classification
system in the automatic detection of small lesions from Magnetic Resonance Images (MRIs)
of a patient affected by multiple sclerosis. The first classification stage consists of an
unsupervised neural network module for data clustering. The second classification stage
consists of a supervised learning module employing a plurality vote mechanism to relate each
unsupervised cluster to the supervised output class having the largest number of
representatives inside the cluster. In this paper two different neural network algorithms, i.e.
the Enhanced Linde-Buzo-Gray (ELBG) algorithm and the well-known Self-Organizing Map
(SOM), have been employed as the clustering module in the first stage of the system,
respectively. The results obtained with the two different clustering algorithms have been
qualitatively and quantitatively compared in a set of classification experiments. In these
experiments, ELBG is equivalent to SOM in terms of classification accuracy and superior to
SOM with respect to the visual quality of the output map and robustness to changes in the
order and composition of the data presentation sequence. The results confirm the usefulness
of the neural classification system in the automatic the detection of small lesions.
1 Introduction
In this data set, supervised (labelled) image areas are manually selected by an expert
neuroradiologist to provide the learning algorithms with training and testing data
samples. In this classification framework, the first classification stage consists of a
pixel-based data clustering algorithm. In the second classification stage, a
supervised learning module employing a plurality vote mechanism relates each
unsupervised cluster to the supervised output class having the largest number of
representatives inside the cluster. Classification accuracy is assessed on a test set.
The Enhanced Linde-Buzo-Gray (ELBG) clustering algorithm and the Self-
Organizing Map (SOM) are employed, respectively, as the first classification stage
providing unsupervised learning. ELBG is a novel quantization algorithm capable
of providing a near-optimal solution to the Mean Square Error (MSE) minimisation
problem [10]. Owing to their complementary functional features, the Fully self-
Organizing Simplified Adaptive Resonance Theory (FOSART) clustering network
may be adopted to initialize ELBG [1], [2]. On the one hand, FOSART is on-line
learning, constructive (i.e., the number of processing elements is not fixed by the
user on a a priori basis before processing the data, rather it is set by the algorithm
depending on the complexity of the clustering task according to an optimization
framework) and cannot shift codewords through Voronoi regions. On the other
hand, ELBG is non-constructive, batch learning and capable of moving codewords
through Voronoi regions to reduce MSE.
For comparison with ELBG, SOM is selected from the literature as a well-
known and successful clustering network. SOM is on-line learning, soft-to-hard
(fuzzy-to-crisp) competitive, non-constructive and capable of employing topological
relationships between output nodes belonging to a 2-D output array [7]. The rest of
this paper is organized as follows. A brief overview of SOM, FOSART and ELBG
is provided in section 2. The data set, the classification method and the results are
illustrated in section 3. Conclusions follow in section 4.
2 Clustering networks
2.1 SOM
SOM and FOSART are both (fuzzy-to-crisp) competitive clustering networks, but,
unlike FOSART, SOM employs inter-node distances in a fixed output lattice rather
than inter-pattern distances in input space to compute learning rates. Noticeably,
unlike FOSART, SOM deals with topological relationships (e.g., adjacency) among
output nodes without explicitly dealing with inter-node (lateral) connections [2].
Despite its many successes in practical applications, SOM has some limitations [7]:
termination is not based on optimising any model of the process or its data [1], [2];
the size of the output lattice, the learning rate and the size of the resonance
neighbourhood must be varied empirically from one data set to another to achieve
159
useful results [3]; prototype parameter estimates may be severely affected by noise
points and outliers.
2.2 FOSART
FOSART is a soft-to-hard (fuzzy-to-crisp) competitive, minimum-distance-to-
means clustering algorithm capable of: i) generating processing units and lateral
(intra-layer) connections on an example-driven basis, and ii) removing processing
units and lateral connections on a mini-batch basis (i.e., based on statistics collected
over subsets of the input sequence to average information over the noise on the
data).Potential advantages of FOSART are listed in the following [2]: a)owing to its
soft-to-hard competitive learning strategy, FOSART is expected to be less prone to
being trapped in local minima and less likely to generate dead units than hard
competitive alternatives [3]; b)owing to its neuron removal strategy, it is robust
against noise; c) feed-back interaction between attentional and orienting subsystems
allows FOSART to self-adjust its network size depending on the complexity of the
clustering task; d) the expressive power of networks that incorporate competition
among lateral connections in a constructive framework, like FOSART and the
Growing Neural Gas (GNG) [2], is superior to that of traditional constructive or
non-constructive clustering systems (e.g., SOM) which employ no lateral
connection explicitly [2]. As a consequence, FOSART, features an application
domain extended to: vector quantization; entropy maximization; and structure
detection in input data to be mapped in a topologically correct way onto submaps of
an output lattice pursuing dimensionality reduction [1].
2.3 ELBG
ELBG is non-constructive, batch learning and capable of moving codewords
through Voronoi regions to reduce MSE. In ELBG, templates eligible for being
shifted and split are those whose "local" contribution to the MSE value is,
respectively, below and above the mean distorsion. Templates eligible for being
shifted are selected sequentially and those eligible for being split are selected
stochastically (in a way similar to the roulette wheel selection in genetic
algorithms). Each selected pair of templates is adjusted locally based on the
traditional LBG (c-means) batch clustering algorithm [8]. In [10] ELBG is
initialized either randomly or with the splitting-by-two technique proposed in [8]. In
this work ELBG is initialised with the FOSART network.
160
3 Experimental results
Figure 1 Input images: (a) Tl MP - RAGE; (b) Tl-SE; and (c) labeled image areas.
Table 1
and ii) assess the utility of Tl MP-RAGE vs Tl-SE images. Let us identify a
labelled (supervised) pixel as an input-output vector pair (Xj,Yj), where
Xi=(/|,i,...,/|,D) £ ^ D is an input data vector, D is the input space dimensionality,
_/i?kS R , i*=l,..., M, k=l,...,D, is the feature component, M represents the number of
input patterns, while Yj=(yj,i,...,>>;. J , i=l,..., M, is the output labelling vector and L
is the total number of classes. Classification results are averaged over three runs
where a different selection of training and testing data sets is adopted. During
learning, the unsupervised first stage of the TSH classifier employs a training set
where data labels are ignored, while the supervised second classification stage is
trained with labelled data, i.e., with (input, output) data pairs. Once the first
classification stage reaches convergence, the second classification stage is trained to
relate each cluster, detected by the first stage, to the supervised output class Yj
having the largest number of representatives inside the cluster.
In the first set of classification experiments, the unsupervised first stage of the TSH
classifier is implemented as an ELBG module initialised by a FOSART clustering
network. In the first classification stage the number of input units is equal to the
number D of input spectral bands considered, whereas the number of output nodes
depends on the FOSART input parameter p. Increasing values of input vigilance
threshold p are employed until the testing Overall Classification Accuracy (OA) of
the TSH classification system remains constant or start decreasing. Table 2 shows
the training and testing results of this first TSH classifier when the Tl MP-RAGE,
T2-SE and PD-SE image bands are used as input. The number of output nodes
detected by FOSART and the different p values employed by FOSART are
reported in the first and second column of Table 2, respectively. The number of
epochs required to train the TSH system is shown in the third column of Table 2 as
the sum of FOSART and ELBG training epochs. The OA percentage values of
FOSART and ELBG are reported in columns 4 and 5 for training data and column 6
and 7 for testing data, respectively. Figure 2(a) shows the classified image obtained
by the first TSH system that employs 174 output clusters and the Tl MP-RAGE,
T2-SE, PD-SE bands as input. A lower qualitative and quantitative performance is
obtained using Tl-SE in place of Tl MP-RAGE when the traditional Tl-SE image
replaces the Tl MP-RAGE band.
Table 2 ELBG Average Results. Input data: T1_MPR; PD_SE; T2JSE
In the second set of experiments, the unsupervised first stage of the second TSH
classifier is implemented as a SOM where the number of input nodes is set equal to
the input space dimensionality D=3, while the number of output nodes is set equal
to the number of nodes detected in the first experiment by FOSART, to make any
classification comparison between the two experiments consistent. Table 3 shows
the training and testing results obtained by this second TSH classification system
when the Tl MP-RAGE, T2-SE and PD-SE image bands are used as input. These
results are almost equivalent to those shown in Table 2. The SOM learning rate a is
set to 0.02 in all simulations and the number of training epochs is set equal to the
total number of epochs required by the first TSH system to train. This number of
epochs is considered sufficient for SOM (in fact, the OA values of SOM do not
change significantly when the number of training epochs is increased). Figure 2 (b)
shows the image obtained by the second TSH classifier with the Tl MP-RAGE, T2-
SE, PD-SE input bands and 174 output clusters. In terms of performance stability
with respect to changes in the order and composition of the presentation sequence,
SOM features an OA standard deviation of 0.7 % during testing.
ujtoO.OO uito0,00
CO
^00,00
%)0,00
200,00 -
200,00
100,00
100,00
0,00
0,00
0 5 10 15 5 10 15
iteration number iteration number
Besides quantitative evaluations, Figures 2(a) and 2(b), generated by two TSH
classification systems employing 174 clusters are qualitatively compared by an
expert neuroradiologist who considers Figure 2(a), generated by ELBG, more
significant than Figure 2 (b), produced by SOM. In this example, SOM detects more
false positives than ELBG, i.e., SOM tends to overestimate the lesion class to which
many interface areas located between white and grey matter are assigned. Both
ELBG and SOM are incapable of detecting a right frontal lesion, which is visible in
SE sequences but has a normal grey matter appearance in MP-RAGE.
Our experiments in multi-spectral MR image labelling seem to indicate that: a) the
ELBG and SOM clustering networks employed in the TSH classification scheme
are equivalent in terms of classification accuracy, b) ELBG is better than SOM in
164
minimizing MSE at small epoch numbers; c) ELBG is less sensitive to noise and/or
false positive and this allows a more correct identification of multiple sclerosis
lesions; c) ELBG is more stable than SOM with respect to small changes in the
order and composition of the presentation sequence.
Future works will assess the utility of interslice and intersubject MR data in the
detection of multiple sclerosis lesions by means of two-stage supervised learning
classifiers where both classification stages employ labelled data pairs to train. In
this type of classifiers, the density of clusters (basis functions) is made independent
of input vector density but dependent on the complexity of the (input, output)
mapping at hand, to avoid generation of mixed clusters of input vectors that are
closely spaced in input space but belong to different classes.
References
G. PERCHIAZZI, G. HEDENSTIERNA
Department of Clinical Physiology,
Uppsala University Hospital, S-75185 Uppsala, Sweden
e-mail: zperchiazzi(d).vahoo.com
1 Introduction
In animal species, the major task of the respiratory system is to exchange gases
between blood and atmosphere. In order to perform this particular task, the system
works like bellows: when the inspiratory muscles contract, the intra-thoracic
volume increases and a negative pressure in the airways is generated. The difference
in pressure between atmosphere and airways determines a gas flow towards the
internal, gas-exchanging part of the lung (alveoli).
Different pathologic conditions can affect this system. Situations that impair the
capacity of exchanging gas ("lung failure") or the efficiency of the gas flow
dynamics ("pump failure"), may require mechanical ventilation. It consists in
making the patient exchange gases by using an endotracheal tube connected to an
external cyclic pump ("mechanical ventilator").
The respiratory system is composed of a conduction system (devoted mainly to
convey gas to the respiratory part of the lung) and a respiratory part (where the gas
exchange effectively takes place). In relation to gas dynamics during artificial
ventilation, the mechanical properties of the respiratory system that have medical
importance are resistance (RRS) and compliance (CRs). The change of RRs and CRS
from their normal value is an indicator of potential pathology.
Different techniques have been proposed to monitor RRS and CRS during
ongoing mechanical ventilation. Among them, the most used is the Interrupted Flow
Technique (IFT). See figure 1. When the flow is constant, the interruption of its
delivery will cause a fall in pressure that is related to the resistive properties of the
166
respiratory system. Maintaining a constant gas volume in the lungs (preventing the
patient from expiring) for some seconds, the recorded pressure in the airways after a
transient, is related mainly to the elastic components of the lung. Although the
described technique remains the gold standard for measuring respiratory mechanics
in ventilated patients, new approaches are necessary. The weak point of IFT is the
necessity of performing a maneuver on a ventilated patient, interrupting the
sequence of ventilation and requiring an operator that pushes an inspiratory-hold
button (end- Inspiratory Hold Maneuver, e-IHM).
25 -i
Airways
Pressure
'PLAT
[cmH2Q] 1
PEEP DROP
Inspiratory
Flpw 0 [s] 1
Tidal Volume
c =
*"RS PPLAT-PEEP
"PEAK ~ PDROP
R
RS_
Inspiratory Flow
2 Review of Literature
Aim of the experiments reported here was to test whether ANNs can assess
respiratory system compliance and resistance starting from airway pressure and
flow (PAW, F A W ) . TO train an ANN it is necessary to provide examples of the
tracings to be faced during its use.
In a preliminary phase, we used a model of the respiratory system developed on a
computer via software, inspired to the studies of Otis [13,17] (See figure 2). The
model provided curves that were obtained under different mechanical conditions.
The ANN had to learn to associate the curves with the RRS and CRS that determined
them. We implemented simulations of mechanical ventilation, varying for
mechanical parameters and ventilatory support. These first experiments showed the
applicability of the method [15,16] .Then we decided to evaluate the performances
of the method in noise-affected conditions.
In a joint project of the Department of Clinical Physiology of Uppsala
University (Sweden) and the Department of Emergency and Transplantation of Bari
University (Italy), we studied an animal model of acute lung injury (ALI). When
moving to biological models, the first problem to face is to provide the large amount
of examples to be used for ANN training.
168
Our idea was to use a well known effect of a substance, oleic acid (OA)
when injected in a central vein of an animal. It modifies, by acting on the lung
structures, the mechanical properties of the lung, creating a time related damage
starting at its administration (see Neumann et al., [12] ). Ten pigs were ventilated in
Volume Controlled - Constant Flow Mechanical Ventilation (VC-CFMV) and ALI
was induced by multiple OA injections. We recorded PAW and FAW at different time
intervals, in order to have different snapshots of respiratory mechanics while the
damage was establishing.
The ANN had to extract RRS and CRS from the recorded curves (that presented
an e-IHM). During the training phase, the curves plus the expected RRS and CRs
were given at the same time to the ANN. The expected RRS and CRS were obtained
by applying on each curve the IFT performed manually by an expert. Then the
trained ANN was tested: only the tracings were given and the yielded results were
compared to the expected ones. The ANN was successfully trained. At this point we
fed the ANN with tracings coming from a new group of four pigs. The aim was to
observe its performance in a prospective way. Performance on the assessment of
CRS remained very high, adjustment of ANN implementation was suggested for the
assessment of RRS. The results were published in The Journal of Applied Physiology
[14]. The described experiments demonstrated the applicability of the method by
comparing the gold standard (the IFT) and ANN-based technologies on curves
having an e-IHM.
A further step was to train an ANN to extract CRS from breaths not having an
e-IHM. Twentyfour pigs, ventilated in VC-CFMV, were studied. They underwent
ALI induction by multiple OA injections. At different time intervals, recordings of
169
more than ten breaths were obtained during steady state (see figure 3). At the end of
each series, an e-IHM was performed. This last breath was used to calculate CRS
according to IFT. The breath preceding the one having the e-IHM (and not having
any flow interruption) had to be given to the ANN (Dynamic Breath, DB). We gave
to the ANN the Pressure/Volume loop of each DB and the CRS calculated on the
successive breath (this last having an e-IHM). The ANN had to associate the DB to
the static CRS obtained by IFT on the successive breath. The results showed that
ANNs were able to extract static CRS without needing to stop inspiratory flow [18] .
Artificial
Airways Neural Interrupter
Pressure Network Technique
time
Figure 3: Experimental design for training ANNs without using e-IHM
Conclusions
1. Abel, E.W., P.C. Zacharia, A. Forster, and T.L. Farrow. Neural network
analysis of the EMG interference pattern. Med Eng Phys 18 (1996) pp. 12-
17.
2. Allen, J. and A. Murray. Comparison of three arterial pulse waveform
classification techniques. J Med Eng Technol20 (1996) pp. 109-114.
3. Baxt, W.G. Application of artificial neural networks to clinical medicine.
Lancet 346 (1995) pp. 1135-1138.
4. Bright, P., M.R. Miller, J.A. Franklyn, and M.C. Sheppard. The use of a
neural network to detect upper airway obstruction caused by goiter. Am J
Respir Crit Care Med 157 (1998) pp. 1885-1891.
5. Cross, S.S., R.F. Harrison, and R.L. Kennedy. Introduction to neural
networks. Lancet 346 (1995) pp. 1075-1079.
6. Dybowski, R. and V. Gant. Artificial neural networks in pathology and
medical laboratories. Lancet 346 (1995) pp. 1203-1207.
7. Heden, B., H. Olin, R. Rittner, and L. Edenbrandt. Acute myocardial
infarction detected in the 12-lead ECG by artificial neural networks.
Circulation 96 (1997) pp. 1798-1802.
8. Huang J., H., Y. Lu, A. Nayak, and J. Roy R. Depth of anesthesia estimation
and control. Ieee Transactions On Biomedical Engineering 46 (1999) pp. 76-
81.
9. Jando, G., R.M. Siegel, Z. Horvath, and G. Buzsaki. Pattern recognition of
the electroencephalogram by artificial neural networks. Electroencephalogr
Clin Neurophysiol 86 (1993) pp. 100-109.
10. Leon, M.A. and F.L. Lorini. Ventilation mode recognition using artificial
neural networks. Comp Biomed Res 30 (1997) pp. 373-378.
11. Leon, M.A., J. Rasanen, and D. Mangar. Neural network-based detection of
esophageal intubation. Anesth Analg 78 (1994) pp. 548-553.
12. Neumann, P., J.E. Berglund, E.F. Mondejar, A. Magnusson, and G.
Hedenstierna. Dynamics of lung collapse and recruitment during prolonged
breathing in porcine lung injury. J Appl Physiol 85 (1998) pp. 1533-1543.
13. Otis, A.B., C.B. Mckerrow, R.A. Bartlett, J. Mead, M.B. Mcilroy, N.J.
Selverstone, and E.P. Radford. Mechanical factors in distribution of
pulmonary ventilation. J.Appl.Physiol. (1956) pp. 427-443.
14. Perchiazzi, G., M. Hogman, C. Rylander, R. Giuliani, T. Fiore, and G.
Hedenstierna. Assessment of respiratory system mechanics by artificial
neural networks: an exploratory study. J Appl Physiol 90 (2001) pp. 1817-
1824.
15. Perchiazzi, G., L. Indehcato, N. D'Onghia, C. Coniglio, A.M. Fanelli, and R.
Giuliani. Assessing respiratory mechanics of inhomogeneous lungs using
171
EYTAN DOMANY
Department of Physics of Complex Systems, Weizmann Institute of Science,
Rehovot 76100, Israel
E-mail: eytan.domany@weizmann.ac.il
DNA chips are novel experimental tools that have revolutionized research in molec-
ular biology and generated considerable excitement. A single chip allows simul-
taneous measurement of the level at which thousands of genes are expressed. A
typical experiment uses a few tens of such chips, each focusing on one sample -
such as material extracted from a particular tumor. Hence the results of such an
experiment contain several hundred thousand numbers, that come in the form of
a table, of several thousand rows (one for each gene) and 50 - 100 columns (one
for each sample). We developed a clustering methodology to mine such data. I
provide here a very basic introduction to the subject, with no prior knowledge of
any biology assumed. I will explain what genes are, what is gene expression and
how it is measured by DNA chips. I will also explain what is meant by "cluster-
ing" and how we analyze the massive amounts of data from such experiments. I
will present results obtained from analysis of data obtained from brain tumors and
breast cancer.
1 Introduction
This talk and paper has three parts, aimed at explaining the meaning of its
title. The first part is a crash course in biology, starting from genes and
transcription and ending with an explanation of what DNA chips are. The
second part is an equally concise introduction to cluster analysis, leading to
a recently introduced method, Coupled Two-Way Clustering (CTWC), that
was designed for the analysis and mining of data obtained by DNA chips. The
third section puts the two introductory parts together and demonstrates how
CTWC is used to obtain insights from the anaysis of gene expression data in
several clinically relevant contexts, such as colon cancer and leukemia.
NUCLEUS ^ ^-CELL
\
MBOSOMR
D.N'A
4
Figure 1. Caricature of a eucaryotic cell: its nucleus contains DNA, whereas the ribosomes
are in the cytoplasm.
PROTEIN SYNTHESIS:
Figure 2. Transcription involves synthesis of mRNA, a copy of the gene encoded on the
DNA (left). The mRNA molecules leave the nucleus and serve as the template for protein
synthesis by the ribosomes (right).
tions and the proteins that perform these functions are very different; cells
in our retina need photosensitive molecules, whereas our livers do not make
much use of these. A gene is expressed in a cell when the protein it codes
for is actually synthesized.
There will be differences between the expression profiles of different cells,
and even in a single cell there are variations of expression that are dictated
by external and internal signals that reflect the state of the organism and the
cell itself.
Synthesis of proteins takes place at the ribosomes. These are enormous
machines (made also of proteins) that read the chemical formulae written
on the DNA and synthetise the proten according to the instructions. The
ribosomes are in the cytoplasm, whereas the DNA is in the protected envi-
ronment of the nucleus. This poses an immediate logistic problem - how does
the information get transferred from the nucleus to the ribosome?
2.2 Transcription
The obvious solution of information transfer would be to rip out the piece of
DNA that contains the gene that is to be expressed, and transport it to the
cytoplasm. The engineering analogue of this strategy is the following. Imagine
an architect, who has a single copy of a design for a building, stored on the
hard disk of his PC. Now he has to transfer the blueprint to the construction
site, in a different city. He probably will not opt for tearing out his hard
disk and mailing it to the site, risking it being irreversibly lost or corrupted.
Rather, he will prepare several diskettes, that contain copies of his design,
and mail these in separate envelopes.
This is precisely the strategy adopted by cells.
When a gene receives a command to be expressed, the corresponding
178
double helix of DNA opens, and a precise copy of the information, as written
on one of the strands, is prepared (see Fig 2). This "diskette" is a linear
molecule called messenger R N A , m R N A and the process of its production,
subsequent reading by the ribosome and synthesis of the corresponding protein
a
is called transcription. In fact, when many molecules of a certain protein are
needed, the cell produces many corresponding mRNAs, which are transferred
through the nucleus' membrane to the cytoplasm, and are "read" by several
ribosomes. Thus the single master copy of the instructions, contained in the
DNA, generates many copies of the protein (see Fig 2). This transcription
strategy is prudent and safe, preserving the precious master copy; at the same
time it also serves as a remarkable amplifier of the genetic information.
A cell may need a large number of some proteins and a small number of
others. That is, every gene may be expressed at a different level. The man-
ner in which the instructions to start and stop transcription are given for a
certain gene is governed by regulatory networks, which constitute one of the
most intricate and fascinating subjects of current research. Transcription is
regulated by special proteins, called transcription factors, which bind to spe-
cific locations on the DNA, upstream from the coding region. Their presence
at the right site initiates or suppresses transcription.
This leads us to the basic paradigm of gene expression analysis:
The " biological state" of a cell (or tisue) and the ongoing biological
processes are reflected by its expression profile: the expression levels
of all the genes of the genome. These, in turn, are reflected in the
concentrations of the corresponding mRNA molecules.
"Actually the mRNA is "read" by one end of another molecule, transfer RNA; the amino
acid that corresponds to the triplet of bases that has just been read is attached to the other
end of the tRNA. This process, and the formation of the peptide bond between subsequent
amino acids, takes place on the ribosome, which moves along the mRNA as it is read.
179
- how does one measure, for a given cell or tissue, the expression levels of
thousands of genes?
such vast amounts of data are "mined", to extract from it biologically rele-
vant meaning. Several obvious aims of the data analysis are the following:
2. Groups the tumors ito classes that can be differentiated on the basis of
their expression profiles, possibly in a way that can be interpreted in
terms of clinical classification. If one can partition tumors, on the basis
of their expression levels, into relevant classes (such as e.g. positive vs
negative responders to a particular treatment), the classification obtained
from expression analysis can be used as a diagnostic and thereupeutic
tool 6 .
3. Finally, the analysis can provide clues and guesses for the function of
genes (proteins) of yet unknown role c .
This concludes the brief and very oversimplified review of the biology back-
ground that is essential to understand the aims of this research. In what
follows I present a method designed for mining such expression data.
3 Cluster Analysis
"For example one hopes to use the expression profile of a tumor to select the most effective
therapy.
c
T h e statement "the human genome has been solved" means that the sequences of 40,000
genes are known, from which the chemical formulae of 40,000 proteins can be obtained.
Their biological function, however, remains largely unknown.
181
r(RESOLUTION)
ik
H i l l l J | [ l l l l l l l l lll.l I I I I
Figure 3. Left: Each zebra or giraffe is represented as a point on the neck length - coloration
shape plane. The points form two clouds marked by the black ellipses. At higher resolution
(controlled by the parameter T), we notice that the cloud of the giraffes is in fact composed
of two slightly separated sub clouds. The corresponding dendrogram is presented on the
right hand side.
in fact the data break into two clear clouds; one with small values of L and
E, corresponding to the zebras, and the second - the giraffes - with large L
and E PS 1. The child, not having been instructed, will not know the names
of the two kinds of animals he was exposed to, but I have no doubt that he
will realize that the pictures were taken of two different kinds of creatures. He
has performed a clustering operation on the visual data he has been presented
with.
Let us pause and consider the data and the statements that were made.
Are there indeed two clouds in Fig 3? As we already said, when the data are
seen with low resolution, they appear to belong to a single cloud of animals.
Improved resolution leads to two clouds - and closer inspection reveals that
in fact the cloud of giraffes breaks into two sub-clouds, of points that have
similar colorations but different neck lengths! Apparently there were mature
fully developed giraffes with long necks, and a group of young giraffes with
shorter necks. Finally, when resolution is improved to the level of discerning
individual differences between animals, each one forms his own cluster. Thus
the proper way of representing the structure of the data is in the form of a
dendrogram, also shown in Fig 3. The vertical axis corresponds to a parameter
T that represents the resolution at which the data are viewed. The horizontal
axis is nominal - it presents a linear ordering of the individual data points
(as identified by the final partition, in which each cluster consists of one
individual point). The ordering is determined by the entire dendrogram - it
can be thought of as a highly nonlinear mapping of the data from D to one
dimension. In any clustering algorithm that we use, we should look for the
two features mentioned here, of (a) yielding a dendrogram that starts with
a single cluster of N points and ends with N single-point clusters, and (b)
providing a one-dimensional ordering of the data.
troids. There are several physics related clustering algorithms, e.g. Deter-
ministic Annealing n and Coupled Maps 12 . Deterministic Annealing uses
the same cost function as K-means, but rather than minimizing it for a fixed
value of clusters K, it performs a statistical mechanics type analysis, using a
maximum entropy principle as its starting point. The resulting free energy is
a complex function of the number of centroids and their locations, which are
calculated by a minimization process. This minimization is done by lowering
the temperature variable slowly and following minima that move and every
now and then split (corresponding to a second order phase transition). Since
it has been proved that in the generic case the free energy function exhibits
first order transitions, the deterministic annealing peocedure is likely to follow
one of its local minima.
We use another physics-motivated algorithm, which maps the clustering
problem onto the statistical physics of granular ferromagnets 13 .
Figure 4. Two-way clustering of brain tumor data; the two dendrograms, of genes and
samples, are shown next to the expression matrix.
the genes which is much more relevant to our aims could have been obtained
had we used only a subset of the samples.
Both these examples have to do with reducing the size of the feature
space. Sometimes it is important to use the reduced set of features to cluster
only a subset of the objects. For example, when we have expression profiles
from to kinds of leukemia patients, ALL and AML, with the ALL patients
breaking further into two sub-families, of T-ALL and B-ALL, the separation
of the latter two subclouds of points may be masked by the interpolating
presence of the AML group. In other words, a special set of genes will reveal
an internal structure of the ALL cloud only when the AML cloud is removed.
These two statements amount to a need to work with special submatrices
of the full expression matrix. The number of such submatrices is, however,
exponential in the size of the dataset, and the obvious question that arises is -
how can one select the "right" submatrices in an unsupervised and yet efficient
way? The CTWC algorithm provides a heuristic answer to this question.
generates a few stable sample clusters usually. Hence the next stage typically
involves less than a hundred clustering operations. These iterative steps stop
when no new stable clusters beyond a preset minimal size are generated, which
usually happens after the first or second level of the process.
In a typical analysis we generate between 10 and 100 interesting partitions,
which are searched for biologically or clinically interesting findings, on the
basis of the genes that gave rise to the partition and on the basis of available
clinical labels of the samples. It is important to note that these labels are used
a posteriori, after the clustering has taken place, to interpret and evaluate the
results.
So far CTWC has been applied primarily to analysis of data from various
kinds of cancer. In some cases we used publicly available data, with no prior
contact with the groups that did the original acquisition and analysis. Our
initial work on colon cancer 6 and leukemia 20 fall in this category.
Subsequently we collaborated with a group at the University Hospital
at Lausanne (CHUV) on Glioblastoma - in this work we were involved from
early in the data acquisition stage. Our current collaborations include work
on colon cancer and breast cancer. In the latter case we worked with publicly
available data, but its choice and the challenge to improve on existing analysis
came from our collaborators. We are also involved in work on leukemia and
on meiosis 21 in yeast; finally, the same method was applied successfully 22
to analyze data obtained from an "antigen chip", used to study the antibody
repertoire of subjects that suffer from autoimmune diseases, such as diabetes.
I will limit the discussion here to presentation a few select results obtained
for glioblastoma 23 and for breast cancerItaiMSc.
•T^fTTTT'T"
m
<$- ,!©-
&).<&)
a 5 10 is
Figure 5. The operation 51(G5), clustering all tumors on the basis of their expression
profiles over the genes of cluster G5. A stable cluster, 511 emerges, containing all the
non-primary tumors and only two of the primaries.
1176 genes. For each gene g the measured expression value for tumor sample
s was divided by its value in a reference sample composed of a mixture of
normal brain tissue. We filtered the genes by keeping only those for which the
maximal value of this ratio (over the 36 samples) exceeded its minimal value
by at least a factor of two. 358 genes passed this filter and constituted our
full gene set (71, which was clustered using expression ratios over 5 1 . The
G1(S1) clustering operation (see Fig 4) yielded 15 stable gene clusters. The
complementary operation S1(G1) did not yield any partition of the samples
that could be given clear clinical interpretation.
One of the stable gene clusters, G5, contained 9 genes. When the ex-
pression levels of only these genes are used to characterize the tumors [in the
operation denoted 51(G5)], a large and stable cluster, 511, of 21 tumors,
emerged (see Fig 5. This cluster contained all the 12 astrocytoma and all 4
SC tumors. Three of the remaining 5 tumors of 511 were cell lines and two
were registered as PR GBMs. Pathological diagnosis was redone for these two
tumors; one was found to contain a significant oligoastrocytoma component,
and much of the piece of the other, that was used for RNA extraction, was
diagnosed as of normal brain ifiltrative zone. Hence the expression levels of
G5 gave rise to a nearly perfect separation of PR from non-PR (A and SC
tumors). The genes of G5 were significantly upregulated in PR and downreg-
ulated in A and SC.
These findings made good biological sense, since three of the genes in
G5 (VEGF, VEGFR and PTN) are related to angiogenesis. Angiogenesis is
191
the process of development of blood vessels, which are essential for growth
of tumors beyond a certain critical size, bringing nutrition to and removing
waste from the growing tissue. Since PR GBM are large, Upregulation of of
genes that are known to be involved in angiogenesis is a logical consequence
of the fact that PR GBM are large tumors.
An important application of the method concerns investigation of the
genes that belong to G5; in particular, one of the genes of G5, IGFBP2,
was of considerable interest with little existing clues to its function and role
in cancer development. Our finding, that its expression is strongly correlated
with the angiogenesis related genes came as a surprise that was worth detailed
further study. The co-expression of genes from the IGFBP family with VEGF
and VEGFR has been demonstrated in an independent experiment that tested
this directly for cell lines under different conditions.
This example demonstrates the power of CTWC; a subgroup of genes
with correlated expression levels was found to be able to separate P R from
non-PR GBM, whereas using all the genes introduced noise that wiped out
this separation. In addition, by looking at the genes of this correlated set,
we provided an indication for the role that a gene with previously unknown
function may play in the evolution of tumors.
For other findings of interest in this data set we refer the reader to the
paper by Godard et al 23 .
after-
-C
fttt^d
-fcj
10
iJP 3ar
before
\}0
n \
^
10 20 30 40 50 10 20 30
Genes
Figure 6. The operation S1(G46), clustering all tumors on the basis of the proliferation
related genes of G46. We found a cluster (b) which contained all three samples from
patients for who chemotherapy was successful, taken before the treatment. Cluster (b)
contained 10 out of the 20 "before" samples.
low proliferation rates - these are 'normal breast - like'; (b) samples with in-
termediate, and (c) with high proliferation rates. Interestingly, the "before
treatment" samples taken from all three tumors for which chemotherapy did
succeed were in cluster (b), whereas the corresponding 'after treatment' sam-
ples were in (a), the 'normal breast - like' cluster. Therefore the genes of
G46 can perhaps be used a posteriori, to indicate success of treatment on
the basis of their expression measured after treatment and, more importantly,
may have predictive power with respect to the probability of success of the
doxorubicin therapy, that was used. Intermediate expression of the G46 genes
may serve as a marker for a relatively high success rate of the Doxorubicin
treatment (3/10 versus 3/20 for the entire set of "before treatment" sam-
ples). Clearly these statements are backed only by statistics based on small
samples, but they do indicate possible clinical applications of the method,
provided experiments on more samples strengthen the statistical reliability of
these preliminary findings.
6 Summary
DNA chips provide a new, previously unavailable glimpse into the manner
in which the expression levels of thousands of genes vary as a function of
time, tissue type and clinical state. Coupled Two Way Clustering provides
a powerful tool to mine large scale expression data by identifying groups of
correlated (and possibly co-regulated) genes which, in turn, are used to divide
the samples into biologically and clinically relevant groups. The basic "engine"
used by CTWC is a clustering algorithm rooted in the methodology of and
insight gained from Statistical Physics.
The extracted information may enlarge our body of general basic knowl-
edge and understanding, especially of gene rgulatory networks and processes.
In addition, it may provide clues about the function of genes and their role
in various pathologies; one can also hope to develop powerful diagnostic and
prognostic tools based on gene microarrays.
Acknowledgments
References
C. MARANGI
Dipartimento Interateneo di Fisica, Universita di Bari, 70126 Bari, Italy
Istituto per le Applicazioni del Calcolo "M. Picone ", Sezione di Bari, CNR,
70126 Bari, Italy
E-mail: c. marangi@area. ba. cnr. it
M. ATTIMONELLI
Dipartimento di Biochimica e di Biologia Molecolare, Universita di Bari, 70126 Bari, Italy
M. DE ROBERTIS
Dipartimento di Genetica ed Anatomia Patologica, Universita di Bari, 70126 Bari, Italy
L. NITTI
D.E.T.O., Universita di Bari, 70126 Bari, Italy
Center of Innovative Technologies for Signal Detection and Processing, 70126 Bari, Italy
G. PESOLE
Dipartimento di Fisiologia e Biochimica Generali, Universita di Milano,20133Milano, Italy
C. SACCONE
Dipartimento di Biochimica e di Biologia Molecolare, Universita di Bari, 70126 Bari, Italy
M. TOMMASEO
Dipartimento di Zoologia, Universita di Bari, 70126 Bari, Italy
A novel distance method for sequence classification and intraspecie phylogeny reconstruction
is proposed. The method incorporates biologically motivated definitions of DNA sequence
distance in the recently proposed Chaotic Map Clustering algorithm (CMC) which performs a
hierarchical partition of data by exploiting the cooperative behavior of an inhomogeneous
lattice of chaotic maps living in the space of data. Simulation results show that our method
outperforms, on average, the simple and most widely used approach to intra specie phylogeny
reconstruction based on the Neighbor Joining (NJ) algorithm. The method has been tested on
real data too, by applying it to two distinct datasets of human mtDNA HVRI haplotypes from
different geographical origins. A comparison with results from other well known methods
such as Stochastic Stationary Markov method and Reduced Median Network has also been
performed.
197
1 Introduction
The study of genetic diversity provide a powerful instrument to infer the historical
patterns of human evolution by assessing relationships among populations on the
basis of nucleotide composition of specific DNA sequences [12]. Limiting ourselves
to consider intra specie evolution, we assume that a molecular clock exists so that
DNA mutations appear at a more or less constant speed (on a large time scale) for
all evolutionary lines. That results in a correlation between the mutation rate and the
length of time intervals: the differences at molecular level would play the role of
estimators of the divergence time among groups belonging to the same specie. The
final goal is the reconstruction of a phylogenetic tree, i.e. the evolutionary temporal
lines through which human groups differentiate.
In the debate about the appropriate genetic analysis for evolution studies, a
prominent role has been achieved by analysis of mitochondrial DNA (mtDNA).
Although the mtDNA contains only a small percentage of the total information of
the human genome (0.0006%), it is known to represent an efficient marker of
biological intra specific diversity. This haploid genome is not recombinant and is
transmitted through maternal lines, i.e. it is inherited as a single block or haplotype.
Moreover the mtDNA exists in a large number of copies in each cell and shows a
higher mutation rate than nuclear genes, which appears to be a relevant feature for
estimation of genetic distances and for ancient DNA studies [1]. In particular the
HVRI and HVRII hypervariable regions of the human mtDNA D-loop have been
extensively used to study human population history and to estimate the age of
MRCA (Most Recent Common Ancestor), a still controversial problem [8].
In order to reconstruct a phylogeny within a human population of a given
geographical area, individuals belonging to different groups and sharing the same
pattern of variant sites (haplotype) are clustered in extended macro classes
(haplogroups) according to the measured genetic distance among different
haplotypes. If the haplogroup discrimination is performed in a hierarchical way,
results at different hierarchical levels can be identified as different branch levels of a
phylogenetic tree. It is clear that the choice of a clustering methodology is crucial to
obtain a classification hierarchy which is coherent with the anthropological
observation. Moreover, since we are typically dealing with datasets in high
dimensional spaces (genetic sequences may be as long as several thousands of
nucleotide basis) we are looking for clustering algorithms with low computational
complexity.
In this paper we propose a novel approach to phylogeny reconstruction based
on the recently proposed Chaotic Map Clustering algorithm (CMC) [2,3] which
relies on the cooperative behaviour of a inhomogeneous lattice of coupled chaotic
maps. In the original formulation , CMC is a clustering tool to process an input
dataset of arbitrary nature. To tailor CMC to the specific application we define new
198
2 CMC algorithm
A new clustering algorithm has been recently proposed [2], which is based on the
cooperative behaviour of a inhomogeneous lattice of coupled chaotic maps whose
dynamics leads to the formation of clusters of synchronized maps sharing the same
chaotic trajectory [11]. Cluster structure is biased by the architecture of the
couplings among the maps and a full hierarchy of clusters can be achieved using the
mutual information of map pair states as similarity index. The Chaotic Map
Clustering (CMC) performs a non-parametric partition of a data without prior
assumptions about the number of classes and the geometric distribution of clusters.
In the following we briefly review some bases of the CMC algorithm.
Let us consider a set of N points (representing here DNA sequences) in a D-
dimensional space (with D equal to the number of variant sites in the sequence). We
assign a real dynamical variable x, e [-1,1] to each point and define pair-interactions
JtJ - exp\-dv/2a::) where a is the local length scale and dy is a suitable measure of
distance between points i and j in our D-dimensional space. The time evolution of
the system is given by:
*,('+')=-^-I-V?V^
where C, = JV, ; ., and f(x) = 1 -2x2. Due to the choice of the function/the equation
respect to noise. Hereafter we only describe the guidelines of the method, which can
be viewed as an alternative to bootstrap for assessing the reliability of a cluster
analysis. Further details and applications to different algorithms can be found in
[10,3].
The method can be easily implemented as follows. A set of values V for the
cluster parameters is used to perform a clustering on the N points of a given dataset
and on a number of subsets of size rN (0 <r <1 ) randomly generated from it.
Clustering results for each resample are compared with the ones obtained on the
initial dataset and a suitable defined measure of the overlap of the solutions,
averaged on all resampling, can be calculated as a function of the algorithm
parameters. The step is repeated for all the values in V, and the optimal parameters
are selected as those which maximize the average overlap of the clustering solutions.
3 Simulations
tree 1 tree 2
Lk^k
Figure 1. Trees constructed by connecting 64 arbitrary taxonomic units for simulation purpose.
well known and widely used algorithm belonging to the same class of distance
methods, namely the Neighbor Joining [17].
Two arbitrary trees (treel and tree2), each connecting 64 taxonomic units have
been constructed and displayed in Fig. 1 using the web application for drawing
phylogenetic trees Phylodendron [6]. For each tree, 200 random datasets of
sequences with length of 80 units have been generated by Montecarlo simulations
201
using the program SeqGen [14] with the simple Kimura two-parameter generation
model. The variability has been assumed as uniform throughout the sequence and
low , with a transition versus transversion ratio of 2, and an equal starting
probability for the four nucleotide basis is imposed.
In order to determine the pairwise distances we used the simple Kimura two-
parameter model as for the purpose of the simulation there was no need to obtain
accurate genetic distance estimates.
The sequence distance calculation, as well as the NJ tree reconstruction have
been performed by the PHILIP package's programs DNADIST and NEIGHBOR
respectively [5]. We applied a routine of the same package (TREEDIST) to compute
the Symmetric Distance (SD) of Robinson and Foulds [15] between each
reconstructed tree and the initial tree used for sequence generation.
The Symmetric Distance between two trees is defined as the number of the
partitions (unrooted trees) or clades (rooted trees) that are on one tree and not on the
other. For fully resolved, i.e. bifurcating, trees the Symmetric Distance must be an
even number ranging from 0 to twice the number of internal branches, which for n
Figure 2. Left plot: the difference between CMC symmetric distance from tree 1 and the corresponding
measure for NJ is reported for each simulation. Right plot: the histogram of CMC symmetric distances
from treel (white) is compared with the corresponding one by NJ (black). Overlap regions are displayed
in grey.
units is 4n-6. Odd numbers can be obtained if the input trees can have
rnultifurcations .
The results obtained by NJ have then been compared with the ones produced by
CMC for the same pairwise distance matrix. Before analyzing the results we have to
202
stress that the absolute value of symmetric distance is not related to any statistical
interpretation and that only tree topologies are used in the computation, neglecting
branch length information. In the left plots of Fig. 2 and Fig. 3 we report for both
initial trees the difference, computed for each simulation, between NJ and CMC
symmetric distance from the tree. In the right plots of Fig. 2 and Fig. 3 the
comparison between the SD histograms obtained by CMC and NJ is shown, for the
first and the second tree respectively. The k parameters has been fixed in the range
(3,10) for both cases. We note that, since NJ produces only bifurcating trees and the
bin size is set to one, there are no counts in bins corresponding to odd values of SD.
Although with the above mentioned restrictions on the quantitative interpretation of
SD measure , we observe that, on average, CMC outperforms NJ method.
40 80 80 100 120
simulation index
140 100 180 200
Symmetric Detance
I
Figure 3. Left plot: the difference between CMC symmetric distance from tree2 and the corresponding
measure for NJ is reported for each simulation. Right plot: the histogram of CMC symmetric distances
from tree2 (white) is compared with the corresponding one by NJ (black). Overlap regions are displayed
in grey.
4 Distance Measures
On account of several human population studies based on the HVRI and HVRII
regions, doubts arise on the reliability of classical evolutionary models such as
Jukes-Cantor, Kimura e Maximum Likelihood [7], mainly due to the strong
assumptions they make about a constant mutation rate at different sites.
Here we propose a distance measure that incorporates the biological evidence
of heterogeneous variation rate at different sites which results from a recent
theoretical analysis on site variability, supported by simulation and experimental
data [13].
203
E-=£p;iog(p;)
whereas the index /, running over the number of different nucleotides, represents the
frequency of the nucleotide / at site s calculated with respect to the given dataset.
An appealing feature of 'entropic' distance is the lack of bias by any biological
model of genetic distance, although, depending on the data-set, the information
provided without any complementary assumption on sequence generating processes,
could be insufficient to solve sequence classification ambiguities. Of course the
'entropic' distance is strictly related to the specific context of haplogroup
discrimination. Depending on the dataset under investigation, the two distance
measures can appear as more or less correlated, although they cannot be considered
as equivalent. The main difference is in the correlation among sites introduced by
the site variability definition. Since it is questionable whether site variations along a
sequence have to be considered as really independent, this could be regarded as an
intriguing feature of a sequence classification based on the site variability concept.
- 0 Group I
•s - *
o 3<>
19 Q
Ht>
- o Group II
. o Group III
170 St-
5*
Q21 DJS
an?
Figure 4. Phenogram representing the evolution of the Pacific area's data set. The time scale is given in
terms of the resolution parameter 0. The cluster size at each branching point is represented by a number
and a circle of variable size. On the right, the final classification is reported for clusters of size > 4.
The RMN method was used for deeper insight of haplotype genetic
relationships. It generates a network which harbours all most parsimonious trees.
The resultant network (data not shown) is quite complex, as a consequence of the
high number of haplotypes considered in the analysis, while its reticulated structure
reflects the high rate of homoplasy in the dynamic evolution of the mtDNA HVRI
region. The topological structure of the RMN also evidences three major haplotype
clusters, which reflects the same "haplotype composition" shown by the NJ tree that
has been constructed on the distance matrix computed by the Stationary Markov
Model (Table 1).
6 Conclusions
In this paper we propose a novel distance method for phylogeny reconstruction and
sequence classification that is based on the recently proposed CMC algorithm, as
well as on a biologically motivated definition of distance.
The main advantage of the algorithm relies in high effectiveness and low
computational cost which make it suitable for analysis of large amounts of data in
high dimensional spaces. Simulations on artificial datasets show that CMC
206
Table 1. Comparison of classification results obtained on the Pacific area data set by the CMC method
(V=site variability distance , E=entropic distance), Neighbor Joining performed on Stationary Markov
Model (S) distance matrix, Reduced Median Network (R)
Since we are dealing with a distance method of general applicability , any prior
biological information has to be coded in an ad hoc distance definition, in order to
improve the reliability of sequence grouping. That is the rationale for the
207
introduction of site variability and entropy terms in distance measures that account
for the dependency of classification on the different rates of variation occurring on
sites. Performances obtained by applying both distance definitions on two
population datasets have been compared with classification obtained using SMM
and Reduced Median Network [4].
We found that our method performs as well as the two known techniques but at
lower complexity and computational costs. Moreover, compared to RMN, the
method has the main advantage of providing an easy reading and interpretation of
results regardless the dataset size.
Further investigations are currently carried on regarding the use of CMC
method for phylogenetic inference and the possibility to perform divergence time
estimate by relating internal node depths of CMC trees to the estimated number of
substitutions along lineages.
Acknowledgements
This work has been partially supported by the MURSTPRIN99 and by "program
Biotecnologie, legge 95/95 (MURST 5%)"- Italy.
References
1 Introduction
As more and more complete genomic sequences are being decoded it is becom-
ing of crucial importance to understand how the gene expression is regulated.
A central role in our present understanding of gene expression is played by the
notion of "regulatory network". It is by now clear that a particular expression
pattern in the cell is the result of an intricate network of interactions among
genes and proteins which cooperate to enhance (or depress) the expression
rate of the various genes. It is thus important to address the problem of gene
expression at the level of the whole regulatory network and not at the level of
the single gene 1 ' 2 ' 3,4 ' 5 .
In particular, most of the available information about such interactions
concerns the transcriptional regulation of protein coding genes. Even if this
is not the only regulatory mechanism of gene expression in eukaryotes it is
certainly the most widespread one.
210
In these last years, thanks to the impressive progress in the DNA array
technology several results on these regulatory networks have been obtained.
Various transcription factors (TF's in the following) have been identified and
their binding motifs in the DNA chain (see below for a discussion) have been
characterized. However it is clear that we are only at the very beginning of
such a program and that much more work still has to be done in order to
reach a satisfactory understanding of the regulatory network in eukaryotes
(the situation is somehow better for the prokaryotes whose regulatory network
is much simpler).
In this contribution we want to discuss a new method which allows to re-
construct these interactions by comparing existing biological information with
the statistical properties of the sequence data. This is a line of research which
has been pursued in the last few years, with remarkable results, by several
groups in the world. For a (unfortunately largely incomplete) list of references
s e e 2,3,4,5,6,7,8,9 j n p a r t i c u i a r the biological input that we shall use is the fact
that some genes, being involved in the same biological process, are likely to be
"coregulated" i.e. they should show the same expression pattern. The simplest
way for this to happen is that they are all regulated by the same set of TF's.
If this is the case we should find in the upstream" region of these genes the
same T F binding sequences. This is a highly non trivial occurrence from a sta-
tistical point of view and could in principle be recognized by simple statistical
analysis.
As a matter of fact the situation is much more complex than what appears
from this idealized picture. TF's not necessarily bind only to the upstream
region. They often recognize more than one sequence (even if there is usually
a "core" sequence which is highly conserved). Coregulation could be achieved
by a complex interaction of several TF's instead than following the simple
pattern suggested above. Notwithstanding this, we think that it is worthwhile
to explore this simplified picture of coregulation, for at least three reasons.
• Even if in this way we only find a subset of the TF's involved in the coreg-
ulation, this would be all the same an important piece of information: It
would add a new link in the regulatory network that we are studying.
• Analyses based on this picture, being very simple, can be easily per-
formed on any gene set, from the few genes involved in the Glycolysis
(the first example that we shall discuss below) up to the whole genome
(this will be the case of the second example that we shall discuss). This
"With this term we denote the portion of the DNA chain which is immediately before the
starting point of the open reading frame (ORF). We shall characterize this region more
precisely in sect.3 below.
211
feature is going to be more and more important as more and more DNA
array experiment appear in the literature. As the quantity of available
data increases, so does the need of analytical tools to analyze it.
2 Transcription factors
As mentioned in the introduction, a major role in the regulatory network is
played by the Transcription Factors, which may have in general a twofold action
on gene transcription. They can activate it by recruiting the transcription
212
2.1 Classification
Even if TF's show a wide variability it is possible to try a (very rough) clas-
sification. Let us see it in some more detail, since it will help understanding
the examples which we shall discuss in the following sections. There are four
main classes of binding sites in eukaryotes.
• Promoters
These are localized in the region immediately upstream of the coding
region (often within 200 bp from the transcription starting point). They
can be of two types:
• Response Elements
These appear only in those genes whose expression is controlled by an
external factor (like hormones or growing factors). These are usually
within lkb from the transcription starting point. Binding of a response
element with the appropriate factor may induce a relevant enhancement
in the expression of the corresponding gene
• Enhancers
these are regulatory elements which, differently from the promoters, can
act in both orientations and (to a large extent) at any distance from the
transcription starting point (there are examples of enhancers located even
213
• Let us denote with M the number of genes in the coregulated set and
with gi, i = 1 • • • M the genes belonging to the set
• Let us denote with L the number of base pairs (bp) of the upstream non
coding region on which we shall perform our analysis. It is important to
define precisely what we mean by "upstream region" With this term we
denote the non coding portion of the DNA chain which is immediately
before the transcription start site. This means that we do not consider as
214
part of this region the UTR5 part of the ORF of the gene in which we are
interested. If we choose L large enough it may happen that other ORFs
are present in the upstream region. In this case we consider as upstream
region only the non coding part of the DNA chain up to the nearest ORF
(even if it appears in the opposite strand). Thus L should be thought of
as an upper cutoff. In most cases the length of the upstream region is
much smaller and is gene dependent. We shall denote it in the following
as L(g).
Let us call U the collection of upstream regions of the M genes p i , . . . QM- Our
goal is to see if the number of occurrences of a given word W{ in each of the
upstream regions belonging to U shows a "statistically significant" deviation
(to be better defined below) from what expected on the basis of pure chance.
To this end we perform two types of analyses.
As we have seen the critical point of this analysis is in the choice of the
reference sample. We try to avoid the bias induced by this choice by crossing
the above procedure with a second level of analysis
The main change with respect to the previous analysis is that in this case we
extract the reference probabilities for the n-words from an artificial reference
sample constructed with a Markov chain algorithm based on the frequencies
of k-words with k << n (usually k = 1,2 or 3) extracted from the upstream
regions themselves. Then the second and third step of the previous analysis
follow unchanged. The rationale behind this second approach is that we want
to see if in the upstream region there are some n-words (with n = 7 or 8,
say) that occur much more often than what one would expect based on the
frequency of the A;-words in the same region.
These two levels of analysis are both likely to give results that are biased
according to the different choices of reference probabilities that define them.
However, since these biases are likely to be very different from each other, it
is reasonable to expect that by comparing the results of the two methods one
can minimize the number of false positives found.
Table 2: Probability p(n, 7) of finding a ra-word in the upstream region of all the 7 genes
involved in glycolysis. In the first column the value of n, in the second the result obtained
using the background probabilities. In the last two columns the result obtained with the
Markov chains with k = 1 and k = 2 respectively.
three of them: STRE, MIG1 and UME6 (for the meaning of these abbrevi-
ations see again 4 ) were previously known to be involved in glucose-induced
regulation process, while for the two other known motifs: PAC and RRPE
this was a new result. We consider the fact of having found known regulatory
motifs a strong validation of our method.
Finally we also found a new binding sequence: ATAAGGG, which we
could not associate to any known regulatory motif.
7 Conclusions
We have proposed two new methods to extract biological information on the
Transcription Factors (and more generally on the mutual interactions among
genes) from the statistical distribution of oligonucleotides in the upstream re-
gion of the genes. Both are based on the notion of a "regulatory network"
responsible for the various expression patterns of the genes, and aim to find
common binding sites for TFs in families of coregulated genes.
• In the direct method, once the set of coregulated genes has been chosen,
no further external input is needed. The significance criterion of our
candidates binding sites only depends on the statistical distribution of
oligonucleotides in the upstream region (or in nearby regions used as test
samples)
Even if they already give interesting results, both our methods are far from
being optimized. In particular there are three natural directions of improve-
ment.
b] Recognizing dyad like binding sequences (see for instance 7 ) which are
rather common in eukaryotes,
220
Needless to say the candidate binding sequences that we find with our
method will have to be tested experimentally. However our method could help
to greatly reduce the number of possible candidates and could be used as a
guiding line for experiments.
References
1. M. Ptashne and A. Gann, Nature 386 (1997) 569
2. A. Wagner, Nucleic Acids Research 25 3594-3604 (1997).
3. S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho and G.M. Church,
Nature Genetics 22 281-285 (1999).
4. Y. Pilpel, P. Sudarsanam and G.M. Church, Nature Genetics 29 153-159
(2001). Web supplement:
http://genetics.med.harvard.edu/~tpilpel/MotComb.html
5. H.J. Bussemaker, H. Li and E.D. Siggia, Nature Genetics 27 167-171
(2001).
6. J. van Helden, B. Andre and J. Collado-Vides, J. Mol. Biol. 281 827-842
(1998).
7. J. van Helden, A. F. Rios and J. Collado-Vides, Nucleic Acids Research
28 1808-1818 (2000).
8. J. D. Hughes, P. W. Estep, S. Tavazoie and G. M. Church, J. Mol. Biol.
296 1205-1214 (2000).
9. R.Hu and B. Wang, Archive: http://xxx.sissa.it/abs/physics/0009002
10. M. Caselle, F. DiCunto and P.Provero, "Correlating overrepresented up-
stream motifs to gene expression: a computational approach to regula-
tory element discovery in eukaryotes." Submitted to BMC Bioinformat-
ics.
11. B. Alberts et al., Molecular Biology of the Cell (Garland Publishing
Inc., New York, 1994).
12. R.C. Wilkins and J.T. Lis Nucleic Acids Research 26 2672-2678 (1998).
13. J.L. DeRisi, V.R. Iyer and P.O. Brown, Science 278 680-686 (1997).
221
GIUSEPPE CIBELLI
Department of Pharmacology and Human Physiology, University ofBari, P.le G. Cesare 11,
70124 Bari, and Chair of Human Physiology, University ofFoggia Medical School, V.le L.
Pinto, 71100 Foggia
Italy
E-mail: g.cibelli@unifg.it
Extracellular signals trigger important adaptative responses that enable an organism to cope
with a changing environment. The induction of immediate early genes is a key initial event in
responses to diverse stimuli as changes in inducible transcription factor expression lead to a
complex array in transcriptional control signals for use in the coordination of late response
gene expression. The early growth response-1 (Egr-1), first identified as an immediate early
gene induced by mitogenic stimulation, and subsequently shown to be activated by diverse
exogenous stimuli including growth factors, hormones and neurotransmitters, encodes for a
zinc finger transcription factor involved in the regulation of growth and differentiation. This
article reviews recent findings about the expression, the signaling pathways, and the
biological activity of Egr-1 in neuronal cells, following differentiation and apoptosis, as
paradigms for nervous system functioning under physiological and stress-related conditions.
1 Introduction
Stimulated neurons process and transmit the information by either short-term cell
surface-dependent events that immediately process and convey information about
the stimulus, or long-term intracellular messenger systems-mediated events,
inducing changes in gene expression. Immediate-early genes are the first
downstream nuclear targets, activated by different second messenger signaling
cascades, linking membrane events to the nucleus, thus altering the neurons'
responses to subsequent stimuli. These genes are defined by rapid, often transient,
transcriptional induction occurring in the absence of de novo protein synthesis.
Immediate early genes encode many functionally different products such as secreted
proteins and cytoplasmic enzymes. In particular, a subclass of these genes encodes
inducible transcription factors, proteins that control the expression of genes.
By now, the best characterized immediate-early gene-encoded transcription
factors include AP-1, composed of members of the fos and jun families, and the
early growth response (Egr) family of transcription factors. Here, we focus on Egr-
1, the most extensively characterized member of the Egr gene family, first identified
as an immediate-early gene involved of cellular growth and differentiation control
and after confirmed to be a transcriptional regulatory protein. The potential role of
222
The Egr-1 transcription factor [45] also known as zif268 [11], Krox-24 [28]), tis8
[31]), or nerve growth factor induced (NGFI)-A [34]), is a member of the early
growth response family of transcription factors which also includes Egr-2, Egr-3
and Egr-4 [3]. The human Egr-1 consists of 533 amino acids with a calculated
molecular weight of 57 kDa [45], but because of extensive phosphorylation it runs
at 75-80 kDa [8, 55]. Egr-1 contains three zinc finger domains in the C-terminal
portion of the cysteine2-histidine2 subtype, suggesting that it is a DNA binding and
regulatory protein.
The expression of Egr-1 in cell culture systems can be induced by a range of
stimuli, as extensively reviewed by Gashler and Sukhatme [17], including growth
factors, phorbol esthers, hypoxia, ionizing radiations, tissue damage and signals that
result in neuronal excitation, such as membrane depolarization or brain seizures.
Table 1 represents a summary of the stimuli which have been studied with respect to
the expression of Egr-1 in the nervous system.
The time course of Egr-1 expression is typical of inducible transcription
factors mRNAs, and resembles that of c-fos [8, 45]. The expression of Egr-1 has
been extensively studied in the mammalian brain. In the adult, low basal Egr-1
mRNA expression is detected in the rat cortex, amygdala, striatum, cerebellar cortex
and hippocampus [11]. Egr-1 mRNA is expressed at low levels in the early
postnatal rat cortex, midbrain, cerebellum, and brainstem. The Egr-1 message
increases throughout postnatal development to adult levels suggesting a role for
Egr-1 in postnatal maturation of the brain [56].
The architecture of the Egr-1 promoter has been described by several groups
which have cloned the murine [11], rat [10] and human Egr-1 gene [42]. The
upstream region of the mouse Egr-1 gene contains five SREs. In addition, putative
regulatory elements in the Egr-1 promoter include a Spl-, a CRE-, an AP-1-like
elements, and two CCAATT sequences. The human Egr-1 gene promoter contains
these sequence in a conserved position. The SREs are the dominant regulators of
Egr-1 transcription [33]. The SREs mediate Egr-1 responses to TPA, growth factors
and serum [15]. The SRE is a 22 bp segment that contains the inner core sequence
CCA/T-6-GG, similar to the CArG box present in other inducible immediate early
genes, which is the binding element for the serum response factor (SRF), a nuclear
phosphoprotein present in most cell types [51]. As homodimer, the SRF binds to
224
Elk-1, a member of the ternary complex factor family of Ets domain proteins, over
the SREs [33]. The phosphorylation of Elk-1 in response to growth factors and other
stimuli is responsible of activation of transcription. Fig. 1 shows the schematic
representation of the mechanism of Egr-1 transcription induced by the SREs.
STIMULUS
end-kinase
phosphorylation
nuclear translocation
TRANSCRIPTIONAL
Eg r .1 ACTIVATION
Figure 1. Mechanism of Egr-1 transcription induced by the SREs..
4 Structure-function mapping
The structure of a complex formed between the three zinc fingers of the Egr-1 and
its cognate DNA binding site has been extensively analyzed [3]. Four distinct
activation domains have been identified within the Egr-1 molecule, three of them
localized in the N-terminal region [40]. Other investigators described an extensive
activation domain from amino acids 3 to 281 [18]. Finally, a domain for
transcriptional repression is contained between the activation domain and the DNA
binding domain [18]. This repression domain functions as a binding site for two
cellular inhibitors, NGFI-A binding protein 1 and 2 (NAB1 and NAB2) [41, 46],
which may negatively modulate transactivation by Egr-1 [49], thus conferring to
Egr-1 protein a bipartite function in alternatively activating or repressing
225
transcription. The structural features and the defined activity domains of the Egr-1
protein are depicted in fig. 2.
Egr-1 induction has been correlated with the onset of differentiation in several cell
types. In particular, monocytic differentiation of U-937 and HL-60 myeloid
leukemia cells induces Egr-1 expression [26, 25]. Neuronal differentiation has been
extensively investigated by using PC 12 cells as model cell line. Nerve growth factor
causes an initial mitogenic response in PC 12 cells, followed by growth arrest and
differentiation into sympathetic neuron-like cells with extended neurites and
induces sustained activation of extracellular signal-regulated protein kinases (ERK)
[50]. In addition, NGF stimulation of PC12 induces expression of Egr-1 [34, 45].
We have recently reported that the neuropeptide corticotropin-releasing factor
(CRF) induces neurite outgrowth in immortalized locus coeruleus-like CATH.a
cells, suggesting a potential role for CRF as a neurotrophic factor for noradrenergic
locus coeruleus neurons [12]. In addition, we used the CRF-induced CATH.a cells
neurite outgrowth as a bioassay to study CRF signaling in these cells. Our results,
which are summarized in fig. 3, indicate that cAMP-dependent protein kinase
(PKA) inhibitors block CRF differentiation entirely. Likewise, dbcAMP induces
neurite outgrowth of CATH.a cells indistiguishable from CRF-treated cells.
Moreover, we found that CRF induces the transcriptional activity of CREB. The
inhibition of the MAP kinase pathway, in particular inhibition of ERK, also blocks
CRF-induced neurite outgrowth. Furthermore, CRF stimulates the transcriptional
activity of the transcription factor Elk-1.
In PC 12 cells, NGF activates ERK, the kinase translocates to the nucleus and
phosphorylates transcription factors such as Elk-1 [54, 58]. Elk-1 and other
activated transcription factors subsequently induce transcription of those genes
whose gene products are required for the differentiation process. Inhibition of MAP
kinase kinase (MEK) blocks the differentiation of PC 12 cells by nerve growdi factor
[39]. We obtained very similar data with CRF-differentiated CATH.a cells using the
MEK inhibitor PD98059. Neuronal differentiation of PC12 cells can be induced not
only by NGF but also by an increase in the intracellular cAMP concentration. In
226
CATH.a cells, CRF and dbcAMP induced the differentiation of the cells. While
NGF activates ERK as discussed, cAMP activates the cAMP-dependent protein
kinase via binding to the regulatoy subunit of the holoenzyme. In many cell-types,
cAMP antagonizes with the ERK pathway. In PC 12 cells, however, a positive cross-
talk exists between the cAMP and the ERK signaling pathway. cAMP does not only
activate the cAMP-dependent protein kinase in PC 12 cells. Interestingly, cAMP
activates MAP kinase and Elk-1 in PC 12 cells through a pathway involving the
small G-protein Rap-1 [54], which is, in turn, activated by a family of cAMP
binding proteins termed cAMP-GEFs in a cAMP-dependent, but PKA-independent
manner [24]. Our data suggest that both PKA-dependent and PKA-independent
effects of cAMP could account for the attivation of ERK in CATH.a cells as well as
in PC 12 cells, indicating that both the cAMP and the ERK signaling pathways are
involved in signal transduction of CRF.
By using the Egr-1 DNA binding domain as a selective antagonist of Egr-
1-mediated transcription, Levkovitz et ah reported that the expression of this Egr-1
inhibitor construct suppresses neurite outhgrowth elicited in PC 12 cells by NGF, but
not by dbcAMP, indicating that Egr-1 expression is necessary, but not sufficient for
eliciting neurite outgrowth [29]. Conversely, the neuron-specific activator of cyclin-
dependent kinase 5, p35, has been identified as one of the targets of the NGF-
stimulated ERK pathway that are essential for neurite outgrowth in PC 12 cells [20].
The transcription factor Egr-1 is required for induction of p35, as the activation of
ERK by NGF correlates with the observed expression patterns for Egr-1 mRNA and
p35 mRNA and protein. To further define an essential signaling pathway,
downstream of ERK, that leads to CRF-induced neuronal differentiation, we
analyzed the effect of CRF on the transcriptional activity of the Egr-1 promoter.
We showed that CRF activates the Egr-1 reporter strikingly, most likely via the
upstream SREs. The fact that CRF very strongly activated the Egr-1 promoter
suggests that the transcription factor Egr-1 is necessary for the CRF-initiated
CATH.a differentiation process.
CRF
c A M P ^ H-89
RAP-1 PKA
MEK
PD98059 11
ERK
I 1
Elk1 CREB
\ *''
Egr-1
\
responsive genes
neurite
outgrowth
In the last years several reports described Egr-1 as a proapoptotic molecule [32].
Egr-1 biosynthesis was reported to be stimulated in melanoma cells treated with the
apoptotic stimulus thapsigargin [37]. A p5 3-dependent and a p5 3-independent
pathway was proposed to explain the proapoptotic activity of Egr-1. In melanoma
cells, expressing a wild-type p53 protein, Egr-1 directly upregulated transcription of
the p53 gene, followed by the synthesis of p53 mRNA and protein [38]. In contrast,
transcriptional upregulation of the tumor necrosis factor a promoter was proposed
as a mechanisms by which Egr-1 may induce apoptosis in cells expressing a non-
functional p53 protein [2]. In the nervous system, an enhanced expression of Egr-1
has been connected with neuronal apoptosis of cerebellar granule cells [9].
We have studied nitric oxide (NO)-induced changes in gene transcription
in the human neuroblastoma cell line SH-SY5Y that undergo cell death upon
treatment with NOC-18 as NO donor [Cibelli G., Policastro V., Rossler O. and
Thiel G., Nitric oxide-induced programmed cell death in human neuroblastoma cells
is accompanied by the synthesis of Egr-1, a zinc finger transcription factor, J.
Neurosci. Res., in press]. Our results indicate that NO-induced signaling specifically
elevates the transcriptional activation potential of the ternary complex factor Elk-1.
The finding that Elk-1 is part of a NO-induced signaling cascade in neuronal cells
logically requested a search for Elk-1-regulated genes. Therefore, we measured Egr-
1 promoter activities, following administration of NOC-18, and detected an increase
in Egr-1 promoter controlled reporter gene transcription, indicating that the Egr-1
gene is a nuclear target for NO signaling in SH-SY5Y cells. Following the NO-
induced signaling cascade in SH-SY5Y cells, we demonstrated that NO stimulates
the biosynthesis of Egr-1. Furthermore, a striking increase in the transcriptional
activation potential of Egr-1-responsive genes was measured, due to elevated
concentrations of Egr-1. Taken together, these findings suggest that Egr-1 may be
an integral part of the NO-triggered apoptosis signaling cascade in SH-SY5Y
neuroblastoma cells. A model for the mechanism of action of Egr-1 following NO-
induced apoptosis in SH-SY5Y cells is proposed in fig. 4.
228
NO
I
ERK - » - Elk1
Egr-1 -« '
I
Egr-1
responsive genes
I
apoptosis
Figure 4. Mechanism of action of Egr-1 following NO-induced apoptosis in SH-SY5Y cells.
7 Acknowledgements
The author wishes to thank Prof. Gerald Thiel for scientific collaboration and
helpful discussion, Dr. Beatrice Greco for critical reading of the manuscript and
Prof. Carlo Di Benedetta for continuous support on this project.
References
1. Abe K., Kawagoe J., Sato S., Sahara M , Kogure K., Induction of the zinc
finger gene after transient focal ischemia in rat cerebral cortex, Neurosci. Lett.
123 (1991) pp. 248-250.
2. Ahmed M. M., Sells S. F., Venkatasubbarao K., Fruitwala S. M., Muthukkumar
S. , Harp C , Mohiuddin M., Rangnekar V. M., Ionizing radiation-inducible
apoptosis in the absence of p53 liked to transcription factor EGR-1, J. Biol.
Chem. 272 (1997) pp. 33056-33061.
3. Beckmann A. M. and Wilce P. A., Egr transcription factor in the nervous
system, Neurochem. Int. 31 (1997) pp. 477-510.
4. Beckmann A. M., Matsumoto I. and Wilce P. A., AP-1 and Egr DNA-binding
activities are increased in rat brain during ethanol withdrawal, J. Neurochem.
69 (1997) pp. 306-314.
5. Beckmann A. M., Matsumoto I., Wilce P. A., Immediate early gene expression
during morphine withdrawal, Neuropharmacology 34 (1995) pp. 1183-1189.
6. Bhat R. V, Worley P. F, Cole A. J, Baraban J. M., Activation of the zinc finger
encoding gene krox-20 in adult rat brain: comparison with zif268, Brain Res.
Mol. Brain Res. 13 (1992) pp. 263-266.
7. Bing G. Y., Filer D., Miller J. C , Stone E. A., Noradrenergic activation of
immediate early genes in rat cerebral cortex, Brain Res. Mol. Brain Res. 11
(1991)pp.43-46.
229
8. Cao X., Koski R. A., Gashler A., McKiernan M , Morris C. F., Gaffney R, Hay
R. V., Sukhatme V. P., Identification and characterization of the Egr-1 gene
product, a DNA-binding zinc finger protein induced by differentiation and
growth signals , Mol. Cell. Biol. 10 (1990) pp 1931-1939.
9. Catania M. V., Copani A., Calogero A., Ragonese G. I., Condorelli D. F.,
Nicoletti F., An enhanced expression of the immediate early gene, Egr-1, is
associated with neuronal apoptosis in culture, Neuroscience 91 (1999) pp.
1529-1538.
10. Changelian P. S., Feng P., King T. C , Milbrandt J., Structure of the NGFI-A
gene and detection of upstream sequences responsible for its transcriptional
induction by nerve growth factor, Proc. Natl. Acad. Sci. U S A. 86 (1989) pp.
377-381.
11. Christy B. A., Lau L. F., Nathans D., A gene activated in mouse 3T3 cells by
serum growth factors encodes a protein with zinc finger sequences, Proc. Natl.
Acad. Sci. USA 85 (1988) pp. 7857-7861.
12. Cibelli G., Corsi P., Diana G., Vitiello F., Thiel G., Corticotropin-releasing
factor triggers neurite outgrowth of a catecholaminergic immortalized neuron
via cAMP and MAP kinase signalling pathways, Eur. J. Neurosci. 13 (2001)
pp. 1339-1348.
13. Cole A. J., Saffen D. W., Baraban J. M., Worley P. F., Rapid increase of an
immediate early gene messenger RNA in hippocampal neurons by synaptic
NMDA receptor activation, Nature 10 (1989) pp. 474-476.
14. Day M. L., Fahrner T. J., Aykent S., Milbrandt J., The zinc finger protein
NGFI-A exists in both nuclear and cytoplasmic forms in nerve growth factor-
stimulated PC12 cells, J. Biol. Chem. 265 (1990) pp. 15253-15260.
15. DeFranco C , Damon D. H., Endoh M., Wagner J. A., Nerve growth factor
induces transcription of NGFIA through complex regulatory elements that are
also sensitive to serum and phorbol 12-myristate 13-acetate, Mol. Endocrinol. 7
(1993) pp. 365-379.
16. Ebling F. J., Maywood E. S., Staley K., Humby T., Hancock D. C , Waters C.
M., Evan G. I. and Hastings M. H., The role of N-methyl-D-aspartate-type
glutamatergic neurotrasmission in the photic induction or immediate-early gene
expression in the suprachiasmatic nuclei of the Syrian hamster, J.
Neuroendocrinol. 3 (1991) pp. 641-652.
17. Gashler A. and Sukhatme V. P., Early growth response protein 1 (Egr-1):
prototype of a zinc-finger family of transcription factors, Prog. Nucl. Ac. Res.
And Mol. Biol. 50 (1995) pp. 191-224.
18. Gashler A. L., Swaminathan S., Sukhatme V. P., A novel repression module, an
extensive activation domain, and a bipartite nuclear localization signal defined
in the immediate-early transcription factor Egr-1, Mol. Cell. Biol. 13 (1993) pp.
4556-4571.
230
19. Hallahan D. E., Sukhatme V. P., Sherman M. L., Virudachalam S., Kufe D.,
Weichselbaum R. R., Protein kinase C mediates x-ray inducibility of nuclear
signal transducers EGR1 and JUN, Proc. Natl. Acad. Sci. U S A. 88 (1991) pp.
2156-2160.
20. Harada T., Morooka T., Ogawa S., Nishida E., ERK induces p35, a neuron-
specific activator of Cdk5, through induction of Egrl, Nat. Cell. Biol. 3 (2001)
pp. 453-459.
21. Honkaniemi J., Sagar S. M., Pyykonen I., Hicks K. J., Sharp F. R., Focal brain
injury induces multiple immediate early genes encoding zinc finger
transcription factors, Brain Res. Mol. Brain Res. 28 (1995) pp. 157-163.
22. Hu R. M., Levin E. R., Astrocyte growth is regulated by neuropeptides through
Tis 8 and basic fibroblast growth factor, J Clin Invest. 93 (1994) pp.1820-1827.
23. Kaufmann K., Thiel G., Epidermal growth factor and platelet-derived growth
factor induce expression of Egr-1, a zinc finger transcription factor, in human
malignant glioma cells, J. Neurol. Sci. 189 (2001) pp.83-91.
24. Kawasaki H., Springett G. M., Mochizuki N., Toki S., Nakaya M., Matsuda M.,
Housman D. E., Graybiel A. M., A family of cAMP-binding proteins that
directly activate Rapl, Science 282 (1998) pp. 2275-2279.
25. Kharbanda S., Nakamura T., Stone R., Hass R., Bernstein S., Datta R.,
Sukhatme V. P., Kufe D., Expression of the early growth response 1 and 2 zinc
finger genes during induction of monocytic differentiation, J Clin. Invest. 88
(1991)pp.571-757.
26. Kharbanda S., Rubin E., Datta R., Hass R., Sukhatme V., Kufe D.,
Transcriptional regulation of the early growth response 1 gene in human
myeloid leukemia cells by okadaic acid, Cell Growth Differ. 4 (1993) pp. 17-
23.
27. Leah J. D., Herdegen T., Murashov A., Dragunow M., Bravo R., Expression of
immediate early gene proteins following axotomy and inhibition of axonal
transport in the rat central nervous system, Neuroscience 57 (1993) pp.53-66.
28. Lenaire P., Revelant O., Bravo R., Charnay P., Two mouse genes encoding
potential transcription factors with identical DNA-binding domains are
activated by growth factors in cultured cells, Proc. Natl. Acad. Sci. USA 85
(1988) pp. 4691-4695.
29. Levkovitz Y., O'Donovan K. J., Baraban J. M., Blockade of NGF-induced
neurite outgrowth by a dominant-negative inhibitor of the egr family of
transcription regulatory factors, J Neurosci. 21 (2001) pp. 45-52.
30. Lim C.P. , Jain N., Cao X., Stress-induced immediate-early gene, egr-1,
involves activation of p38/JNKl, Oncogene. 16 (1988) pp. 2915-2926.
31. Lim R. W., Varnum B. C , Herschman H. R., Cloning of tetradecanoyl phorbol
ester-induced primary response sequences and their expression in density-
arrested swiss 3T3 cells and a TPA nonproliferative variant, Oncogene 1 (1987)
pp. 263-270.
231
32. Liu C , Rangnekar V. M., Adamson E., Mercola D., Suppression of growth and
transformation and induction of apoptosis by EGR-1, Cancer Gene Ther. 5
(1998) pp. 3-28.
33. McMahon S. B., Monroe J. G., A ternary complex factor-dependent mechanism
mediates induction of egr-1 through selective serum response elements
following antigen receptor cross-linking in B lymphocytes, Mol. Cell. Biol. 15
(1995) pp. 1086-1093.
34. Milbrandt J., A nerve growth-induced gene encodes a possible transcriptional
regulatory factor, Science 238 (1987) pp. 797-799.
35. Moratalla R., Robertson H. A., Graybiel A. M., Dynamic regulation of NGFI-A
(zif268, egrl) gene expression in the striatum, J. Neurosci. 12 (1992) pp. 2609-
2622.
36. Murphy T. H., Worley P. F., Baraban J. M., L-type voltage-sensitive calcium
channels mediate synaptic activation of immediate early genes, Neuron. 7
(1991) pp. 625-635.
37. Muthukkumar S., Nair P., Sells S. F., Maddiwar N. G., Jacob R. J., Rangnekar
V. M., Role of EGR-1 in thapsigargin-inducible apoptosis in the melanoma cell
line A375-C6, Mol. Cell. Biol. 15 (1995) pp. 6262-6272.
38. Nair P , Muthukkumar S., Sells S. F., Han S.-S., Sukhatme V. P , Rangnekar V.
M., Early growth response-1-dependent apoptosis is mediated by p53, J. Biol.
Chem. 272 (1997) pp. 20131-20138.
39. Pang, L., Sawada, T., Decker, S. J. and Saltiel, A. R., Inhibition of MAP kinase
kinase blocks the differentiation of PC-12 cells induced by nerve growth factor,
J. Biol. Chem. 270 (1995) pp. 13585-13588.
40. Russo M. W., Matheny C , Milbrandt J., Transcriptional activity of the zinc
finger protein NGFI-A is influenced by its interaction with a cellular factor,
Mol. Cell. Biol. 13 (1993) pp. 6858-6865.
41. Russo M. W., Sevetson B. R., Milbrandt J., Identification of NAB1, a repressor
of NGFI-A- and Krox20-mediated transcription, Proc. Natl. Acad; Sci. U S A .
92 (1995) pp. 6873-6877.
42. Sakamoto K. M., Bardeleben C , Yates K. E., Raines M. A., Golde D. W.,
Gasson J. C , 5' upstream sequence and genomic structure of the human
primary response gene, EGR-1/TIS8, Oncogene. 6 (1991) pp. 867-871.
43. Sakamoto KM, Fraser JK, Lee HJ, Lehman E, Gasson J C , Granulocyte-
macrophage colony-stimulating factor and interleukin-3 signaling pathways
converge on the CREB-binding site in the human egr-1 promoter, Mol. Cell.
Biol. 14 (1994) pp. 5975-5985.
44. Simpson C. S., Morris B. J., Stimulation of zif/268 gene expression by basic
fibroblast growth factor in primary rat striatal cultures, Neuropharmacology 34
(1995) pp. 515-520.
45. Sukhatme V. P., Cao X., Chang L. C , Tsai-Morris C , Stamenkovich D.,
Ferreira P. C. P., Cohen D. R., Edward S. A., Shows T. B., Curran T., Le Beau
232
M. M., Adamson E. D., A zinc finger -encoding gene coregulated with c-fos
during growth and differentiation, and after cellular depolarization, Cell 53
(1988) pp. 37-43.
46. Svaren J., Sevetson B. R., Apel E. D., Zimonjic D. B., Popescu N. C ,
Milbrandt J., NAB2, a corepressor of NGFI-A (Egr-1) and Krox20, is induced
by proliferative and differentiative stimuli, Mol. Cell. Biol. 16 (1996) pp. 3545-
3553.
47. Svenningsson P., Johansson B., Fredholm B. B., Caffeine-induced expression
of c-fos mRNA and NGFI-A mRNA in caudate putamen and in nucleus
accumbens are differentially affected by the N-methyl-D-aspartate receptor
antagonist MK-801, Brain Res. Mol. Brain Res. 35 (1996) pp. 183-189.
48. Swirnoff A. H. and Milbrandt J., DNA-binding specificity of NGFI-A and
related zinc finger transcription factors, Mol. Cell. Biol. 15 (1995) pp. 2275-
2287.
49. Thiel G., Kaufmann K., Magin A., Lietz M., Bach K., Cramer M., The human
transcriptional repressor protein NAB1: expression and biological activity,
Biochim. Biophys. Acta. 1493 (2000) pp. 289-301.
50. Traverse S., Gomez N., Paterson H., Marshall C , Cohen P., Sustained
activation of the mitogen-activated protein (MAP) kinase cascade may be
required for differentiation of PC 12 cells. Comparison of the effects of nerve
growth factor and epidermal growth factor, Biochem J. 288 (1992) pp. 351-355.
51. Treisman R., Identification and purification of a polypeptide that binds to the c-
fos serum response element, EMBO J. 6 (1987) pp. 2711-2717.
52. Vaccarino F. M., Hayward M. D., Le H. N., Hartigan D. J., Duman R. S.,
Nestler E. J., Induction of immediate early genes by cyclic AMP in primary
cultures of neurons from rat cerebral cortex, Brain Res. Mol. Brain Res. 19
(1993) pp. 76-82.
53. Vaccarino F. M., Hayward M. D., Nestler E. J., Duman R. S. and Tallman J. F.,
Differential induction of immediate early genes by excitatory amino acid
receptor types in primary cultures of cortical and striatal neurons, Mol. Brain
Res. 12 (1992) pp. 233-241.
54. Vossler, M. R., Yao, H., York, R. D , Pan, M.-G., Rim, C. S. and Stork, P. J. S,
cAMP activates MAP kinase and Elk-1 through a B-Raf- and Rap 1-dependent
pathway, Cell 89 (1997) pp. 73-82.
55. Waters C. M., Hancock D. C , Evan G. J., Identification and characterisation of
the egr-1 gene product as an inducible, short-lived, nuclear phosphoprotein,
Oncogene 5 (1990) pp 669-674.
56. Watson M. A., Milbrandt J., Expression of the nerve growth factor-regulated
NGFI-A and NGFI-B genes in the developing rat, Development 110 (1990) pp.
173-183.
233
57. Wilce P. A., Le F., Matsumoto I., Shanley B. C , Ethanol inhibits NMDA-
receptor mediated regulation of immediate early gene expression, Alcohol
Alcohol Suppl. 2 (1993) pp. 359-363.
58. York, R. D., Yao, H., Dillon, T., Ellig, C. L., Eckert, S. P., McCleskey, E. W.
and Stork, P. J. S., Rapl mediates sustained MAP kinase activation induced by
nerve growth factor, Nature 392 (1998) pp. 622-628.
234
G E O M E T R I C A L A S P E C T S OF P R O T E I N FOLDING
CRISTIAN MICHELETTI
International School for Advanced Studies (S.I.S.S.A.) and INFM,
Via Beirut 2-4, 34014 Trieste, Italy
E-mail: michelet@sissa.it
1 Introduction
Two of the properties that distinguish small globular proteins from random
heteropolymers are the ubiquitous presence of recurrent geometrical motifs
and the ability to fold rapidly and reversibly into the native state, i.e. the
shape providing maximum biological activity. It is generally believed that
these special properties are the result of evolutionary pressure to optimise the
protein chemical composition. Recently, an increasing amount of evidence has
accumulated showing that, besides detailed chemistry, also the geometrical
shape of native states has been especially selected to optimise the folding
process 1 . Here we focus on two questions that arise spontaneously from these
considerations. Firstly we try to characterize the main events of the folding
process by using schematic topology-based models. It is found that there are
a number of obligatory steps that heavily influence the whole folding process.
The knowledge of such crucial stages is not only of theoretical interest but
could be used to develop drugs tailored to target viral enzymes. In the next
section we report a validation of such strategy for the HIV-1 protease.
In the last section we focus on a more general problem, namely the ubiq-
uitous presence of secondary motifs (such as helices and sheets) in natu-
235
where To is the known native state, T is a trial structure that has the same
length of T 0 . A s is the contact matrix of structure 5 , whose element Ay is 1
if residues i and j are in contact in the native state (i.e. their Ca separation is
below the cutoff r = 6.5 A) and 0 otherwise. This symmetric matrix encodes
the topology of the protein. The energy-scoring function of Eq. (1) ensures
that the state of lowest energy is attained in correspondence of structures
with the same contact map of To- This, in principle, may lead to a degenerate
ground state since more than one structure can be compatible with a given
contact matrix. In practice, however, unless one uses unreasonably small
values of r, the degenerate structures are virtually identical. In fact, for
r K, 6.5 A the number of distinct contacts is about twice the protein length;
this number of constraints nicely matches the number of degrees of freedom
of the peptide (two dihedral angles for each non-terminal CA), thus avoiding
both under- and over-constraining the ground states.
The introduction of this type of topology-based folding models can be
traced back to the work of Go and Scheraga 9 . For a long time, the interesting
property of these systems was the presence of an all-or-none folding process,
that is the finite-size equivalent of first-order transitions in infinite systems.
This is illustrated in the example of Fig. 1 where we have reported energy
and specific heat of the model applied to the target protein 1HJA; the code
refers to the Protein Data Bank tag. The plotted data were obtained through
stochastic (Monte Carlo) equilibrium (constant-temperature) samplings.
It is interesting to note the presence of a peak, that can be identified with
the folding transition of the model system. At the peak, about 50 % of the
native structure (measured as the fraction of formed native contacts 16,17 ) is
formed, consistently with analogous results on different proteins 15 . It is, how-
ever, possible to investigate the equilibrium properties of the system in finer
237
1hja
-20
-30
-50
-60
100
80
60
d<E>/dT
40
20
Figure 1. Plots of the energy (top) and specific heat (bottom) as a function of temperature
for protein lhja. The curves were obtained through histogram reweighting techniques.
residues were shared by 56% of the sampled structures. On the other hand,
the rarest contacts pertained to interaction between the helix and /?-strands
and between the /3-strands themselves. A different behaviour (see Fig. 2) was
found for barnase, where, again, for overlap of « 40%, we find many contacts
pertaining to the nearly complete formation of helix 1 (residues 8-18), a par-
tial formation of helix 2, and bonds between residues 26-29 and 29-32 as well
as several non-local contacts bridging the /^-strands, especially residues 51-55
and 72-75.
Figure 2. Ribbon plot (obtained with RASMOL) of 2ci2 (left) and barnase (right). The
residues involved in the most frequent contacts of alternative structures that form « 40%
of the native interactions are highlighted in black. The majority of these coincide with
contacts that are formed at the early stages of folding.
Both this picture, and the one described for CI2 are fully consistent with
the experimental results obtained by Fersht and co-workers in mutagenesis
experiments 18 ' 19 . In such experiments, the key role of an amino acid at a given
site is probed by mutating it and measuring the changes in the folding and
equilibrium characteristics. By measuring the change of the folding/unfolding
equilibrium constant one can introduce a parameter, termed Rvalue, which
is zero if the mutation is irrelevant to the folding kinetics and 1, if the change
in folding propensity mirrors the change in the relative stability of the folded
and unfolded states (intermediate values are, of course, possible). Ideally, the
measure of the sensitivity to a given site should be measured as a suitable
susceptibility to a small perturbation of the same site (or its environment).
239
Thus, contribution of the various contacts to the specific heat will be then
proportional to how rapidly the contact is forming as temperature is lowered.
The contacts relevant for the folding process, will be those giving the largest
contribution to Cv at (or above) the folding transition temperature. Armed
with this insight, we can use this deterministic criterion to rank the contacts
in order of importance.
Our simulations on the protease of HIV-1 21 , are based on an energy-
scoring function that is more complex than Eq. (1). As usual, amino acids are
represented as effective centroids placed on Ca atoms, while the peptide bond
between two consecutive amino acids, i, i + 1 at distance r ^ + i is described
by the anharmonic potential adopted by Clementi et al. 22 , with parameters
a = 20, 6 = 2000. The interaction among non-consecutive residues is treated
again in Go-like schemes9 which reward the formation of native contacts with
a decrease of the energy scoring function. Each pair of non-consecutive amino
acids, i and j , contributes to the energy scoring function by an amount:
Votf?
»j
+ 5V1(l-ArHj)(j^j , (4)
where ro = 6.8A, r^- denotes the distance of amino acids i and j in the na-
tive structure and A r ° is the native contact matrix built with an interaction
cutoff, r, equal to 6.5 A. Vo and \\ are constants controlling the strength of
interactions (VQ = 20, V\ = 0.05 in our simulations). Constant temperature
molecular dynamics simulations were carried out where the equations of mo-
tion are integrated by a velocity-Verlet algorithm combined with the standard
Gaussian isokinetic scheme 23,21 . Unfolding processes can be studied within
the same framework by warming up starting from the native conformation
(heat denaturation).
The free-energy, the total specific-heat, Cv, and contributions of the in-
dividual contacts to Cv were obtained combining data sampled at different
equilibrium temperatures with multiple histogram techniques 24 . The ther-
modynamics quantities obtained through such deconvolution procedures did
not depend, within the numerical accuracy, on whether unfolding or refolding
paths were followed.
The contacts that contribute more to the specific heat peak are identified
as the key ones belonging to the folding bottleneck and sites sharing them as
the most likely to be sensitive to mutations. Furthermore, by following several
individual folding trajectories (by suddenly quenching unfolded conformations
below the folding transition temperature, Tf0u) we ascertained that all such
241
R T N 26.27 20, 33, 35, 36, 46, 54, 63, 71, 82, 84, 90 TB, 01,02,03
NLF28 30, 46, 63, 71, 77, 84, TB, 0 2 , 0 3
I N D 29.30 10, 32, 46, 63,71, 82, 84 TB, 01,02,03
S Q V 29.30,31 10, 46, 48, 63, 71, 82, 84, 90 TB,0i,0 2 ,03
APR32 46, 63, 82, 84 TB, 0 2 , 03
Table 1. Mutations in the protease associated with FDA-approved drug resistance 2 0 . Sites
highlighted in boldface are those involved in the folding bottlenecks as predicted by our
approach. Pi refers to the bottleneck associated with the formation of the i-th /3-sheet,
whereas T B refers to the bottleneck occurring at the folding transition temperature Tf0id
(see next Table).
Table 2. Key sites for the four bottlenecks. For each bottleneck, only the sites in the top
three pairs of contacts have been reported.
As for the packing of free spheres, also the present case is sensitive to the
details of the confining geometries when the system is finite. An example of
the variety of shapes resulting from the choice of different confining geometries
is given in Fig. 3.
Figure 3. Examples of optimal strings. The strings in the figure were obtained starting
from a random conformation of a chain made up of N equally spaced points (the spacing
between neighboring points is defined to be 1 unit) and successively distorting the chain
with pivot, crankshaft and slithering moves. A stochastic optimization scheme (simulated
annealing) is used to promote structures that have larger and larger thickness. Top row:
optimal shapes obtained by constraining strings of 30 points with a radius of gyration less
than R. a) R = 6.0, A = 6.42 b) R = 4.5, A = 3.82 c) R = 3.0, A = 1.93. Bottom
row: optimal shapes obtained by confining a string of 30 points within a cube of side L. d)
L = 22.0, A = 6.11 e) L = 9.5, A = 2.3 f) L - 8.1, A = 1.75.
In order to reveal the "true" bulk solution one needs to adopts suitable
boundary conditions. The one that we found most useful and robust was to
replace the constraint on the overall chain density with one working at a local
level. In fact, we substituted the fixed box containing the whole chain, with
the requirement that any succession of n beads be contained in a smaller box
of side /. The results were insensitive (unless the discretization of the chain
was poor) to the choice of n, I and even to replacing the box with a sphere etc.
245
The solutions that emerged out of the optimizaton procedure were perfectly
helical strings, corresponding to discretised approximations to the continuous
helix represented in Fig. 4b, confirming that this is the optimal arrangement.
In all cases, the geometry of the chosen helix is such that there is an
equality of the local radius of curvature (determined by the local bending of
the curve) and the radius associated with a suitable triplet of non-consecutive
points lying in two successive turns of the helix. In other words, among all
possible shapes of linear helices, the one selected by the optimization pro-
cedure has the peculiarity that the local radius of curvature is equal to the
distance of successive turns. Hence, if we inflate uniformly the centerline of
this helix, one observes that the tube contacts itself near the helix axis ex-
actly when successive turns touch. This is a feature that is observed only for
a special ratio c* = 2.512... of the pitch, p, and the radius, r, of the circle
projected by the helix on a plane perpendicular to its axis. As this packing
problem is considerably more complicated that the hard spheres one, we have
little hope to prove analytically that, among all possible three-dimensional
chains, the helix of Fig. 4b is the optimally packed one. However, if we
assume that the optimal shape is a linear helix, it is not too difficult to ex-
plain why the "magic" ratio we can explain why the "magic" ratio p/r = c*
is observed. In fact, when p/r > c* the local radius of curvature, given by
p = r{\ + p2 /(2wr)2), is smaller than the half of the distance of closest ap-
proach of points on successive turns of the helix (see 4a). The latter is given
by the first minimum of 1/2-^2 - 2cos(2?rt) +p2t2 for t > 0. Thus A = p in
this case.
On the other hand, if p/r < c*, the global radius of curvature is strictly
lower than the local radius, and the helix thickness is determined basically by
the distance between two consecutive helix turns: A ~ p/2 if p/r <C 1 (see 4c).
Optimal packing selects the very special helices corresponding to the transition
between the two regimes described above. A visual example is provided by
the optimal helix of Fig. 4b. An instructive quantity to monitor is the ratio,
/ , of the minimum radius of the circles going through each point and any two
non-adjacent points and the local radius. For discretized strings, / = 1 just
at the transition described above, whereas / > 1 in the 'local' regime and
/ < 1 in the 'non-local' regime. In our computer-generated optimal strings,
the value of / averaged over all sites in the chain differed from unity by less
than a part in a thousand.
It is interesting to note that, in nature, there are many instances of the ap-
pearance of helices. It has been shown 10 that the emergence of such motifs in
proteins (unlike in random heteropolymers which, in the melt, have structures
conforming to Gaussian statistics) is the result of the evolutionary pressure
246
Figure 4. Maximally inflated helices with different pitch to radius ratio, c. (a) c = 3.77: the
thickness is given by the local radius of curvature, (b) c = 2.512...: for this optimal value
the local and non-local radii of curvature match, (c) c = 1.26: the maximum thickness is
limited by non-local effects (close approach of points in successive turns). Note the optimal
use of space in situation (b), while in cases (a) and (c), empty space is left between the
turns or along the helix axis.
exerted by nature in the selection of native state structures that are able to
house sequences of amino acids which fold reproducibly and rapidly 38 and
are characterized by a high degree of thermodynamic stability 17 . Further-
more, because of the interaction of the amino acids with the solvent, globular
proteins attain compact shapes in their folded states.
It is then natural to measure the shape of these helices and assess if they
are optimal in the sense described here. The measure of / in a-helices found in
naturally-occurring proteins yields an average value for / of 1.03±0.01, hinting
that, despite the complex atomic chemistry associated with the hydrogen bond
and the covalent bonds along the backbone, helices in proteins satisfy optimal
packing constraints. An example is provided in Fig. 5 where we report the
value of / for a particularly long a-helix encountered in a heavily-investigated
membrane protein, bacteriorhodopsin.
247
1c3w
(first helix)
3.4
3.2 Ft local
R non-local
3
2.8
2.6
2.4
11 21
11
Helix site
Figure 5. Top. Local and non-local radii of curvature for sites in the first helix of bacteri-
orhodopsin (pdb code lc3w). Bottom. Plot of / values for the same sites.
This result implies that the backbone sites in protein helices have an
associated free volume distributed more uniformly than in any other confor-
mation with the same density. This is consistent with the observation 10 that
secondary structures in natural proteins have a much larger configurational
entropy than other compact conformations. This uniformity in the free vol-
ume distribution seems to be an essential feature because the requirement
of a maximum packing of backbone sites by itself does not lead to secondary
structure formation 5 ' 6 . Furthermore, the same result also holds for the helices
appearing in the collagen native state structure, which have a rather different
geometry (in terms of local turn angles, residues per turn and pitch 45 ) from
average a-helices. In spite of these differences, we again obtained an average
/ = 1.01 ± 0.03 very close to the optimal situation.
4 Conclusions
Acknowledgments
Support from INFM, Murst Cofin 1999 and Cofm 2001 is acknowledged.
References
42. Gonzalez O. and Maddocks J. H. Proc. Natl. Acad. Sci. USA 96, 4769
(1999).
43. Buck G., Nature 392, 238 (1998).
44. Cantarella J., Kusner R. B. and Sullivan J. M., Nature 392, 237 (1998).
45. Creighton T. E., Proteins - Structures and Molecular Properties, W.H.
Freeman and Company, New York (1993), pag. 182-188.
46. A. Maritan, C. Micheletti and J. R. Banavar, Phys. Rev. Lett. 84, 3009,
(2000).
251
T H E PHYSICS OF M O T O R P R O T E I N S
G. L A T T A N Z I
International School for Advanced Studies (S.I.S.S.A.) and INFM, via Beirut, 2~4,
34013 Trieste, Italy
E-mail: lattanzi@sissa.it
A. M A R I T A N
InternationalSchool for Advanced Studies (S.I.S.S.A.) and INFM, via Beirut, 2~4,
34013 Trieste, Italy
The Abdus Salam International Center for Theoretical Physics, Strada Costiera 11,
34100 Trieste, Italy
Motor proteins are able to transform the chemical energy of ATP hydrolysis into
useful mechanical work, which can be used for several purposes in living cells.
The paper is concerned with problems raised by the current experiments on mo-
tor proteins, focusing on the main question of conformational changes. A simple
coarse-grained theoretical model is sketched and applied to the motor domain of
the kinesin protein; regions of functional relevance are identified and compared with
up-to-date information from experiments. The analysis also predicts the functional
importance of regions not yet investigated by experiments.
The increasing precision in the observation of single cells and their components
can be compared to the approach of one of our cities by air 1 : at first we notice a
complex network of urban arteries (streets, highways, railroad tracks). Then,
we may have a direct look at traffic in its diverse forms: trains, cars, trucks
and buses traveling to their destinations. We do not know the reason for that
traffic, but we know that it is essential to the welfare of the entire city. If we
want to understand the rationale for every single movement, we need to be at
ground level, and possibly drive a single element of the traffic flow.
In the same way, biologists have observed the complex network of fila-
ments that constitute the cytoskeleton, the structure that is responsible also
for the mechanical sustain of the cell. Advances in experimental techniques
have finally cast the possibility of observing traffic inside the cell. This trans-
port system is of vital importance to the functioning of the entire cell; as ordi-
nary traffic jam, or a dafect in the transportation network of a city, can impair
its organized functionmg, occasional problems in the transport of chemical
components inside the cell can be the cause of serious cardiovascular diseases
or neurological disorders.
252
1.4 Structure
Until 1992, it appeared as though kinesin and myosin had little in common 9 .
In addition to moving on different filaments, kinesin's motor domain is less
than one-half the size of myosin and initial sequence comparisons failed to
reveal any important similarities between these two motors. Their motile
properties also appeared to be quite different.
In the last few years of research, however, the crystal structures of kinesin
have revealed a striking similarity to myosin, the structural overlap pointing
254
Many theoretical models have been proposed for the analysis of protein struc-
ture and properties. The problem with protein motors is that they are huge
proteins, whose size prevents any attack by present all-atoms computer sim-
ulations. To make things worse, even under an optimistic assumption of im-
mense computer memory to store all necessary coordinates, the calculations
needed would show only few nanoseconds of the dynamics, but the conforma-
tional rearrangements usually lie in the time range of milliseconds. Therefore,
a detailed simulation of the dynamics is not feasible and furthermore it is not
guaranteed that all the details can shed light on the general mechanism.
Yet, in recent years, dynamical studies have increased our appreciation of
256
the importance of protein structure and have shed some light on the central
problem of protein folding14'15'16. Interestingly, coarse grained models proved
to be very reliable for specific problems in this field. The scheme is as follows.
Proteins are linear polymers assembled from about 20 amino acid monomers
or residues. The sequence of aminoacids (primary structure) varies for dif-
ferent molecules. Sequences of amino acid residues fold into typical patterns
(secondary structure), consisting mostly of helical (a helices) and sheetlike (/?
sheets) patterns. These secondary structure elements bundle into a roughly
globular shape (tertiary structure) in a way that is unique to each protein
(native state). Therefore, the information on the detailed sequence of amino
acids composing the protein, uniquely encodes its native state. Once the lat-
ter is known, one may forget about the former (this is the topological point of
view).
The GNM is a recently developed simple technique which drives this prin-
ciple to its extreme. It has been applied with success to a number of large
proteins 17 and even to nucleic acids 18,19 .
2.1 Theory
Bahar et al. 20 proposed a model for the equilibrium dynamics of the folded
protein in which interactions between residues in close proximity are replaced
by linear springs. The model assumes that the protein in the folded state
is equivalent to a three dimensional elastic network. The nodes are identi-
fied with the Ca atoms" in the protein. These undergo Gaussian distributed
fluctuations, hence the name Gaussian Network Model.
The native structure of a given protein, together with amplitudes of
atomic thermal fluctuations, measured by x-ray crystallography, is reported
in the Brookhaven Protein Data Bank 21 (PDB). Given the structure of a
protein, the Kirchhoff matrix of its contacts is defined as follows:
l i f r r c
r-=t- = | ~ «- (2)
l 3 [Z)
* \0 ifry>rc
N
r « = - 2_^ Ti/t (3)
"Carbon atoms in amino acids are numbered with Greek letters: for each residue there is
at least one carbon atom, Ca, but there could be also additional carbon atoms, called Cg,
257
that are connected via springs, their separation r^ being shorter than a cutoff
value rc for inter-residue interactions. The diagonal elements are found from
the negative sum of the off-diagonal terms in the same row (or column); they
represent the coordination number, i.e. the number of individual residues
found within a sphere of radius rc. The Kirchhoff matrix is conveniently
used 22 for evaluating the overall conformational potential of the structure:
V = |ARTrAR. (4)
where the integration is carried over all possible fluctuation vectors A R , Zjv
is the partition function, and 5 (J2i ARj) accounts for the constraint of fixing
the position of the center of mass. This integral can be calculated exactly,
yielding:
1
(ARi.ARj) = ^ I [ r - ]i. (6)
TV
^ ( u f c u i O a = 1 Vfc, (8)
0.09
0.08
(a)
0.07
0.06
Loop L l l
0.05
0.04
0.03
0.02
relay
0.01
Figure 1. Normalized fluctuations in the first four vibrational modes of the motor domain
of kinesin. (a) Mode 1: loop L l l experiences the largest fluctuation; (b) Mode 2: vibration
of both ends; (c) Mode 3: microtubule binding regions; (d) Mode 4: switch I.
relay helix, loop L12 (269-268), helix a5 (279-290). The tip of the protein is
again involved in a large amplitude vibration, which is now correlated with
the microtubule binding elements and also with the C-terminal of the protein.
The fourth vibrational mode is depicted in figure 1(d). This mode drives
our attention on the vibrations of the two switches of the motor domain
(switch I: residues 199 and 200, and switch II, residue 232) and the mechanical
elements in their neighborhoods, in particular, those in proximity of switch
I, helix aZ (174-189) and loop L9 (190-202), but there is also a lower peak
corresponding to switch II. This may explain how the chemistry is indeed
affected by a mechanical force acting on the protein. If we suppose that
this force is transmitted through the neck-linker to the C-terminus, then the
elastic structure of the protein transmits these vibrations to the switches, and
therefore the rate of binding and/or dissociation of nucleotides can be affected
by the mechanical force acting on the protein, as observed by Visscher et al. 24 .
3 Conclusions
In this paper we analyzed the motor domain of kinesin with the simple Gaus-
sian Network Model6. This analysis relies on the fact that the conformational
change of the kinesin protein should not be sought in a conformational change
of the motor domain. Indeed, as proposed by Vale et al. 9 , it is likely that mo-
tions within the motor domain are small. It is also unlikely that the catalytic
core undergoes large interdomain motions which are needed to drive efficient
unidirectional motility.
As shown by experiments, the directionality is not determined by the cat-
alytic core, but rather by the adjacent neck-linker region. Therefore the search
for a conformational change of the kinesin motor domain might be fruitless,
since the conformational change may not be observable experimentally, or
detectable by computer simulations. Instead, the transmission of mechanical
strain between regions in the motor domain, is of extreme importance.
Despite its simplicity, the GNM has been used to address such an issue.
The slowest modes drove our attention on structural elements, which were in-
deed shown to be important in recent experiments, but also on other elements
which have not been investigated yet. In particular the GNM analysis seems
to indicate that the tip region (which is also thought to interact with the
neck-linker) plays an important role not only to counterbalance motions of
the other parts, but mostly as a possible mechanical communication channel
b
A more comprehensive analysis, where also the kinetics of motor proteins is taken into
account 2 6 , 2 7 , can be found in GL's PhD Thesis 2 8 , available on request.
262
among the slowest vibrational modes and therefore the structural elements
that are most important for the biological function of the motor.
Our opinion on the way the motor domain of kinesin may effectively make
use of the binding energy of ATP to generate strain on the neck-linker, is as
follows: the phosphate sensor senses the presence of ATP by direct contacts
with the third phosphate. These newly formed contacts activate some of the
slowest vibrational modes of the motor domain, the first and fourth in our
model, for instance.
The vibration of the switch regions and their adjacent parts is accompa-
nied by the activation of other regions which may be far apart in the structure,
yet their vibrations are strongly correlated with those in the proximity of ATP,
in particular the tip.
This works as a mechanical amplifier: its vibrations activate all the slowest
vibrational modes, in particular the first one, activating the relay helix, the
second one activating the neck-linker joined at one of the termini, and the
third one allowing the motor domain to rotate on the microtubule binding
site.
This scheme is consistent with previous experiments and with the current
switch based mechanism, as recently proposed by Kikkawa et al. 23 ; it is also
consistent with the picture obtained by Wriggers25 using all atoms computer
simulations, but requires only a negligible fraction of the corresponding CPU
time (the only CPU intensive calculation being the Jacobi diagonalization of
a symmetric matrix).
In addition our analysis suggests a direct correlation among switch I and
II and the C-terminal part of the domain. This dependence could effectively
explain how the chemistry could be affected by a mechanical force, as observed
by Visscher et al. 24 .
The correlation was weaker (essentially active only in mode 2) for the
N-terminus, which seems to be more stable to vibrational motions, at least in
the available structure. This may imply that chimeras with the neck-linker
attached to the N-terminal of this motor domain, could be less efficient than
their natural counterparts.
More importantly our analysis seems to suggest that a particularly well
designed experiment aimed at constraining the tip of the protein, could af-
fect the communication among mechanical elements of the motor domain, by
killing the main communication channel among the slowest vibrational modes;
therefore such an experiment, if possible, could affect mobility and/or rate of
ATP binding/ADP dissociation.
Our conclusion is that the GNM, or similar coarse grained models, could
be extremely useful in predicting the pathway along which mechanical strain
263
References
1. H. Tiedge et al, Proc. Natl. Acad. Sci. USA 98, 6997 (2001).
2. Y. Ishii et al, TRENDS Biotech. 19, 211 (2001)
3. A. Ishijima and T. Yanagida, TRENDS Bioch. Sci. 26, 438 (2001)
4. H. Lodish et al, Molecular Cell Biology (Scientific American Books, New
York, 2001).
5. J. Howard, Mechanics of Motor Proteins and the Cytoskeleton (Sinauer
Associates, Sunderland, MA, 2001).
6. R.P. Feynman et al, The Feynman Lectures on Physics (Addison-Wesley,
Reading, MA, 1966).
7. J. Howard et al, Nature 342, 154 (1989)
8. S. M. Block et al, Nature 348, 348 (1990)
9. R .D. Vale and R. A. Milligan, Science 288, 88 (2000) and references
therein.
10. I. Rayment et al, Science 261, 50 (1993)
11. F. J. Kull et al, Nature 380, 550 (1996)
12. S. Weiss, Science 283, 1689 (1999)
13. Y. Ishii et al, Chem. Phys. 247, 163 (1999)
14. C. Micheletti et al, Proteins 42, 422 (2001)
15. A. Maritan et al, Phys. Rev. Lett. 84, 3009 (2000)
16. A. Maritan et al, Nature 406, 6793 (2000)
17. A. R. Atilgan et al, Biophys. J. 80, 505 (2001) and references therein.
18. I. Bahar and R. L. Jernigan, J. Mol. Biol. 281, 871 (1998)
19. B. Lustig et al, Nucl. Ac. Res. 26, 5212 (1998)
20. I. Bahar et al, Phys. Rev. Lett. 80, 2733 (1998) and references therein.
21. F. C. Bernstein et al, J. Mol. Biol. 112, 535 (1977)
22. P. J. Flory, Proc. Roy. Soc. London A 351, 351 (1976)
23. M. Kikkawa et al, Nature 411, 439 (2001)
24. K. Visscher et al, Nature 400, 184 (1999)
25. W. Wriggers and K. Schulten, Bioph. J. 75, 646 (1998)
26. G. Lattanzi and A. Maritan, Phys. Rev. Lett. 86, 1134 (2001)
27. G. Lattanzi and A. Maritan, Phys. Rev. E 64, 061905 (2001)
28. G. Lattanzi, Statistical Physics Approach to Protein Motors, PhD thesis,
International School for Advanced Studies, SISSA, Trieste, 2001.
264
1 Introduction
Figure 1. Genesis of diffracted beams by interaction of a three-dimensional crystal with an incident x-ray
beam. Reconstruction of the crystal by inverse Fourier transform.
265
Unfortunately we loose in the experiment the phase value of F(r*) and we are
only able to measure its modulus. Indeed the intensity of each diffracted beam (the
only observable), marked by a triple of integers (h k 1), is related to |Fhki|2 by
is the structure factor with vectorial index h = (h k 1), fj is the scattering factor of the
j * atom (thermal factor included), r, is its position in the unit cell, N is the number
of atoms in the cell and 4>hki is the phase of the structure factor Fhk].
A typical experimental outcome is shown in Fig. 2, where a set of diffracted
beam intensities are collected over an area detector.
266
mi
Figure 2. Reciprocal-space plane of an oxydized form of the enzyme rhodanese, space group C2,
a=156.2 A, b=49.04 A, c=42.25 A, (3=98.6° [from Gliubich F., Gazerro M., Zanotti G., Delbono S.,
Bombieri G. and Bemi R., Active Site Structural Features for Chemically Modified Forms of Rhodanese.
J. Biol. Chem. 271 (1996) pp. 21054-21061].
The question now is: how to obtain the phases from the moduli? If the inverse
Fourier transform of |F(r*)|2 is calculated the result is the Patterson function:
where F(r*) is the complex conjugate of F(r*). Owing to the convolution theorem
we have
Thus P(u) is the autoconvolution of the electron density: its maxima correspond
to the interatomic vectors, not to the atomic positions. These last quantities may be
obtained only if the Patterson is "deconvoluted", which is a quite difficult problem
267
for large structures. In spite of this difficulty, a general suggestion comes out from
eq. (3). If we assume that:
i) the Patterson function is univocally defined from the collected diffraction
moduli;
ii) the Patterson function univocally defines (in principle) the interatomic
vectors;
iii) the interatomic vectors univocally define the crystal structure,
then we can conclude that the set of diffraction moduli contain all the necessary
information to define the crystal structure. The above conclusion encouraged
several scientists to directly obtain phases from the moduli without passing through
the Patterson function: these methods are called "direct methods", and their basic
concepts are described in the section 3.
Since
the larger the number of measured moduli, the larger the amount of saved
experimental information. Accordingly, the aim of each diffraction experiment is to
collect a set of experimental intensities as extended as possible. The ideal situation
occurs when the number of observations is much larger than the numbers of
structural parameters to find (in this case we say that the structure is overdetermined
by the data). It is usual to allocate nine parameters per atom into the asymmetric
unit: i.e., the three coordinates x y z and the six thermal anisotropic parameters by.
In case of scarsity of experimental data four parameters per atom are defined: the
three spatial coordinates and the isotropic vibrational factor B. We will see in the
section 4 that for proteins the use of a simple wavelength diffraction experiment
does not usually contain sufficient information to overdetermine the crystal
structure via one set of the experimental data.
The prior information available before a diffraction experiment usually reduces to:
the positivity of the electron density, i.e. p(r) > 0, and consequently fj > 0 for
j=l,....,N;
the atomicity: i.e, the electrons are concentrated around nuclei.
The above information, even if apparently trivial, constitutes a strong restraint
on the allowed phase values. Indeed, let:
a) S = {h 0 =0,h,,....,h n }
be a finite set of indices, origin included;
268
n
b) H[p(r)]=ZF hi . hj u, Uj
i,j=0
h
^0 hl-li2 rhn
Mi2-h, Fo Fh2.hn
D s =det[(F h i . h .)] = (4)
F F F
hnl>l hn-h2 0
are non-negative. The converse is also true: if Dh > for all S then p is non negative.
Since the analytical expression of Ds may involve phases, (4) may be considered as
a mathematical restraint for the phase values, generated by the positivity of the
electron density distribution.
The above result has been exploited in the crystallographic literature by several
authors: we quote Harker and Kasper [19], Karle and Hauptmann [26], Goedkoop
[16]. More recently the determinantal techniques have been integrated with
probabilistic methods, giving rise to effective procedures for the solution of the
phase problem (Tsoucaris [34]; de Graaf and Vermin [3]).
Let us now come to the problem of extracting the phase information from the
moduli. We observe that the direct passage
is not allowed.
o
Xn
'0'
Figure 3. A unit cell with origin in O. A shift of origin in O' changes the positional vector Y- into T-.
269
The generic j t h atom is in Pj, and rj is its positional vector, F h the structure factor
with vectorial index h when the origin is assumed in O. If we move the origin in O',
the new structure factor will be
N N
Fh = Xf J exp(2rahr j ) = ^ e x p ^ T i i n ^ -X 0 )] = exp(-27tihX0)Fh, (5)
where Xo is the origin shift. We observe that the change of origin generates a phase
change equal to (-27thX 0 ). Many sets {<j>h} are therefore compatible with the same
set of magnitudes, each set corresponding to a given choice of the origin. This
implies that single phases (origin dependent) cannot be determined from the
diffraction moduli alone (observable quantities, and therefore origin independent).
Luckily there are combinations of phases which are origin independent: they only
depend on the crystal structure and therefore can be estimated via the diffraction
moduli. Let us consider the product
F h ,F h2 ...F hn . (6)
The relation (7) suggests that the product of structure factors (6) is invariant
under origin translation if
h, + h 2 +...+ h n = 0.
These products are called structure invariants. The simplest examples are:
N
= me
a) for n=l, F00o /Li^i is simplest structure invariant (Zj is the atomic
j=i
number of j atom);
b) for n=2 eq. (6) reduces to |Fh|2;
c) for n=3 eq. (6) reduces to F hl F h2 F S]+K2 ;
d) for n= 4 the relation (6) reduces to F h F k F,F s+£+j .
Quintet, sextet, ... invariants are defined by analogy. Invariants of order 3 or
larger are phase dependent and therefore potentially useful to solve the phase
problem. Triplets and quartets are the most important invariants.
270
where
which is considered useful for the estimation of the structure invariant is defined.
This may be made via the neighbourhoods principle by Hauptman [20] or by the
representation theory by Giacovazzo [9, 11].
Then the joint probability distribution function
P ( 0 | R h l , R h 2 , . . . . , R h p ) « P(<P|{R})
is calculated.
The mathematical approach for the calculation of (9) is the classical one: the
characteristic function of (9) is first derived (the atomic positional vectors may be
used as random variables uniformly distributed in the unit cell), then its Fourier
transform provides the required distribution. The mostly used structure invariants
for which the distribution (9) is calculated are the triplet and the quartet invariants.
Real space techniques [the set b)] are based on the cycle
271
where {F}j is the set of structure factors at the ith cycle, pj(r) the corresponding
electron density, pmodj(r) the modified electron density function (to match the
expected behaviour of p), {F}; +1 the set of structure factors in the (i+1)* cycle.
These techniques directly exploit the positivity condition without transforming it
into complex reciprocal space relationships.
4 Phasing proteins
The impressive achievement in protein phasing obtained in these last years are
mainly due (see the discussion below) to simultaneous advances in (see Fig. 4):
theoretical development, increasing of the computer power, new radiation sources,
sophisticated experimental techniques.
There are intrinsic difficulties in recovering the protein phases directly from the
diffraction moduli:
a) the large number of atoms in the unit cell (i.e., only in few cases the
asymmetric unit contains less then 500 atoms, often more then 5000). Under these
conditions direct methods provide rather flat phase probability distributions;
b) in accordance with equation (1), the relation
272
Ihk. * 1/V
holds. Since the unit cell volume of a protein is large the diffraction intensities will
be weaker than for small molecules, and therefore their measure less accurate. Quite
efficient experimental techniques are therefore necessary for collecting data for
which the ratio I/a(I) is sufficiently good;
c) protein molecules are irregular in shape and they pack together with gaps
between them filled by a liquid. This constitutes a unordered region which ranges
from 35 to 75 per cent of the volume of the unit cell, giving rise to diffuse
scattering;
d) protein molecules are intrinsically flexible (to secure their biological
function), and their thermal vibration is generically high. Consequently their atoms
are bad scatterers.
The above drawbacks limit the number of observations available by a
diffraction experiment. While, for small molecules, the available number of
reflections per atom in the asymmetric unit is about 100 (i.e., for data up to 0.77 A
resolution), the same number for proteins lowers to about 12.5 if the data resolution
is 1.54 A. Unfortunately the resolution for proteins is usually between 3 and 1.5 A,
so that we are often in the case in which diffraction data do not overdetermine the
crystal structure (number of observation comparable with or inferior to the number
of parameters to define). We will briefly consider two cases: the first occurs when
data resolution is better or equal to 1 A . In this case the ab initio crystal structure
solution of the protein may be directly attempted without any use of supplementary
data. The second case occurs when the resolution is worse than 1 A: supplementary
information is then necessary.
Reciprocal space techniques were able to extend the complexity of the solvable
structures up to 200 atoms in the asymmetric unit. This extreme success has been
overcome in recent years (Weeks et al. [37]) when Shake-and-Bake introduced a
new approach: reciprocal and direct space techniques are cyclically and repeatedly
alternated. An effective variant of Shake-and-Bake is the program Half-baked
SHELX-D (Sheldrick [32]) which preserves the cyclic combination of direct and
reciprocal space techniques, but relies more on real space techniques. A third
program, SIR2000 (Burla et al. [2]) proved able to solve crystal structures with
more than 2000 atoms in the asymmetric unit without any user intervention. It is
mainly based on real space techniques: the role of tangent formula is ancillary.
In all the above mentioned programs the procedure is the following: random
phases are given to a subset of structure factors, and direct methods are applied to
drive them towards the correct values. The approach is a multisolution one: several
random sets are explored to obtain the structure. The computing time necessary to
succeed may be remarkable. As an example, in Table 1 we show, for some protein
structures, the cpu time needed to find the correct solution by application of
SIR2000: Nasym is the number of non-hydrogen atoms in the asymmetric unit, NH o
is the number of bonded water molecule.
273
Table 1. Large Size Structures (up to 2000 atoms in the a. u.); the structure resolution average time is
76.1 hours. Structure references in square brackets.
STRUCTURE „ „ w w SIR2000
CODE Reference Nasym-N, TIME (H)
TOXIN II [33] 508--96 6.3
LACTAL [18] 9 3 5 - 164 52.9
LYSOZIME [4] 1001--108 1.0
OXIDOREDUCTASE [7] 1106--283 78.4
HIPIP [30] 1229--334 76.2
MYOGLOBINE [35] 1241 --186 19.1
CUTINASE [28] 1141 --264 293.2
ISD [6] 1910 -374 87.4
Non-ab-initio methods
The supplementary information is generally provided by:
isomorphous replacement techniques (Green et al. [17]; Bragg and Perutz [1]);
anomalous dispersion techniques (Hoppe and Jakubowski [25]; Hendrickson et
al [23]);
molecular replacement (Rossmann and Blow [31]; Navaza [29])
crystallochemical restraints.
Let us first examine the nature of the first three techniques.
Isomorphous Replacement. The method requires the preparation of one or more
heavy atoms containing derivatives in the crystalline state. The most common
technique is by soaking the protein crystal in a solution of the reagent. Then X-ray
intensity data are collected both for the native protein and for its derivatives . One
speaks of SIR (Single Isomorphous Replacement) or MIR (Multiple Isomorphous
Replacement) according to whether one or more derivatives are available.
Anomalous dispersion. Atomic electrons can be considered as oscillators with
natural frequencies. If the frequency of the primary beam is near to some of these
natural frequencies resonance will take place, and the scattering factor may be
analitically expressed via the complex quantity
Aiso = | F d | - | F P I
can be obtained, where Fd represents the generic structure factor of the derivative, F p
the corresponding structure factor of the native protein. Magnitudes and signs of the
Ajso are determined by the heavy atom substructure; however Aiso does not coincide
with FH (the generic structure factor of the heavy atom substructure), owing to the
fact that
FH = Fd - Fp.
SIR and MIR techniques aim first at finding the heavy atom substructure: then
they use this information as a prior to phase the native protein. Alternative
techniques directly phase the protein from the Aiso (Hauptman [21]; Giacovazzo and
Siliqui [8], Giacovazzo et al. [10, 12-14]). The overall problem is not trivial, mostly
when lack of isomorphism occurs between the native and the derivative (i.e., the
introduction of the heavy atoms in the native crystal structure framework generates
too many conformational changes). In general, the signal (say the Aiso) is of the
same size of the error, and sophisticated techniques have to be used to succeed.
Also SAS and MAD techniques use a two steps procedure: first the substructure
of the anomalous scatterers is found, and then this information is used as prior for
phasing the protein. Alternative techniques directly phasing the protein from the
experimental data have also been proposed (Hauptman [22]; Giacovazzo [15]).
If SAS is used, only the anomalous differences
Aa„o= | F + | - | F |
are employed to locate the anomalous scatterers; however Aano does not coincide
with
275
N
F"= V^fj exp(23rihr.).
j=i
S = Iwj(|FJ|0SS-|FJ|calc)2,
J
where the summation is extended to all the measured reflexions, is not quite useful.
Luckily, bond lengths and valence angles in amino acids are very well known, so
they can be held fixed to their theoretical values during the refinement and only
torsion angles around single bonds are allowed to vary (Diamond [5]). Also a group
of atoms can be treated as a rigid entity when the geometry of the group is believed
insensitive to the environment. This is the classical case of the phenil ring : the
eighteen positional variables are then reduced only to six (three rotational to define
the orientation of the ring, three of translational type to locate the ring). In this way
276
where dj(caiC) is calculated from the structural model, dj(ideai) is the expected value.
Also deviations from the planarity can be minimized (for planar groups), as well as
the volume of the chiral atoms (defined for an a-carbon by the product of the
interatomic vectors of the three atoms bound to it). The above restraints introduce
the amount of information necessary to obtain a quite reliable structural models of
proteins.
References
LIST OF PARTICIPANTS