You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/231035288

The Hebb Rule: Storing Static and Dynamic Objects in an Associative Neural
Network

Article  in  EPL (Europhysics Letters) · July 2007


DOI: 10.1209/0295-5075/7/7/016

CITATIONS READS

15 117

4 authors, including:

Andreas V M Herz Leo van Hemmen


Ludwig-Maximilians-University of Munich Technische Universität München
109 PUBLICATIONS   3,981 CITATIONS    321 PUBLICATIONS   7,580 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Sound Localization using Internally Coupled Ears View project

Underwater hearing and sound localization in Xenopus View project

All content following this page was uploaded by Leo van Hemmen on 03 November 2016.

The user has requested enhancement of the downloaded file.


EUROPHYSICS LETTERS 1 December 1988
Europhys. Lett., 7 (7), pp. 663-669 (1988)

The Hebb Rule: Storing Static and Dynamic Objects in an


Associative Neural Network.
A. HERZ, B. SULZER, R. KUHN and J. L. VAN HEMMEN
Sonderforschungsbereich 123 an der Universittit Heidelberg
0-6900 Heidelberg, Germany

(received 18 July 1988; accepted 20 September 1988)

PACS. 87.10 - General, theoretical, and mathematical biophysics.

Abstract. - The Hebb rule (Hebb, 1949) indicates how information presented to a neural
network during a learning session is stored in the synapses, local elements which act as
mediators between neurons. In this paper we demonstrate that the Hebb rule can be used to
handle both stationary and dynamic objects such as single patterns and cycles. The two main
ideas are: U ) a broad distribution of delays as they occur in the natural dynamics and
b) incorporation of the very same delays during the learning session. Our work shows that the
resulting procedure is robust and faithful.

For theoretical purposes a neural net is most conveniently described as a collection of


formal neurons, Ising spins Si = f 1, connected by synapses whose efficacies are interpreted
as coupling strengths denoted by Jij. Synapses transmit information between neurons.
According to Hebb’s neurophysiological postulate for learning [l], they also store the
information content of data presented to a neural net during a learning session. More
precisely, the Hebb rule has two aspects. First, it is local. Given a synapse transmitting data
from neuron j to neuron i, only the local information presented to neuron i and j determines
the efficacy Jij. Second, Jij is allowed to increase if neuron i is active at the same time as it
receives the message of j’s activity [2,3]. In other words, JU is allowed to increase if the
activities of i and j, as noted by i, are positively correlated.
In this letter we show how Hebb’s principle, if interpreted in a careful way, can be
implemented mathematically so as to handle both static objects (single patterns) and
temporal associations such as tunes and rhythms. That is, after the network has been taught
both single, stationary, patterns and temporal sequences-of course, by the s a m
principle!-it will reproduce them when triggered suitably.
The key idea is that signal delays, omnipresent in the brain and well known for their
stabilization of temporal sequences [4-61, have to be taken into account during the learning
phase as well. This kind of Hebbian learning presupposes a broad distribution of delays [3,7]
and is, therefore, extremely robust. Neither does the performance of the network depend
significantly on the characteristics of the distribution nor it is deteriorated by imprecise
initial conditions such as noisy patterns or the theme BHCH instead of BACH. Our analysis
664 EUROPHYSICS LETTERS

shows those pure Hebbian learning functions through synaptic sezection, as advocated by
Changeux et al. &91.
For the moment we focus our attention on two specific neurons, say i and j , with Jij as the
synaptic efficacy for the information transport from j to i. If neuron j has fired, a solitonlike
pulse propagates via the axon to the synapse at neuron i and it is here, at the synapse,
where the information is stored, provided [2,3] neuron i’s activity is concurrent with or
slightly after the arrival of the pulse coming fromj. The signal transport through the axon
takes a certain amount of time, the delay T, whose distribution is very broad [3,71. So at
time t the signal S j ( t - z)has to be paired with Si(t), the state neuron i is in. In general,
delay can be modelled as a linear filter with a memory kerne1[3].
Let us denote the size of the network by N . Throughout what follows, patterns are
specific king spin configurations { E r ; 1S i < N}, labelled by 1<,U< q. It is only for the sake
of convenience that we work with unbiased random patterns where tf = k 1 with equal
probability. During a learning session, the system is offered a finite sequence of patterns
{t?); 1< i < N } , where v ( t ) is a given function oft. According to Hebb’s principle [ll, at each
instant of time the synaptic efficacy Jij increases or decreases by an infinitesimal amount
proportional to Si(t)Sj(t- 7). Let us suppose that the learning session is of duration T and,
for the sake of simplicity, that we start with a tabula rasa (Jij= 0). Then we obtain, adding
the partial increments,
m

In the integrand, Si(t)= 51(t). N-’ is a trivial scaling factor and E ( Z ) is a weight of the
connection j + i with delay z.In passing we note that the increments in (1)are linear (new
data are just added) and that by imposing upper and lower bounds [lo-121 or by allowing
leakage one can induce forgetfulness. We will not pursue this issue here, however. It is also
worthwhile to realize that the prescription (1) in conjunction with a broad distribution of
delays z strongly deviates from all the cycle generating mechanisms which are known at
present [4-6,9,13,141.
If the pattern shown to the network is stationary, i.e. Sd(t)= Er for some fixed ,U and
0 < t < T , while T exceeds all the 7’s of the system (a natural consequence of the pattern
being stationary), then the J;s become symmetric and, despite the delays, the performance
of the network turns out to be that of the Hopfield model [15]. Plainly, in general the Jiis are
asymmetric.
The Hebb rule (1)gives rise to a retrieval mechanism which is extremely robust. Neither
does its performance depend on the specific model of synaptic transmission nor on the
precise distribution of the delays z.To illustrate this statement we study three models to be
denoted by A ) , B ) , and C).
A) For each pair (i,j) there is a large number of axons (possibly, interneurons) whose
delays zhave a distribution independent of i and j . Summing over the incoming signals, we
obtain the local field

The weights E ( T ) are chosen according to a given distribution of delays 7. Of course, one can
also imagine that only a silzgle axon links each pair of neurons. This idea has led us to
consider two further models.
A. HER2 et al.:THE HEBB RULE: STORING STATIC AND DYNAMIC OBJECTS ETC. 665

B ) In addition to the axonal delay, which is assumed to depend on j only, we have a


synaptic (exponential) delay at i. Then

rj), and where S,(t)incorporates the different types of delay.


where rv stands for the pair (zi,
To simplify the discussion, we only consider the case where ri vanishes so that gj(t)=
= Sj(t - rj). Furthermore, we take c ( r )= 1.
C) For each pair (i,j”) there exists a single axon with delay zij which is sampled from a
given distribution independent of i and j, so that hi(t) is of the form (2b), with
SJt)= S,(t - zij). Again, E(Z) = 1.
The models A) and B ) are random-site problems which are analytically soluble in terms of
sublattice magnetizations [16,17]. Model C), however, is a random-bond problem and no
exact solution is known yet.
The problem of learning and recalling a temporal pattern sequence is well-suited to
illustrate the most salient features of the Hebb rule (1). For the sake of definiteness we
consider a cycle of q unbiased random patterns {tf;1 s i s N } with 1 s p < q, each of
duration A. During the learning session of duration T,we then impose upon the network a
time-dependent stimulus Si(t), 0 s t 6 T, of the form Si(t)= E;@) with v (t)= p (mod q) for
( p - l ) A S t S P A . The corresponding Jij(z) are readily calculated. We write r = (n,+ d,)A,
where n, is either zero or a positive integer and O s d , < l . After some algebra we find

Apart from the a priori weights E ( z ) , eq. (3) exhibits a (resonance phenomenon*: the JQ(7)
with delays z which are integer multiples of A and thus match the timing of the external
stimulus are the ones that receive maximum strength. Note that they are also the ones that
would support a stable cycle of exactly the same period (neglecting transition times) as that
of the external stimulus. Thus, due to a subtle interplay between external stimulus and
internal architecture (distribution of Z’s), the Hebb rule (l),which p r i m facie appears to be
instructive in character, has in fact also pronounced selective aspects [8,91.
The dynamics of the retrieval process is described most conveniently in terms of the
overlaps mJt)= N-’ E! Si@).We concentrate on one specific finite cycle and refer to the
literature [6] for the tieatment of extensively many patterns outside the cycle. Model C)is a
random-bond problem whose analytic solution is not available. So we concentrate on models
A) and B ) . An analytic treatment of their dynamics can be formulated in terms of the
sublattices I ( x )= {i;gt = x} and the corresponding sublattice magnetizations m(x;t). For
full details, see Riedel et al. [SI. The overlaps m,(t) are then reconstructed through
m,, (t)= p (x)xpm (x; t), where x ranges through { - 1, 1}q and p (x) = N-’ II(x)1 is the frac-
x
tion of sites belonging to I(&. It, therefore, suffices to study the m(x; t). For model A), one
easily verifies that, as N + C Q , the local field h,(t)does not depend on the specific site i in I ( x )
and, therefore, may be written h(x;t). In the case of parallel dynamics at inverse
temperature p, where one updates all the spins after each elementary time step At, one
finds [61 the simple equation m(x;t + At) = tgh[ph(x; t)].Here we concentrate on sequential
dynamics of the Glauber type, where [6]
G ( x ; t ) =-T{m(~;t)-tgh[ph(x;t)]}. (4)
666 EUROPHYSICS LETTERS

The E denotes differentiation of m with respect to time and r 2 1 is the mean attempt
rate. We choose the unit of time so that 1s = 1 Monte Carlo step per spin (MCS); whence
r = 1. Because of the delays in h (x;t), eq. (4)is so-called functional differential equation.
In case of model B), one has to extend the notion of sublattice slightly [3] so as to take into
account the delays through I ( x ;9) = {i; gi= x, zi= a}. As for the rest, no change.
If the learning session lasts long enough (T+ a),then the overlaps may be used to derive
a more convenient expression for the Jij than eq. (1) - even if the cycle taught consists of
pulses with a smoother form than the blocks used until now. We only assume that at each
transitions v+ v + 1 the spin flips are stochastic and such that m, + mv+l= 1. By the ergodic
theorem, we then get

where

with T' = qA denoting the duration of the cycle presented to the network.
We now turn to the retrieval performance. As shown by fig. 1-4, the Hebb rule (1)allows
a faithful representation of a cycle which we have taught the network. It is satisfying, even
remarkable, that the retrieval quality hardly depends on the specific model (fig. 1). The

a)
0.5
0

1.0 0
0.5 -2
0 -4

0 50 100 150 200


-2
-4
'r/fl
0 20 40 60 80
t(MCS) t (MCS)
Fig. 1. Fig. 2.
Fig. 1. - Model dependence. The overlap with the first pattern of a cycle consisting of three patterns
(&cycle) is shown as a function of time (t):a) model A), b) model B), c) model C). Though the transients
are different, the long-term behaviour of the overlaps is strikingly similar. In all three cases, the
number of neurons is N = 256. Dynamics is sequential and ,R = 10. The initial conditions are ml(t)= 1,
for - 1 MCS d t d 0 and m,,(t)= 0, otherwise. We took a discrete distribution of axonal &-function
delays a t 7 = 0, 1, ...,30 MCS. The weights E ( Z ) are uniform and normalized to 1. During the learning
session each pattern lasted A = 10 MCS.
Fig. 2. - Overlap ml(t) with the first pattern of a 3-cycle as a function of time for model A). Dashed line
indicates the analytic solution ( N = m), and the solid line the simulation result (N = 1024): a ) ml(t)
itself, b) ln[m,(t)], c) In [l- ml(t)].Initial conditions and other parameters are as in fig. 1, except that
,R = 20. Note that for the finite system ml(t)is necessarily discrete with spacing 2/N, which is clearly
visible on the logarithmic scales of b ) and c).
A. HER2 et al.: THE HEBB RULE: STORING STATIC AND DYNAMIC OBJECTS ETC. 667

1.05

0.5 0.5

0 o i

0.05m
1.0 O . l O V /

0.5
0 0
0 100 200 0 10 20 30
t (MCS) r(MCS)

Fig. 3. Fig. 4.
Fig. 3. - Dependence upon delay distribution for model B). Each track represents the time evolution of
the overlap with the first pattern of a 3-cycle, ml(t),for a given distribution of axonal delays. The delay
distribution is displayed in the box to the right of the ml(t)-plot to which it belongs. All the
distributions are discrete, with a spacing of one MCS. Note that a stable cycle is produced even in the
absence of symmetric synapses without delay. The proviso is that A < T , , , ~ . If this condition is not
satisfied, as in a), the cycle is not retrieved. In b)-d) variations of the T distributions do affect the
transients, but they hardly change the large-t behaviour. The system size is N = 512 and the dynamics
is sequential with p = 10. During the learning session, each pattern lasted A = 10 MCS.
Fig. 4. - Dependence upon initial conditions. Imagine the system has been tought the theme BACH,
all notes having the same duration A =O. The overlaps with B, A, C and H (from top to bottom) have
been plotted as a function of time. After the network has been given a pattern sequence with a wrong
timing as initial condition for - 7-6 t c O ( A lasting much longer than B,C and H), then the cycle
with its correct timing is spontaneously retrieved. The simulation result is shown for model A), with
N = 512 and a uniform discrete distribution of delays ( T = 0,1, ..., 4 0 MCS). Here we have sequential
dynamics with p = 10.

Hebbian performance, though, exhibits a dichotomy: either the job is done so well that the
dependence upon the distribution of the delays, though existent, is extremely hard to
discern, or the cycle does not run at all because, as a rule of thumb, the duration A of each
pattern exceeds z , so that the system cannot determine the pattern lifetime (fig. 3).
Thermal noise, which is measured by the inverse temperature p, gives rise to a similar
dichotomic behaviour. Below a critical Pc, no stable cycle exists. On the other hand, if p >pC,
the performance of the system is only marginally temperature dependent. The critical Pc,
however, is model dependent. For instance, with N = 128 and initial conditions as in fig. 1,pc
turns out to be 8.0, 8.3, and 9.5 for models A), B ) , and C), respectively.
Figure 2 shows two things. First, the performance of the network agrees well with the
prediction of eq. (4). Second, as p becomes large (in fig. 2, p = 20) the overlaps m,(t) are
locally of the form a + b exp [- tl for suitable constants a and b. This is easily understood if
we scrutinize eq. (41,replace tgh [ph(x;t)]by sgn[h(x;t)],since p is large, and realize that
the sign function is piecewise constant. Then each m(x;t ) is of the indicated form. The
overlap m,(t) is a linear combination of the m(x;t ) and the statement follows.
668 EUROPHYSICS LETTERS

Error correction is also performed by a Hebbian network. If the system has been taught
the theme BACH and afterwards it is presented a faulty version of it, say with wrong order
(BHCH) or wrong timing (A lasting much longer than B, C, and H), then it will readily
reproduce the correct tune (fig. 4). As in the static case [8,15-171 the system functions as a
pattern recognizer except that here <<pattern)> is a spatio-temporal object, where space
refers to phase space and time to temporal order in the cycle. The <<pattern.presented to
the network will be recognized, provided it is close enough in space and time to one of the
stored prototypes.

In view of fig. 2 and 4,some comment on the usual cycle-generating models [4-61 seems to
be in order. First, this t y e of model presupposes synaptic efficacies of the form
J v- = J(v1)+ J(2’
Y 2 where J$) = J$= xQ
#‘ is symmetric and Jg) = E 2 Fj’ with E > 1 is to be
r
paired with a delayed Sj(t). The form of the Jij suggests that i) the second term <<pushes.the
system through an energy landscape created and stabilized by the first and ii) the patterns
neatly follow each other in the natural order 1 , 2 , 3 , ... et cycl. Figure 2 shows, however,
that for &-functiondelay the symmetric, <<stabilizing.term may be completely absent and
nevertheless a stable cycle may exist. We have also verified [6] that, if exponential delay is
dominating-which in the present context is not very plausible- no stable cycle exists.
Furthermore, J$’ alone does not induce the natural order. To wit, suppose that
Sj(t)= Sj(t- T), while z= 44 and that we start with the initial condition v ( t ) = 1 , 3 , 4 , 2 (or
BCHA) for - ~ s t s O ,each pattern lasting A ms. For t 2 O a conventional cycle
generator [4-61 then will play 2, 4, 1, 3 and shifted variations thereof. Simply note that in
this case

h{(t)= E g+lm, (t - z),


#

so that the natural order 1 , 2 , 3 , 4 (or BACH) never appears. This picture does not change if
5f” is replaced by Er, or if the symmetric term is added and E > 1, as in ref. [4-61. Hebbian
learning guarantees that faulty themes are transformed asymptotically into their correct
counterparts. (See fig. 4).
In summary, a prerequisite for the working of any learning scheme is that the structure
of the learning task is compatible with the network architecture and the learning algorithm.
In the present context, the task is to store spatio-temporal objects, such as stationary
patterns, sequences and cycles. The internal representability of these objects is guaranteed
by a broad distribution of delays T in conjunction with a high connectivity. The
representation itself is accomplished by the Hebb rule, eq. (l), in that the Hebbian synapses
Jij(z)measure and store the correlations of the external stimulus in space (ij)and time (7).
The dynamics of the neural network, operating with the very same delays, is able to extract
the spatio-temporal information encoded in the Jij(z). Retrieval is therefore extremely
robust.
***
The authors are most grateful to A. AERTSENand T. BONHOEFFER (MPI fiir Biologische
Kybernetik, Tubingen) for their help and advice. This work has been supported by the
Deutsche Forschungsgemeinschaft.
A. HERZ et al.: THE HEBB RULE: STORING STATIC AND DYNAMIC OBJECTS ETC. 669

REFERENCES

[l] HEBB D. O., Th,e Organization of Behaviour (Wiley, New York, N.Y.) 1949, p. 60.
[2] KELSO S. R., GANONGA. H. and BROWNT. H., Proc. Natl. Acad. Sci. USA, 83 (1986) 5326.
[3] For further details and references, see: HERZA., SULZER B., KUHNR. and VAN HEMMENJ. L.,
to be submitted to Biol. Cybemet.
[4] KLEINFELDD., Proc. Natl. Acad. Sci. USA, 83 (1986) 9469.
[51 SOMPOLINSKY H. and KANTERI., Phys. Rev. Lett., 57 (1986) 2861.
[6] RIEDELU., KUHNR. and VAN HEMMENJ. L., Phys. Rev. A, 38 (1988) 1105, and to be published.
[7] BRAITENBERG V., in Brain Theory, edited by G. PALM and A. AERTSEN(Springer, Berlin) 1986,
p. 81.
[8] TOULOUSEG., DEHAENES. and CHANGEUX J.-P., Proc. Natl. Acud. Sci. USA, 83 (1986) 1695.
[9] DEHAENES., CHANGEUX J.-P. and NADALJ.-P., Proc. Natl. Acad. Sci. USA, 84 (1987) 2727.
[lo] NADALJ.-P., TOULOUSEG., CHANGEUX J.-P. and DEHAENES., Eurqhys. Lett., 1 (1986) 535.
[111 PARISIG., J . Phys. A, 19 (1986) L-617.
[12] VAN HEMMENJ. L., KELLER G. and KUHN R., E u r q h y s . Lett., 5 (1988) 663.
[13] PERETTO P. and NIEZ J. J., in Disordered Systems and Biological Organization, edited by E.
BIENENSTOCK,F. FOGELMAN-SOULIE and G. WEISBUCH(Springer, Berlin) 1986, p. 171.
[141 BUHMANNJ. and SCHULTENK., Europhys. Lett., 4 (1987) 1205.
[151 HOPFIELDJ. J., Proc. Natl. Acud. Sci. USA, 79 (1982) 2554; 81 (1984) 3088.
[161 VAN HEMMENJ. L. and KUHNR., Phys. Rev. Lett., 57 (1986)913; VAN HEMMENJ. L., Phys. Rev.
A, 36 (1987) 1959.
[171 VAN HEMMENJ. L., GRENSINGD., HUBERA. and KUHNR., J . Stat. Phys., 50 (1988)231 and 259.

View publication stats

You might also like