Professional Documents
Culture Documents
net/publication/231035288
The Hebb Rule: Storing Static and Dynamic Objects in an Associative Neural
Network
CITATIONS READS
15 117
4 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Leo van Hemmen on 03 November 2016.
Abstract. - The Hebb rule (Hebb, 1949) indicates how information presented to a neural
network during a learning session is stored in the synapses, local elements which act as
mediators between neurons. In this paper we demonstrate that the Hebb rule can be used to
handle both stationary and dynamic objects such as single patterns and cycles. The two main
ideas are: U ) a broad distribution of delays as they occur in the natural dynamics and
b) incorporation of the very same delays during the learning session. Our work shows that the
resulting procedure is robust and faithful.
shows those pure Hebbian learning functions through synaptic sezection, as advocated by
Changeux et al. &91.
For the moment we focus our attention on two specific neurons, say i and j , with Jij as the
synaptic efficacy for the information transport from j to i. If neuron j has fired, a solitonlike
pulse propagates via the axon to the synapse at neuron i and it is here, at the synapse,
where the information is stored, provided [2,3] neuron i’s activity is concurrent with or
slightly after the arrival of the pulse coming fromj. The signal transport through the axon
takes a certain amount of time, the delay T, whose distribution is very broad [3,71. So at
time t the signal S j ( t - z)has to be paired with Si(t), the state neuron i is in. In general,
delay can be modelled as a linear filter with a memory kerne1[3].
Let us denote the size of the network by N . Throughout what follows, patterns are
specific king spin configurations { E r ; 1S i < N}, labelled by 1<,U< q. It is only for the sake
of convenience that we work with unbiased random patterns where tf = k 1 with equal
probability. During a learning session, the system is offered a finite sequence of patterns
{t?); 1< i < N } , where v ( t ) is a given function oft. According to Hebb’s principle [ll, at each
instant of time the synaptic efficacy Jij increases or decreases by an infinitesimal amount
proportional to Si(t)Sj(t- 7). Let us suppose that the learning session is of duration T and,
for the sake of simplicity, that we start with a tabula rasa (Jij= 0). Then we obtain, adding
the partial increments,
m
In the integrand, Si(t)= 51(t). N-’ is a trivial scaling factor and E ( Z ) is a weight of the
connection j + i with delay z.In passing we note that the increments in (1)are linear (new
data are just added) and that by imposing upper and lower bounds [lo-121 or by allowing
leakage one can induce forgetfulness. We will not pursue this issue here, however. It is also
worthwhile to realize that the prescription (1) in conjunction with a broad distribution of
delays z strongly deviates from all the cycle generating mechanisms which are known at
present [4-6,9,13,141.
If the pattern shown to the network is stationary, i.e. Sd(t)= Er for some fixed ,U and
0 < t < T , while T exceeds all the 7’s of the system (a natural consequence of the pattern
being stationary), then the J;s become symmetric and, despite the delays, the performance
of the network turns out to be that of the Hopfield model [15]. Plainly, in general the Jiis are
asymmetric.
The Hebb rule (1)gives rise to a retrieval mechanism which is extremely robust. Neither
does its performance depend on the specific model of synaptic transmission nor on the
precise distribution of the delays z.To illustrate this statement we study three models to be
denoted by A ) , B ) , and C).
A) For each pair (i,j) there is a large number of axons (possibly, interneurons) whose
delays zhave a distribution independent of i and j . Summing over the incoming signals, we
obtain the local field
The weights E ( T ) are chosen according to a given distribution of delays 7. Of course, one can
also imagine that only a silzgle axon links each pair of neurons. This idea has led us to
consider two further models.
A. HER2 et al.:THE HEBB RULE: STORING STATIC AND DYNAMIC OBJECTS ETC. 665
Apart from the a priori weights E ( z ) , eq. (3) exhibits a (resonance phenomenon*: the JQ(7)
with delays z which are integer multiples of A and thus match the timing of the external
stimulus are the ones that receive maximum strength. Note that they are also the ones that
would support a stable cycle of exactly the same period (neglecting transition times) as that
of the external stimulus. Thus, due to a subtle interplay between external stimulus and
internal architecture (distribution of Z’s), the Hebb rule (l),which p r i m facie appears to be
instructive in character, has in fact also pronounced selective aspects [8,91.
The dynamics of the retrieval process is described most conveniently in terms of the
overlaps mJt)= N-’ E! Si@).We concentrate on one specific finite cycle and refer to the
literature [6] for the tieatment of extensively many patterns outside the cycle. Model C)is a
random-bond problem whose analytic solution is not available. So we concentrate on models
A) and B ) . An analytic treatment of their dynamics can be formulated in terms of the
sublattices I ( x )= {i;gt = x} and the corresponding sublattice magnetizations m(x;t). For
full details, see Riedel et al. [SI. The overlaps m,(t) are then reconstructed through
m,, (t)= p (x)xpm (x; t), where x ranges through { - 1, 1}q and p (x) = N-’ II(x)1 is the frac-
x
tion of sites belonging to I(&. It, therefore, suffices to study the m(x; t). For model A), one
easily verifies that, as N + C Q , the local field h,(t)does not depend on the specific site i in I ( x )
and, therefore, may be written h(x;t). In the case of parallel dynamics at inverse
temperature p, where one updates all the spins after each elementary time step At, one
finds [61 the simple equation m(x;t + At) = tgh[ph(x; t)].Here we concentrate on sequential
dynamics of the Glauber type, where [6]
G ( x ; t ) =-T{m(~;t)-tgh[ph(x;t)]}. (4)
666 EUROPHYSICS LETTERS
The E denotes differentiation of m with respect to time and r 2 1 is the mean attempt
rate. We choose the unit of time so that 1s = 1 Monte Carlo step per spin (MCS); whence
r = 1. Because of the delays in h (x;t), eq. (4)is so-called functional differential equation.
In case of model B), one has to extend the notion of sublattice slightly [3] so as to take into
account the delays through I ( x ;9) = {i; gi= x, zi= a}. As for the rest, no change.
If the learning session lasts long enough (T+ a),then the overlaps may be used to derive
a more convenient expression for the Jij than eq. (1) - even if the cycle taught consists of
pulses with a smoother form than the blocks used until now. We only assume that at each
transitions v+ v + 1 the spin flips are stochastic and such that m, + mv+l= 1. By the ergodic
theorem, we then get
where
with T' = qA denoting the duration of the cycle presented to the network.
We now turn to the retrieval performance. As shown by fig. 1-4, the Hebb rule (1)allows
a faithful representation of a cycle which we have taught the network. It is satisfying, even
remarkable, that the retrieval quality hardly depends on the specific model (fig. 1). The
a)
0.5
0
1.0 0
0.5 -2
0 -4
1.05
0.5 0.5
0 o i
0.05m
1.0 O . l O V /
0.5
0 0
0 100 200 0 10 20 30
t (MCS) r(MCS)
Fig. 3. Fig. 4.
Fig. 3. - Dependence upon delay distribution for model B). Each track represents the time evolution of
the overlap with the first pattern of a 3-cycle, ml(t),for a given distribution of axonal delays. The delay
distribution is displayed in the box to the right of the ml(t)-plot to which it belongs. All the
distributions are discrete, with a spacing of one MCS. Note that a stable cycle is produced even in the
absence of symmetric synapses without delay. The proviso is that A < T , , , ~ . If this condition is not
satisfied, as in a), the cycle is not retrieved. In b)-d) variations of the T distributions do affect the
transients, but they hardly change the large-t behaviour. The system size is N = 512 and the dynamics
is sequential with p = 10. During the learning session, each pattern lasted A = 10 MCS.
Fig. 4. - Dependence upon initial conditions. Imagine the system has been tought the theme BACH,
all notes having the same duration A =O. The overlaps with B, A, C and H (from top to bottom) have
been plotted as a function of time. After the network has been given a pattern sequence with a wrong
timing as initial condition for - 7-6 t c O ( A lasting much longer than B,C and H), then the cycle
with its correct timing is spontaneously retrieved. The simulation result is shown for model A), with
N = 512 and a uniform discrete distribution of delays ( T = 0,1, ..., 4 0 MCS). Here we have sequential
dynamics with p = 10.
Hebbian performance, though, exhibits a dichotomy: either the job is done so well that the
dependence upon the distribution of the delays, though existent, is extremely hard to
discern, or the cycle does not run at all because, as a rule of thumb, the duration A of each
pattern exceeds z , so that the system cannot determine the pattern lifetime (fig. 3).
Thermal noise, which is measured by the inverse temperature p, gives rise to a similar
dichotomic behaviour. Below a critical Pc, no stable cycle exists. On the other hand, if p >pC,
the performance of the system is only marginally temperature dependent. The critical Pc,
however, is model dependent. For instance, with N = 128 and initial conditions as in fig. 1,pc
turns out to be 8.0, 8.3, and 9.5 for models A), B ) , and C), respectively.
Figure 2 shows two things. First, the performance of the network agrees well with the
prediction of eq. (4). Second, as p becomes large (in fig. 2, p = 20) the overlaps m,(t) are
locally of the form a + b exp [- tl for suitable constants a and b. This is easily understood if
we scrutinize eq. (41,replace tgh [ph(x;t)]by sgn[h(x;t)],since p is large, and realize that
the sign function is piecewise constant. Then each m(x;t ) is of the indicated form. The
overlap m,(t) is a linear combination of the m(x;t ) and the statement follows.
668 EUROPHYSICS LETTERS
Error correction is also performed by a Hebbian network. If the system has been taught
the theme BACH and afterwards it is presented a faulty version of it, say with wrong order
(BHCH) or wrong timing (A lasting much longer than B, C, and H), then it will readily
reproduce the correct tune (fig. 4). As in the static case [8,15-171 the system functions as a
pattern recognizer except that here <<pattern)> is a spatio-temporal object, where space
refers to phase space and time to temporal order in the cycle. The <<pattern.presented to
the network will be recognized, provided it is close enough in space and time to one of the
stored prototypes.
In view of fig. 2 and 4,some comment on the usual cycle-generating models [4-61 seems to
be in order. First, this t y e of model presupposes synaptic efficacies of the form
J v- = J(v1)+ J(2’
Y 2 where J$) = J$= xQ
#‘ is symmetric and Jg) = E 2 Fj’ with E > 1 is to be
r
paired with a delayed Sj(t). The form of the Jij suggests that i) the second term <<pushes.the
system through an energy landscape created and stabilized by the first and ii) the patterns
neatly follow each other in the natural order 1 , 2 , 3 , ... et cycl. Figure 2 shows, however,
that for &-functiondelay the symmetric, <<stabilizing.term may be completely absent and
nevertheless a stable cycle may exist. We have also verified [6] that, if exponential delay is
dominating-which in the present context is not very plausible- no stable cycle exists.
Furthermore, J$’ alone does not induce the natural order. To wit, suppose that
Sj(t)= Sj(t- T), while z= 44 and that we start with the initial condition v ( t ) = 1 , 3 , 4 , 2 (or
BCHA) for - ~ s t s O ,each pattern lasting A ms. For t 2 O a conventional cycle
generator [4-61 then will play 2, 4, 1, 3 and shifted variations thereof. Simply note that in
this case
so that the natural order 1 , 2 , 3 , 4 (or BACH) never appears. This picture does not change if
5f” is replaced by Er, or if the symmetric term is added and E > 1, as in ref. [4-61. Hebbian
learning guarantees that faulty themes are transformed asymptotically into their correct
counterparts. (See fig. 4).
In summary, a prerequisite for the working of any learning scheme is that the structure
of the learning task is compatible with the network architecture and the learning algorithm.
In the present context, the task is to store spatio-temporal objects, such as stationary
patterns, sequences and cycles. The internal representability of these objects is guaranteed
by a broad distribution of delays T in conjunction with a high connectivity. The
representation itself is accomplished by the Hebb rule, eq. (l), in that the Hebbian synapses
Jij(z)measure and store the correlations of the external stimulus in space (ij)and time (7).
The dynamics of the neural network, operating with the very same delays, is able to extract
the spatio-temporal information encoded in the Jij(z). Retrieval is therefore extremely
robust.
***
The authors are most grateful to A. AERTSENand T. BONHOEFFER (MPI fiir Biologische
Kybernetik, Tubingen) for their help and advice. This work has been supported by the
Deutsche Forschungsgemeinschaft.
A. HERZ et al.: THE HEBB RULE: STORING STATIC AND DYNAMIC OBJECTS ETC. 669
REFERENCES
[l] HEBB D. O., Th,e Organization of Behaviour (Wiley, New York, N.Y.) 1949, p. 60.
[2] KELSO S. R., GANONGA. H. and BROWNT. H., Proc. Natl. Acad. Sci. USA, 83 (1986) 5326.
[3] For further details and references, see: HERZA., SULZER B., KUHNR. and VAN HEMMENJ. L.,
to be submitted to Biol. Cybemet.
[4] KLEINFELDD., Proc. Natl. Acad. Sci. USA, 83 (1986) 9469.
[51 SOMPOLINSKY H. and KANTERI., Phys. Rev. Lett., 57 (1986) 2861.
[6] RIEDELU., KUHNR. and VAN HEMMENJ. L., Phys. Rev. A, 38 (1988) 1105, and to be published.
[7] BRAITENBERG V., in Brain Theory, edited by G. PALM and A. AERTSEN(Springer, Berlin) 1986,
p. 81.
[8] TOULOUSEG., DEHAENES. and CHANGEUX J.-P., Proc. Natl. Acud. Sci. USA, 83 (1986) 1695.
[9] DEHAENES., CHANGEUX J.-P. and NADALJ.-P., Proc. Natl. Acad. Sci. USA, 84 (1987) 2727.
[lo] NADALJ.-P., TOULOUSEG., CHANGEUX J.-P. and DEHAENES., Eurqhys. Lett., 1 (1986) 535.
[111 PARISIG., J . Phys. A, 19 (1986) L-617.
[12] VAN HEMMENJ. L., KELLER G. and KUHN R., E u r q h y s . Lett., 5 (1988) 663.
[13] PERETTO P. and NIEZ J. J., in Disordered Systems and Biological Organization, edited by E.
BIENENSTOCK,F. FOGELMAN-SOULIE and G. WEISBUCH(Springer, Berlin) 1986, p. 171.
[141 BUHMANNJ. and SCHULTENK., Europhys. Lett., 4 (1987) 1205.
[151 HOPFIELDJ. J., Proc. Natl. Acud. Sci. USA, 79 (1982) 2554; 81 (1984) 3088.
[161 VAN HEMMENJ. L. and KUHNR., Phys. Rev. Lett., 57 (1986)913; VAN HEMMENJ. L., Phys. Rev.
A, 36 (1987) 1959.
[171 VAN HEMMENJ. L., GRENSINGD., HUBERA. and KUHNR., J . Stat. Phys., 50 (1988)231 and 259.