You are on page 1of 17

PHYSICAL REVIEW E VOLUME 59, NUMBER 4 APRIL 1999

Hebbian learning and spiking neurons


Richard Kempter*
Physik Department, Technische Universität München, D-85747 Garching bei München, Germany

Wulfram Gerstner†
Swiss Federal Institute of Technology, Center of Neuromimetic Systems, EPFL-DI, CH-1015 Lausanne, Switzerland

J. Leo van Hemmen‡


Physik Department, Technische Universität München, D-85747 Garching bei München, Germany
~Received 6 August 1998; revised manuscript received 23 November 1998!
A correlation-based ~‘‘Hebbian’’! learning rule at a spike level with millisecond resolution is formulated,
mathematically analyzed, and compared with learning in a firing-rate description. The relative timing of
presynaptic and postsynaptic spikes influences synaptic weights via an asymmetric ‘‘learning window.’’ A
differential equation for the learning dynamics is derived under the assumption that the time scales of learning
and neuronal spike dynamics can be separated. The differential equation is solved for a Poissonian neuron
model with stochastic spike arrival. It is shown that correlations between input and output spikes tend to
stabilize structure formation. With an appropriate choice of parameters, learning leads to an intrinsic normal-
ization of the average weight and the output firing rate. Noise generates diffusion-like spreading of synaptic
weights. @S1063-651X~99!02804-4#

PACS number~s!: 87.19.La, 87.19.La, 05.65.1b, 87.18.Sn

I. INTRODUCTION In contrast to the standard rate models of Hebbian learn-


ing, we introduce and analyze a learning rule where synaptic
Correlation-based or ‘‘Hebbian’’ learning @1# is thought modifications are driven by the temporal correlations be-
to be an important mechanism for the tuning of neuronal tween presynaptic and postsynaptic spikes. First steps to-
connections during development and thereafter. It has been wards a detailed modeling of temporal relations have been
shown by various model studies that a learning rule which is taken for rate models in @34# and for spike models in @22,35–
driven by the correlations between presynaptic and postsyn- 43#.
aptic neurons leads to an evolution of neuronal receptive
fields @2–9# and topologically organized maps @10–12#. II. DERIVATION OF THE LEARNING EQUATION
In all these models, learning is based on the correlation
between neuronal firing rates, that is, a continuous variable A. Specification of the Hebb rule
reflecting the mean activity of a neuron. This is a valid de- We consider a neuron that receives input from N@1 syn-
scription on a time scale of 100 ms and more. On a time apses with efficacies J i , 1<i<N; cf. Fig. 1. We assume
scale of 1 ms, however, neuronal activity consists of a se- that changes of J i are induced by presynaptic and postsyn-
quence of short electrical pulses, the so-called action poten- aptic spikes. The learning rule consists of three parts. ~i! Let
tials or spikes. During recent years experimental and theoret-
t if be the arrival time of the f th input spike at synapse i. The
ical evidence has accumulated which suggests that temporal
coincidences between spikes on a millisecond or even sub-
millisecond scale play an important role in neuronal infor-
mation processing @13–24#. If so, a rate description may, and
often will, neglect important information that is contained in
the temporal structure of a neuronal spike train.
Neurophysiological experiments also suggest that the
change of a synaptic efficacy depends on the precise timing
of postsynaptic action potentials with respect to presynaptic
input spikes on a time scale of 10 ms. Specifically, a synaptic
weight is found to increase, if presynaptic firing precedes a
postsynaptic spike, and to decrease otherwise @25,26#; see
also @27–33#. Our description of learning at a temporal reso-
lution of spikes takes these effects into account.

FIG. 1. Single neuron. We study the development of synaptic


*Electronic address: Richard.Kempter@physik.tu-muenchen.de weights J i ~small filled circles, 1<i<N) of a single neuron ~large

Electronic address: Wulfram.Gerstner@di.epfl.ch circle!. The neuron receives input spike trains denoted by S in
i and

Electronic address: Leo.van.Hemmen@physik.tu-muenchen.de produces output spikes denoted by S out.

1063-651X/99/59~4!/4498~17!/$15.00 PRE 59 4498 ©1999 The American Physical Society


PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4499

arrival of a spike induces the weight J i to change by an


amount h w in which can be either positive or negative. The
quantity h .0 is a ‘‘small’’ parameter. ~ii! Let t n be the nth
output spike of the neuron under consideration. This event
triggers the change of all N efficacies by an amount h w out
which can also be positive or negative. ~iii! Finally, time
differences between all pairs of input and output spikes in-
fluences the change of the efficacies. Given a time difference
s5t if 2t n between input and output spikes, J i is changed by
an amount h W(s) where the learning window W is a real-
valued function. It is to be specified shortly; cf. also Fig. 6.
Starting at time t with an efficacy J i (t), the total change
DJ i (t)5J i (t1T)2J i (t) in a time interval T, which may be FIG. 2. Hebbian learning and spiking neurons—schematic. In
interpreted as the length of a learning trial, is calculated by the bottom graph we plot the time course of the synaptic weight
summing the contributions of input and output spikes as well J i (t) evoked through input and output spikes ~upper graphs, vertical
as all pairs of input and output spikes occurring in the time bars!. An output spike, e.g., at time t 1 , induces the weight J i to
interval @ t,t1T# . Denoting the input spike train at synapse i change by an amount w out which is negative here. To consider the
by a series of d functions, S in i (t)5 ( f d (t2t i ), and, simi-
f
effect of correlations between input and output spikes, we plot the
larly, output spikes by S (t)5 ( n d (t2t ), we can formu-
out n
learning window W(s) ~center graphs! around each output spike,
late the rules ~i!–~iii! explicitly by setting where s50 matches the output spike times ~vertical dashed lines!.
The three input spikes at times t 1i , t 2i , and t 3i ~vertical dotted lines!
increase J i by an amount w in each. There is no influence of corre-

DJ i ~ t ! 5 h E
t
t1T
dt 8 @ w inS in
i ~ t 8 ! 1w S ~ t 8 !#
out out lations between these input spikes and the output spike at time t 1 .
This becomes visible with the aid of the learning window W cen-
tered at t 1 . The input spikes are too far away in time. The next

1h E t
t1T
dt 8 E
t
t1T
dt 9 W ~ t 9 2t 8 ! S in
i ~ t9!S ~ t8!
out
output spike at t 2 , however, is close enough to the previous input
spike at t 3i . The weight J i is changed by w out,0 plus the contribu-
tion W(t 3i 2t 2 ).0, the sum of which is positive ~arrowheads!.
~1a! Similarly, the input spike at time t 4i leads to a change w in1W(t 4i
2t 2 ),0.

5h F( f
ti
8 w in1 ( 8 w out1 ( 8 W ~ t if 2t n ! .
tn f
t i ,t n
G discrete events that trigger a discontinuous change of the
synaptic weight; cf. Fig. 2 ~bottom!. If we assume a stochas-
~1b!
tic spike arrival or if we assume a stochastic process for
generating output spikes, the change DJ i is a random vari-
In Eq. ~1b! the prime indicates that only firing times t if and t n able, which exhibits fluctuations around some mean drift.
in the time interval @ t,t1T# are to be taken into account; cf. Averaging implies that we focus on the drift and calculate
Fig. 2. the expected rate of change. Fluctuations are treated in Sec.
Equation ~1! represents a Hebb-type learning rule since VI.
they correlate presynaptic and postsynaptic behavior. More
precisely, here our learning scheme depends on the time se- 1. Self-averaging of learning
quence of input and output spikes. The parameters w in,w out Effective learning needs repetition over many trials of
as well as the amplitude of the learning window W may, and
length T, each individual trial being independent of the pre-
in general will, depend on the value of the efficacy J i . Such
vious ones. Equation ~1! tells us that the results of the indi-
a J i dependence is useful so as to avoid unbounded growth
vidual trials are to be summed. According to the ~strong! law
of synaptic weights. Even though we have not emphasized
of large numbers @44# in conjunction with h being ‘‘small’’
this in our notation, most of the theory developed below is
valid for J i -dependent parameters; cf. Sec. V B. @45# we can average the resulting equation, viz., Eq. ~1!,
regardless of the random process. In other words, the learn-
ing procedure is self-averaging. Instead of averaging over
B. Ensemble average
several trials, we may also consider one single long trial
Given that input spiking is random but partially correlated during which input and output characteristics remain con-
and that the generation of spikes is in general a complicated stant. Again, if h is sufficiently small, time scales are sepa-
dynamic process, an analysis of Eq. ~1! is a formidable prob- rated and learning is self-averaging.
lem. We therefore simplify it. We have introduced a small The corresponding average over the resulting random pro-
parameter h .0 into Eq. ~1! with the idea in mind that the cess is denoted by angular brackets ^ & and is called an en-
learning process is performed on a much slower time scale semble average, in agreement with physical usage. It is a
than the neuronal dynamics. Thus we expect that only aver- probability measure on a probability space, which need not
aged quantities enter the learning dynamics. be specified explicitly. We simply refer to the literature @44#.
Considering averaged quantities may also be useful in or- Substituting s5t 9 2t 8 on the right-hand side of Eq. ~1a! and
der to disregard the influence of noise. In Eq. ~1! spikes are dividing both sides by T, we obtain
4500 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

occur on average in a time interval of length T. Then, using


the notation f (t)5T 21 * t1Tt dt 8 f (t 8 ), we may introduce the
mean firing rates n ini (t)5 ^ S i & (t) and n (t)5 ^ S & (t). We
in out out

call n i and n mean firing rates in order to distinguish them


in out

from the previously defined instantaneous rates ^ S in i & and


^ S out& which are the result of an ensemble average only. Be-
cause of their definition, mean firing rates n always vary
slowly as a function of time. That is, they vary on a time
scale of the order of T. The quantities n in i and n
out
therefore
FIG. 3. Inhomogeneous Poisson process. In the upper graph we carry hardly any information that may be present in the tim-
have plotted an example of an instantaneous rate l in
i (t) in units of ing of discrete spikes.
Hz. The average rate is 10 Hz ~dashed line!. The lower graph shows For the sake of further simplification of Eq. ~2!, we define
a spike train S in
i (t) which is a realization of an inhomogeneous the width W of the learning window W(s) and consider the
Poisson process with rate l in
i (t). The spike times are denoted by case T@W. Most of the ‘‘mass’’ of the learning window
vertical bars. should be inside the interval @ 2W,W# . Formally we require
*W 2W `
2W ds u W(s) u @ * 2` ds u W(s) u 1 * W ds u W(s) u . For T@W
^ DJ i & ~ t !
T
5
h
T
E
t
t1T
dt 8 [w in^ S in
i & ~ t 8 ! 1w ^ S & ~ t 8 ! ]
out out the integration over s in Eq. ~2! can be extended to run from
2` to `. With the definition of a temporally averaged cor-
relation function,
1
h
T
Et
t1T
dt 8 E t1T2t 8

t2t 8
dsW ~ s ! C i ~ s;t ! 5 ^ S in
i ~ t1s ! S ~ t ! & ,
out
~3!

the last term on the right in Eq. ~2! reduces to


i ~ t 8 1s ! S ~ t 8 ! & .
3^ S in ~2!
out
* `2` dsW(s)C i (s;t). Correlations between presynaptic and
2. Example: Inhomogeneous Poisson process postsynaptic spikes, thus, enter spike-based Hebbian learning
through C i convolved with the window W. We note that the
Averaging the learning equation before proceeding is jus- correlation C i (s;t), though being both an ensemble and a
tified if both input and output processes will be taken to be temporally averaged quantity, may change as a function of s
inhomogeneous Poisson processes, which will be assumed on a much faster time scale than T or the width W of the
throughout Secs. IV–VI. An inhomogeneous Poisson pro- learning window. The temporal structure of C i depends es-
cess with time-dependent rate function l(t)>0 is character- sentially on the neuron ~model! under consideration. An ex-
ized by two facts: ~i! disjoint intervals are independent and ample is given in Sec. IV A.
~ii! the probability of getting a single event at time t in an We require learning to be a slow process; cf. Sec. II B 1.
interval of length Dt is l(t)Dt, more events having a prob- More specifically, we require that J values do not change
ability o(Dt); see also @46#, Appendix A for a simple expo- much in the time interval T. Thus T separates the time scale
sition of the underlying mathematics. The integrals in Eq. W ~width of the learning window W) from the time scale of
~1a! or the sums in Eq. ~1b! therefore decompose into many the learning dynamics, which is proportional to h 21 . Under
independent events and, thus, the strong law of large num- those conditions we are allowed to approximate the left-hand
bers applies to them. The output is a temporally local process side of Eq. ~2! by the rate of change dJ i /dt, whereby we
as well so that the strong law of large numbers also applies have omitted the angular brackets for brevity. Absorbing h
to the output spikes at times t n in Eq. ~1!. into the learning parameters w in, w out, and W, we obtain
If we describe input spikes by inhomogeneous Poisson
processes with intensity l in i (t), then we may identify the
ensemble average over a spike train with the stochastic in-
d
J ~ t ! 5w inn in
dt i i ~ t ! 1w n ~ t ! 1
out out
E 2`
`
dsW ~ s ! C i ~ s;t ! .
tensity, ^ S in i & (t)5l i (t); cf. Fig. 3. The intensity l i (t) can
in in
~4!
be interpreted as the instantaneous rate of spike arrival at
synapse i. In contrast to temporally averaged mean firing The ensemble-averaged learning equation ~4!, which holds
rates, the instantaneous rate may vary on a fast time scale in for any neuron model, will be the starting point of the argu-
many biological systems; cf. Sec. III C. The stochastic inten- ments below.
sity ^ S out& (t) is the instantaneous rate of observing an output
spike, where ^ & is an ensemble average over both the input III. SPIKE-BASED AND RATE-BASED
and the output. Finally, the correlation function HEBBIAN LEARNING
^ S ini (t 9 )S out(t 8 ) & is to be interpreted as the joint probability In this section we indicate the assumptions that are re-
density for observing an input spike at synapse i at the time quired to reduce spike-based to rate-based Hebbian learning
t 9 and an output spike at time t 8 . and outline the limitations of the latter.

C. Separation of time scales A. Rate-based Hebbian learning


We require the length T of a learning trial in Eq. ~2! to be In neural network theory, the hypothesis of Hebb @1# is
much larger than typical interspike intervals. Both many in- usually formulated as a learning rule where the change of a
put spikes at any synapse and many output spikes should synaptic efficacy J i depends on the correlation between the
PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4501

mean firing rate n in


i of the ith presynaptic neuron and the
mean firing rate n out of a postsynaptic neuron, viz.,

dJ i
[J̇ i 5a 0 1a 1 n in
i 1a 2 n
out
dt
1a 3 n in
i n 1a 4 ~ n i ! 1a 5 ~ n ! ,
out in 2 out 2
~5!

FIG. 4. The postsynaptic potential e in units of @ e 21 t 21


0 # as a
where a 0 ,0, a 1 , a 2 , a 3 , a 4 , and a 5 are proportionality function of time s in milliseconds. We have e [0 for s,0 so that e
constants. Apart from the decay term a 0 and the ‘‘Hebbian’’ is causal. The kernel e has a single maximum at s5 t 0 . For s
term n in
i n
out
proportional to the product of input and output →` the postsynaptic potential e decays exponentially with time
rates, there are also synaptic changes which are driven sepa- constant t 0 ; cf. also Appendix B 2.
rately by the presynaptic and postsynaptic rates. The param-
eters a 0 , . . . ,a 5 may depend on J i . Equation ~5! is a general IV. STOCHASTICALLY SPIKING NEURONS
formulation up to second order in the rates; see, e.g.,
@3,47,12#.
A crucial step in analyzing Eq. ~4! is determining the
correlations C i between input spikes at synapse i and output
B. Spike-based Hebbian learning spikes. The correlations, of course, depend strongly on the
neuron model under consideration. To highlight the main
To get Eq. ~5! from the spike-based learning rule in Eq.
points of learning, we study a simple toy model. Input spikes
~4! two approximations are required. First, if there are no
are generated by an inhomogeneous Poisson process and fed
correlations between input and output spikes apart from the
into a stochastically firing neuron model. For this scenario
correlations contained in the instantaneous rates, we can
we are able to derive an analytical expression for the corre-
i (t 8 1s)S (t 8 ) & ' ^ S i & (t 8 1s) ^ S & (t 8 ). Second,
write ^ S in out in out
lations between input and output spikes. The introduction of
if these rates change slowly as compared to T, then we have
the model and the derivation of the correlation function is the
i (t1s) n (t). In addition, n 5 ^ S & is the time
C i (s;t)' n in out
topic of the first subsection. In the second subsection we use
evolution on a slow time scale; cf. the discussion after Eq. the correlation function in the learning equation ~4! and ana-
~3!. Since we have T@W, the rates n in i also change slowly as lyze the learning dynamics. In the final two subsections the
compared to the width W of the learning window and, thus, relation to the work of Linsker @3# ~a rate formulation of
we may replace n in i (t1s) by n i (t) in the correlation term
in
Hebbian learning! and some extensions based on spike cod-
* 2` dsW(s) C i (s;t). This yields * `2` dsW(s)C i (s;t)
`
ing are considered.
`
'W̃(0) n ini (t) n (t), where W̃(0)ª * 2` dsW(s). Under the
out

above assumptions we can identify W̃(0) with a 3 . By fur-


A. Poisson input and stochastic neuron model
ther comparison of Eq. ~4! with Eq. ~5! we identify w in with
a 1 and w out with a 2 , and we are able to reduce Eq. ~4! to Eq. We consider a single neuron which receives input via N
~5! by setting a 0 5a 4 5a 5 50. synapses 1<i<N. The input spike trains arriving at the N
synapses are statistically independent and generated by an
C. Limitations of rate-based Hebbian learning inhomogeneous Poisson process with time-dependent inten-
The assumptions necessary to derive Eq. ~5! from Eq. ~4!, sities ^ S in
i & (t)5l i (t) with 1<i<N @46#.
in

however, are not generally valid. According to the results of In our simple neuron model we assume that output spikes
Markram et al. @25#, the width W of the Hebbian learning are generated stochastically with a time-dependent rate
window in cortical pyramidal cells is in the range of 100 ms. l out(t) that depends on the timing of input spikes. Each input
At retinotectal synapses W is also in the range of 100 ms spike arriving at synapse i at time t if increases ~or decreases!
@26#. the instantaneous firing rate l out by an amount J i (t if ) e (t
A mean rate formulation thus requires that all changes of 2t if ), where e is a response kernel. The effect of an incom-
the activity are slow at a time scale of 100 ms. This is not ing spike is thus a change in probability density proportional
necessarily the case. The existence of oscillatory activity in to J i . Causality is imposed by the requirement e (s)50 for
the cortex in the range of 40 Hz ~e.g., @14,15,20,48#! implies s,0. In biological terms, the kernel e may be identified with
activity changes every 25 ms. Retinal ganglion cells fire an excitatory ~or inhibitory! postsynaptic potential. In
synchronously at a time scale of about 10 ms @49#; cf. also throughout what follows, we assume excitatory couplings
@50#. Much faster activity changes are found in the auditory J i .0 for all i and e (s)>0 for all s. In addition, the response
system. In the auditory pathway of, e.g., the barn owl, spikes kernel e (s) is normalized to * ds e (s)51; cf. Fig. 4.
can be phase-locked to frequencies of up to 8 kHz @51–53#. The contributions from all N synapses as measured at the
Furthermore, beyond the correlations between instantaneous axon hillock are assumed to add up linearly. The result gives
rates, additional correlations between spikes may exist. rise to a linear inhomogeneous Poisson model with intensity
Because of all the above reasons, the learning rule ~5! in
the simple rate formulation is insufficient to provide a gen- N
erally valid description. In Secs. IV and V we will therefore
study the full spike-based learning equation ~4!.
l out~ t ! 5 n 0 1 ( (f J i~ t if ! e ~ t2t if ! .
i51
~6!
4502 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

Here, n 0 is the spontaneous firing rate and the sums run over
all spike arrival times at all synapses. By definition, the spike
generation process ~6! is independent of previous output
spikes. In particular, this Poisson model does not include
refractoriness.
In the context of Eq. ~4!, we are interested in ensemble
averages over both the input and the output. Since Eq. ~6! is
a linear equation, the average can be performed directly and
yields

( J i~ t ! E0 ds e ~ s ! l ini ~ t2s ! .
N
`
^ S out& ~ t ! 5 n 0 1 ~7! FIG. 5. Spike-spike correlations. To understand the meaning of
i51 Eq. ~9! we have sketched ^ S in i (t 8 )S (t) & /l i (t 8 ) as a function of
out in

time t ~full line!. The dot-dashed line at the bottom of the graph is
In deriving Eq. ~7! we have replaced J i (t if ) by J i (t) because the contribution J i (t 8 ) e (t2t 8 ) of an input spike occurring at time
efficacies are assumed to change adiabatically with respect to t 8 . Adding this contribution to the mean rate contribution, n 0
the width of e . The ensemble-averaged output rate in Eq. ~7! 1 ( j J j (t)L inj (t) ~dashed line!, we obtain the rate inside the square
depends on the convolution of e with the input rates. In what brackets of Eq. ~9! ~full line!. At time t 9 .t 8 the input spike at time
follows we denote t 8 enhances the output firing rate by an amount J i (t 8 ) e (t 9 2t 8 )
~arrows!. Note that in the main text we have taken t 9 2t 8 52s.

L in
i ~ t !5 E `

0
ds e ~ s ! l in
i ~ t2s ! . ~8! B. Learning equation
Before inserting the correlation function ~10! into the
learning rule ~4! we define the covariance matrix
Equation ~7! may suggest that input and output spikes are
statistically independent, which is not the case. To show this
i ~ t1s ! 2 n i ~ t1s !#@ L j ~ t ! 2 n j ~ t !# ~11!
q i j ~ s;t ! ª @ l in in in in
explicitly, we determine the ensemble-averaged correlation
^ S ini (t1s)S out(t) & in Eq. ~3!. Since ^ S ini (t1s)S out(t) & corre- and its convolution with the learning window W,
sponds to a joint probability, it equals the probability density
l in
i (t1s) for an input spike at synapse i at time t1s times
the conditional probability density of observing an output
spike at time t given the above input spike at t1s,
Q i j~ t !ª E `

2`
dsW ~ s ! q i j ~ s;t ! . ~12!

Using Eqs. ~7!, ~10!, and ~12! in Eq. ~4!, we obtain


^ S ini ~ t1s ! S out~ t ! &

F
i ~ t1s ! n 0 1J i ~ t ! e ~ 2s ! 1
5l in
N

( J j ~ t ! L inj ~ t !
j51
G .
J̇ i 5w inn in
i 1w
out
n 01 F (j J j n j G 1W̃ ~ 0 ! n in
i n0

~9! 1 (j J j F i n j 1di jni


Q i j 1W̃ ~ 0 ! n in in in
E 2`
`
G
dsW ~ s ! e ~ 2s ! .
The first term inside the square brackets is the spontaneous
~13!
output rate and the second term is the specific contribution
caused by the input spike at time t1s, which vanishes for
For the sake of brevity, we have omitted the dependence
s.0. We are allowed to write J i (t) instead of the ‘‘correct’’
upon time.
weight J i (t1s); cf. the remark after Eq. ~7!. To understand
The assumption of identical and constant mean input
the meaning of the second term, we recall that an input spike
i (t)5 n for all i, reduces the number of free param-
rates, n in in
arriving before an output spike ~i.e., s,0) raises the output
eters in Eq. ~13! considerably and eliminates all effects com-
firing rate by an amount proportional to e (2s); cf. Fig. 5.
ing from rate coding. We define
The sum in Eq. ~9! contains the mean contributions of all
synapses to an output spike at time t. For the proof of Eq.
~9!, we refer to Appendix A. k 1 5 @ w out1W̃ ~ 0 ! n in# n 0 1w inn in,
Inserting Eq. ~9! into Eq. ~3! we obtain
k 2 5 @ w out1W̃ ~ 0 ! n in# n in, ~14!
N

(
C i ~ s;t ! 5 J j ~ t ! l in
i ~ t1s ! L j ~ t !
E
in
j51 k 3 5 n in dsW ~ s ! e ~ 2s !

i ~ t1s !@ n 0 1J i ~ t ! e ~ 2s !# ,
1l in ~10!
in Eq. ~13! and arrive at
where we have assumed the weights J j to be constant in the
time interval @ t,t1T# . Temporal averages are denoted by a
J̇ i 5k 1 1 (j ~ Q i j 1k 2 1k 3 d i j ! J j . ~15!
i (t)5 n i (t).
bar; cf. Sec. II C. Note that l in in
PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4503

Equation ~15! describes the ensemble-averaged dynamics of


synaptic weights for a spike-based Hebbian learning rule ~1!
under the assumption of a linear inhomogeneous Poissonian
model neuron.

C. Relation to Linsker’s equation


Linsker @3# has derived a mathematically equivalent equa-
tion starting from Eq. ~5! and a linear graded-response neu-
ron, a rate-based model. The difference between Linsker’s
equation and Eq. ~15! is, apart from a slightly different no-
tation, the term k 3 d i j .
FIG. 6. The learning window W in units of the learning param-
Equation ~15! without the k 3 term has been analyzed ex- eter h as a function of the delay s5t if 2t n between presynaptic
tensively by MacKay and Miller @5# in terms of eigenvectors spike arrival at synapse i at time t if and postsynaptic firing at time
and eigenfunctions of the matrix Q i j 1k 2 . In principle, there t n . If W(s) is positive ~negative! for some s, the synaptic efficacy J i
is no difficulty in incorporating the k 3 term in their analysis, is increased ~decreased!. The increase of J i is most efficient if a
because Q i j 1k 2 1 d i j k 3 contains k 3 times the unit matrix presynaptic spike arrives a few milliseconds before the postsynaptic
and thus has the same eigenvectors as Q i j 1k 2 . All eigen- neuron starts firing ~vertical dashed line at s5s * ). For u s u →` we
values are simply shifted by k 3 . have W(s)→0. The form of the learning window and parameter
The k 3 term can be neglected if the number N of synapses values are as described in Appendix B 1.
is large. More specifically, the influence of the k 3 term as
compared to the k 2 and Q i j term is negligible if for all i, depending on the sign of k 3 . Since firing rates n are always
positive and k 3 5 n in* dsW(s) e (2s), the sign of the integral
* dsW(s) e (2s) is crucial. Hebb’s principle suggests that for
(j u Q i j 1k 2u J j @ u k 3u J i . ~16! excitatory synapses, the integral is always positive. To un-
derstand why, let us recall that s is defined as the time dif-
This holds, for instance, if ~i! we have many synapses, ~ii! ference between input and output spikes. The response ker-
u k 3 u is smaller than or at most of the same order of magni- nel e vanishes for negative arguments. Thus the integral
tude as u k 2 1Q i j u for all i and j, and ~iii! each synapse is effectively runs only over negative s. According to our defi-
weak as compared to the total synaptic weight, J i ! ( j J j . nition, s,0 implies that presynaptic spikes precede postsyn-
The assumptions ~i!–~iii! are often reasonable neurobiologi- aptic firing. These are the spikes that may have participated
cal conditions, in particular when the pattern of synaptic in firing the postsynaptic neuron. Hebb’s principle @1# sug-
weights is still unstructured. The analysis of Eq. ~15! pre- gests that these synapses are strengthened, hence W(s).0
sented in Sec. V and focusing on normalization and structure for s,0; cf. Fig. 6. This idea is also in agreement with
formation is therefore based on these assumptions. In par- recent neurobiological results @25,26,33#: Only those syn-
ticular, we neglect k 3 . apses are potentiated where presynaptic spikes arrive a few
Nevertheless, our approach even without the k 3 term is far milliseconds before a postsynaptic spike occurs so that the
more comprehensive than Linsker’s rate-based ansatz ~5! be- former arrive ‘‘in time.’’ We conclude that * dsW(s) e (2s)
cause we have derived Eq. ~15! from a spike-based learning .0 and, hence, the k 3 term is positive.
rule ~1!. Therefore correlations between spikes on time With k 3 .0 every weight and thus every structure in the
scales down to milliseconds or below can enter the driving distribution of weights is enhanced. This may contribute to
term Q i j so as to account for structure formation. Correla- the stability of structured weight distributions at the end of
tions on time scales of milliseconds or below may be essen- learning, in particular when the synapses are few and strong
tial for information processing in neuronal systems; cf. Sec. @22,54#. In this case, Eq. ~16! may be not fulfilled and the k 3
III C. In contrast to that, Linsker’s ansatz is based on a term in Eq. ~15! has an important influence. Thus spike-
firing-rate description where the term Q i j contains correla- based learning is different from simple rate-based learning
tions between mean firing rates only. If we use a standard rules. Spike-spike correlations on a millisecond time scale
interpretation of rate coding, a mean firing rate corresponds play an important role and tend to stabilize existing strong
to a temporally averaged quantity which varies on a time synapses.
scale of the order of hundreds of milliseconds. The temporal
structure of spike trains is neglected completely. V. LEARNING DYNAMICS
Finally, our ansatz ~1! allows the analysis of the influence
of noise on learning. Learning results from stepwise weight In order to get a better understanding of the principal
changes. Each weight performs a random walk whose expec- features of the learning dynamics, we discuss Eq. ~15! with
tation value is described by the ensemble-averaged equation k 3 50 for a particularly simple configuration: a model with
~15!. Analysis of noise as a deviation from the mean is de- two groups of synapses. Input rates are homogeneous within
ferred to Sec. VI. each group but different between one group and the other.
Our discussion focuses on intrinsic normalization of output
rates and structure formation. We take lower and upper
D. Stabilization of learning
bounds for the J values into account explicitly and consider
We now discuss the influence of the k 3 term in Eq. ~15!. the limiting case of weak correlations in the input. We will
It gives rise to an exponential growth or decay of weights, see that for a realistic scenario we need to require w in.0 and
4504 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

TABLE I. Values of the parameters used for the numerical arbitrary time-dependent input, l in in
i (t)5l (t), with the same
simulations ~a! and derived quantities ~b!.
mean input rate n in5l in(t) as in group M1 . Without going
into details about the dependence of l in(t) upon the time t,
~a! Parameters
we assume l in(t) to be such that the covariance q i j (s;t) in
Learning h 51025 Eq. ~11! is independent of t. In this case it follows from Eq.
w in5 h ~12! that Q i j (t)[Q for i, jPM2 , regardless of t. For the
w out521.0475h sake of simplicity we require in addition that Q.0. In sum-
A 1 51 mary, we suppose in the following

H
A 2 521
Q.0 for i, jPM2 ,
t 1 51 ms Q i j~ t !5 ~17!
t 2 520 ms 0 otherwise.
t syn55 ms
We recall that Q i j is a measure of the correlations in the
EPSP t 0 510 ms input arriving at synapses i and j; cf. Eqs. ~11! and ~12!.
Synaptic input N550 Equation ~17! states that at least some of the synapses re-
M 1 525 ceive positively correlated input, a rather natural assumption.
M 2 525 Three different realizations of Eq. ~17! are now discussed in
n in510 s21
turn.
d n in510 s21
1. White-noise input
v /(2 p )540 s21
For all synapses in group M2 , let us consider the case of
Further parameters n 0 50 stochastic white-noise input with intensity l in(t) and mean
q 50.1 firing rate l in(t)5 n in(t)>0. The fluctuations are
@ l in(t1s)2 n in(t1s) #@ l in(t)2 n in(t) # 5 s 0 d (s). Due to the
~b! Derived quantities convolution ~8! with e , Eq. ~11! yields q i j (s;t)5 s 0 e (2s),
W̃(0)ª * dsW(s)54.7531028 s independently of t, i, and j. We use Eq. ~12! and find
* dsW(s) 2 53.68310212 s Q i j (t)[Q5 s 0 * dsW(s) e (2s). We want Q.0 and there-
* dsW(s) e (2s)57.0431026 fore arrive at * W(s) e (2s)5k 3 / n in.0. We have seen be-
Q56.8431027 s21 fore in Sec. IV D that k 3 .0 is a natural assumption and in
k 1 5131024 s21
agreement with experiments.
k 2 52131024 s21
2. Colored-noise input
k 3 57.0431025 s21
t av523102 s Let us now consider the case of an instantaneous and
t str52.933104 s memoryless excitation, e (s)5 d (s). We assume that l in
t 51.623105 s
noise 2 n in obeys a stationary Ornstein-Uhlenbeck process @62#
J av5231022 with correlation time t c . The fluctuations are therefore
*
D52.4731029 s21 q i j (s;t)}exp(2usu/tc), independent of the synaptic indices i
D 8 51.4731029 s21 and j. Q.0 implies * dsW(s)exp(2usu/tc).0.

3. Periodic input

w ,0 and that we can formulate theoretical predictions of


out Motivated by oscillatory neuronal activity in the auditory
the relative magnitude of the learning parameters w in,w out system and in the cortex ~cf. Sec. III C!, we now consider the
and the form of the learning window W. The theoretical con- scenario of periodically modulated rates @ l in(t)2 n in#
siderations are illustrated by numerical simulations whose 5 d n in cos(v t), where v @2 p /T. Let us first study the case
parameters are justified in Appendix B and summarized in e (s)5 d (s). We find Q5( d n in) 2 /2* dsW(s) cos(vs). Posi-
Table I. tive Q hence requires the real part of the Fourier transform
W̃( v )ª * dsW(s)exp(ivs) to be positive, i.e., Re@ W̃( v ) #
A. Models of synaptic input
.0. For a general interaction kernel e (s), we find q i j (s;t)
5( d n in) 2 /2 * ds 8 e (s 8 )cos@v (s1s8)# and hence
We divide the N statistically independent synapses, all
converging onto the very same neuron, into two groups, M1 Q5 ~ d n in! 2 /2 Re@ W̃ ~ v ! ẽ ~ v !# , ~18!
and M2 . The numbers of synapses are M 1 and M 2 , respec-
tively, where M 1 1M 2 5N and M 1 ,M 2 @1. Since each
independent of t. Then Q.0 requires Re@ W̃( v ) ẽ ( v ) # .0.
group contains many synapses, we may assume that M 1 and
M 2 are of the same order of magnitude. The spike input at
synapses i in group M1 is generated by a Poisson process B. Normalization

i (t)[ n , which is independent of


with a constant intensity l in in
Normalization is a very desirable property for any learn-
t. We therefore have Q i j (t)[0 for i or jPM1 ; cf. Eqs. ~11! ing rule. It is a natural requirement that the average weight
and ~12!. The synapses i in group M2 are driven by an and the mean output rate do not blow up during learning but
PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4505

FIG. 7. Numerical simulation of weight normalization with pa-


FIG. 8. Development of the average weight J av as a function of
rameters as given in Appendix B. The four graphs show the tem-
time t in units of 103 s. The simulations we started at t50 with
poral evolution of synaptic weights J i , 1<i<50, before (t50)
five different average weights, J avP $ 0, 0.01, 0.025J av, 0.05, 0.1
and during learning (t5200, 500, and 1000 s!. Before learning, all *
5 q % . Full lines indicate homogeneous initial weight distributions,
weights are initialized at the upper bound q 50.1. During learning,
where J i 5J at t50 for all i; cf. also Fig. 7, upper left panel. In all
av
weights decrease towards the fixed point of the average weight,
five cases, J av decays with the time constant t av52.03102 s de-
J av52.031022 ; cf. also Fig. 8, topmost full line. The time constant
* scribing the rate of normalization to the fixed point J av52.0
of normalization is t av52.03102 s, which is much smaller than the *
31022 . Our theoretical prediction according to Sec. V B ~crosses
time constant of structure formation; cf. Sec. V C and Fig. 9. For
on the uppermost full line! is in good agreement with the numerical
times t<1000 s we therefore can neglect effects coming from
results. The dashed line indicates the development of J av starting
structure formation.
from an inhomogeneous initial weight distribution J i 50 for 1<i
<25 and J i 5 q for 25,i<505N. In the inhomogeneous case, t av
are stabilized at a reasonable value in an acceptable amount is enlarged as compared to the homogeneous case by a factor of 2
of time. Standard rate-based Hebbian learning can lead to because only half of the synapses are able to contribute to normal-
unlimited growth of the average weight. Several methods ization; cf. Appendix D. The insets ~signatures as in Fig. 7! show
have been designed to control this unlimited growth; for in- the inhomogeneous weight distributions ~arrows! at times t
stance, subtractive or multiplicative rescaling of the weights 50, 200, 500, and 1000 s; the dotted line indicates the fixed point
after each learning step so as to impose either ( j J j 5const or J av50.2q . We note that here the distribution remains inhomoge-
else ( j J 2j 5const; cf., e.g., @2,7,55#. It is hard to see, how- *
neous.
ever, where this should come from. Furthermore, a J depen-
dence of the parameters a 1 , . . . ,a 5 in the learning equation specific input ~17! described in the preceding section yields
~5! is often assumed. Higher-order terms in the expansion ~5! Q av5(M 2 /N) 2 Q.0. We rewrite Eq. ~20! in the standard
may also be used to control unlimited growth. form J̇ av5 @ J av2J av# / t av, where
In this subsection we show that under some mild condi- *
tions there is no need whatsoever to invoke the J dependence
J av52k 1 / @ N ~ k 2 1Q av!# ~21!
of the learning parameters, rescaling of weights, or higher- *
order correlations to get normalization, which means here
that the average weight is the fixed point for the average weight and

1
N
t av5J av/k 1 521/@ N ~ k 2 1Q av!# ~22!
J av5
N (
i51
Ji ~19! *
is the time constant of normalization. The fixed point in Eq.
approaches a stable fixed point during learning. Moreover, in ~21! is stable, if and only if t av.0.
this case the mean output rate n out is also stabilized since During learning, weights $ J i % and rates $ l in
i % may become
n out5 n 0 1NJ avn in; cf. Eq. ~7!. correlated. In Appendix C we demonstrate that the influence
As long as the learning parameters do not depend on the J of any interdependence between weights and rates on nor-
values, the rate of change of the average weight is obtained malization can be neglected in the case of weak correlations
from Eqs. ~15!, ~19!, and k 3 50 ~Sec. IV C!, in the input,

0,Q!2k 2 . ~23!
J̇ av5k 1 1N k 2 J av1N 21 (
i, j
Qi jJ j . ~20!
The fixed point J av in Eq. ~21! and the time constant t av in
*
In the following we consider the situation at the beginning Eq. ~22! are, then, almost independent of the average corre-
of the learning procedure where the set of weights $ J i % has lation Q av, which is always of the same order as Q.
not picked up any correlations with the set of Poisson inten- In Figs. 7 and 8 we show numerical simulations with
sities $ l in
i % yet and therefore is independent. We may then parameters as given in Appendix B. The average weight J av
replace J i and Q i j on the right-hand side of Eq. ~20! by their always approaches J av , independent of any initial conditions
*
average values J av and Q av5N 22 ( Ni, j Q i j , respectively. The in the distribution of weights.
4506 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

Weight constraints lead to further conditions on learning may be enlarged is of order 1, if we take the upper bound to
parameters be q 5(11d)J av , where d.0 is of order 1, which will be
*
We have seen that normalization is possible without a J assumed throughout what follows; cf. Appendix D.
dependence of the learning parameters. Even if the average
weight J av approaches a fixed point J av , there is no restric- C. Structure formation
*
tion for the size of individual weights, apart from J i >0 for In our simple model with two groups of input, structure
excitatory synapses and J i &N J . This means that a single
av
formation can be measured by the difference J str between the
*
weight at most comprises the total ~normalized! weight of all average synaptic strength in groups M1 and M2 ; cf. Sec.
N synapses. The latter case is, however, unphysiological, V A. We derive conditions under which this difference in-
since almost every neuron holds many synapses with nonva- creases during learning. In the course of the argument we
nishing efficacies ~weights! and efficacies of biological syn- also show that structure formation takes place on a time scale
apses seem to be limited. We take this into account in our t str considerably slower than the time scale t av of normaliza-
learning rule by introducing a hard upper bound q for each tion.
individual weight. As we will demonstrate, a reasonable We start from Eq. ~15! with k 3 50 and randomly distrib-
value of q does not influence normalization in that J av re- uted weights. For the moment we assume that normalization
*
mains unchanged. However, an upper bound q .0, whatever has already taken place. Furthermore, we assume small cor-
its value, leads to further constraints on the learning param- relations as in Eq. ~23!, which assures that the fixed point
eters. J av'2k 1 /(Nk 2 ) is almost constant during learning; cf. Eqs.
To incorporate the restricted range of individual weights *
~21! and ~C1!. If the formation of any structure in $ J i % is
into our learning rule ~1!, we assume that we can treat the slow as compared to normalization, we are allowed to use
learning parameters w in,w out, and the amplitude of W to be J av5J av during learning. The consistency of this ansatz is
constant in the range 0<J i < q . For J i ,0 or J i . q , we take *
checked at the end of this section.
w in5w out5W50. In other words, we use Eq. ~15! only be- The average weight in each of the two groups M1 and
tween the lower bound 0 and the upper bound q and set M2 is
dJ i /dt50 if J i ,0 or J i . q .
Because of lower and upper bounds for each synaptic 1 1
weight, 0<J i < q for all i, a realizable fixed point J av has to
*
J ~ 1! 5
M1 (
iPM 1
Ji and J ~ 2! 5
M2 (
iPM2
Ji . ~26!
be within these limits. Otherwise all weights saturate either
at the lower or at the upper bound. To avoid this, we first of
If lower and upper bounds do not influence the dynamics of
all need J av.0. Since t av5J av/k 1 in Eq. ~22! must be posi-
* * each weight, the corresponding rates of change are
tive for stable fixed points, we also need k 1 .0. The meaning
becomes transparent from Eq. ~14! in the case of vanishing
spontaneous activity in the output, n 0 50. Then k 1 .0 re- J̇ ~ 1! 5k 1 1M 1 J ~ 1! k 2 1M 2 J ~ 2! k 2 ,
duces to ~27!
~ 2! ~ 2! ~ 1!
J̇ 5k 1 1M 2 J ~ k 2 1Q ! 1M 1 J k2 .
w in.0, ~24!
One expects the difference J str5J (2) 2J (1) between those av-
which corresponds to neurobiological reality @28,56,31#.
erage weights to grow during learning because group M2
A second condition for a realizable fixed point arises
receives a stronger reinforcement than M1 . Differentiating
from the upper bound q .J av . This requirement leads to
* J str with respect to time, using Eq. ~27! and the constraint
k 2 ,2k 1 /(N q )2Q av. Exploiting only k 2 ,0, we find from
J av5J av5N 21 (M 1 J (1) 1M 2 J (2) ), we find the rate of growth
Eq. ~14! that w out1W̃(0) n in,0, which means that postsyn- *
aptic spikes on average reduce the total weight of synapses. M1 M2
This is one of our predictions that can be tested experimen- J̇ str5 Q J str1M 2 Q J av . ~28!
N *
tally. Assuming W̃(0).0, which seems reasonable — with
the benefit of hindsight — in terms of rate-coded learning in
the manner of Hebb ~Sec. III!, we predict The first term on the right-hand side gives rise to an expo-
nential increase (Q.0) while the second term gives rise to a
w out,0, ~25! linear growth of J str. Equation ~28! has an unstable fixed
point at J str52N/M 1 J av . Note that J str is always negative
* * *
which has not been verified by experiments yet. and independent of Q.
Weight constraints do not influence the position of the We associate the time constant t of structure formation
str

fixed point J av ~as long as it remains realizable! but may with the time that is necessary for an increase of J str from a
* typical initial value to its maximum. The maximum of J str is
enlarge the value of the time constant t av of normalization
~see details in Appendix D!. The time constant t av changes of order J av if M 1 /M 2 is of order 1 ~Sec. V A! and if q
*
because weights saturated at the lower ~upper! bound cannot 5(11d) J av , where d.0 is of order 1 ~Sec. V B!. At the
*
contribute to a decrease ~increase! of J av. If fewer than the beginning of learning (t50) we may take J str(0)50. Using
total number of weights add to our ~subtractive! normaliza- this initial condition, an integration of Eq. ~28! leads to
tion, then the fixed point is approached more slowly; cf. Fig. J str(t)5(N/M 1 ) J av@ exp(t M 1M 2Q/N)21#. With t5 t str and
*
8, dashed line and insets. The factor, however, by which t av J str( t str)5J av we obtain t str5N/(M 1 M 2 Q) ln(M 1 /N11).
*
PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4507

FIG. 10. The asymptotic distribution of weights $ J i % at time t


5105 s; signatures are as in Fig. 7. This distribution is the final
result of the numerical simulation shown in Fig. 9 and remains
stable thereafter apart from minor rapid fluctuations. All but one of
the synapses are saturated either at the lower or at the upper bound.
FIG. 9. Temporal evolution of average weights J av, J ( 1 ) , and
(2) In stable weight distributions, it is now shown that all
J as a function of the learning time t in units of 104 s. The
quantity J av is the average weight of all synapses, J ( 1 ) and J ( 2 ) are
synapses but one are saturated either at the lower or the
average weights in groups M1 and M2 , respectively. Synapses i in upper bound. In the scenario of Fig. 10, the k 3 term keeps a
group M1 , where 1<i<25, receive incoherent input, whereas syn- single weight J m 1 in group M1 at the upper bound q , even
apses i in group M2 , where 26<i<50, are driven by a coherently though there is a nonsaturated one J m 2 in M2 . Group M2
modulated input intensity. Parameters are as given in Appendix B. ~in contrast to M1 ) comprises most of the total weight and is
Simulations started at time t50 with a homogeneous weight distri- driven by positively correlated input. Why does J m 1 not
bution J i 5 q 50.1 for all i. The normalization of the average
weights takes place within a time of order O(100 s); see also the
decrease in favor of J m 2 ? The answer comes from Eq.
uppermost full line in Fig. 8. On the time scale of t str52.93 ~15!. Weight J m 1 receives a stronger reinforcement than
3104 s a structure in the distribution of weights emerges in that J m 2 , if J̇ m 1 .J̇ m 2 holds. Using Eq. ~15! we find J m 1
J ( 2 ) grows at the expense of J ( 1 ) . The average weight J av remains
almost unaffected near J av5231022 ~dashed line!. The slight en- .Q/k 3 ( jPM2 J j 1J m 2 . Approximating ( jPM2 J j 5N J av
* *
largement of J av between t5104 s and t573104 s can be ex- 52k 1 /k 2 we obtain J m 1 .2Q k 1 /(k 2 k 3 )1J m 2 . This con-
plained by using Eq. ~C1! and taking also the k 3 term into account. dition is fulfilled because J m 1 '0.1, J m 2 '0.04 ~cf. Fig. 10!,
The insets ~signatures as in Figs. 7 and 8! show the weight distri-
and 2Q k 1 /(k 2 k 3 )'0.01 ~cf. Table I!; here k 1 .0, k 2 ,0,
butions at times t5103 , 104 , 2.933104 , and 73104 s ~arrows!.
and k 3 .0.
Since we only need an estimate of t str we drop the logarithm,
VI. NOISE
which is of order 1. Finally, approximating N/(M 1 M 2 ) by
1/N we arrive at the estimate In this section we discuss the influence of noise on the
evolution of each weight. Noise may be due to jitter of input
t str5 ~ N Q ! 21 . ~29! and output spikes and the fact that we deal with spikes per se
~Sec. II B!. This gives rise to a random walk of each weight
We could adopt a refined analysis similar to the one we have
around the mean trajectory described by Eq. ~15!. The vari-
used for J av to discuss the effects of the upper and lower
ance var J i (t) of this random walk increases linearly with
bounds for individual weights. We will not do so, however,
time as it does in free diffusion. From the speed of the vari-
since the result ~29! suffices for our purpose: the comparison
ance increase we derive a time scale t noise. A comparison
of time constants.
with the time constant t str of structure formation leads to
A comparison of t av in Eq. ~22! with t str in Eq. ~29!
further constraints on our learning parameters and shows
shows that we have a separation of the fast time scale of
that, in principle, any correlation in the input, however weak,
normalization from the slow time scale of structure forma-
can be learned, if there is enough time available for learning.
tion, if Eq. ~23! holds.
The calculation of var J i is based on four approximations.
A numerical example confirming the above theoretical
First, we neglect upper and lower bounds of the learning
considerations is presented in Fig. 9. Simulation parameters
dynamics as we have done for the calculation of the time
are as given in Appendix B.
constants of normalization ~Sec. V B! and structure forma-
tion ~Sec. V C!. Second, we neglect spike-spike correlations
D. Stabilization of learning
between input and output and work directly with ensemble-
Up to this point we have neglected the influence of the k 3 averaged rates. As we have seen, spike-spike correlations
term in Eq. ~15!, which may lead to a stabilization of weight show up in the term k 3 in Eq. ~15! and have little influence
distributions, in particular when synapses are few and strong on learning, given many, weak synapses and an appropriate
@22,54#; cf. Sec. IV D. This is the case, for example, in the scenario for our learning parameters; cf. Sec. IV C. Third, we
assume constant input rates l in i (t)5 n for all i. A temporal
in
scenario of Fig. 10, which is the final result of the simula-
tions described in Fig. 9. The shown weight distribution is structure in the input rates is expected to play a minor role
stable so that learning has terminated apart from minor rapid here. Fourth, as a consequence of constant input rates we
fluctuations due to noise. assume a constant output rate l out(t)5 n out.
4508 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

Despite such a simplified approach, we can study some


interesting effects caused by neuronal spiking. Within the
limits of our approximations, input and output spikes are
generated by independent Poisson processes with constant
intensities. The variance var J i (t) increases basically because
of shot noise at the synapses. We now turn to the details.

A. Calculation of the variance


We start with some weight J i (t 0 ) at time t 0 and calculate
the variance var J i (t)ª ^ J 2i & (t)2 ^ J i & 2 (t) as a function of t
for t.t 0 . Angular brackets ^& again denote an ensemble
average; cf. Sec. II B. A detailed analysis is outlined in Ap-
FIG. 11. Influence of noise. We compare numerical results for
pendix E. The result is the evolution of the variance var$ J i % (t) defined in Eq. ~32! ~full
var J i ~ t ! 5 ~ t2t 0 ! D for t2t 0 @W, ~30! lines! with our theoretical prediction ~dashed line! based on Eq.
~33! with D 8 51.4731029 s21 . Learning starts at time t50 with a
where W is the width of the learning window W ~cf. Sec. homogeneous distribution, J i 5J av50.02 for all i. The thin line cor-
*
II C! and responds to the simulation of Fig. 8 with initial condition J av
50.02, viz., two groups of 25 synapses each. The thick line has

D5 n in~ w in! 2 1 n out~ w out! 2 1 n inn out E ds W ~ s ! 2


been obtained with incoherent input for all 50 synapses, l in
5 n in for all i ~all other parameters in Appendix B being equal!.
Because the number of synapses is finite (N550), deviations from
i (t)

1 n inn outW̃ ~ 0 ! @ 2 ~ w in1w out! 1W̃ ~ 0 !~ n in1 n out!# . the dashed straight line are due to fluctuations. The overall agree-
ment between theory and simulation is good.
~31!
var $ J i % (t) as long as upper and lower bounds have not been
Thus because of Poisson spike arrival and stochastic output reached. Furthermore, all synapses ‘‘see’’ the same spike
firing with disjoint intervals being independent, each weight train of the postsynaptic neuron they belong to. In contrast to
J i undergoes a diffusion process with diffusion constant D. that, input spikes at different synapses are independent.
To discuss the dependence of D upon the learning param- Again we assume that input and output spikes are indepen-
eters, we restrict our attention to the case n in5 n out in Eq. dent; cf. the second paragraph at the beginning of Sec. VI.
~31!. Since mean input and output rates in biological neurons Combining the above arguments, we obtain the diffusion
typically are not too different, this makes sense. Moreover, constant D 8 by simply setting w out50 and disregarding the
we do not expect that the ratio n in/ n out is a critical parameter. term @ n inW̃(0) # 2 n out in Eq. ~31!, which leads to
We recall from Sec. V B that n out52k 1 /k 2 n in once the D 8 5 n in (w in) 2 1 n in n out@ * ds W(s) 2 12 w in W̃(0)1W̃(0) 2 # .
weights are already normalized and if n 0 5Q av50. With The boundaries of validity of Eq. ~33! are illustrated in Fig.
n in5 n out this is equivalent to k 1 52k 2 . Using the definition 12.
of k 1 and k 2 in Eq. ~14! we find w in1w out52W̃(0) n in. If
we insert this into Eq. ~31!, the final term vanishes. In what B. Time scale of diffusion
remains of Eq. ~31! we identify the contributions due to input The effects of shot noise in input and output show up on
spikes, n in (w in) 2 , and output spikes, n out (w out) 2 . Weight a time scale t noise which may be defined as the time interval
changes because of correlations between input and output necessary for an increase of the variance ~30! from
spikes enter Eq. ~31! via n inn out* ds W(s) 2 . var J i (t 0 )50 to var J i (t 0 1 t noise)5(J av) 2 . We chose J av as a
* *
Equation ~30! describes the time course of the variance of reference value because it represents the available range for
a single weight. Estimating var J i is numerically expensive each weight. From Eq. ~30! we obtain t noise5(J ) 2 /D. We av
*
because we have to simulate many independent learning tri- use J av52k 1 /(N k 2 ) from Eq. ~21! and Q av50. This yields
*

S D
als. It is much cheaper to compute the variance of the distri- 2
bution $ J i % of weights in a single learning trial. For the sake k1
1
t noise5 . ~34!
of a comparison of theory and numerics in Fig. 11, we plot 2 k
N D 2
N
1
(
C. Comparison of time scales
var $ J i % ~ t ! ª @ J ~ t ! 2J av~ t !# 2 , ~32!
N21 i51 i We now compare t noise in Eq. ~34! with the time constant
t 51/(N Q) of structure formation as it appears in Eq. ~29!.
str
which obeys a diffusion process with The ratio
var $ J i % ~ t ! 5 ~ t2t 0 ! D 8 ,

in a way similar to Eq. ~30!. The diffusion constant D 8 is,


~33! t noise
t str
5 S D
Q k1
N D k2
2
~35!

however, different from D because weights of single neurons should exceed 1 so as to enable structure formation ~in the
do not develop independently of each other. Each output sense of Sec. V C!. Otherwise weight diffusion due to noise
spike triggers the change of all N weights by an amount w out. spreads the weights between the lower bound 0 and the up-
Therefore, output spikes do not contribute to a change of per bound q and, consequently, destroys any structure.
PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4509

be motivated by elementary dynamic processes at the level


of the synapse @54,57# and can also be implemented in hard-
ware; cf. @40#. A phenomenological model of the experimen-
tal effects which is close to the model studied in the present
paper has been introduced @42#. A compartmental model of
the biophysics and ion dynamics underlying spike-based
learning along the lines of @58# has not been attempted yet.
As an alternative to changing synaptic weights, spike-based
learning rules which act directly on the delays may also be
considered @59–61#.
The learning rule ~1! discussed in the present paper is
rather simple and contains only terms that are linear and
FIG. 12. The variance var $ J i % (t) as in Fig. 11 but on a longer quadratic in the presynaptic and postsynaptic spikes ~‘‘Heb-
time scale. Thick line: all 50 synapses receive incoherent input. bian’’ learning!. This simple mathematical structure, which
Thin line: two groups of synapses that are treated differently, as in is based on experiment @25,26,30#, has allowed us to derive
Figs. 8, 9, and 10. The four insets ~signatures as in Fig. 7! corre- analytical results and identify some key quantities.
spond to the thick line scenario and show the evolution of the dis- First of all, if the input signal contains no correlations
tribution of synaptic weights. As in Fig. 11, full lines are numerical with the output at the spike level, and if we use a linear
results and the dashed line is our theoretical prediction. Both differ Poissonian neuron model, the spike-based learning rule re-
significantly for times t.104 s. The reason is that Eq. ~33! does duces to Eq. ~15!, which is closely reminiscent of Linsker’s
not include the influence of correlations between input and output. linear learning equation for rate coding @3#. The only differ-
Spike-spike correlations due to the k 3 term increase weights with a ence is an additional term k 3 , which is not accounted for by
velocity proportional to their weight; cf. Eq. ~15!. Large weights, pure rate models. It is caused by precise temporal correla-
which are already present at times t*23104 s ~see inset!, there- tions between an output spike and an input spike that has
fore grow at the expense of the smaller ones. This gives rise to an triggered the pulse. This additional term reinforces synapses
enlarged variance ~thick full line!. In the thin-line scenario, we also that are already strong and hence helps to stabilize existing
have the Q i j term in Eq. ~15!, which contributes to an additional
synapse configurations.
increase of var$ J i % . Finally, at t'105 s, var$ J i % saturates because
In the limit of rate coding, the form of the learning win-
most of the weights are either at the lower or at the upper bound.
dow W is not important but only the integral * ds W(s)
counts: * ds W(s).0 would be called ‘‘Hebbian,’’
We note that D in Eq. ~35! is quadratic in w in, w out, and * ds W(s),0 is sometimes called ‘‘anti-Hebbian’’ learning.
W, whereas k 1 , k 2 , and Q are linear; cf. Eqs. ~12!, ~14!, In general, however, input rates may be modulated on a fast
~17!, and ~31!. As a consequence, scaling w in, w out, and time scale or contain correlations at the spike level. In this
W(s) ~or the learning parameter h ) in Eq. ~1! by a common case, the shape of the learning window does matter. A learn-
factor g changes the ratio of time constants in Eq. ~35! by ing window with a maximum at s * ,0 ~thus maximal in-
1/g without affecting the ~normalized! mean output rate and crease of the synaptic strength for a presynaptic spike pre-
the fixed points J av and J str52J av N/M 1 ; cf. Eq. ~21!. ceding a postsynaptic spike; cf. Fig. 6! picks up the
* * *
Hence it is always possible to achieve t noise/ t str.1 by tuning correlations in the input. In this case a structured distribution
g . This means that any covariance matrix ~11! that gives rise of synaptic weights may evolve @22#.
to Q.0, however small, can be learned. More precisely, it The mathematical approach developed in this paper leads
can be learned if there is enough time for learning. to a clear distinction between different time scales. First, the
A reduction of g also increases the time constant t str fastest time scale is set by the time course of the postsynaptic
51/(N Q) of structure formation; cf. Eq. ~29!. If the learning potential e and the learning window W. Correlations in the
time is limited, which may be the case in biological systems, input may occur on the same fast time scale, but can also be
only input with Q larger than some minimal value can be slower or faster, there is no restriction. Second, learning oc-
learned. Considering the learning parameters as fixed, we see curs on a much slower time scale and in two phases: ~i! an
that increasing the number of synapses, on the one hand, intrinsic normalization of total synaptic weight and the out-
helps reduce the time t str necessary for learning but, on the put firing rate followed by ~ii! structure formation. Third, if
other hand, decreases the ratio t noise/ t str in Eq. ~35!, possibly the learning rate is small enough, then diffusion of the
below 1. weights due to noise is slow as compared to structure forma-
With parameters as given in Appendix B, the ratio ~35! is tion. In this limit, the learning process is described by the
5.5. Therefore, the desired structure in Fig. 9 can emerge differential equation ~4! for the expected weights.
before noise spreads the weights at random. Normalization is possible, if at least w in.0 and w out,0
for * ds W(s).0 in Eq. ~1! ~‘‘Hebbian’’ learning!. In this
VII. DISCUSSION
case, the average weight may decay exponentially to a fixed
point, though there is no decay term for individual weights.
Changes of synaptic efficacies are triggered by the rela- In other words, normalization is an intrinsic property since
tive timing of presynaptic and postsynaptic spikes @25,26#. we do not invoke multiplicative or subtractive rescaling of
The learning rule ~1! discussed in this paper is a first step weights after each learning step @2,7,55#.
towards a description and analysis of the effects of synaptic The fluctuations due to noise have been treated rather
changes with single-spike resolution. Our learning rule can crudely in the present paper. In principle, it should be pos-
4510 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

sible to include the effects of noise directly at the level of the The term ^ S in
i (t1s) f i (t) & in Eq. ~A1! has to be handled with
differential equation, as is standard in statistics @62#. Such an more care as it describes the influence of synapse i on the
approach would then lead to a Fokker-Planck equation for firing behavior of the postsynaptic neuron,
the evolution of weights as discussed in @63#. All this is in
principle straightforward but in practice very cumbersome.
Finally, we emphasize that we have used a crudely over-
simplified neuron model, viz., a linear stochastic unit. In par- ^ S ini ~ t1s ! f i ~ t ! &
ticular, there is no spike emission threshold nor reset or spike
afterpotential. Poisson firing is not as unrealistic as it may at
first seem, though. Large networks of integrate-and-fire neu-
rons with stochastic connectivity exhibit Poisson-like firing
@64#. Experimental spike interval distributions are also con-
5 KF( f8
GF
d ~ t1s2t if 8 ! J i ~ t ! ( e ~ t2t if !
f
GL . ~A3!

sistent with Poisson firing @65#. In the present paper, the


simple Poisson model has been chosen so as to grasp the
mathematics and get an explicit expression for the correla- The first term on the right in Eq. ~A3! samples spike events
tion between input and output spikes. The formulation of the at time t1s. To be mathematically precise, we sample all
learning rule and the derivation of the learning equation ~4! spikes in a small interval of size Dt around t1s, average,
is general and holds for any neuron model. The calculation
and divide by Dt. We replace the first sum in Eq. ~A3! by the
of the correlations which enter in the definition of the param-
~approximate! identity (Dt) 21 1$ spike in [t1s,t1s1Dt) % , where
eter Q i j in Eq. ~15! is, however, much more difficult, if a
nonlinear neuron model is used. 1$ % is the indicator function of the set $ % ; i.e., it equals 1
Spike-based Hebbian learning has important implications when its argument is in the set $ % and 0 elsewhere. Because
for the question of neural coding since it allows us to pick up the postsynaptic potential e is a continuous function, we ap-
and stabilize fast temporal correlations @38,22,41#. A better proximate the second sum by ( k 1$ spike in [t k ,t k 1Dt) % e (t2t k ),
understanding of spike-triggered learning may thus also con- where $ @ t k ,t k 1Dt),kPZ% is a decomposition of the real
tribute to a resolution of the problem of neural coding axis. Since it is understood that Dt→0, all events with two or
@17,19,65–67#. more spikes in an interval @ t k ,t k 1Dt) have a probability
o(Dt) and, hence, can be neglected. It is exactly this prop-
ACKNOWLEDGMENTS erty that is typical to a Poisson process — and to any bio-
logical neuron.
The authors thank Christian Leibold for helpful comments What we are going to compute is the correlation between
and a careful reading of the manuscript. R.K. has been sup- S in out
i , the input at synapse i, and the output S , which is
ported by the Deutsche Forschungsgemeinschaft ~DFG! un-
governed by all synapses, including synapse i. Here the sim-
der Grant Nos. He 1729/8-1 and Kl 608/10-1 ~FG Hörob-
plicity of the linear Poissonian neuron model pays off as S out
jekte!. The authors also thank the DFG for travel support
~Grant No. He 1729/8-1!. is linear in the sum of the synaptic inputs and, hence, in each
of them. Furthermore, whatever the model, the synaptic ef-
ficacies J i (t) are changing adiabatically with respect to the
APPENDIX A: PROOF OF EQ. „9… neuronal dynamics so that they can be taken to be constant
In proving Eq. ~9! there is no harm in setting n 0 50. We and, thus, out of the average. In the limit Dt→0 we can
then have to compute the average therefore rewrite the right-hand side of Eq. ~A3! so as to find

K F
^ S ini ~ t1s ! S out~ t ! & 5 S ini ~ t1s ! f i ~ t ! 1 (
j ~ Þi !
f j~ t ! GL , ~A1!
J i ~ t !~ Dt ! 21 (k e ~ t2t k !
where f i (t)5J i (t) ( f e (t2t if ) with the upper index f ranging 3 ^ 1$ spike in [t1s,t1s1Dt) % 1$ spike in [t k ,t k 1Dt ! % & .
over the firing times t if ,t of neuron i, which has an axonal
connection to synapse i; here 1<i<N. Since e is causal, i.e., ~A4!
e (s)[0 for s,0, we can drop the restriction t if ,t. The
synapses being independent of each other, the sum over j
(Þi) is independent of S in i and thus we obtain
Without restriction of generality we can choose our parti-
tion so that t k 5s1t for some k, say k5l. Singling out k

K S in
i ~ t1s ! F ( GL
j ~ Þi !
f j~ t !
5l, the rest (kÞl) can be averaged directly, since events in
disjoint intervals are independent. Because ^ 1$ % &
5prob$ spike in @ t1s,t1s1Dt) % 5l in i (t1s) Dt, the result

F( G
is J i (t) l in
i (t1s) L in
i (t), where we have used Eq. ~8!. As for
5 ^ S in
i & ~ t1s ! ^ f j&~ t ! the term k5l, we plainly have 1$2 % 51$ % , as an indicator
j ~ Þi ! function assumes only two distinct values, 0 and 1. We ob-

5l in F( E
i ~ t1s !
j ~ Þi !
J j~ t !
`

0
dt 8 e ~ t 8 ! l in G
j ~ t2t 8 ! . ~A2!
i (t1s) e (2s).
tain J i (t) l in
Collecting terms and incorporating n 0 Þ0, we find Eq. ~9!.
PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4511

APPENDIX B: PARAMETERS FOR NUMERICAL SIMULATIONS

We discuss the parameter regime of the simulations as shown in Secs. V and VI. Numerical values and important derived
quantities are summarized in Table I.

1. Learning window
We use the learning window

S D F S D S DG
5
s s s
exp A 1 12 1A 2 12 for s<0,
t syn
t̃ 1 t̃ 2
W~ s !5h ~B1!
A 1 exp 2 S D
s
t1
1A 2 exp 2
s
t2 S D for s.0.

Here s is the delay between presynaptic spike arrival and


postsynaptic firing, h is a ‘‘small’’ learning parameter, E ds W ~ s ! e ~ 2s ! 5 h ~ t syn! 2 / ~ t syn1 t 0 ! 3
t syn, t 1 , t 2 , t̃ 1 ª t synt 1 /( t syn1 t 1 ), and t̃ 2 ª t synt 2 /( t syn
1 t 2 ) are time constants. The dimensionless constants A 1 3 @ A 2 ~ 2 t syn t 0 / t 2 1 t syn13 t 0 !
and A 2 determine the strength of synaptic potentiation and
1A 1 ~ 2 t syn t 0 / t 1 1 t syn13 t 0 !# .
depression, respectively. Numerical values are h
51025 , t syn55 ms, t 1 51 ms, t 2 520 ms, and A 1 ~B5!
51, A 2 521; cf. Table I. The learning window ~cf. Fig. 6!
is in accordance with experimental results @25,26,28,29,33#.
A detailed explanation of our choice of the learning window 3. Synaptic input
on a microscopical basis of Hebbian learning can be found The total number of synapses is N550. For 1<i<M 1
elsewhere @54,57#. 525 synapses in group M1 we use a constant input intensity
For the analysis of the learning process we need the inte-
i (t)5 n . The remaining M 2 525 synapses receive a peri-
l in in
grals W̃(0)ª * ds W(s) and * ds W(s) 2 . The numerical re-
i (t)5 n 1 d n cos(v t) for iPM2 ; cf. also
odic intensity, l in in in
sult is listed in Table I. Using c 1 ª t syn/ t 1 and Sec. V A 3. Numerical parameters are n in510 Hz, d n in
c 2 ª t syn/ t 2 we obtain 510 Hz, and v /(2 p )540 Hz. For the comparison of
theory and simulation we need the value of Q in Eq. ~18!.
E ds W ~ s ! 5 h t syn@ A 2 ~ 21c 2 1c 21
2 !
We numerically took the Fourier transforms of e and W at
the frequency v . The time constant t str is calculated via Eq.
~29!; cf. Table I.
1A 1 ~ 21c 1 1c 21
1 !# ~B2!
4. Parameters w in, w out, n 0 , and q
and
We use the learning parameters w in5 h and w out
521.0475 h , where h 51025 . The spontaneous output rate
E ds W ~ s ! 2 5
h2
4
$ A 22 t 2 @ c 32 14c 22 15c 2 12 # is n 0 50 and the upper bound for synaptic weights is q
50.1. These values have been chosen in order to fulfill the
following five conditions for learning. First, the absolute val-
1A 21 t 1 @ c 31 14c 21 15c 1 12 # ues of w in and w out are of the same order as the amplitude of
12 A 1 A 2 t syn @ c 1 c 2 12 ~ c 1 1c 2 ! the learning window W; cf. Fig. 6. Furthermore, these abso-
lute values are small as compared to the normalized average
1514/~ c 1 1c 2 !# % . ~B3! weight ~see below!. Second, the constraints on k 1 and k 2 for
a stable and realizable fixed point are satisfied; cf. Sec. V B
and Eq. ~14!. Third, the correlations in the input are weak so
2. Postsynaptic potential
that 0,Q!2k 2 ; cf. Eq. ~23!. This implies that the time
We use the excitatory postsynaptic potential ~EPSP! scale t av of normalization in Eq. ~22! is orders of magnitude
smaller than the time scale t str of structure formation in Eq.
e ~ s ! 5s/ t 20 exp~ 2s/ t 0 ! H~ s ! , ~B4! ~29!; cf. also Table I. Fourth, the k 3 term in Eq. ~14! can be
neglected in the sense of Sec. IV C. Proving this, we note
where H( ) denotes the Heaviside step function, and that the fixed point for the average weight is J av5231022
*
* ds e (s)51. For the membrane time constant we use t 0 @cf. Eq. ~21!# and k 3 57.0431025 s21 . We now focus on
510 ms, which is reasonable for cortical neurons @68,69#. Eq. ~16!. Since Q i j (! u k 2 u for all i, j) can be neglected and
The EPSP has been plotted in Fig. 4. Using Eqs. ~B1! and J i < q for all i, we find from Eq. ~16! the even more restric-
~B4! we obtain tive condition N u k 2 u J av/ q @ u k 3 u which is fulfilled in our pa-
*
4512 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

rameter regime. Fifth, input and output rates are identical for APPENDIX E: RANDOM WALK OF SYNAPTIC WEIGHTS
normalized weights, n in5 n out for n 0 50; see Sec. V B.
We consider the random walk of a synaptic weight J i (t)
for t.t 0 , where J i (t 0 ) is some starting value. The time
APPENDIX C: NORMALIZATION AND CORRELATIONS
course of J i (t) follows from Eq. ~1!,
BETWEEN WEIGHTS AND INPUT RATES

The assumption of independence of the weights $ J i % and


the rates $ l in
i % used in Sec. V B for the derivation of a nor-
J i ~ t ! 5J i ~ t 0 ! 1 Et0
t
dt 8 @ w in S in
i ~ t 8 ! 1w S ~ t 8 !#
out out

malization property of Eq. ~15! is not valid in general. Dur-


ing learning we expect weights to change according to their
input. For the configuration of the input as introduced in Sec.
V A this depends on whether synapses belong to groups M1
1 E 8E
t0
t
dt
t

t0
dt 9 W ~ t 9 2t 8 ! S in
i ~ t9! S ~ t8!.
out

~E1!
or M2 . To show that even under the condition of interde-
pendence of $ J i % and $ l in
i % there is a normalization property For a specific i, the spike trains S in i (t 8 ) and S (t 8 ) are now
out
of Eq. ~15! similar to that derived in Sec. V B, we investigate
assumed to be statistically independent and generated by
the most extreme case in which the total mass of synaptic
Poisson processes with constant rates n in for all i and n out,
weight is, e.g., in M2 . Taking J i 50 for iPM1 into
respectively; cf. Secs. II A and IV A. Here n in can be pre-
account, we replace N 21 ( i j J j Q i j in Eq. ~20! by
scribed, whereas n out then follows; cf., for instance, Eq. ~7!.
M 21 2 av av
2 N J Q . The fixed point J
av
is similar to that in Eq. For large N the independence is an excellent approximation.
*
~21! except for a multiplicative prefactor N/M 2 of order 1 The learning parameters w in and w out can be positive or
preceding Q av in Eq. ~21!, negative. The learning window W is some quadratically in-
tegrable function with a width W as defined in Sec. II C.
J av52k 1 / @ N ~ k 2 1Q av N/M 2 !# . ~C1! Finally, it may be beneficial to realize that spikes are de-
*
scribed by d functions.
Since N/M 2 .1, k 1 .0, and k 2 1Q avN/M 2 ,0, J av in Eq. The weight J i (t) is a stepwise constant function of time;
* see Fig. 2 ~bottom!. According to Eq. ~E1!, an input spike
~C1! is larger than J av in Eq. ~21!, where we assumed inde-
* in arriving at synapse i at time t changes J i at that time by a
pendence of $ J i % and $ l i % . Correlations between $ J i % and
$ l in constant amount w in and a variable amount * tt dt 8 W(t
i % can be neglected, however, if we assume 0,Q'Q
av
0
!2k 2 ; cf. Eq. ~23!. In this case, J in Eqs. ~21! and ~C1!
av 2t 8 ) S out(t 8 ), which depends on the sequence of output
* spikes in the interval @ t 0 ,t # . Similarly, an output spike at
are almost identical and independent of Q av.
time t results in a constant weight change w out and a variable
one that equals * tt dt 9 W(t 9 2t) S in i (t 9 ). We obtain a random
APPENDIX D: NORMALIZATION AND WEIGHT 0

CONSTRAINTS
walk with independent steps but randomly variable step size.
Suitable rescaling of this random walk leads to Brownian
Let us consider the influence of weight constraints ~Sec. motion.
V B! on the position of the fixed point J av in Eq. ~21! and the As in Sec. II B, we substitute s5t 9 2t 8 in the second line
* of Eq. ~E1! and extend the integration over the new variable
time constant t av of normalization in Eq. ~22!. We call N ↓
and N ↑ the number of weights at the lower bound 0 and the s so as to run from 2` to `. This does not introduce a big
upper bound q .0, respectively. By construction we have error for t2t 0 @W. The second line of Eq. ~E1! then reduces
N ↓ 1N ↑ <N, where N is the number of synapses. to * ds W(s) * tt dt 8 S in
i (t 8 1s) S (t 8 ).
out
0
For example, if the average weight J av approaches J av We denote ensemble averages by angular brackets ^ & .
*
from below, then only N2N ↑ weights can contribute to an The variance then reads
increase of J av. For the remaining N ↑ saturated synapses we
have J̇ i 50. Deriving from Eq. ~15! an equation equivalent to var J i ~ t ! 5 ^ J 2i & ~ t ! 2 ^ J i & 2 ~ t ! . ~E2!
Eq. ~20!, we obtain J̇ av5(12N ↑ /N) (k 1 1N k 2 J av
1J av Q av/N). The fixed point J av remains unchanged as To simplify the ensuing argument, upper and lower bounds
*
compared to Eq. ~21! but the time constant t av for an ap- for each weight are not taken into account.
proach of J av from below is increased by a factor (1 For the calculation of the variance in Eq. ~E2!, first of all
*
2N ↑ /N) 21 >1 as compared to Eq. ~22!. Similarly, t av for we consider the term ^ J i & (t). We use the notation ^ S ini & (t)
an approach of J av from above is increased by a factor (1 5 n in and ^ S out& (t)5 n out because of constant input and out-
* put intensities. Stochastic independence of input and output
2N ↓ /N) 21 >1.
i (t 8 1s) S (t 8 ) & 5 n n . Using Eq. ~E1! and
leads to ^ S in out in out
The factor by which t av is increased is of order 1, if we
use the upper bound q 5(11d) J av , where d. is of order 1. * ds W(s)5W̃(0) we then obtain
*
If J av5J av , at most N ↑ 5N/(11d) synapses can saturate at
*
the upper bound comprising the total weight. The remaining ^ J i & ~ t ! 5J i ~ t 0 ! 1 ~ t2t 0 !@ w inn in1w outn out1 n inn outW̃ ~ 0 !# .
N ↓ 5N2N/(11d) synapses are at the lower bound 0. The ~E3!
time constant t av is enhanced by at most 111/d and 11d
for an approach of the fixed point from below and above, Next, we consider the term ^ J 2i & (t) in Eq. ~E2!. Using Eq.
respectively. ~E1! once again we obtain for t2t 0 @W
PRE 59 HEBBIAN LEARNING AND SPIKING NEURONS 4513

E E We note that ^ S in
i & 5 n for all i and ^ S & 5 n .
t t in out out
^ J 2i & ~ t ! 52J i ~ t 0 ! 2 12J i ~ t 0 ! ^ J i & ~ t ! 1 dt 8 du 8 Input spikes at times t 8 and u 8 are independent as long as
t0 t0
t 8 Þu 8 . In this case we therefore have ^ S in i (t 8 ) S i (u 8 ) &
in

H i ~ t 8 ! S i ~ u 8 !& ~ w !
3 ^ S in in in 2 5 n n . For arbitrary times t 8 and u 8 we find ~cf. Appendix
A!
in in

1 ^ S out~ t 8 ! S out~ u 8 ! & ~ w out! 2 ^ S ini ~ t 8 ! S ini ~ u 8 ! & 5 n in @ n in1 d ~ t 8 2u 8 !# . ~E6!


i ~t8!
1 ^ S in S ~ u 8 !& 2 w w
out in out
Similarly, for the correlation between output spike trains we
12 E ds W ~ s ! i ~t8!
@ ^ S in i ~ u 8 1s !
S in S out~ u 8 ! & w in
obtain

^ S out~ t 8 ! S out~ u 8 ! & 5 n out @ n out1 d ~ t 8 2u 8 !# . ~E7!


1^S ~ t8!out
i ~ u 8 1s !
S in S ~ u 8 !& w #
out out

E E
Using Eqs. ~E5!, ~E6!, and ~E7! in Eq. ~E4!, performing
1 ds dv W~ s ! W~ v ! the integrations, and inserting the outcome together with Eq.
~E3! into Eq. ~E2!, we arrive at

i ~ t 8 1s ! S i ~ u 8 1 v ! S ~ t 8 ! S ~ u 8 ! & .
3 ^ S in in out out
J var J i ~ t ! 5 ~ t2t 0 ! D, ~E8!

~E4! where

E
Since input and output were assumed to be independent, we
get D5 n in~ w in! 2 1 n out~ w out! 2 1 n inn out ds W ~ s ! 2
^ S ini S ini S out S out& 5 ^ S ini S ini & ^ S out S out& ,
1 n inn outW̃ ~ 0 ! @ 2 ~ w in1w out! 1W̃ ~ 0 ! ~ n in1 n out!# ,
^ S ini S ini S out& 5 ^ S ini S ini & ^ S out& , ~E9!

^ S out S out S ini & 5 ^ S out S out& ^ S ini & . ~E5! as announced.

@1# D.O. Hebb, The Organization of Behavior ~Wiley, New York, @18# A.K. Kreiter and W. Singer, Eur. J. Neurosci. 4, 369 ~1992!.
1949!. @19# M. Abeles, in Models of Neural Networks II, edited by E.
@2# C. von der Malsburg, Kybernetik 14, 85 ~1973!. Domany, J.L. van Hemmen, and K. Schulten ~Springer, New
@3# R. Linsker, Proc. Natl. Acad. Sci. USA 83, 7508 ~1986!. York, 1994!, pp. 121–140.
@4# R. Linsker, Proc. Natl. Acad. Sci. USA 83, 8390 ~1986!. @20# W. Singer, in Models of Neural Networks II, edited by E.
@5# D.J.C. MacKay and K.D. Miller, Network 1, 257 ~1990!. Domany, J.L. van Hemmen, and K. Schulten ~Springer, New
@6# K.D. Miller, Neural Comput. 2, 321 ~1990!. York, 1994!, pp. 141–173.
@7# K.D. Miller and D.J.C. MacKay, Neural Comput. 6, 100 @21# J.J. Hopfield, Nature ~London! 376, 33 ~1995!.
~1994!. @22# W. Gerstner, R. Kempter, J.L. van Hemmen, and H. Wagner,
@8# S. Wimbauer, O.G. Wenisch, K.D. Miller, and J.L. van Hem- Nature ~London! 383, 76 ~1996!.
men, Biol. Cybern. 77, 453 ~1997!. @23# R.C. deCharms and M.M. Merzenich, Nature ~London! 381,
@9# S. Wimbauer, O.G. Wenisch, J.L. van Hemmen, and K.D. 610 ~1996!.
Miller, Biol. Cybern. 77, 463 ~1997!. @24# M. Meister, Proc. Natl. Acad. Sci. USA 93, 609 ~1996!.
@10# D.J. Willshaw and C. von der Malsburg, Proc. R. Soc. London, @25# H. Markram, J. Lübke, M. Frotscher, and B. Sakmann, Science
Ser. B 194, 431 ~1976!. 275, 213 ~1997!.
@11# K. Obermayer, G.G. Blasdel, and K. Schulten, Phys. Rev. A @26# L.I. Zhang et al., Nature ~London! 395, 37 ~1998!.
45, 7568 ~1992!. @27# B. Gustafsson et al., J. Neurosci. 7, 774 ~1987!.
@12# T. Kohonen, Self-Organization and Associative Memory @28# T.V.P. Bliss and G.L. Collingridge, Nature ~London! 361, 31
~Springer, Berlin, 1984!. ~1993!.
@13# L.M. Optican and B.J. Richmond, J. Neurophysiol. 57, 162 @29# D. Debanne, B.H. Gähwiler, and S.M. Thompson, Proc. Natl.
~1987!. Acad. Sci. USA 91, 1148 ~1994!.
@14# R. Eckhorn et al., Biol. Cybern. 60, 121 ~1988!. @30# T.H. Brown and S. Chattarji, in Models of Neural Networks II,
@15# C.M. Gray and W. Singer, Proc. Natl. Acad. Sci. USA 86, edited by E. Domany, J.L. van Hemmen, and K. Schulten
1698 ~1989!. ~Springer, New York, 1994!, pp. 287–314.
@16# C.E. Carr and M. Konishi, J. Neurosci. 10, 3227 ~1990!. @31# C.C. Bell, V.Z. Han, Y. Sugawara, and K. Grant, Nature ~Lon-
@17# W. Bialek, F. Rieke, R.R. de Ruyter van Steveninck, and D. don! 387, 278 ~1997!.
Warland, Science 252, 1855 ~1991!. @32# V. Lev-Ram et al., Neuron 18, 1025 ~1997!.
4514 KEMPTER, GERSTNER, AND van HEMMEN PRE 59

@33# H.J. Koester and B. Sakmann, Proc. Natl. Acad. Sci. USA 95, @49# M. Meister, L. Lagnado, and D.A. Baylor, Science 270, 1207
9596 ~1998!. ~1995!.
@34# L.F. Abbott and K.I. Blum, Cereb. Cortex 6, 406 ~1996!. @50# M.J. Berry, D.K. Warland, and M. Meister, Proc. Natl. Acad.
@35# A.V.M. Herz, B. Sulzer, R. Kühn, and J.L. van Hemmen, Eu- Sci. USA 94, 5411 ~1997!.
rophys. Lett. 7, 663 ~1988!. @51# W.E. Sullivan and M. Konishi, J. Neurosci. 4, 1787 ~1984!.
@52# C.E. Carr, Annu. Rev. Neurosci. 16, 223 ~1993!.
@36# A.V.M. Herz, B. Sulzer, R. Kühn, and J.L. van Hemmen, Biol.
@53# C. Köppl, J. Neurosci. 17, 3312 ~1997!.
Cybern. 60, 457 ~1989!.
@54# R. Kempter, Hebbsches Lernen zeitlicher Codierung: Theorie
@37# J.L. van Hemmen et al., in Konnektionismus in Artificial Intel- der Schallortung im Hörsystem der Schleiereule, Naturwissen-
ligence und Kognitionsforschung, edited by G. Dorffner schaftliche Reihe, Vol. 17 ~DDD, Darmstadt, 1997!.
~Springer, Berlin, 1990!, pp. 153–162. @55# L. Wiskott and T. Sejnowski, Neural Comput. 10, 671 ~1998!.
@38# W. Gerstner, R. Ritz, and J.L. van Hemmen, Biol. Cybern. 69, @56# P.A. Salin, R.C. Malenka, and R.A. Nicoll, Neuron 16, 797
503 ~1993!. ~1996!.
@39# R. Kempter, W. Gerstner, J.L. van Hemmen, and H. Wagner, @57# W. Gerstner, R. Kempter, J.L. van Hemmen, and H. Wagner,
in Advances in Neural Information Processing Systems 8, ed- in Pulsed Neural Nets, edited by W. Maass and C.M. Bishop
ited by D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo ~MIT Press, Cambridge, MA, 1998!, Chap. 14, pp. 353–377.
~MIT Press, Cambridge, MA, 1996!, pp. 124–130. @58# A. Zador, C. Koch, and T.H. Brown, Proc. Natl. Acad. Sci.
USA 87, 6718 ~1990!.
@40# P. Häfliger, M. Mahowald, and L. Watts, in Advances in Neu-
@59# C.W. Eurich, J.D. Cowan, and J.G. Milton, in Proceedings of
ral Information Processing Systems 9, edited by M. C. Mozer,
the 7th International Conference on Artificial Neural Networks
M. I. Jordan, and T. Petsche ~MIT Press, Cambridge, MA, (ICANN’97), edited by W. Gerstner, A. Germond, M. Hasler,
1997!, pp. 692–698. and J.-D. Nicoud ~Springer, Heidelberg, 1997!, pp. 157–162.
@41# B. Ruf and M. Schmitt, in Proceedings of the 7th International @60# C.W. Eurich, U. Ernst, and K. Pawelzik, in Proceedings of the
Conference on Artificial Neural Networks (ICANN’97), edited 8th International Conference on Artificial Neural Networks
by W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud (ICANN’98), edited by L. Niklasson, M. Boden, and T.
~Springer, Heidelberg, 1997!, pp. 361–366. Ziemke ~Springer, Berlin, 1998!, pp. 355–360.
@42# W. Senn, M. Tsodyks, and H. Markram, in Proceedings of the @61# H. Hüning, H. Glünder, and G. Palm, Neural Comput. 10, 555
7th International Conference on Artificial Neural Networks ~1998!.
(ICANN’97), edited by W. Gerstner, A. Germond, M. Hasler, @62# N.G. van Kampen, Stochastic Processes in Physics and Chem-
and J.-D. Nicoud ~Springer, Heidelberg, 1997!, pp. 121–126. istry ~North-Holland, Amsterdam, 1992!.
@43# R. Kempter, W. Gerstner, and J.L. van Hemmen, in Advances @63# H. Ritter, T. Martinez, and K. Schulten, Neural Computation
in Neural Information Processing Systems 11, edited by M. and Self-Organizing Maps: An Introduction ~Addison-Wesley,
Kearns, S. A. Solla, and D. Cohn ~MIT Press, Cambridge, MA, Reading, 1992!.
in press!. @64# C. van Vreeswijk and H. Sompolinsky, Science 274, 1724
@44# J. W. Lamperti, Probability, 2nd ed. ~Wiley, New York, 1996!. ~1996!.
@45# J.A. Sanders and F. Verhulst, Averaging Methods in Nonlinear @65# W. Softky and C. Koch, J. Neurosci. 13, 334 ~1993!.
Dynamical Systems ~Springer, New York, 1985!. @66# F. Rieke, D. Warland, R. de Ruyter van Steveninck, and W.
@46# R. Kempter, W. Gerstner, J.L. van Hemmen, and H. Wagner, Bialek, Spikes: Exploring the Neural Code ~MIT Press, Cam-
Neural Comput. 10, 1987 ~1998!. bridge, MA, 1997!.
@47# T.J. Sejnowski and G. Tesauro, in Neural Models of Plasticity: @67# M.N. Shadlen and W.T. Newsome, Curr. Opin. Neurobiol. 4,
Experimental and Theoretical Approaches, edited by J. H. By- 569 ~1994!.
rne and W.O. Berry ~Academic Press, San Diego, 1989!, Chap. @68# Ö. Bernander, R.J. Douglas, K.A.C. Martin, and C. Koch,
6, pp. 94–103. Proc. Natl. Acad. Sci. USA 88, 11 569 ~1991!.
@48# R. Ritz and T.J. Sejnowski, Curr. Opin. Neurobiol. 7, 536 @69# M. Rapp, Y. Yarom, and I. Segev, Neural Comput. 4, 518
~1997!. ~1992!.

You might also like