Professional Documents
Culture Documents
Module 4 Notes
Module 4 Notes
to say that the strength' of the EPSP increases in response to the stimulus train, and this change can
last for anything from hours to weeks.
• This facilitation of the EPSP is referred to as long-term potentiation (LTP).
• It is most interesting that CA1 LTP displays three important properties: cooperativity, associativity,
and specificity.
• Cooperativity implies that CA1 LTP cannot be produced by activating only one fibre,
• There is a minimum number of fibres that must be activated together to achieve LTP.
• This cooperativity has an associative feature: when both strong and weak excitatory inputs arrive in
the same dendritic region of a pyramidal cell, the weak input gets potentiated if it is activated in
association with the strong one. Finally, LTP is specific to the dendrites where it is produced.
• Amongst various other observations, the most important point to note is that LTP requires
depolarization of the postsynaptic neuron, and that this depolarization must coincide with the
stimulus from the presynaptic neuron.
• "When an axon of cell A... excite[s] cell B and repeatedly or persistently takes part in firing it, some
growth process or metabolic change takes place in one or both cells so that A 's efficacy as one of the
cells’ firing B is increased."
• This means that the channel opens when the glutamate neurotransmitter binds to the designated
site on the channel.
• The opening of the channel allows the flow of sodium ions into, and potassium ions out of the
postsynaptic dendrite.
• This eventually leads to a depolarization of the postsynaptic dendritic region.
• On the other hand, the NMDA receptor channel is a uniquely double gated channel: it is usually
blocked by 𝑀𝑔++ ions, and so initially the channel remains blocked even when glutamate binds
to it. This is shown in Fig. 10.2(a).
• Magnesium ions decouple from the channel only if the postsynaptic cell is Sufficiently
depolarized.
• Therefore, an NMDA channel can become unblocked only when
o glutamate binds to the receptor, and
o the postsynaptic cell is sufficiently depolarized by strong cooperative inputs from the
presynaptic neurons.
• Depolarization is achieved only when many non-NMDA receptors open in response to the
simultaneous firing of many presynaptic inputs since unblocking these receptors allows an influx
of 𝑁𝑎 + into the postsynaptic cell.
Fig 4.2: Structure of the NMDA synapse and the molecular mechanism of LTP
Figure 4.3: A matrix associative memory encodes associations into a connection matrix
• Whether the system is auto associative or hetero-associative, the memory of the vectors that are to be
associated is stored in the connections of the network.
• Associative memory models enjoy properties such as fault tolerance and, when they are auto
associative, they are able to solve the patten completion problem.
• Figure 4.4 shows two layers of neurons, one which receives input vectors and one which generates
output vectors.
where we assume that the signal vector A is generated using a non-linear function of activations in
the
input layer. In this case the memory is non-linear.
4.3.2. Hebb Association Matrix
• The Hebb postulate states that the change in synaptic efficacy is proportional to the conjunction
(or product) of pre and postsynaptic neuron signals.
• To formalize this idea, consider two layers 𝐿𝑥 and 𝐿𝑦 of n and m neurons respectively such that
neurons from 𝐿𝑥 connect to 𝐿𝑦 using a weight matrix W.
• If we assume that the weight matrix is constructed using Hebbian learning, it generates weight
values in accordance with the product of the presynaptic and postsynaptic activities of a neuron
as follows:
𝛿𝑤𝑖𝑗 = 𝜂𝑎𝑖 𝑏𝑗 − − − − − − − − − − − (4.3)
for some learning rate η, and signal vectors A = (𝑎1 , … . . , 𝑎𝑛 )𝑇 and B = (𝑏1 , … . . , 𝑏𝑛 )𝑇 across 𝐿𝑥
and 𝐿𝑦 , respectively. This rule is often referred to as the generalized Hebb rule.
• If we assume the leaning rate 𝜂 = 1, Hebbian learning essentially reduces to taking the outer
products of signal vectors.
• For example, if we start out with a zero-association matrix, then a single association between two
vectors A and B is then encoded as:
𝑎1 𝑏1 ⋯ 𝑎1 𝑏𝑚
𝑊= ( ⋮ ⋱ ⋮ ) = 𝐴𝐵 𝑇 − − − − − − − −(4.4)
𝑎𝑛 𝑏1 ⋯ 𝑎𝑛 𝑏𝑚
• The Hebb conjunctional learning rule should be clear from the above matrix: the weight 𝑤𝑖𝑗 that
connects neuron i of 𝐿𝑥 to neuron j of 𝐿𝑦 is the product of the respective signals 𝑎𝑖 and 𝑏𝑗 of both
the neurons. The association (A, B) is thus encoded into the matrix.
What if more than one association needs to be encoded? The natural course to follow is to superpose the
individual association matrices. Assuming Q associations {𝐴𝑘 , 𝐵𝑘 }𝑄𝑘=1 that are to be encoded resulting
connection matrix is then
𝑄
𝑊 = ∑ 𝐴𝑘 𝐵𝑘𝑇 − − − − − − − −(4.6)
𝑘=1
If the vectors 𝐴𝑘 are orthonormal-orthogonal to one another and normalized to unity magnitude. In such a
case, when an association 𝐴𝑖 is presented to neurons in layer 𝐿𝑥 . a forward pass through the network yields a
vector B`:
𝐵′𝑇 = 𝐴𝑇𝑖 𝑊 − − − − − − − − − −(4.7)
𝑄
Layer F comprises n neurons which receive external inputs 𝐼𝑖 , i = l,... ,n. Each neuron feeds back signals
𝒮𝑖 (𝑥𝑖 ), through weights 𝑤𝑖𝑗 , i = 1,.....,n; j=1,... ,n. The simplest form of dynamics is the additive activation
dynamics model given by:
𝑛
𝑥̇ 𝑖 = −𝐴
⏟ 𝑖 𝑥𝑖 + ∑ 𝑤𝑖𝑗 𝒮𝑗 (𝑥𝑗 ) + 𝐼⏟𝑖 𝑖 = 1 … … . 𝑛 − − − − − − − (4.12)
⏟
𝑗=1
where we assume that individual neurons can have distinct sigmoidal characteristics
1 − 𝑒 −𝜆𝑖 𝑥
𝒮𝑖 (𝑥) = − − − − − − − −(4.13)
1 + 𝑒 −𝜆𝑖 𝑥
with an inverse function
1 1−𝑠 1 −1
𝒮𝑖−1 (𝑠) = ln ( )= 𝒮 (𝑠) − − − − − − − (4.14)
𝜆𝑖 1+𝑠 𝜆𝑖
1− 𝑒 −𝑥
Where s = 𝒮(𝑥) = . Usually, the gain scale factor 𝜆𝑖 is common for all neuron
1+𝑒 +𝑥
Assuming the ith input and output amplifier voltages to be 𝑥𝑖 and 𝑆𝑖 (𝑥𝑖 ) (where S(.) is the amplifier transfer
function) respectively, the Kirchhoff current law equation at the input node of the ith amplifier can be
written as:
-------------------(4.15)
Rearrangement yields
-------------(4.16)
------------------(4.17)
Where,
--------------------(4.18)
-----------------(4.19)
Where 𝑎𝑖 (𝑥𝑖 ) > 0, 𝐶𝑖𝑗 = 𝐶𝑗𝑖 and 𝑑𝑗′ (𝑥𝑗 ) ≥ 0
These models also follow the Lyapunov function
----------------(4.20)
and are globally asymptotically stable.
Rearranging equation (4.12) yields:
---------------------(4.21)
Which has Cohen-Grossberg form with the following substitution
𝑎𝑖 (𝑥𝑖 ) = 1 − − − − − − − −(4.22)
Department of CSE, AJIET Mangalore 9
Neural Network Module 4
𝑛 𝑛 𝑠𝑖 𝑛 𝑛
1
= − ∑ ∑ 𝑤𝑖𝑗 𝑆𝑗 𝑆𝑘 + ∑ 𝐴𝑖 ∫ 𝑆𝑖−1 (𝛽𝑖 )𝑑𝛽𝑖 − ∑ 𝐼𝑖 𝑠𝑖 − − − − − − − − − − − − − (4.27)
2 0
𝑗=1 𝑘=1 𝑖=1 𝑖=1
-------------(4.28)
------------------------------(4.29)
From equation 4.27 and 4.12
-----------------(4.30)
----------------------------------------(4.31)
Substituting 4.31 in 4.29 yields,
------------------------------(4.32)
The product 𝑥̇ 𝑖 𝑠̇𝑖 is always positive since both the activation and signal time derivatives have the same sign.
The positivity of this product also follows from the monotonicity of both signal function and its inverse.
𝑛
𝑑 −1
𝐸̇ = − ∑ (𝑆 (𝑠𝑖 )𝑠̇𝑖 − − − − − − − − − (4.33)
𝑑𝑡 𝑖
𝑖=1
< 0 − − − − − − − − − − − − − − − (4.35)
since (𝑆𝑖−1 (𝑠𝑖 )′> 0 and 𝑠̇𝑖2 > 0. Note that É = 0 iff 𝑠̇𝑖 = 0 ∀i: the time derivatives of signal go to zero, or
signals cease to evolve further in time when the time derivative of the energy goes to zero. The signal state
trajectory must eventually come to rest. This establishes the stability of the system in continuous time.
The binary threshold signal function is actually a limiting case of the sigmoidal function 𝜆𝑖 −→ ∞. When the
neuron gain scale factor 𝜆𝑖 −→ ∞, we say the network operates in the high gain limit If we allow 𝜆𝑖 −→
∞ the energy function of Eq. (4.28) reduces to
-------------------(4.36)
In the absence of external inputs the energy function of Eq. (4.36) reduces to the familiar quadratic form
-------------------(4.37)
In this high gain limit, a neuron switches its state from -1 to 1 or from 1 to -1 at discrete interval of time by
sampling its current activation. At the kth instant of time the neuron updates its activation and signal to
instant k +1 in accordance with the difference equation:
--------------(4.38)
i = 1,... ,n where the subscript i on the signal function has been dropped since all neurons operate in the high
gain limit. Note that positive and negative activations translate unambiguously to 1 and -1 whereas the
neuron does not change state if its activation is zero.
At time instant k we assume the energy of the system to be 𝐸𝑘 :
---------------------------(4.39)
Neuron state changes at time instant k will yield a new system state vector and consequently a new system
energy 𝐸𝑘+1 :
-----------------------(4.40)
The change in energy Δ𝐸 is then defined as
---------------------(4.41)
without any loss or generality, we may assume that only one neuron, say the Ith, changes state. Note that in
the double summation of Eq. (4.40) there are two sets of terms that involve index I: one when i = I and the
other where j =I. For the moment we are only interested in the energy 𝐸𝑖𝑘 comprising terms that involve
index I:
--------------------------(4.42)
Replacing the index i by j in the second summation term and noting that 𝑤𝐼𝑗 = 𝑤𝑗𝐼 since network
connections are symmetric, yields
-----------------------------(4.43)
Since the change in energy Δ𝐸𝑘 involves on the ith neuron, we have:
-------------(4.44)
Where 𝑥1𝑘+1 is the activation of neuron I at the time instant k+1, and Δ𝑠1𝑘+1 = 𝑠𝐼𝑘+1 − 𝑆𝐼𝑘
Fig: 4.7 Three 12 x 12- pixel patterns encoded into a Hopfield Network
Recall that bipolar encoding of these patterns also encodes their complements. This means that there are
going to be three complement state spurious attractor amongst the other mixture states. We do however
expect that partially distorted pattens will be cleaned up to their original form. In other words, as long as the
distortion level is such that the initial point lies within the basin of attraction of the encoded memory, image
clean up and proper recall are guaranteed.
Figure 4.8 portrays a 40 per cent distorted version of the original plane image. This means that a random
sample of 58 pixels of the original 144 pixels of the plane image have been inverted. For purpose of this
simulation, we employed pure asynchronous update. We only have to ensure that each neuron in the network
is updated the same number of times on average. In the present simulation, the order of update of the
neurons was randomized, but it was ensured that each neuron was updated at least once during a field update
cycle involving all 144 neurons.
Fig 4.9: Labelled regions of a slotted wedge in a particular view of the object
Labelling involves calculating the area of each region and tracing the boundary to determine corner points.
The area is a local feature of each region. Once each region has been labelled, a global relational feature is
computed. As shown in Figure 4.9 for region 2, this global feature is the distance from the centroid of region
2 to the centroids of all other regions. Both the area and the distances are suitably normalized.
𝑇
Activation and Signal Function: Assume that the network has an activation vector X = (𝑥1𝑘 , … … . 𝑥𝑛𝑘 )
𝑇
and a corresponding signal vector S = (𝑠1𝑘 , … … . 𝑠𝑛𝑘 ) at time instant k. Then BSB Neurons update their
signals in accordance with the following pair of update equation.
----------------------------------------- (4.45)
Where 𝛾, 𝛼 𝑎𝑛𝑑 𝛿 are constants that control the nature of the activation update equation. If 𝛾 = 𝛼 = 1 and
𝛿 = 0, equation 4.45 reduces to
-------------(4.46)
The signal function S(.) in Eq. (4.46) is given by a bipolar version of the linear threshold signal function:
-------------------(4.47)
Weight Matrix As in the case of the Hopfield network, the connection matrix is symmetric, which implies
the existence of n mutually orthogonal eigenvectors 𝜂1 … … . 𝜂𝑛 . Then the matrix W is completely
determined by its eigenvalues and eigenvectors. If we start out with a zero-connection matrix, and assume
that Q orthonormal input vectors 𝐴𝑖 are presented, such that each 𝐴𝑖 appears 𝜆𝑖 times, then in accordance
with the Hebb encoding scheme the resulting connection matrix is
-------------------------(4.48)
Note that each 𝐴𝑖 is an eigenvector of W with eigenvalue 𝜆𝑖 since
---------------------(4.49)
Operational Summary
-----------------------------(4.50)
where 𝛿𝑗𝑖 is the Kronecker delta function, and we have defined
------------------------------(4.51)
Although equation 4.50 is specified in discrete time, one can easily rewrite it in continuous form as
-------------------------(4.52)
Now, consider a change of variables. Make the substitution:
Department of CSE, AJIET Mangalore 18
Neural Network Module 4
------------------------------------(4.53)
in which case Eq. (4.52) modifies to
------------------------(4.54)
A Lyapunov function for the BSB model is now straightforward to derive. All we need to do is to make
appropriate substitution into the Lyapunov function of the standard Cohen-Grossberg form. This yields:
------------------(4.55)
Using the definitions from Eqs. (4.47), (4.51), and Eq. (4.53), Eq. (4.45) can be written in terms of the
original variables 𝑠𝑖 as
--------------------(4.56)
The simulated annealing method for finding an optimal configuration of neuron states given a set of weights
is based on physical annealing metaphor. It involves the following basic steps:
1. Randomize neuron states once in the beginning and initialize the temperature to a high value.
2. Choose a neuron I randomly from the network.
3. Compute energy 𝐸𝐴 of the present configuration A.
4. Flip the state of neuron I to generate a new configuration B.
5. Compute the energy 𝐸𝐵 of configuration B.
6. If 𝐸𝐵 < 𝐸𝐴 then accept the state change of neuron I. otherwise accept the state change for neuron I
−Δ𝐸 (1)⁄
with probability 𝑒 𝑇 where Δ𝐸 (1) = 𝐸𝐵 − 𝐸𝐴
7. Continue selecting and testing neuron randomly and set their state several times in this way until a
thermal equilibrium is reached.
8. Finally, lower the temperature and repeat the procedure.
From an implementation point of view, note that the transition probability is dependent on ΔE. Therefore,
only those neurons need to be considered which are directly connected to the neuron I under consideration.
The change in energy Δ𝐸 (1) which occurs when neuron I flips its state from 𝑆𝑙 to -𝑆𝑙
--------------------------(4.57)
• The model also introduces a powerful stochastic learning algorithm in place of the simple Hebbian
rule employed in Hopfield networks.
• Relaxation in Boltzmann machines is done using the simulated annealing procedure.
• The literature on Boltzmann machines differentiates between a completion network and an input-
output network (similar to the standard feedforward neural network architecture).
• These architectures are portrayed in Fig. 4.11. Note that there are hidden neurons that do not receive
any external input. In addition, the connections are all bidirectional. Architectural differences from the
Hopfield competition network and a standard feedforward network should be carefully observed.
• In a completion network architecture (Fig. 4.11), a partially unspecified vector is expected to be
completed (or corrected) by clamping (holding fixed) the neuron signals to the corresponding known
values, and allowing the remaining unknown values to be determined through a network relaxation
process that is based on simulated annealing Fig. 4.11b), the inputs are clamped to a particular value
and the outputs are determined through the relaxation process.
Let us summarize the similarities and differences between the Boltzmann machine and the Hopfield
network. First the similarities. In both the models
1. neuron states are bipolar,
2. weights are symmetric,
3. neurons are selected at random for asynchronous update, and
4. there is no self-feedback.
----------------------------(4.58)
Neurons in a Boltzmann machine flip their states based on Glauber dynamics. Specifically, for the Ith
neuron with signal 𝑠𝑖 and activation 𝑥𝐼 ,
--------------------(4.59)
Alternatively, Eq. (4.59) can be recast into the form of a flip probability. For this, we expand Eq (4.59)
and observe the signal flip probabilities for neuron
---------------------------(4.60)
which we conveniently combine using the neuron signal value before the flip, and the energy change
Δ𝐸𝑙 that results from the flip:
-------------------------(4.61)
-----------------------------(4.62)
Where signal vector A' of Lx is composed with weight matrix W to set-up activations in Ly, which
are then thresholded to generate the signal vector B' across Ly. Assuming that A' is closer to an
encoded association 𝐴𝑖 than to all other encoded associations 𝐴𝑖 , it is desired that the vector B' that is
generated is closer to 𝐵𝑖 , than to any other encoded 𝐵𝑗 , 𝐵𝑗 being the vector associated with 𝐴𝑖 . If we
wish to back B' to Lx in order to improve the accuracy of recall we may do so by using 𝑊 𝑇 in this
reverse chain of activity transmission from Ly to Lx:
-------------------(4.63)
A`` can then be fed forward through W to generate B". Continuing this bidirectional process
generates a sequence of vector pairs:
------------------------(4.64)