CHAPTER 7
Supervised
Hebbian
Learning
MingFeng Yeh 2
Objectives
The Hebb rule, proposed by Donald Hebb in
1949, was one of the first neural network
learning laws.
A possible mechanism for synaptic
modification in the brain.
Use the linear algebra concepts to explain
why Hebbian learning works.
The Hebb rule can be used to train neural
networks for pattern recognition.
MingFeng Yeh 3
Hebbs Postulate
Hebbian learning
(The Organization of Behavior)
When an axon of cell A is near enough to excite a
cell B and repeatedly or persistently takes part in
firing it; some growth process or metabolic change
takes place in one or both cells such that As
efficiency, as one of the cells firing B, is increased.
AB
B
AB
MingFeng Yeh 4
Linear Associator
W
SR
R
p
R1
a
S1
n
S1
S
a = Wp
=
=
Q
j
j ij i
p w a
1
The linear associator is an example of a type of neural
network called an associator memory.
The task of an associator is to learn Q pairs of
prototype input/output vectors: {p
1
,t
1
}, {p
2
,t
2
},, {p
Q
,t
Q
}.
If p = p
q
, then a = t
q
. q = 1,2,,Q.
If p = p
q
+ o, then a = t
q
+ c.
MingFeng Yeh 5
Hebb Learning Rule
If two neurons on either side of a synapse are
activated simultaneously, the strength of the synapse
will increase.
The connection (synapse) between input p
j
and
output a
i
is the weight w
ij
.
Unsupervised learning rule
jq iq
old
ij
new
ij jq j iq i
old
ij
new
ij
p a w w p g a f w w + = + = o o ) ( ) (
) 1 (
T
= + = + = o o
q q
old new
jq iq
old
ij
new
ij
p t w w p t W W
Supervised learning rule
Not only do we increase the weight when p
j
and a
i
are
positive, but we also increase the weight when they
are both negative.
MingFeng Yeh 6
Supervised Hebb Rule
Assume that the weight matrix is initialized to zero and
each of the Q input/output pairs are applied once to
the supervised Hebb rule. (Batch operation)
 
T
T
T
2
T
1
2 1
1
T T T
2 2
T
1 1
TP
p
p
p
t t t
p t p t p t p t W
=
(
(
(
(
(
=
= + + + =
=
Q
Q
Q
q
q q Q Q
   
Q Q
p p p P t t t T
2 1 2 1
, where = =
MingFeng Yeh 7
Performance Analysis
Assume that the p
q
vectors are orthonormal
(orthogonal and unit length), then
=
=
=
. , 0
. , 1
k q
k q
k
T
q
p p
If p
q
is input to the network, then the network output
can be computed
k
Q
q
k q q k
Q
q
q q k
t p p t p p t Wp a = = 
.

\

= =
= = 1
T
1
T
) (
If the input prototype vectors are orthonormal, the Hebb
rule will produce the correct output for each input.
MingFeng Yeh 8
Performance Analysis
Assume that each p
q
vector is unit length, but they are
not orthogonal. Then
+ = = =
=
k
Q
q
k q q k
t p p t Wp a
1
T
) (
=k q
k q q
) (
T
p p t
error
The magnitude of the error will depend on the amount
of correlation between the prototype input patterns.
MingFeng Yeh 9
Orthonormal Case
=
(
(
(
(
=
(
(
(
(
=
1
1
,
5 . 0
5 . 0
5 . 0
5 . 0
,
1
1
,
5 . 0
5 . 0
5 . 0
5 . 0
2 2 1 1
t p t p
(
=
(
= =
0 1 1 0
1 0 0 1
5 . 0 5 . 0 5 . 0 5 . 0
5 . 0 5 . 0 5 . 0 5 . 0
1 1
1 1
T
TP W
.
1
1
,
1
1
2 1 (
=
(
= Wp Wp
Success!!
MingFeng Yeh 10
Not Orthogonal Case
   
=
(
(
(
=
(
(
(
= 1 ,
5774 . 0
5774 . 0
5774 . 0
, 1 ,
5774 . 0
5774 . 0
5774 . 0
2 2 1 1
t p t p
 
  0 547 . 1 0
5774 . 0 5774 . 0 5774 . 0
5774 . 0 5774 . 0 5774 . 0
1 1
T
=
(
= = TP W
   . 8932 . 0 , 8932 . 0
2 1
= = Wp Wp
The outputs are close, but do not quite match the target
outputs.
MingFeng Yeh 11
Solved Problem P7.2
 
2 1
p p T P = =
:
1
p
:
2
p
 
T
1
1 1 1 1 1 1 = p
 
T
2
1 1 1 1 1 1 = p
i. = 0
2
T
1
p p
Orthogonal, not orthonormal,
6
2
T
2 1
T
1
= = p p p p
(
(
(
(
(
(
(
(
= =
2 0 2 0 2 0
0 2 0 2 0 2
2 0 2 0 2 0
0 2 0 2 0 2
2 0 2 0 2 0
0 2 0 2 0 2
T
TP W
ii.
MingFeng Yeh 12
Solutions of Problem P7.2
iii.
:
t
p
 
T
1 1 1 1 1 1 =
t
p
2
1
1
1
1
1
1
6 
2
6
2
6
2 
hardlims ) ( hardlims p Wp a =
(
(
(
(
(
(
(
(
=








.

\

(
(
(
(
(
(
(
(
= =
t
:
1
p
:
2
p
Hamming dist. = 2
Hamming dist. = 1
MingFeng Yeh 13
Pseudoinverse Rule
. ,..., 2 , 1 , Q q
q q
= = t Wp
Performance index:
2
1
) (
=
=
Q
q
q q
F Wp t W
Goal: choose the weight matrix W to minimize F(W).
When the input vectors are not orthogonal and we use
the Hebb rule, then F(W) will be not be zero, and it is
not clear that F(W) will be minimized.
T WP=
= = =
i j
ij
e F
2
2 2
) ( E WP T W
If the P matrix has an inverse, the solution is
1
= TP W
   
Q Q
p p p P t t t T
2 1 2 1
, where = =
MingFeng Yeh 14
Pseudoinverse Rule
P matrix has an inverse iff P must be a square matrix.
Normally the p
q
vectors (the column of P) will be
independent, but R (the dimension of p
q
, no. of rows)
will be larger than Q (the number of p
q
vectors, no. of
columns). P does not exist any inverse matrix.
The weight matrix W that minimizes the performance
index is given by the
pseudoinverse rule .
2
1
) (
=
=
Q
q
q q
F Wp t W
+
= TP W
where P
+
is the MoorePenrose pseudoinverse.
MingFeng Yeh 15
MoorePenrose
Pseudoinverse
The pseudoinverse of a real matrix P is the unique
matrix that satisfies
T
T
) (
) (
+ +
+ +
+ + +
+
=
=
=
=
PP PP
P P P P
P PP P
P P PP
When R (no. of rows of P) > Q (no. of columns of P) and
the columns of P are independent, then the
pseudoinverse can be computed by .
T 1 T
) ( P P P P
+
=
Note that we do NOT need normalize the input vectors
when using the pseudoinverse rule.
MingFeng Yeh 16
Example of
Pseudoinverse Rule
   
=
(
(
(
=
(
(
(
= 1 ,
1
1
1
, 1 ,
1
1
1
2 2 1 1
t p t p
(
=
1 1 1
1 1 1
T
P
(
=
(
= =
+
25 . 0 5 . 0 25 . 0
25 . 0 5 . 0 25 . 0
1 1 1
1 1 1
3 1
1 3
) (
T
T 1 T
P P P P
    0 1 0
25 . 0 5 . 0 25 . 0
25 . 0 5 . 0 25 . 0
1 1 =
(
= =
+
TP W
       
2 2 1 1
1
1
1
1
0 1 0 , 1
1
1
1
0 1 0 t Wp t Wp = =
(
(
(
= = =
(
(
(
=
MingFeng Yeh 17
Autoassociative Memory
The linear associator using the Hebb rule is a type of
associative memory ( t
q
= p
q
). In an autoassociative
memory the desired output vector is equal to the input
vector ( t
q
= p
q
).
An autoassociative memory can be used to store a
set of patterns and then to recall these patterns, even
when corrupted patterns are provided as input.
1 1
, t p
2 2
, t p
3 3
, t p
W
3030
30
p
301
a
301
n
301
30
T
3 3
T
2 2
T
1 1
p p p p p p W + + =
MingFeng Yeh 18
Corrupted & Noisy Versions
Recovery of 50% Occluded Patterns
Recovery of Noisy Patterns
Recovery of 67% Occluded Patterns
MingFeng Yeh 19
Variations of
Hebbian Learning
Many of the learning rules have some relationship to the
Hebb rule.
The weight matrices of Hebb rule have very large
elements if there are many prototype patterns in the
training set.
Basic Hebb rule:
T
q q
old new
p t W W + = o
Filtered learning: adding a decay term, so that the
learning rule behaves like a smoothing filter,
remembering the most recent inputs more clearly.
T T
) 1 (
q q
old old
q q
old new
p t W W p t W W + = + = o o
1 0 s s
MingFeng Yeh 20
Variations of
Hebbian Learning
Delta rule: replacing the desired output with the
difference between the desired output and the
actual output. It adjusts the weights so as to minimize
the mean square error.
T
) (
q q q
old new
p a t W W + = o
The delta rule can update the weights after each new
input pattern is presented.
Basic Hebb rule:
T
q q
old new
p t W W + = o
Unsupervised Hebb rule:
T
q q
old new
p a W W + = o
MingFeng Yeh 21
Solved Problem P7.6
+
a
11
n
11
1
b
11
W
11
2
p
21
1
   
T
2
T
1
2 2 , 1 1 = = p p
p
1
p
2
Wp = 0
Why is a bias required to solve this problem?
The decision boundary for the perceptron network is
Wp + b = 0. If these is no bias, then the boundary
becomes Wp = 0 which is a line that must pass
through the origin. No decision boundary that passes
through the origin could separate these two vectors.
i.
MingFeng Yeh 22
Solved Problem P7.6
Use the pseudoinverse rule to design a network with
bias to solved this problem.
Treat the bias as another weight, with an input of 1.
ii.
   
T
2
T
1
1 2 2 , 1 1 1 =
'
=
'
p p 1 , 1
2 1
= = t t
  1 1 ,
1 1
2 1
2 1
=
(
(
(
= T P
(
= =
+
1 5 . 0 5 . 0
2 5 . 0 5 . 0
) (
T 1 T
P P P P
    3 , 1 1 3 1 1 = = = =
'
+
b W TP W
p
1
p
2
Wp + b = 0
MingFeng Yeh 23
Solved Problem P7.7
Up to now, we have represented patterns as vectors by
using 1 and 1 to represent dark and light pixels,
respectively. What if we were to use 1 and 0 instead?
How should the Hebb rule be changed?
Bipolar {1,1} representation:
} , { },..., , { }, , {
2 2 1 1 Q Q
t p t p t p
Binary {0,1} representation:
} , { },..., , { }, , {
2 2 1 1 Q Q
t p t p t p
' ' ' ' ' '
1 p p 1 p p
'
= + =
'
q q q q
2 ,
2
1
2
1
, where 1 is a vector of ones.
( ) Wp b 1 W p W b 1 p W = +
'
+
'
= + +
'
2
1
2
1
2
1
2
1
Wp b p W = +
' '
W1 b W W = =
'
, 2
MingFeng Yeh 24
Binary Associative
Network
+
a
S1
n
S1
1
b
S1
SR
R
R1
S
n = Wp + b
a = hardlim(Wp + b)
p
'
W
'
W1 b
W W
=
=
'
, 2
Much more than documents.
Discover everything Scribd has to offer, including books and audiobooks from major publishers.
Cancel anytime.