You are on page 1of 24

# Ming-Feng Yeh 1

CHAPTER 7
Supervised
Hebbian
Learning
Ming-Feng Yeh 2
Objectives
The Hebb rule, proposed by Donald Hebb in
1949, was one of the first neural network
learning laws.
A possible mechanism for synaptic
modification in the brain.
Use the linear algebra concepts to explain
why Hebbian learning works.
The Hebb rule can be used to train neural
networks for pattern recognition.
Ming-Feng Yeh 3
Hebbs Postulate
Hebbian learning
(The Organization of Behavior)
When an axon of cell A is near enough to excite a
cell B and repeatedly or persistently takes part in
firing it; some growth process or metabolic change
takes place in one or both cells such that As
efficiency, as one of the cells firing B, is increased.

AB
B
AB
Ming-Feng Yeh 4
Linear Associator
W

SR
R
p
R1
a
S1
n
S1
S
a = Wp

=
=
Q
j
j ij i
p w a
1
The linear associator is an example of a type of neural
network called an associator memory.
The task of an associator is to learn Q pairs of
prototype input/output vectors: {p
1
,t
1
}, {p
2
,t
2
},, {p
Q
,t
Q
}.
If p = p
q
, then a = t
q
. q = 1,2,,Q.
If p = p
q
+ o, then a = t
q
+ c.
Ming-Feng Yeh 5
Hebb Learning Rule
If two neurons on either side of a synapse are
activated simultaneously, the strength of the synapse
will increase.

The connection (synapse) between input p
j
and
output a
i
is the weight w
ij
.

Unsupervised learning rule
jq iq
old
ij
new
ij jq j iq i
old
ij
new
ij
p a w w p g a f w w + = + = o o ) ( ) (
) 1 (
T
= + = + = o o
q q
old new
jq iq
old
ij
new
ij
p t w w p t W W
Supervised learning rule
Not only do we increase the weight when p
j
and a
i
are
positive, but we also increase the weight when they
are both negative.
Ming-Feng Yeh 6
Supervised Hebb Rule
Assume that the weight matrix is initialized to zero and
each of the Q input/output pairs are applied once to
the supervised Hebb rule. (Batch operation)

| |
T
T
T
2
T
1
2 1
1
T T T
2 2
T
1 1
TP
p
p
p
t t t
p t p t p t p t W
=
(
(
(
(
(

=
= + + + =

=
Q
Q
Q
q
q q Q Q

| | | |
Q Q
p p p P t t t T
2 1 2 1
, where = =
Ming-Feng Yeh 7
Performance Analysis
Assume that the p
q
vectors are orthonormal
(orthogonal and unit length), then

=
=
=
. , 0
. , 1
k q
k q
k
T
q
p p
If p
q
is input to the network, then the network output
can be computed

k
Q
q
k q q k
Q
q
q q k
t p p t p p t Wp a = = |
.
|

\
|
= =

= = 1
T
1
T
) (
If the input prototype vectors are orthonormal, the Hebb
rule will produce the correct output for each input.

Ming-Feng Yeh 8
Performance Analysis
Assume that each p
q
vector is unit length, but they are
not orthogonal. Then

+ = = =

=
k
Q
q
k q q k
t p p t Wp a
1
T
) (

=k q
k q q
) (
T
p p t
error
The magnitude of the error will depend on the amount
of correlation between the prototype input patterns.

Ming-Feng Yeh 9
Orthonormal Case

=
(
(
(
(

=
(
(
(
(

=
1
1
,
5 . 0
5 . 0
5 . 0
5 . 0
,
1
1
,
5 . 0
5 . 0
5 . 0
5 . 0
2 2 1 1
t p t p
(

=
(

= =
0 1 1 0
1 0 0 1
5 . 0 5 . 0 5 . 0 5 . 0
5 . 0 5 . 0 5 . 0 5 . 0
1 1
1 1
T
TP W
.
1
1
,
1
1
2 1 (

=
(

= Wp Wp
Success!!
Ming-Feng Yeh 10
Not Orthogonal Case
| | | |

=
(
(
(

=
(
(
(

= 1 ,
5774 . 0
5774 . 0
5774 . 0
, 1 ,
5774 . 0
5774 . 0
5774 . 0
2 2 1 1
t p t p
| |
| | 0 547 . 1 0
5774 . 0 5774 . 0 5774 . 0
5774 . 0 5774 . 0 5774 . 0
1 1
T
=
(

= = TP W

| | | |. 8932 . 0 , 8932 . 0
2 1
= = Wp Wp
The outputs are close, but do not quite match the target
outputs.
Ming-Feng Yeh 11
Solved Problem P7.2
| |
2 1
p p T P = =
:
1
p
:
2
p
| |
T
1
1 1 1 1 1 1 = p
| |
T
2
1 1 1 1 1 1 = p
i. = 0
2
T
1
p p
Orthogonal, not orthonormal,
6
2
T
2 1
T
1
= = p p p p
(
(
(
(
(
(
(
(

= =
2 0 2 0 2 0
0 2 0 2 0 2
2 0 2 0 2 0
0 2 0 2 0 2
2 0 2 0 2 0
0 2 0 2 0 2
T
TP W
ii.
Ming-Feng Yeh 12
Solutions of Problem P7.2
iii.
:
t
p
| |
T
1 1 1 1 1 1 =
t
p
2
1
1
1
1
1
1
6 -
2
6
2
6
2 -
hardlims ) ( hardlims p Wp a =
(
(
(
(
(
(
(
(

=
|
|
|
|
|
|
|
|
.
|

\
|
(
(
(
(
(
(
(
(

= =
t
:
1
p
:
2
p
Hamming dist. = 2
Hamming dist. = 1
Ming-Feng Yeh 13
Pseudoinverse Rule
. ,..., 2 , 1 , Q q
q q
= = t Wp

Performance index:
2
1
) (

=
=
Q
q
q q
F Wp t W
Goal: choose the weight matrix W to minimize F(W).
When the input vectors are not orthogonal and we use
the Hebb rule, then F(W) will be not be zero, and it is
not clear that F(W) will be minimized.

T WP=

= = =
i j
ij
e F
2
2 2
) ( E WP T W
If the P matrix has an inverse, the solution is
1
= TP W
| | | |
Q Q
p p p P t t t T
2 1 2 1
, where = =
Ming-Feng Yeh 14
Pseudoinverse Rule
P matrix has an inverse iff P must be a square matrix.
Normally the p
q
vectors (the column of P) will be
independent, but R (the dimension of p
q
, no. of rows)
will be larger than Q (the number of p
q
vectors, no. of
columns). P does not exist any inverse matrix.

The weight matrix W that minimizes the performance
index is given by the
pseudoinverse rule .

2
1
) (

=
=
Q
q
q q
F Wp t W
+
= TP W
where P
+
is the Moore-Penrose pseudoinverse.
Ming-Feng Yeh 15
Moore-Penrose
Pseudoinverse
The pseudoinverse of a real matrix P is the unique
matrix that satisfies

T
T
) (
) (
+ +
+ +
+ + +
+
=
=
=
=
PP PP
P P P P
P PP P
P P PP
When R (no. of rows of P) > Q (no. of columns of P) and
the columns of P are independent, then the
pseudoinverse can be computed by .

T 1 T
) ( P P P P
+
=
Note that we do NOT need normalize the input vectors
when using the pseudoinverse rule.

Ming-Feng Yeh 16
Example of
Pseudoinverse Rule
| | | |

=
(
(
(

=
(
(
(

= 1 ,
1
1
1
, 1 ,
1
1
1
2 2 1 1
t p t p
(

=
1 1 1
1 1 1
T
P
(

=
(

= =
+
25 . 0 5 . 0 25 . 0
25 . 0 5 . 0 25 . 0
1 1 1
1 1 1
3 1
1 3
) (
T
T 1 T
P P P P
| | | | 0 1 0
25 . 0 5 . 0 25 . 0
25 . 0 5 . 0 25 . 0
1 1 =
(

= =
+
TP W
| | | | | | | |
2 2 1 1
1
1
1
1
0 1 0 , 1
1
1
1
0 1 0 t Wp t Wp = =
(
(
(

= = =
(
(
(

=
Ming-Feng Yeh 17
Autoassociative Memory
The linear associator using the Hebb rule is a type of
associative memory ( t
q

= p
q
). In an autoassociative
memory the desired output vector is equal to the input
vector ( t
q
= p
q
).

An autoassociative memory can be used to store a
set of patterns and then to recall these patterns, even
when corrupted patterns are provided as input.

1 1
, t p
2 2
, t p
3 3
, t p
W

3030
30
p
301
a
301
n
301
30
T
3 3
T
2 2
T
1 1
p p p p p p W + + =
Ming-Feng Yeh 18
Corrupted & Noisy Versions
Recovery of 50% Occluded Patterns

Recovery of Noisy Patterns

Recovery of 67% Occluded Patterns

Ming-Feng Yeh 19
Variations of
Hebbian Learning
Many of the learning rules have some relationship to the
Hebb rule.

The weight matrices of Hebb rule have very large
elements if there are many prototype patterns in the
training set.

Basic Hebb rule:

T
q q
old new
p t W W + = o
Filtered learning: adding a decay term, so that the
learning rule behaves like a smoothing filter,
remembering the most recent inputs more clearly.

T T
) 1 (
q q
old old
q q
old new
p t W W p t W W + = + = o o
1 0 s s
Ming-Feng Yeh 20
Variations of
Hebbian Learning
Delta rule: replacing the desired output with the
difference between the desired output and the
actual output. It adjusts the weights so as to minimize
the mean square error.

T
) (
q q q
old new
p a t W W + = o
The delta rule can update the weights after each new
input pattern is presented.

Basic Hebb rule:

T
q q
old new
p t W W + = o
Unsupervised Hebb rule:

T
q q
old new
p a W W + = o
Ming-Feng Yeh 21
Solved Problem P7.6
+
a
11
n
11
1
b
11
W
11
2
p
21
1
| | | |
T
2
T
1
2 2 , 1 1 = = p p
p
1

p
2

Wp = 0
Why is a bias required to solve this problem?
The decision boundary for the perceptron network is
Wp + b = 0. If these is no bias, then the boundary
becomes Wp = 0 which is a line that must pass
through the origin. No decision boundary that passes
through the origin could separate these two vectors.
i.
Ming-Feng Yeh 22
Solved Problem P7.6
Use the pseudoinverse rule to design a network with
bias to solved this problem.
Treat the bias as another weight, with an input of 1.
ii.
| | | |
T
2
T
1
1 2 2 , 1 1 1 =
'
=
'
p p 1 , 1
2 1
= = t t
| | 1 1 ,
1 1
2 1
2 1
=
(
(
(

= T P
(

= =
+
1 5 . 0 5 . 0
2 5 . 0 5 . 0
) (
T 1 T
P P P P
| | | | 3 , 1 1 3 1 1 = = = =
'

+
b W TP W
p
1

p
2

Wp + b = 0
Ming-Feng Yeh 23
Solved Problem P7.7
Up to now, we have represented patterns as vectors by
using 1 and 1 to represent dark and light pixels,
respectively. What if we were to use 1 and 0 instead?
How should the Hebb rule be changed?
Bipolar {1,1} representation:
} , { },..., , { }, , {
2 2 1 1 Q Q
t p t p t p
Binary {0,1} representation:
} , { },..., , { }, , {
2 2 1 1 Q Q
t p t p t p
' ' ' ' ' '
1 p p 1 p p
'
= + =
'

q q q q
2 ,
2
1
2
1
, where 1 is a vector of ones.
( ) Wp b 1 W p W b 1 p W = +
'
+
'
= + +
'

2
1
2
1
2
1
2
1
Wp b p W = +
' '

W1 b W W = =
'
, 2
Ming-Feng Yeh 24
Binary Associative
Network
+
a
S1
n
S1
1
b
S1
SR
R
R1
S
n = Wp + b
a = hardlim(Wp + b)
p
'
W
'
W1 b
W W
=
=
'
, 2