Professional Documents
Culture Documents
NEURAL NETWORKS (PDFDrive) PDF
NEURAL NETWORKS (PDFDrive) PDF
Vedat Tavşanoğlu
What Is a Neural Network?
defer: L. differre- dis-, asunder (adv. apart, into parts, separately), ferre, to bear , to carry
v.t. to put off to another time, to delay
3. Adaptivity.
Neural networks have a built-in capability to
adapt their synaptic weights to changes in the
surrounding environment. In particular, a neural
network trained to operate in a specific
environment can be easily retrained to deal with
minor changes in the operating environmental
conditions.
Properties and Capabilities of Neural
Networks
Moreover, when it is operating in a nonstationary
environment (i.e., one whose statistics change
with time), a neural network can be designed to
change its synaptic weights in real time. The
natural architecture of a neural network for
pattern classification, signal processing, and
control applications, coupled with the adaptive
capability of the network, make it an ideal tool for
use in adaptive pattern classification, adaptive
signal processing, and adaptive control.
Properties and Capabilities of
Neural Networks
Models of a Neuron
A neuron is an information-processing unit that is
fundamental to the operation of a neural network.
The figure on the next slide shows the model for a
neuron.
Models of a Neuron
p
Mathematical Model uk wkj x j
of a Neuron j 1
yk ( uk k )
where xj’s are the input signals; wkj’s are the synaptic
weights of neuron k; uk is the linear combiner output;
k is the threshold; is the activation function;
and yk is the output signal of the neuron.
Models of a Neuron
Block-Diagram Representation of a Neuron
p
uk wkj x j
j 1
yk ( u k k )
Models of a Neuron
The use of threshold k has the effect of
applying an affine transformation to the
output uk of the linear combiner in the
model of the figure, as shown by
vk u k k
Models of a Neuron
In particular, depending on
whether the threshold k is
positive or negative, the
relationship between the
effective internal activity level or
activation potential vk of neuron
k and the linear combiner
output uk is modified in the
manner illustrated in the figure.
Note that as a result of this
affine transformation, the graph
of vk versus uk no longer passes
through the origin.
Models of a Neuron
The threshold k is an external parameter
of artificial neuron k. We may account for its
presence as in the above equation.
Equivalently, we may formulate the
combination of the two equations as follows:
p
vk wkj x j
j 0
yk (vk )
Models of a Neuron
Here we have added a new synapse, whose input
is
x0 1
and whose weight is
wk 0 k
Models of a Neuron
We may therefore
reformulate the model
of neuron k as in the
figure, where the effect
of the threshold is
represented by doing
two things:
Models of a Neuron
(1) adding a new input signal fixed at -1, and
(2) adding a new synaptic weight equal to the
threshold k .
where the
Models of a Neuron
combination of
fixed input
xo = + 1 and
weight wkO = bk
accounts for the
bias bk•
Although the
models of the
two figures are
different in
appearance,
they are
mathematically
equivalent.
Slayt 92
Y1 YTU; 15.03.2005
Models of a Neuron
Signal-Flow Graph Representation of a Neuron
p
uk w
j 1
kj xj
yk ( uk k )
Models of a Neuron
Signal-Flow Graph Representation of a Neuron
Two different types of links may be distinguished:
(a) Synaptic links, defined by a linear input-output
relation. Specifically, the node signal xj is
multiplied by the synaptic weight wkj to produce
the node signal vk .
(b) Activation links, defined in general by a
nonlinear input-output relation. This form of
relationship is the nonlinear activation function
given as
(.)
Models of a Neuron
y (v )
defines the output y of a neuron in terms of the
activity level at its input v.
Models of a Neuron
We may identify three basic types of activation
functions:
1. Threshold Function
2. Piecewise-linear Function
3. Sigmoid Function
Models of a Neuron
Threshold (hard limiter or binary activation )
1.
Function (leading to discrete perceptron)
(v )
1 1
1 (v) sgn(v)
2 2
0
v
(a) Unipolar
Models of a Neuron
(v )
(v) sgn(v)
1
00
v
0
-1
(b) Bipolar
Models of a Neuron
2. Piecewise-linear Function
(v )
1 1 1 1 1
(v ) v v
0.5 2 2 2 2
-0.5 0 v
0
0
(a) Unipolar
y ij (t ) f ( xij )
1
2
xij (t ) 1 x ij (t ) 1
Models of a Neuron
(v )
(v ) v 1 v 1
1 1
2
-1
0
1 v
-1
(b) Bipolar
Models of a Neuron
3. Sigmoid Function
(v )
1 1
(v ) av
;a 0
0.5
v
1 e
(a) Unipolar
Models of a Neuron
(v )
1
1 e av 2
(v ) av
= av
-1 ; a 0
1 e 1 e
v
-1
(b) Bipolar
Models of Artificial Neural Networks
DEFINITION OF Neural Network
(Jacek M. Zurada, ARTIFICIAL NEURAL SYSTEMS, 1992, West Publishing Company)
w
j 1
kj xj 0
The Perceptron
This is illustrated here for the case of
two input variables xl and x2, for which
the decision boundary takes the form of
a straight line called the decision line. A
point (x1,x2) that lies above the decision
line is assigned to class C1, and a point
(x1,x2) that lies below the decision line
is assigned to class C2. Note also that
the effect of the threshold is merely to
shift the decision line away from the
origin. The synaptic weights w1 w2, ..,wp
of the perceptron can be fixed or
adapted on an iteration-by-iteration
basis. For the adaptation, we may use
an error-correction rule known as the
perceptron convergence algorithm.
The Perceptron
We find it more convenient to
work with the modified signal-
flow graph given here.
In this second model, which is
equivalent to that of the previous
figure, the threshold is treated as
a synaptic weight is connected
to a fixed input equal to -1. We may thus define the
(p + 1)-by-1 (augmented) input vector and the
corresponding (augmented) weight vector as:
x [ x1 x2 ... ... x p 1]t w [ w1 w2 ... ... wp ]t
The Perceptron
Pattern Space
(0, 0) (2,0)
x1
(-0.5, -1) (1.5,-1)
2 1.5 1
, , : class 1
0 1 2
0 0.5 1
, , : class 2
0 1 2
The Perceptron
One possible decision line is given by x2= 2x1-2
which is drawn in the following figure.
x2 x = 2x -2
2 1
(0, 0) (2,0)
x1
(-0.5, -1) (1.5,-1)
x1
-2
x2 1 + v sgn(v) y
-2
-1 y sgn(2 x1 x2 2)
Single-Layer Feedforward Neural
Network
Example 2: Assume that a set of eight points,
P0, P1... , P7 , in three-dimensional space is available.
The set consists of all vertices of a three-dimensional
cube as follows:
{P0(-l, -1, -l), P1(-l, -1, l), P2(-1, 1, -1), P3(-1, 1, 1),
P4(1, -1, -l), P5(1, -1, 1), P6(1,1, -1), P7(1, 1, 1)}
Elements of this set need to be classified into two categories
The first category is defined as containing points with two
or more positive ones; the second category contains all the
remaining points that do not belong to the first category.
Single-Layer Feedforward Neural
Network
Classification of points P3, P5, P6, and P7 can be
based on the summation of coordinate values for
each point evaluated for category membership.
Notice that for each point Pi (x1, x2, x3) ,where
i = 0, ... , 7, the membership in the category can be
established by the following calculation:
1, then category 1
If sgn( x1 x2 x3 )
1, then category 2
Single-Layer Feedforward Neural
Network
The neural network given below implements the
above expression:
Single-Layer Feedforward Neural
Network
The network above performs the
threedimensional Cartesian space partitioning
as illustrated below :
Single-Layer Feedforward Neural
Network
Discriminant Functions
In Example 1
x3 2 x1 x2 2
can be viewed as a Discriminant Function. We
may also write
g ( x1 , x2 ) 2 x1 x2 2
or
x1
g ( x ) 2 x1 x2 2 where x =
x2
Single-Layer Feedforward Neural
Network
g ( x1 , x2 ) 2 x1 x2 2
can also be viewed as the equation of a plane in
3-D Euclidean space.
g ( x1 , x2 ) 0
we can write
g ( x1 , x2 ) g ( x1 , x2 )
dg ( x1 , x2 ) dx1 dx2 0
x1 x2
where dx1 and dx2 are the increments given to
x1 and x2 on the decision line.
Single-Layer Feedforward Neural
Network
Now defining
g ( x1 , x2 )
x dx1
g ( x1 , x2 ) 1
and dr
g ( x1 , x2 ) dx2
x
2
where
g and dr
g ( x1 , x2 ) 1 1 1
x
2
Single-Layer Feedforward Neural
Network
In fact 2 is obtained from
g ( x1 , x2 ) 0
Note that 1 and 2 are the projections of the
normal vectors on the x-y plane of two intersecting
planes whose intersection line is given by
g ( x1 , x2 ) 0
Single-Layer Feedforward Neural
Network
Although 1 and 2 are unique, there are infinetely
many plane pairs whose intersection line is given
by
g ( x1 , x2 ) 0
Plane pairs can be built by appropriately
augementing the 2-D normal vectors 1 and 2
to 3-D normal vectors which will be the normal
vectors of the two intersecting planes.
Single-Layer Feedforward Neural
Network
The 2-D normal vectors are plane vectors given in
the x-y plane.
2 2
1 ,2
1 1
-2
Single-Layer Feedforward Neural
Network
Note that 1 and 2 are the normal vectors of the
plane that is perpendicular to the x-y plane and
intersects the x-y plane at the decision line.
n ( x x0 ) 0
t
where:
•n is the normal vector of the plane,
•x is the vector connecting any point on the plane
to the origin,
•x0 is the vector connecting a fixed point on the
plane to the origin.
Single-Layer Feedforward Neural
Network
2 x1 1
1
For n1 1 2 1 2 x2 0 0 g1 x1 x2 1
2
2 g1 0
2 x1 1
1
For n2 1 2 1 2 x2 0 0 g 2 x1 x2 1
2
2 g 2 0
Single-Layer Feedforward Neural
Network
Because of the way g1(x) and g2(x) are built we can
state the following:
2 2
1 1
For n1 1 g1 x1 x2 1 For n2 1 g 2 x1 x2 1
2 2
2 2
Single-Layer Feedforward Neural
g
Network
Decision
Decisio
n2 line
n1 g1
g2
x1
Decision
line x2
Single-Layer Feedforward Neural
Network
Now we can compute g1(x) and g2(x) for the selected
patterns in Example 1.
Class 1 Class 2
(2,0) (1.5,-1) (1,-2) (0,0) (-0.5,1) (-1,-2)
Pj
xj
0
Single-Layer Feedforward Neural
Network
Now we will derive the equation of the boundary line.
Let the vector x and x0 represent any point on this and
the point P0, respectively. Then the following must hold:
( xi x j ) ( x x 0 ) 0
t
2
Single-Layer Feedforward Neural
Network
and
1
( xi x j ) x ( xi x j ) ( xi x j ) 0
t t
2
or
1 2
( xi x j ) x ( xi x j ) 0
t 2
2
Single-Layer Feedforward Neural
Network
Now defining
1 2
gij ( x ) ( xi x j ) x ( xi xj )
t 2
2
We have already seen that the boundary (decision)
line can be taken as the intersection of two planes
gi and gj .
Single-Layer Feedforward Neural
Network
Therefore
gij ( x ) gi ( x ) g j ( x )
where we have called gi (x) discriminant functions
and shown that they are associated with plane
equations.
Single-Layer Feedforward Neural
Network
Now using the two equations above we obtain
1 2
( xi x j ) x ( xi x j ) gi ( x ) g j ( x )
t 2
2
which can be used to make the following
identification:
1 1 2
gi ( x ) xi x xi
2
t
g j ( x) x j x x j
t
2 2
Single-Layer Feedforward Neural
Network
gi (x) can also be expressed as:
gi ( x ) wi x wi ,n1
t
wi = x i
1
wi ,n1 xi
2
2
Single-Layer Feedforward Neural
Network
An alternative approach towards the construction
of discriminant functions may be taken as follows:
Let us assume that a minimum–distance
classification is requried to classify patterns into
R categories. Each of the classes is represented by
its center point Pi , i=1,2,…..,R. The Euclidean
distance between an input pattern x and the point
Pi is given by the norm of the vector x-xi as:
x xi ( x xi ) ( x xi )
t
Single-Layer Feedforward Neural
Network
A minimum–distance classifier computes the distance
from a pattern of unknown classification to each of the
center points Pi . Then the category number of the point
that yields the minimum distance is assigned to the
unknown pattern.
x xi
we need to maximize
t 1 t
gi (x) = x x - xi xi
i
2
which is called a discriminant function.
Single-Layer Feedforward Neural
Network
Example 3: A linear minimum-distance classifier
will be designed for the three points given as:
10 2 5
x1 , x2 , x3
2 5 5
It is also assumed that the index of each point
(pattern)corresponds to its class number.
The three points and the connecting lines
constitute a triangle which is shown on the
next slide:
Single-Layer Feedforward Neural
Network
x2
P3(-5,5)
P1(10,2)
0 x1
P2(2,-5)
Single-Layer Feedforward Neural
Network
Now let us draw the circle passing through all three
vertices of the triangle, the circumcircle. We can
conclude that each boundary is a perpendicular
bisector of the triangle. A perpendicular bisector of a
triangle is a straight line passing through the
midpoint of a side and being perpendicular to it, i.e.
forming a right angle with it. The three
perpendicular bisectors meet at a single point, the
triangle's circumcenter; this point is the center of the
circumcircle.
Single-Layer Feedforward Neural
Network
x2
P3(-5,5)
P1(10,2)
0 x1
P2(2,-5)
Single-Layer Feedforward Neural
Network
Now using
10 2 5
x1 , x2 , x3
2 5 5
and
1 2
gij ( x ) ( xi x j ) x ( xi xj )
t 2
2
we obtain
Single-Layer Feedforward Neural
Network
1
g12 ( x ) ( x1 x2 ) x ( x1 x2 )
t 2 2
2
t
10 2 x1 1
[(100 4) (4 25)]
2 5 x2 2
8 x1 7 x2 37.5
Single-Layer Feedforward Neural
Network
1
g13 ( x ) ( x1 x3 ) x ( x1 x3 )
t 2 2
2
t
10 5 x1 1
x 2 [(100 4) (25 25)]
2 5 2
15 x1 3 x2 27
Single-Layer Feedforward Neural
Network
1
g 23 ( x ) ( x2 x3 ) x ( x2 x3 )
t 2 2
2
t
2 5 x1 1
[(4 25) (25 25)]
5 5 x2 2
7 x1 10 x2 10.5
Single-Layer Feedforward Neural
Network
Now using
wi = x i
1
wi ,n1 xi
2
2
we obtain
10 2 5
w1 2 ; w2 5 ; w3 5 ;
-52 -14.5 -25
Single-Layer Feedforward Neural
and using
Network
gi ( x ) wi x wi ,n1
t
we obtain
g1 ( x ) 10 x1 2 x2 52
g 2 ( x ) 2 x1 5 x2 14.5
g3 ( x ) 5 x1 5 x2 25
Single-Layer Feedforward Neural
Network
A block diagram producing the three discriminant
functions is shown below:
x1 10 10 x1 2 x2 52
2
52
2
x2 -5 2 x1 5 x2 14.5
14.5
5 -5
-1 25 5 x1 5 x2 25
Single-Layer Feedforward Neural
Network
The discriminant values for the three patterns
P1(10,2), P2(2,-5) and P3(-5,5) are shown in the
table below:
Input
sgn(g2(x)= -x2-2) -1 1 -1
sgn(g3(x)=-9x1+x2) -1 -1 1
Single-Layer Feedforward Neural
Network
The diagonal entries=1
The offdiagonal entries=-1
However, as the next example will demonstrate
this is not true for any three points P1,P2 ,P3 taken
from the three decision regions H1, H2, H3.
Single-Layer Feedforward Neural
Network
The response of the same network to the patterns
Q1(5,0), Q2(0,1) and Q3(-4,0) are shown in the table below:
Input
sgn(g2(x)= 2x1-5x2-14.5) -1 -1 -1
sgn(g3(x)=-5x1+5x2-25) -1 -1 -1
Single-Layer Feedforward Neural
Network
It is therefore impossible to use TLUs once the
decision lines are calculated using the
minimum-distance calssification procedure.
The only way out is using a maximum selector.
g1 10 x1 2 x2 52 0
g 2 2 x1 5 x2 14.5 0
g3 5 x1 5 x2 25 0
Single-Layer Feedforward Neural
Network
These planes are shown on the next slide.
50
-50
gi
-100
-150
-200
10
-5
x2 -10 5 10
-10 -5 0
x1
Single-Layer Feedforward Neural
Network
P3(-5,5) g1 ( x) g 2 ( x)
g1 ( x) g3 ( x)
g3 ( x) g1 ( x)
g3 ( x) g 2 ( x) P123(2.337,2.686)
P1(10,2)
0 x1
g 2 ( x) g1 ( x)
g23 (x)=0 g 2 ( x) g3 ( x)
g12 (x)=0
H2 P2(2,-5)
Single-Layer Feedforward Neural
Network
A MATLAB plot of the projections of the
intersection lines of the planes gi are shown
on the next slide
Single-Layer Feedforward Neural
Network
30
20
10
-10
-20
-30
-30 -20 -10 0 10 20 30
Single-Layer Feedforward Neural
Network
The projections of the intersection lines of the
planes gi on the x1-x2 plane are shown to be given
by the following line equations:
g12 ( x ) 8 x1 7 x2 37.5 0
g13 ( x ) 15 x1 3 x2 27 0
g 23 ( x ) 7 x1 10 x2 10.5 0
The previous slide shows the segments that can
be seen from the top.
Single-Layer Feedforward Neural
Network
The continuation of the line g12=0 remains
underneath the plane g3.
x1 10 g1(x)
1
2
52
2 Maximum i=1,2, or 3
x2 -5 g2(x)
2 selector
14.5
5 -5 g3(x)
-1 25 3
x1 -5
x2 5 g3(x)
3
-1 25
Classifier using the maximum selector
Single-Layer Feedforward Neural
Network
x1 10
2 g1(x)
x2 1
-1 52
x1 2
-5 g2(x) Maximum i=1,2, or 3
x2 2 selector
-1 14.5
x1 -5
x2 5 g3(x)
3
-1 25
Classifier using the maximum selector
Single-Layer Feedforward Neural
Network
5 x1 3x2 5 0
x2 2 0
9 x1 x2 0
which are given on the next slide. The shaded
areas are indecision regions which will become
clear in the following discussion.
Single-Layer Feedforward Neural
Network
x2 g1 (0,9) 22 1
g (0,9) 29 1
5 x1 3 x2 5 0 Q1(0,9) 2
g 2 (0,9) 9 1
0 P1(10,2)
x1
x2 2 0
g2(x)= -x2-2 -4 3 -7
sgn(g2(x)= -x2-2) -1 1 -1
sgn(g3(x)=-9x1+x2) -1 -1 1
Single-Layer Feedforward Neural
Network
The table on the previous slide shows that the new
discriminant functions
g1 ( x ) 5 x1 3 x2 5
g 2 ( x ) x2 2
g3 ( x ) 9 x1 x2
classify the paterns P1(10,2), P2(2,-5) and P3(-5,5)
in the same way as the discriminant functions
g1 ( x ) 10 x1 2 x2 52
g 2 ( x ) 2 x1 5 x2 14.5
g3 ( x ) 5 x1 5 x2 25
Single-Layer Feedforward Neural
Network
Conclusion:
The network, which is obtained through the
perceptron learning algorithm, and the network
obtained using the maximum-distance classification
procedure have classified the three points
P1(10,2), P2(2,-5) and P3(-5,5) in exactly the same
way, i.e.,
P1 (10,2) class 1
P2 (2,-5) class 2
P3 (-5,5) class 3
Single-Layer Feedforward Neural
Network
Now consider the patterns Q1(0,9), Q2(4,-4)
and Q3(-1,-3) which fall into shaded areas.
The discriminant values for these patterns are
shown in the table on the next slide:
Single-Layer Feedforward Neural
Network
Input
g3(x)=-9x1+x2 9 -40 6
Single-Layer Feedforward Neural
Network
Since
g1 (0,9) g3 (0,9), g 2 (0,9)
g3 ( 1, 3)>g 2 (1, 3), g1 (1, 3)
g1 (4,-4)>g 2 (4,-4),g3 (4,-4)
if we use a maximum selector instead of the three
TLUs the network can decide that
g1(x)=5x1+3x2-5 1 1 -1
g2(x)= -x2-2 -1 1 1
g3(x)=-9x1+x2 1 -1 1
Single-Layer Feedforward Neural
Network
In order to make a classification we should have a
column with one 1 and two -1s. Therefore
according to the table obtained non of the three
patterns Q1(0,9), Q2(4,-4) and Q3(-1,-3) could be
classified into any class. Therefore according to
the network with TLUs the shaded areas will be
called indecision regions.
Single-Layer Feedforward Neural
Network
Now let us consider the planes defined by
g1 5 x1 3 x2 5
g 2 x2 2
g3 9 x1 x2
which are plotted on the next slide:
Single-Layer Feedforward Neural
Network
100
50
0
gi
-50
-100
10
x2 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10
x1
Single-Layer Feedforward Neural
Network
The projections of the intersection lines of the
planes gi(x) on the x1-x2 plane are given by
g12 : 5 x1 4 x2 3 0
g 23 : 9 x1 2 x2 2 0
g13 : 14 x1 2 x2 5 0
10
-5
-10
-15
-10 -5 0 5 10 15 20
Single-Layer Feedforward Neural
Network
The continuation of the line g12=0 remains
underneath the plane g3.
wmn vm
xn K ym
Input nodes Output nodes
Single-Layer Feedforward Neural
Network
v1 w11 x1 w12 x2 ......... w1 j x j ...... w1J xJ y1 f ( v1 )
. . . . . . . . . Γ .
. . . . . . . . . .
vK wK 1 yK f ( vJ )
wK 2 . . wKJ xJ v J
v Wx y Γ(v)
Single-Layer Feedforward Neural
Network
y1 f (.) 0 . . 0 v1
y 0 f (.) . . 0 v
2 2
. . . . . . .
. . . . . . .
yK 0 0 . . f (.) vJ
y Γ[Wx]
Single-Layer Feedforward Neural
Network
Example 1:
x1
x x2
1
v1 5 3 5 x1 5 x1 3 x2 5 x3
v 0 1 2 x x 2 x
2 2 2 3
v3 9 1 0 1 9 x1 x2
y1 sgn(5 x1 3x2 5 x3 )
y sgn( x 2 x )
2 2 3
y3 sgn(9 x1 x2 )
Two-Layer Feedforward Neural
Network
Example 1: Design a neural network such that the network
maps the shaded region of plane x1, x2 into y = 1, and it
maps its complement into y = -1, where y is the output of the
neural network. In summary, the network will provide the
mapping of the entire x1, x2 plane into one of the two points
±1 on the real number axis.
Two-Layer Feedforward Neural
Network
Solution: The inputs to the neural network will
be x1, x2 and the threshold value -1. Thus the
input vector is given as:
x1
x x2
1
Two-Layer Feedforward Neural
Network
The boundaries of the shaded region are given
by the equations:
x1 1 0
x1 2 0
x2 0
x2 3 0
Two-Layer Feedforward Neural
Network
The shaded region satisfies the inequalities:
x1 1 x1 1 0
x1 2 x1 2 0
or
x2 0 x2 0
x2 3 x2 3 0
Two-Layer Feedforward Neural
Network
These inequalities may be implemented using
four neurons:
Two-Layer Feedforward Neural
Network
The equations for the first layer are obtained as:
v1 1 0 1
v 1 0 2 x1
2 x2
v3 0 1 0
1
v4 0 1 3
y sgn( y1 y2 y3 y4 3.5)
Two-Layer Feedforward Neural
Network
The resultant neural network
The Perceptron Training Algorithm
w x - 0
t for every input vector x belonging to class C1
w x0
t
for every input vector x belonging to class C2
The Perceptron Training Algorithm
wx
t
- 0 for some input vectors x belonging to class C2
w x0
t
for some input vector x belonging to class C1
The Perceptron Training Algorithm
In the former case therefore wt x will
be reduced until wt x 0 is achieved,
x1
f(wt(i )x)=0
w1
x ( i)
∇
f ( w t x (i )) f ( w1 , w2 ,....., wn1 ) w2
x( i )
xn1 (i )
f ( w1 , w2 ,....., wn1 )
wn1
x2 (i )
w2 x( i )
t
(w x( i )) =
x
n1
(i )
( w t x( i ))
w
n 1
d = 1, i.e.,class 1 is input :
1) y =sgn( wt x) 1,i.e., the input is misclassified r d y 1 (1) 2;
the correction is in the direction of steepest ascent and given as w(i) 2c x(i)
2) y =sgn( wt x) 1,i.e., the input is correctly classified r d y 1 1 0; no correction
d = -1, i.e.,class 2 is input :
1) y sgn( wt x) 1,i.e., the input is correctly classified r d y 1 (1) 0; no correction
2) y sgn(wt x) 1,i.e., the input is misclassified r d y 1 (1) 2;
the correction is in the direction of steepest decent and given as w(i) 2c x(i)
The Perceptron Training Algorithm
EXAMPLE:
The trained classifier should provide the
following classification of four patterns x
with known class membership d:
1 0.5 3 2
x(1) , x(2) , x(3) , x(4)
1 1 1 1
z , w1 , w2 or more succinctly z , x, y
The Perceptron Training
Algorithm
Now consider the surface
z f ( x, y )
If the level curves are interpreted as contour lines
of the landscape, i.e., of the surface, then along
these curves
z f ( x, y ) constant
The Perceptron Training
Algorithm
consequently, we obtain
dz df ( x, y ) 0
hence
f ( x , y ) f ( x , y )
df ( x , y ) dx dy 0
x y
where dx and dy are the increments given to
x and y on the level curve.
The Perceptron Training
Algorithm
Now defining
f ( x , y )
x dx
f ( x , y ) f ( x , y ) and dr
dy
y
where
f and dr
are known to be the gradient vector and the tangent
vector,respectively,
The Perceptron Training Algorithm
we can write
df ( x , y ) f dr 0t
z f ( x, y ) ( x 50) ( y 50) 32
2 2 2
3000
4000
3000 2000
2000
1000
1000
0 0
-1000
-1000
-2000
100
100 -2000
80 100
50 60
40 50
20 0 60 80 100
0 0 0 20 40
z=f(x,y)=(x-50)2+(y-50)2-322
3500
3000
2500
2000
1500
1000
500
-500
-1000
The Perceptron Training Algorithm
The level curves are obtained from
z f ( x, y ) ( x 50) ( y 50) 32 Ci
2 2 2
C2 9216
C1 4096
x
The Perceptron Training Algorithm
f ( x, y )
x 2( x 50) dx
f ( x, y ) and dr dy
f ( x , y ) 2( y 50)
y
Q.3 Q.4
The Perceptron Training Algorithm
The fact that the gradient vector is orthogonal to
the tangent vector proves that it is in the direction
of steepest ascent or steepest descent.
The directions found for the example show that
the gradient vector points in the direction of ascent
of the function f(x,y).
Combining the two facts we can conclude that it
points in the direction of steepest ascent.
The Perceptron Training Algorithm
x ( 1 ) 1 x ( 2 ) 0.5 x1( 3 ) 3 x1( 4 ) 2
x( 1 ) 1 , x( 2 ) 1 , x( 3 ) , x( 4 )
x2 ( 1 ) 1 x2 ( 2 ) 1 x2 ( 3 ) 1 x2 ( 4 ) 1
In the weight space the following straight lines
represent the decision lines:
1
w1 w2 w1 w2 0 w2 w1
1
0.5
w1 w2 w1 0.5w2 0 w2 0.5w1
1
3
w1 w2 w1 3w2 0 w2 3w1
1
2
w1 w2 w1 2 w2 0 w2 2 w1
1
The Perceptron Training Algorithm
Decision lines in
weight space w2
Initial weight vector
3 1
4
The Perceptron Training
Algorithm
The corresponding gradient vectors
are computed as follows:
The Perceptron Training Algorithm
( wt x( 1 ))
x ( 1 ) 1
w
for x( 1 ) w x( 1 ) w1 w2 ( w x( 1 ))
t t 1
1
x( 1 ) 1
( w x( 1 )) x1( 1 )
t
w
2
( wt x( 2 ))
x ( 2 ) 0.5
w
for x( 2 ) w x( 1 ) 0.5w1 w2 ( w x( 2 ))
t t 1
1
x( 2 ) 1
( w x( 2 )) x1( 2 )
t
w
2
( wt x( 3 ))
x ( 3 ) 3
w
for x( 3 ) w x( 3 ) 0.5w1 w2 ( w x( 3 ))
t t 1
1
x( 3 ) 1
( w x( 3 )) x1( 3 )
t
w
2
( wt x( 4 ))
x ( 4 ) 2
w
for x( 4 ) w x( 4 ) 2 w1 w2 ( w x( 4 ))
t t 1
1
x( 4 ) 1
( w x( 4 )) x1( 4 )
t
w
2
The Perceptron Training Algorithm
wtx(3)<0 wtx(3)>0
Decision lines and
gradient vectors in
w2
weight space
Initial weight vector wtx(2)>0
wtx(2)<0
2.5
w(1) x(2)
1.75 x(1) x(3)
x(4)
w1
2 wtx(4)>0 wtx(1)>0
wtx(4)<0
wtx(1)<0
3 1
4
The Perceptron Training Algorithm
Now we can concentrate on the particular training
(or learning) algorithm (or rule).
yi
x
di
r d i sgn( w x )t
Since
d i 1 , sgn( w x ) 1
t
d i 1, sgn( w x ) 1 r 0
for Since t
yi
di
The Perceptron Training
Algorithm
In order for the correct cllasification of the entire
training set x( 1 ), x( 2 ), x( 3 ), and x( 4 )
with respective class memberships
d ( 1 ) 1, d ( 2 ) 1, d ( 3 ) 1; and d ( 4 ) 1
the following four inequalities must hold:
w ( N )x( 1 ) 0
t
w ( N )x( 2 ) 0
t
wt ( N )x( 3 ) 0
wt ( N )x( 4 ) 0
where w(N) is the final weight vector that provides
correct classification for the entire training set.
The Perceptron Training
Algorithm
wtx(1)>0
2
wtx(4)>0 wtx(4)<0
wtx(1)<0
1
4 3
The Perceptron Training
Algorithm
The training has so far been shown in the weight
space. This is achieved using the decision lines
defined by x(1), x(2), x(3) nd x(4). However, the
original decision lines determined by the
perceptron at each step are defined in the pattern
space as this enables the classification to be easily
seen. These decision lines are defined by w(1), w(2)
w(3) and w(4).
In the following we show the correction steps of the
weight vector as well as the corresponding decision
surfaces in the pattern space.
The Perceptron Training
Algorithm
In the pattern space
w( 1 )x 0
determines the the decision line defined by
the initial weight vector
2.5
w(1)
1.75
as
x1
w (1) x 2.5 1.75 2.5 x1 1.75 x2 0 x2 1.429 x1
t
x2
The Perceptron Training
Algorithm
The corresponding gradient vector is computed
as follows:
( wt ( 1 )x )
w ( 1 ) 2.5
x
w ( 1 )x 2.5 x1 x2 ( w ( 1 )x
t t 1
1
w( 1 ) 1.75
( w ( 1 )x ) w2 ( 1 )
t
x2
which is the initial weight vector. As the
gradient vector lies on the side of
w (1) x 2.5 x1 1.75 x2 0
t
where
w (1) x 2.5 x1 1.75 x2 0
t
The Perceptron Training
Algorithm
weight w( 1 )
x2
decision
1.75
vector line
wt ( 1 ) x 0 wt ( 1 ) x 0
wt x( 1 ) 0 w( 1 )
x(1) x(2) x(1) x(3)
Initial weight vector w1 x(4) x1
2.5
w(1) wtx(1)>0
1.75
wtx(1)<0 Line 1 is
wt x( 1 ) w1
1
w2 w1 w2 0 w2 w1 decision line
1 x
wt ( 1 )x 2.5 1.75 1 2.5 x1 1.75 x2 0 x2 1.429 x1
x2
The Perceptron Training
Algorithm
Step 1 (Update 1): Pattern x(1) is input
Weight Space Pattern Space
Initial weight First input
vector vector Initial decision line
2.5 1
w( 1 ) x( 1) x
wt ( 1 )x 2.5 1.75 1 2.5 x1 1.75 x2 0 x2 1.429 x1
1.75 1 x2
1
y( 1 ) sgn( wt ( 1 )x( 1 )) sgn( 2.5 1.75 ) 1
1
d ( 1 ) y( 1 ) 1 ( 1 ) 2
Updated weight vector Updated decision line
2.5 1 1.5 x
wt ( 2 )x 1.5 2.75 1 1.5 x1 2.75 x2 0 x2 0.545 x1
w(2) w(1) x(1) x2
1.75 1 2.75
The Perceptron Training Algorithm
Step 1 (Update 1): :Pattern x(1) is input
Weight Space Pattern Space
1.5
w( 2 )
2.75 w2 w( 2 ) x2
wt ( 2 ) x 0
wt x( 1 ) 0
0. 5
y( 2 ) sgn( wt ( 2 )x( 2 )) sgn( 1.5 2.75 ) 1
1
d ( 2 ) y( 2 ) 1 1 ) 2
Second update
Updated decision line
1.5 0.5 1 x
w(3) w(2) x(2) 1 1.75 wt ( 3 )x 1 1.75 1 x1 1.75 x2 0 x2 0.57 x1
2.75 x2
The Perceptron Training Algorithm
Step 2 (Update 2): :Pattern x(2) is input
Weight Space Pattern Space
w( 3 ) w2 x2
w( 3 )
wt x( 2 ) 0 wt ( 3 ) x 0
Line 3 is
3 decision line
0 .5
w1 w2 w1 0.5w2 0 w2 0.5w1
1
The Perceptron Training
Weight Space Algorithm Pattern Space
Step 3 (Update 3): :Pattern x(3) is input
Weight Third
vector to be input
Decision line to be updated
updated vector
1 3 x
w( 3 ) x( 3 ) wt ( 3 )x 1 1.75 1 x1 1.75 x2 0 x2 0.57 x1
1.75 1 x2
1
1
No update,
d ( 4 ) y( 4 ) 1 ( 1 ) 0
same decision line
2
w(5) w(4) x
wt ( 4 )x 2 2.75 1 2 x1 2.75 x2 0 x2 0.73 x1
2.75 x2
The Perceptron Training Algorithm
Step 4 (Update 4): Pattern x(4) is input
Weight Space Pattern Space
w2 x2
w(5) =w(4)
wt x( 4 ) 0
wt ( 4 ) x 0
x(2) x(1) x(3)
x(4)
Initial weight vector w1 x(4) x1
wt ( 4 ) x 0 5 =4
wtx(4)<0 Line 4 remains
wtx(4)>0
2 decision line
w1 w2 2 w1 w2 0 w2 2 w1
1
The Perceptron Training
Algorithm
Step 5 (Update5): :Pattern x(1) is input
Weight First
vector to be input
Decision line to be updated
updated vector
2 1 x
w( 5 ) w( 4 ) x( 1) wt ( 4 )x 2 2.75 1 2 x1 2.75 x2 0 x2 0.73 x1
x2
2.75 1
Step 5:Pattern x(1) is input
1
y( 5 ) sgn( wt ( 5 ) x( 1 )) sgn( wt ( 4 )x( 1 )) sgn( 2 2.75 ) 1
1
d ( 1 ) y( 5 ) 1 1 0 No update,
2
same decision line
w(6) w(5) w(4) x
wt ( 4 )x 2 2.75 1 2 x1 2.75 x2 0 x2 0.73 x1
2.75 x2
The Perceptron Training Algorithm
Step 5 (Update5): :Pattern x(1) is input
Weight Space Pattern Space
w2 x2
w(6) =w(5)= w(4)
wt x( 1 ) 0
wt ( 4 ) x 0
x(1) x(2) x(1) x(3)
Initial weight vector w1 x(4) x1
wtx(1)<0 wtx(1)>0 wt ( 4 ) x 0 6 5 4
Line 4 remains
1 decision line
w1 w2 w1 w2 0 w2 w1
1
The Perceptron Training
Algorithm
Step 6 (Update 6): :Pattern x(2) is input
Weight Second
vector to be input
Decision line to be updated
updated vector
2 0.5 x
wt ( 4 )x 2 2.75 1 2 x1 2.75 x2 0 x2 0.73 x1
w( 6 ) w( 5 ) w( 4 )
. x( 2 ) x2
2 75 1
Step 6:Pattern x(2) is input
0.5
y (6) sgn( wt (6) x(2)) sgn( wt (6) x(2)) sgn( 2 2.75 ) 1
1
d ( 2 ) y( 6 ) 1 1 ) 2 Updated decision line
2 0.5 2.5 t x1
w( 7 ) w( 4 ) x( 2 ) w ( 7 )x 2.5 1.75x 2.5x1 1.75x2 0x2 1.43x1
2.75 1 1.75 2
The Perceptron Training Algorithm
Step 6 (Update 6): :Pattern x(2) is input
Weight Space Pattern Space
w2 x2
wt x( 2 ) 0 w(7)
w(7)
wtx(3)>0 wt ( 7 ) x 0
x(3) x(2) x(1) x(3)
Initial weight vector w1 x(4) x1
wtx(3)<0 7
wt ( 7 ) x 0
2
w1 w2 2 w1 w2 0 w2 2 w1
1
The Perceptron Training Algorithm
Step 7 (Update 7): :Pattern x(3) is input
Weight Space Pattern Space
w2 x2
w(4)
wt x( 3 ) 0
wt ( 3 ) x 0
wtx(3)>0
x(2) x(1) x(3)
x(2)
w1 x(4)
Initial weight vector
w t ( 3 ) x 0 x1
wtx(3)<0
3
3
w1 w2 3w1 w2 0 w2 3w1
1
The Perceptron Training
Algorithm
d ( 3 ) y( 7 ) 1 1 0
2.5
w(8) w(7)
1.75
The Perceptron Training
Algorithm
1
d ( 4 ) y( 7 ) 1 ( 1 ) 0
2.5
w(9) w(8) w(7)
1.75
The Perceptron Training
Algorithm
1
d ( 1 ) y( 9 ) 1 1 0
2.5
w(10) w(9) w(8) w(7)
1.75
The Perceptron Training
Algorithm
1
d ( 2 ) y( 10 ) 1 1 2
2.5
w(11)
1.75
The Perceptron Training
Algorithm
The Perceptron Training Algorithm
Weight Space Pattern Space
w2 x2
wt x( 1 ) 0
wt ( 1 ) x 0 wt ( 1 ) x 0
w( 1 )
x(1) x(2) x(1) x(3)
Initial weight vector w1 x(4) x1
2.5
w(1) wtx(1)>0
1.75
wtx(1)<0
1
wt x( 1 ) w1 w2 w1 w2 0 w2 w1
1 x
wt ( 1 )x 2.5 1.75 1 2.5 x1 1.75 x2 0 x2 1.429 x1
x2
The Perceptron Training Algorithm
The initial weight vector w(1) and the weight
vectors w(2)- w(11) obtained during the training
algorithm are given below:
2.5 1.5 1 2
w(1) , w(2) , w(3) , w(4) ,
1.75 2.75 1.75 2.75
2.5
w(5) w(4), w(6) w(5) w(4), w(7) ,
1.75
3
w(8) w(7), w(9) w(8) w(7), w(10) w(9) w(8) w(7), w(11)
0.75
2.5
1.5
0.5
-0.5
-1
-1.5
-2
-8 -6 -4 -2 0 2 4 6 8 10 12
The Perceptron Training Algorithm
x3
Example: The trained (1,0,1)
(0,0,1)
classifier is required
to provide the
(0,0,1)
classification such
that the yellow vertices (0,1,1) x1
of the cube have class 0
membership d=1 and (0,0,0) (1,0,0)
the blue vertices have
class membership (0,1,0)
d=2. (1,1,0)
x2
x3=0.25 plane
The Perceptron Training Algorithm
The Perceptron Training Algorithm
SUMMARY OF CONTINUOUS PERCEPTRON TRAINING ALGORITHM
Given are the p training pairs
{x1,d1, x2,d2,…………….,xp,dp}, where xi is (N+1) x 1
Di is 1 x 1, i=1,2, ,P. In the following n denotes
the training step and p denotes the step counter
within the training cycle.
Step 1: c>0 is chosen.
Step2: Weights are initialized at w at random small
values, w is (N+1) x 1. Counters and error are
initialized.
1 k ,1 p and 0 E
The Perceptron Training Algorithm
Step3: The training cycle begins here. Input is
presented and output is computed:
x p x, d p d , y sgn( w x)
t
2
where n is a positive integer representing the traning
step number, i.e.,the step number in the minimisation
process, d(n) is the desired output signal and
w( n 1 ) w( n ) E( w( n ))
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
E( w( n )) E( n )
is the error surface at the n’th training step .
Therefore the error to be minimised is:
1
E( n ) ( d ( n ) f ( wt ( n )x( n ))2
2
The independent variables for minimisation at
each training step are wi, the components of the
weight vector.
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
1 2
E( w )w w( n ) ( d ( n ) f ( w x( n ))
t
2 w w( n )
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
The gradient vector is defined as:
E
w
1
E
w2
E ( w )
.
.
E
w
p 1
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
Using
1 2
E( w )w w( n ) ( d ( n ) f ( w x( n ))
t
2 w w( n )
and defining
v( w ) w x t
we obtain
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
v( w )
w
1
v( w )
df ( v( w )) w2
E( w )ww( n ) d ( n ) f ( v( w )) .
dv
.
v( w )
w
p 1 w w( n )
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
Since
v( w)
xi
wi
and
f(v) y
we can write
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
df (v( w))
E ( w)w w( n ) d ( n) y ( n) x ( n)
dv ww ( n )
and
E ( w) df (v( w))
d ( n) y ( n) x i ( n)
wi ww( n ) dv ww ( n )
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
If bipolar continuous activation function is used
then we have:
v
1 e
f (v ) v
1 e
and
v
df (v) 2e
v 2
dv (1 e )
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
In fact
df (v) 2ev
1 v 2
1 e 1
1 v
1 f 2
(v)
dv 1 e
v 2 2 1 e 2
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
E( w )w w( n )
d ( n ) y( n )1 y 2 ( n ) x( n )
1
2
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
1
2
2
w( n 1 ) w( n ) d ( n ) y( n )1 y ( n ) x( n )
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
If unipolar continuous activation function is used
then we have:
1
f (v ) v
1 e
and
v
df (v) e
v 2
dv (1 e )
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
we can write
df (v) e v 1 1 ev 1 1 1
(1 )
1 e 1 e 1 e
v v v v
dv v 2
1 e 1 e
=f (v)(1 f (v))
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
Example:We will carry out the same
training algorithm as in the previous
example but this time using a continuous
bipolar perceptron.
The error at step n is given by:
2 2
1 1 e
v( n )
1 2
E ( n ) d ( n ) v( n )
d ( n ) v( n )
1
2 1 e 2 1 e
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
For the first pattern x(1)=[1 1]t , d(1)=1.
The error at step 1 is given by:
2
1 2
E (1) d (1) ( w1 w2 )
1
2 1 e
2 2
1 2e
( w1 w2 )
1 2 2
1 ( w1 w2 )
1 ( w1 w2 )
( w1 w2 ) 2
2 1 e 2 1 e (1 e )
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
For the second pattern x(2)=[-0.5 1]t ,
d(2)=-1.
The error at step 2 is given by:
2
1 2
E ( 2 ) d ( 2 ) ( 0.5 w1 w2 )
1
2 1 e
2
1 2
2
2 1 2
1 1 ( 0.5 w1 w2 )
2 1 e
( 0.5 w1 w2 )
2 1 e
1 e( 0.5 w1 w2 )
2
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
2 2
E2(w1,w2)
E1(w1,w2)
1 1
0 0
0 100
100 100
50 50 50 50
100 0 w2 w2 0 0 w1
w1
Error surface for xt(3)=[3 1] and y=f(wt*x(3))Error surface for xt(4)=[2 -1] and y=f(wt*x(4))
2
2
E4(w1,w2)
E3(w1,w2)
1
1
0 0
0 100 100
50 50 50 100
0 0 50
100 0 w2 w2 w1
w1
Training Rule for a Single-Layer
Continuous Perceptron:The Delta
Training Rule
The total error is defined by:
E( w1 , w2 ) E1( w1 , w2 ) E2 ( w1 , w2 ) E3 ( w1 , w2 ) E4 ( w1 , w2 )
x p x, d p d , y f ( w x )
t
. . . . . . . . . Γ .
. . . . . . . . . .
vK wK 1 yK f ( vJ )
wK 2 . . wKJ xJ v J
v Wx y Γ(v)
Delta Training Rule for
Multi-Perceptron Layer
y1 f (.) 0 . . 0 v1
y 0 f (.) . . 0 v
2 2
. . . . . . .
. . . . . . .
yK 0 0 . . f (.) vJ
y Γ[Wx]
Delta Training Rule for
Multi-Perceptron Layer
The desired and actual output vectors at the nth
training step are given as:
d1 (n) y1 (n)
d ( n) y ( n)
2 2
d . y .
. .
d K (n) yK (n)
1
E( n ) ( d ( n ) y( n )) 2
2
Delta Training Rule for
Multi-Perceptron Layer
2 k 1 2
E k 1,2 ,..., K
Δwkj ( n ) for
wkj j 1,2 ,..., J
wkj wkj ( n )
Delta Training Rule for
Multi-Perceptron Layer
where
E E vk
E E (v( w)) w v w
kj k kj
Using
vk wk1 x1 wk 2 x2 ......... wkj x j ...... wkJ xJ
we have
vk
xj
wkj
Delta Training Rule for
Multi-Perceptron Layer
The error signal term produced by the kth
neuron is defined as:
E
yk
vk
Using this yields
E
yk x j
wkj
Delta Training Rule for
Multi-Perceptron Layer
On the other hand we can write:
E E yk
yk
vk yk vk
Since
1 K 1
E( n ) ( d k ( n ) yk ( n )) d( n ) y( n )
2 2
2 k 1 2
we get
E
( d k yk )
yk
Delta Training Rule for
Multi-Perceptron Layer
On the other hand using
yk f ( vk )
vk vk
yields
E E yk f ( vk )
yk ( d k yk )
vk yk vk vk
Delta Training Rule for
Multi-Perceptron Layer
which is used to obtain
f ( vk )
Δwkj ( n ) ( d k yk ) xj
vk
For bipolar continuous activation function
we already know that
f ( vk ) 1
vk
1
1 ( f ( vk )) 1 yk
2
2
2
2
Delta Training Rule for
Multi-Perceptron Layer
Hence
1
Δwkj ( n ) ( d k ( n ) yk ( n ))( 1 ( yk ( n )) )x j
2
2
and
1
wkj ( n 1 ) wkj ( n ) ( d k ( n ) yk ( n ))( 1 ( yk ( n ))2 )x j ( n )
2
Delta Training Rule for
Multi-Perceptron Layer
where
yk ( n ) ( d k ( n ) yk ( n ))( 1 ( yk ( n )) ) 2
we can write
W (n 1) W (n) δy x t
Generalised Delta Training Rule for
Multi- Layer Perceptron
z1 t11 u1 w11 v1 y1
1
x1
. v2
z2 t21 u2 wk1 y2
2
. x2 wK1
wK2
. t uj
zi j1 wkj vk yk
k
xj
.
. vK
tIJ wkJ yK
zI uJ xJ K
i-th column of nodes j-th column of nodes (hidden layer) k-th column of nodes
Generalised Delta Training Rule for
Multi- Layer Perceptron
The weight adjustment for the hidden layer
according to the gradient descent method will be:
E j 1,2 ,..., J
Δt ji ( n ) for
t ji
t ji t ji ( n )
i 1,2 ,..., I
where
E E u j
t ji u j t ji
Generalised Delta Training Rule for
Multi- Layer Perceptron
Here
E
xj for j 1,2 ,...., J
u j
is the error signal term of the hidden layer
with output x. This term is produced by the j-th
neuron of the hidden layer, where j=1,2,....,J.
On the other hand, using
u j t j1 z1 t j 2 z2 ............... t jI z I
Generalised Delta Training Rule for
Multi- Layer Perceptron
we can calculate
u j
t ji
as
u j
zi
t ji
Therefore
E E u j
xj zi
t ji u j t ji
Generalised Delta Training Rule for
Multi- Layer Perceptron
and
Δt ji xj zi
Since
xj f ( uj )
E E x j
xj
u j x j u j
Generalised Delta Training Rule for
Multi- Layer Perceptron
E 1
2
K
d k f ( vk )
x j x j 2 k 1
and
x j f ( u j )
u j u j
Generalised Delta Training Rule for
Multi- Layer Perceptron
E K
f ( vk )
( d k f ( vk )
x j k 1 x j
K
f ( vk ) vk
( d k yk )
k 1 vk x j
Now using
vk
wkj
x j
E f ( vk )
yk ( d k yk )
vk vk
Generalised Delta Training Rule for
Multi- Layer Perceptron
in
E K
f ( vk )
( d k f ( vk )
x j k 1 x j
K
f ( vk ) vk
( d k yk )
k 1 vk x j
we obtain
Generalised Delta Training Rule for
Multi- Layer Perceptron
E K
yk wkj
x j k 1
vk
Now using this and wkj in
x j
E E x j
xj
u j x j u j
Generalised Delta Training Rule for
Multi- Layer Perceptron
we obtain
f ( u j ) K
xj
u j
k 1
yk wkj
Now using
Δt ji xj zi
we get
f ( u j ) K
Δt ji zi yk wkj
u j k 1
Generalised Delta Training Rule for
Multi- Layer Perceptron
K
f ( u j )
t ji ( n 1 ) t ji ( n ) yk wkj zi
k 1 u j
for j 1,2 ,..., J
i 1,2 ,..., I
Generalised Delta Training Rule for
Multi- Layer Perceptron
Now defining the j-th column of the matrix
w11 w12 . . w1J
w w22 . . w1J
21
W . . . . .
. . . . .
wK 1 wK 2 . . wKJ
as
wj
Generalised Delta Training Rule for
Multi- Layer Perceptron
and using
y1
δy .
yK
we can write
K
k 1
yk wkj w δy t
j
Generalised Delta Training Rule for
Multi- Layer Perceptron
In the case of bipolar activation function we
obtain for the hidden layer
f ( u j ) 1
f (1 x j )
' 2
xj
u j 2
Now construct a vector whose entries are the
above terms for j=1,2,...,J, i.e.,
Generalised Delta Training Rule for
Multi- Layer Perceptron
1 2
( 1 x )
f x1 2
' 1
' 1
f
x 2 ( 1 x2 )
2
fx
' 2
f ' 1
xJ ( 1 x 2 )
J
2
Generalised Delta Training Rule for
Multi- Layer Perceptron
and define
z1
z
2
z .
.
z I
We then have
K
'
yk wkj f j zi ( w j δy ) f x z
t '
k 1
Generalised Delta Training Rule for
Multi- Layer Perceptron
Now defining
δx w δy f t
j
'
x
and
t11 t12 . . t1I
t . . t1I
21 t22
T . . . . .
. . . . .
t J 1 t J 2 . . t JI
Generalised Delta Training Rule for
Multi- Layer Perceptron
we finally obtain
T ( n 1 ) T ( n ) δx z t
W (n 1) W (n) δy x t
Generalised Delta Training Rule for
Multi- Layer Perceptron
Here the main difference is in computing the error
signals y and x. In fact, the entries of y are given
as
f ( vk )
yk ( d k yk )
vk
which only contain terms belonging to the output
layer. However, this is not the case with x ,
Generalised Delta Training Rule for
Multi- Layer Perceptron
δx w δy f t
j
'
x
f x'
1 1 1 y
x f ( y ) ln
a 1 y
The Hopfield Network
The Bipolar Activation Function and its Inverse
The Hopfield Network
The Derivative of the Inverse of the Bipolar Function
dx 2 1
dy a 1 y 2
The Hopfield Network
We can conclude that
df 1( yi )
0 for 1 yi 1
dyi
Therefore
2
dE dyi df ( yi )
n -1
Ci 0
dt i 1 dt dyi
The Hopfield Network
Considering
dxi df ( yi ) df ( yi ) dyi
-1 -1
dt dt dyi dt
we obtain
2
dE iy f y n
dxi dyi
n -1
d d ( )
Ci
i
Ci
dt i 1 dt dyi i 1 dt dt
The Hopfield Network
Now defining
x1 y1
x y
2 2
x .
y .
C diag ( Ci )
. .
xN y N
yields
t t
dE dy d x dx dy
C C
dt dt dt dt dt
The Hopfield Network
Since
dE dy
E( y )
t
dt dt
We can write
dx
E ( y ) C
dt
This reveals that the capacitor current vector
is parallel to the negative gradient vector.
The Hopfield Network
N
dxi
Ci wij ( y j xi ) g i xi I i
dt j 1
wi1
y1 Ii
xi yi
wi2
y2
gi Ci
.
.
. N
wiN dxi N
yi f ( xi )
yN Ci wij y j wij g i xi I i
dt j 1 j 1 xi f 1( yi )
The Hopfield Network
Now define
N
w
j 1
ij g i Gi
consequently
dx( t )
C Wy Gx I
dt
and since
dx
E ( y ) C
dt
The Hopfield Network
we obtain
E( y ) Wy Gx I
In the case of bipolar activation function we know that
1 1 1 y
x f ( y ) ln
a 1 y
The Hopfield Network
Therefore the state vector is given as:
1 y1
ln
1 y
1
ln 1 y2
1 1 y2
x
a
1 yN
ln 1 y
N
The Hopfield Network
We already know
N
dE dxi dyi
Ci
dt i 1 dt dt
therefore
N N
dE dyi
( wij y j Gi xi I i )
dt i 1 j 1 dt
and
N N
dE dyi N dyi N dyi
( wij y j Gi xi Ii )
dt i 1 j 1 dt i 1 dt i 1 dt
The Hopfield Network
Now consider:
t
d dy dy
( y Wy )
t
Wy y W
t
dt dt dt
If
W W
t
then
t
d dy dy
( y Wy )
t
W y yW
t t
dt dt dt
The Hopfield Network
t t
dy dy dy t dy
W y
t
Wy ( y W
t
) yW
t
dt dt dt dt
Therefore
d dy
( y Wy ) 2 y W
t t
dt dt
and
dy 1 d
t
yW t
( y Wy )
dt 2 dt
The Hopfield Network
Now consider the first term of
N N
dE dyi N dyi N dyi
( wij y j Gi xi Ii )
dt i 1 j 1 dt i 1 dt i 1 dt
We can write:
N N
dyi dy
i 1 j 1
wij y j
dt
yW
t
dt
Now using the above equality, we have
N N
dyi 1 d t
i 1 j 1
wij y j
dt 2 dt
(y Wy)
The Hopfield Network
Now consider the second term in the same equation:
dyi 1 dyi
xi f ( yi )
dt dt
we can write
yi
d
1
f ( yi ) ( f ( y )dy )
1
dyi 0
yi yi
dyi d dyi d
1
f ( yi ) ( f ( y )dy )
1
( f ( y )dy )
1
dt dyi 0 dt dt 0
The Hopfield Network
N yi N
dE d 1 t
( y Wy Gi f ( yi )dy I i yi )
1
dt dt 2 i 1 0 i 1
N yi N
1 t
E ( y Wy Gi f ( yi )dy I i yi )
1
2 i 1 0 i 1
The Hopfield Network
In order to obtain the state equations in terms
of the outputs yi consider once again
N
dxi
Ci wij y j Gi xi I i
dt j 1
Using
dxi 2 1
dyi a 1 yi 2
The Hopfield Network
we obtain
N
2 dy i
Ci wij y j Gi xi I i
a (1 y i ) dt
2
j 1
and
dy i a (1 y i )
2 N
dt
2Ci
w
j 1
ij y j Gi xi I i
The Hopfield Network
a( 1 y )
dy
2
diag i
Wy GΓ (y) I
1
dt 2Ci
The Hopfield Network
g12 g11 x1 y1
g1
C1
dx1
g1 x1 C1 g11 ( y1 x1 )+ g12 ( y2 x1 )
dt
g22
g21 x2
y2
g2 C2
dx2
g 2 x2 C2 g 22 ( y2 x2 )+g 21 ( y1 x2 )
dt
The Hopfield Network
dx1
C1 0 dt g11 g12 y1 g1 g11 g12 0 x1
0
C1 dx2 g 21 g 22 y2 0 g 2 g 22 g 21 x2
dt
which yields
2 g 21 g 22 y2 i1 0
1 y1
ln
g11 g12 y1 1 G1 0 1 y1
E ( y1 , y2 )
g 21 g 22 y2 a 0 G2 1 y2
ln 1 y
2
The Hopfield Network
and
E 1 1 y1
y g11 y1 g12 y2 a G1 ln 1 y
E ( y1 , y2 ) 1
1
E 1 1 y2
y g 22 y1 g 21 y1 a G2 ln 1 y
2 2
The Hopfield Network
y 1 y
2
1
E ( g11 y1 g 22 y2 y1 y2 g12 y1 y2 g12 ) G1 f y dy G2 f 1 y dy
2 2 1
2 0 0
y y
1 1 1 1 y 2
1 y
E ( g11 y1 g 22 y2 y1 y2 g 21 y1 y2 g12 ) G1 ln
2 2
dy G2 ln dy
2 a 0 1 y 0
1 y
Now consider
yi yi yi
1 y
I ln dy ln( 1 y )dy ln( 1 y )dy
0
1 y 0 0
The Hopfield Network
and yi
I1 ln( 1 y )dy
0
ln( 1 y ) u dy
du
Let then 1 y
1 y v
dv dy
yi yi yi
1 y
0
0
The Hopfield Network
I1 { ln ( 1 yi )}( 1 yi ) yi
I 2 {ln ( 1 yi )}( 1 yi ) yi
yi yi yi
1 y
I ln dy ln( 1 y )dy ln( 1 y )dy
0
1 y 0 0
1 1
E {g11 y12 g 22 y22 (g12 g 21 )y1 y2 } G1 ( 1 y1 ) ln( 1 y1 ) ( 1 y1 ) ln( 1 y1 )
2 a
1
G2 ( 1 y2 ) ln( 1 y2 ) ( 1 y2 ) ln( 1 y2 )
a
The Hopfield Network
The Hopfield Network
The Hopfield Network
The Hopfield Network
The Hopfield Network
The Hopfield Network
Using
a( 1 yi2 )
dy
dt
diag
1
Wy GΓ (y) I
2Ci
the state equations are obtained as
dy1 a 1 y 2
1 1 y1
1
ln
dt 2C1
0
11g g y G 0 a 1 y1
12 1 1
dy2 0 1 y22 g 21 g 22 y2 0 G2 1 1 y2
dt
a a ln 1 y
2C2 2
dy1 1 (1 y 2 )(ag y ag y G ln 1 y1 )
dt 2C1 1 11 1 12 2 1
1 y1
dy2 1 (1 y 2 )(ag y ag y G ln 1 y2 )
dt 2C2 2 22 2 21 1 2
1 y2
Discrete-Time Hopfield Networks
Consider the state equation of the Gradient-Type
Hopfield Network:
dx( t )
C Wy Gx I
dt
We can write
dx( t )
C Wy GΓ (y) I
-1
dt
Discrete-Time Hopfield Networks
As the plot of the inverse
bipolar activation function
shows the second term in
the above equation is zero
for high gain neurons.
Hence:
dx( t )
C Wy I
dt
Discrete-Time Hopfield Networks
Now consider
dxi df ( yi ) df ( yi ) dyi
-1 -1
dt dt dyi dt
0 Wy I
Now let us solve this equation using Jacobi’s
algorithm. To this end define:
Discrete-Time Hopfield Networks
W' W D = L U
where
L,U and D = diag( wii )
-D I I
-1
Discrete-Time Hopfield Networks
y = Wy I
Now replace the vector y on the right-hand side
by an initial y(0) vector. If the vector y on the left-
hand side is obtained as y(0), then y(0) is the
solution of the system. If not then call the vector y
obtained on the left-hand side y(1), i.e.,
Discrete-Time Hopfield Networks
y(1) = W y(0) I
and in general we can write
y(k + 1) = W y(k) I
Discrete-Time Hopfield Networks
The method will always converge if the matrix
W is strictly or irreducibly diagonally
dominant. Strict row diagonal dominance
means that for each row, the absolute value of
the diagonal term is greater than the sum of
absolute values of other terms:
wii wij
i j
Discrete-Time Hopfield Networks
The Jacobi method sometimes converges even
if this condition is not satisfied. It is necessary,
however, that the diagonal terms in the matrix
are greater (in magnitude) than the other
terms.
Discrete-Time Hopfield Networks
Solution by Gauss-Seidel Method
In Jacobi’s method the updating of the
unknowns is made after all N unknowns have
been moved to the left side of the equation. We
will see in the following that this is not
necessary, i.e., the updating can be made
individually for each unknown and this updated
value can be used in the next equation. This is
shown in the following equations:
Discrete-Time Hopfield Networks
1
x1 (n 1) [ a12 x2 (n) a13 x3 (n) ...... a1N xN (n) b1 ]
a11
1
x2 (n 1) a21x1 (n 1) a23 x3 (n) ..... a2 N xN (n) b2 a22
a22
1
x3 ( n 1) a31x1 (n 1) a32 x2 (n 1) a34 x2 (n) ...... a3 N xN (n) b3 a33
a33
and
1
xN (n 1) aN 1 x1 ( n 1) aN 2 x2 ( n 1) ...... aN , N 1 xN 1 ( n 1) bN aNN
aNN
1
xi (n 1) bi aij x j (n 1) aij x j (n)
aii j i j i
Discrete-Time Hopfield Networks
Gauss-Seidel method is defined on matrices with non-zero
diagonals, but convergence is only guaranteed if the matrix
is either:
1. diagonally dominant or
2. symmetric and positive definite.