Professional Documents
Culture Documents
3 DeltaRule PDF
3 DeltaRule PDF
1
Mathematical Preliminaries: Vector Notation
2
Review of the McCulloch
McCulloch-Pitts/Perceptron
Pitts/Perceptron Model
x1 w1
x2 a y
∑
x3
xn wn
3
Review of Geometrical Interpretation
x2
y=1
x1
y=-1
wx = 0
4
R i
Review off Supervised
S i d Learning
L i
x
Supervisor ytarget
Generator
Learning
y
Machine
5
Learning b
by Error Minimization
Minimi ation
1
E ( w)
2 p j
( y p
tarj y p 2
j )
Intuition:
• Square makes error positive and penalises large errors more
• ½ jjust makes the maths easier
• Need to change the weights to minimize the error – How?
• Use principle of Gradient Descent
6
Principle of Gradient Descent
Gradient
G di descent
d i an optimization
is i i i algorithm
l i h thath approaches
h a llocall
minimum of a function by taking steps proportional to the negative of
the gradient of the function as the current point.
W
7
Error Gradient
So, calculate the derivative (gradient) of the Cost Function with respect
to the weights, and then change each weight by a small increment in
the negative (opposite) direction to the gradient
To do this we need a differentiable activation function, such as the
linear function: f(a) = a
1
E ( w ji ) ( ytarj y j ) 2 y j f (a j ) w ji xi
2 i
E E y j
( ytarj y j ) xi xi
w ji y j w ji
8
Widrow-Hoff Learning Rule
(Delta Rule)
E
w w wold x or w wold x
w
The weight
g changes
g Δwji need to be applied
pp repeatedly
p y for each weight
g wji in
the network and for each training pattern in the training set.
One pass through all the weights for the whole training set is called an epoch
of training.
training
After many epochs, the network outputs match the targets for all the training
patterns all the Δwji are zero and the training process ceases
patterns, ceases. We then say
that the training process has converged to a solution.
It has been shown that if a possible set of weights for a Perceptron exist, which
solve
l ththe problem
bl correctly,
tl ththen th
the Perceptron
P t L
Learning
i rule/Delta
l /D lt RRule
l
(PLR/DR) will find them in a finite number of iterations.
10