Algorithm: September 23, 2010 Neural Networks Lecture 6: Perceptron Learning 1

Refresher: Perceptron Training Algorithm
Algorithm Perceptron;
Start with a randomly chosen weight vector w0;
Let k = 1;
while there exist input vectors that are
misclassified by wk-1, do
Let ij be a misclassified input vector;
Let xk = class(ij)ij, implying that wk-1xk < 0;
Update the weight vector to wk = wk-1 + xk;
Increment k;
end-while;
September 23, 2010 Neural Networks 1
Lecture 6: Perceptron Learning
Another Refresher: Linear Algebra
How can we visualize a straight line defined by an
equation such as w0 + w1i1 + w2i2 = 0?
One possibility is to determine the points where the
line crosses the coordinate axes:
i1 = 0  w0 + w2i2 = 0  w2i2 = -w0  i2 = -w0/w2
i2 = 0  w0 + w1i1 = 0  w1i1 = -w0  i1 = -w0/w1
Thus, the line crosses at (0, -w0/w2)T and (-w0/w1, 0)T.
If w1 or w2 is 0, it just means that the line is horizontal
or vertical, respectively.
If w0 is 0, the line hits the origin, and its slope i 2/ii is:
w1i1 + w2i2 = 0  w2i2 = -w1i1  i2/i1 = -w1/w2
Perceptron Learning Example
We would like our perceptron to correctly classify the
five 2-dimensional data points below.
Let the random initial weight vector w0 = (2, 1, -2)T.
Then the dividing line crosses at
i2 (0, 1)T and (-2, 0)T.
3
-1 2
1 Let us pick the misclassified
-3 -2 -1 1 2 3
point (-2, -1)T for learning:
-1 i1 i = (1, -2, -1)T (include offset 1)
-2 1
-3
x1 = (-1)(1, -2, -1)T (i is in class -1)
class -1 x1 = (-1, 2, 1)T
class 1
w1 = w0 + x1 (let us set  = 1 for simplicity)
w1 = (2, 1, -2)T + (-1, 2, 1)T = (1, 3, -1)T
The new dividing line crosses at (0, 1)T and (-1/3, 0)T.
i2 3 Let us pick the next misclassified

-1 2
1
point (0, 2)T for learning:
-3 -2 -1 1 2 3
i = (1, 0, 2)T (include offset 1)
-1 i1
-2 1 x2 = (1, 0, 2)T (i is in class 1)
-3
class -1
class 1
w2 = w1 + x2
w2 = (1, 3, -1)T + (1, 0, 2)T = (2, 3, 1)T
Now the line crosses at (0, -2)T and (-2/3, 0)T.
i2 3
1 With this weight vector, the
2
1
perceptron achieves perfect
-3 -2 -1 1 2 3
classification!
-1 i1 The learning process terminates.
-2
-1 -3 In most cases, many more
iterations are necessary than in
class -1 this example.
class 1
Perceptron Learning Results
We proved that the perceptron learning algorithm is
guaranteed to find a solution to a classification problem
if it is linearly separable.
But are those solutions optimal?
One of the reasons why we are interested in neural
networks is that they are able to generalize, i.e., give
plausible output for new (untrained) inputs.
How well does a perceptron deal with new inputs?

Perfect
classification of
training samples,
but may not
generalize well to
new (untrained)
samples.

This function
is likely to
perform
better
classification
on new
samples.

Adalines
Idea behind adaptive linear elements (Adalines):
Compute a continuous, differentiable error function
between net input and desired output (before applying
threshold function).
For example, compute the mean squared error (MSE)
between every training vector and its class (1 or -1).
Then find those weights for which the error is minimal.
With a differential error function, we can use the
gradient descent technique to find this absolute
minimum in the error function.

Gradient Descent
Gradient descent is a very common technique to find
the absolute minimum of a function.
It is especially useful for high-dimensional functions.
We will use it to iteratively minimizes the network’s (or
neuron’s) error by finding the gradient of the error
surface in weight-space and adjusting the weights
in the opposite direction.

Gradient Descent
Gradient-descent example: Finding the absolute
minimum of a one-dimensional error function f(x):
f(x)
slope: f’(x0)
x0 x1 = x0 - f’(x0) x
Repeat this iteratively until for some xi, f’(xi) is

sufficiently close to 0.
Gradient Descent
Gradients of two-dimensional functions:
The two-dimensional function in the left diagram is represented by contour

lines in the right diagram, where arrows indicate the gradient of the function
at different locations. Obviously, the gradient is always pointing in the
direction of the steepest increase of the function. In order to find the
function’s minimum, we should always move against the gradient.

Algorithm: September 23, 2010 Neural Networks Lecture 6: Perceptron Learning 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Algorithm: September 23, 2010 Neural Networks Lecture 6: Perceptron Learning 1

Uploaded by

Copyright:

Available Formats

Refresher: Perceptron Training Algorithm

i2 3 Let us pick the next misclassified

September 23, 2010 Neural Networks 6

September 23, 2010 Neural Networks 7

September 23, 2010 Neural Networks 8

September 23, 2010 Neural Networks 9

September 23, 2010 Neural Networks 10

Repeat this iteratively until for some xi, f’(xi) is

The two-dimensional function in the left diagram is represented by contour

You might also like