Perceptron

The Perceptron Convergence Theorem
Robert Snapp
snapp@cs.uvm.edu
Department of Computer Science
University of Vermont
CS 295 (UVM)
Fall 2013
1 / 21
The Peceptron Algorithm

Consider a linear threshold unit (LTU) with n inputs x1 ; : : : ; xn 2 R and n C 1 weights
w0 ; w1 ; : : : ; wn .
x1
w1
x2
w2
x3
w3
xn
wn
w0
Letting x D .x1 ; x2 ; : : : ; xn /T 2 Rn and w D .w1 ; w2 ; : : : ; wn /T 2 Rn , we have
(
T
y D sgn.w x C w0 / D
CS 295 (UVM)
1; if wT x C w0 > 0;
1; if wT x C w0 0:
Fall 2013
2 / 21
Perceptron Algorithm (cont.)
Let Xm denote a dichotomy of m patterns, i.e., a set of m patterns (or feature

vectors) such that each pattern is assigned to exactly one of two classes),
Xm D f.x1 ; `1 /; : : : .xm ; `m /g;

where for i D 1; : : : ; m,
xi 2 Rn represents the i-th feature vector, and
ì 2 f 1; C1g the corresponding class label.
CS 295 (UVM)
Fall 2013
3 / 21

Assume that Xm represents a linearly separable dichotomy, that is there exists a
weight vector w 2 Rn and w0 2 R such that, for i D 1; : : : ; m,
sgn.wT xi C w0 / D ì :
If this is the case, how can we find a w and w0 that satisfies the above?
CS 295 (UVM)
Fall 2013
4 / 21

The Perceptron algorithm (Rosenblatt):
float w[n] = <n random floats>;

float w0 = <a random float>;
bool errorDetected = true;
while(errorDetected) f
errorDetected = false;
for(int i D 1; i m; i ++) f
if (sgn.wT xi C w0 / ì ) f
errorDetected = true;
w += ì xi ;
w0 += ì ;
g
g
g
return fw, w0 g;
CS 295 (UVM)
Fall 2013
5 / 21

The fixed-increment Perceptron algorithm (with arbitrary increment > 0);
float w[n] = <n random floats>;

float w0 = <a random float>;
float = <positive_number>;
if (sgn.wT xi C w0 / ì ) f
w += ì xi ;
w0 += ì ;
g
g
g
return fw, w0 g;
CS 295 (UVM)
Fall 2013
6 / 21
Perceptron Convergence Theorem

If Xm D f.x1 ; `1 /; : : : ; .xm ; `m /g describes a linearly separable dichotomy, then the
fixed-increment perceptron algorithm terminates after a finite number of weight
updates.
A geometric picture of the perceptron algorithm emerges after transforming into
augmented (or homogeneous coordinates). Let
0 1

b
xD
1
x
1
B x1 C
B C
D B : C 2 RnC1 I
@ :: A

and let,
b
wD
w0
w
w0
B w1 C
B C
D B : C 2 RnC1 :
@ :: A
xn
wn
In the following, hatted vectors will always represent augmented vectors.

Similarly, let
n
o

b m D .b
X
xi ; ì / D .1; xTi /T ; ì i D 1; 2; : : : ; m
CS 295 (UVM)
Fall 2013
7 / 21
Perceptron Convergence Theorem (cont.)
Consequently, the expression used by the LTU, simplifies to just an inner product, as
wT x C w0 D b
wT b
x D kb
wkkb
xk cos ;
where represents the angle between b
w and b
x, measured in their common plane.
Note that the cosine of determines the sign of the inner product.
CS 295 (UVM)
Fall 2013
8 / 21
Perceptron Algorithm (homogeneous coordinates)
The fixed-increment Perceptron algorithm (with arbitrary increment > 0);
float b
w[n C 1] = <.n C 1/ random floats>;
if (sgn.b
wT b
xi / ì ) f
b
w += ìb
xi ;
ggg
return b
w;
CS 295 (UVM)
Fall 2013
9 / 21
Normalized Coordinates
For i D 1; 2; : : : ; m, let
b
x0i D ìb
xi :
Then, the fixed-increment perceptron algorithm becomes
float b
w[n C 1] = <.n C 1/ random floats>;
if (b
wT b
x0i < 0) f
b
w += b
x0i ;
ggg
return b
w;
CS 295 (UVM)
Fall 2013
10 / 21
Perceptron Convergence Theorem: A Proof

Given a linearly separable dichotomy (in normalized, homogeneous form),
o
n
def
bm
b0 D
b
xi .b
xi ; ì / 2 X
x0i D ìb
X
m
let b
w? 2 RnC1 denote a homogeneous weight vector that satisfies the given
dichotomy, i.e., b
w?Tb
x0i > 0, for i D 1; 2; : : : ; m.
Let b
w.k / 2 RnC1 denote the value of the perceptrons homogeneous weight vector
after the k -th update.
b 0 denote the normalized, homogeneous feature vector that triggered

Let b
x0 .k / 2 X
m
the k -th update. Thus,
b
w.1/ D b
w.0/
C b
x0 .1/
b
w.2/ D b
w.1/
C b
x0 .2/
::
:
b
w.k / D b
w.k 1/ C b
x0 .k /
CS 295 (UVM)
Fall 2013
11 / 21
Perceptron Convergence Theorem: A Proof (cont.)
2
kw(k)w(0)
k
Bk
Ak2
k
b 0 , and parameter , we will show that there exists constants
For a given dichotomy X
m
A and B such that

2
w.k / b
w.0/ Bk :
Ak 2 b
Thus the network must converge after no more than kmax D B =A updates.
CS 295 (UVM)
Fall 2013
12 / 21
Peceptron Convergence Theorem: A Proof (cont.)

Lemma (Cauchy-Schwartz Inequality): Let a; b 2 Rn , then
kak2 kbk2 jaT bj2 ;

with equality, if and only if there exists a c 2 R, such that ca D b.
Proof:
Let,
def
.t / D kta C bk2 D kak2 t 2 C 2aT bt C kbk2 :
Note that .t / is a non-negative quadratic, of the form .t / D At 2 C Bt C C 0.
Thus, the discriminant satisfies,
B2
4AC 0;
with equality if and only if there exists a value of t 2 R for which .t / D 0. (Letting
c D t implies that ca D b.) More generally,
4AC B 2 ) 4kak2 kbk2 4jaT bj2
) kak2 kbk2 jaT bj2

CS 295 (UVM)
Fall 2013
13 / 21
Convergence Proof: Lower Bound

Given a linearly separable dichotomy (in normalized, homogeneous form),
o
n
def
bm
b0 D
b
xi .b
xi ; ì / 2 X
x0i D ìb
X
m
let b
w? 2 RnC1 denote a homogeneous weight vector that satisfies the given
dichotomy, i.e., b
w?Tb
x0i > 0, for i D 1; 2; : : : ; m.
Let b
w.k / 2 RnC1 denote the value of the perceptrons homogeneous weight vector
after the k -th update.
b 0 denote the normalized, homogeneous feature vector that triggered

Let b
x0 .k / 2 X
m
the k -th update. Thus,
b
w.1/ D b
w.0/
C b
x0 .1/
b
w.2/ D b
w.1/
C b
x0 .2/
::
:
b
w.k / D b
w.k 1/ C b
x0 .k /
CS 295 (UVM)
Fall 2013
14 / 21
Convergence Proof: Lower Bound (cont.)

b
w.1/ D b
w.0/
C b
x0 .1/
b
w.2/ D b
w.1/
C b
x0 .2/
::
:
b
w.k / D b
w.k 1/ C b
x0 .k /
Adding the above k equations, yields,

b
w.k / D b
w.0/ C b
x0 .1/ C b
x0 .2/ C C b
x0 .k / :
Subtracting b
w.0/ from both sides, and taking the inner-product with respect to the
hypothetical solution b
w? yields,
b
w?T .b
w.k /
CS 295 (UVM)

b
w.0// D b
w?T b
x0 .1/ C b
x0 .2/ C C b
x0 .k / :
Fall 2013
15 / 21
Convergence Proof: Lower Bound (cont.)
b
w?T .b
w.k /

b
w.0// D b
w?T b
x0 .1/ C b
x0 .2/ C C b
x0 .k / :
Now define,
a D min b
w?Tb
x0 > 0:
b
x0 2b
X 0m
Thus,
b
w?T .b
w.k /
b
w.0/ ak > 0:
Squaring both sides, with the Cauchy-Schwartz inequality, yields
kb
w? k2 kb
w.k /
b
w.0/k2 jb
w?T .b
w.k /
b
w.0//j2 .ak /2 :
Thus,
kb
w.k /
CS 295 (UVM)
b
w.0/k2
a
kb
w? k
2
k 2:
Fall 2013
16 / 21
Convergence Proof: Upper Bound

To construct an upper bound of the growth of kb
w.k / b
w.0/k2 , we begin with the
sequence of weight values generated by the Perceptron algorithm:
b
w.1/ D b
w.0/
C b
x0 .1/
b
w.2/ D b
w.1/
C b
x0 .2/
::
:
b
w.k / D b
w.k 1/ C b
x0 .k /
Now subtract b
w.0/ from both sides,
CS 295 (UVM)
b
w.1/
b
w.0/ D b
x0 .1/
b
w.2/
b
w.0/ D b
w.1/
::
:
b
w.k /
b
w.0/ D b
w.k

b
w.0/ C b
x0 .2/
1/

b
w.0/ C b
x0 .k /
Fall 2013
17 / 21
Convergence Proof: Upper Bound (cont.)

Squaring both sides yields,

b
w.1/

b
w.2/

b
w.k /
2
0 2
b
w.0/ D 2 b
x .1/
2
2
0 2
T 0
b
w.0/ D b
w.1/ b
w.0/ C 2 b
w.1/ b
w.0/ b
x .2/ C 2 b
x .2/
::
:
2
2
0 2
T 0
b
w.0/ D b
w.k 1/ b
w.0/ C 2 b
w.k 1/ b
w.0/ b
x .k / C 2 b
x .k /
Note that since b

x0 .1/ triggers the first weight update, it must have been misclassified
by the weight vector b
w.0/. Thus b
w.0/Tb
x0 .1/ < 0.
Similarly,
b
w.j 1/Tb
x0 .j / < 0 for j D 1; 2; : : : k :
CS 295 (UVM)
Fall 2013
18 / 21

Thus,

b
w.1/

b
w.2/

b
w.k /
2
0 2
b
w.0/ D 2 b
x .1/
2
2
0 2
b
w.0/ b
w.1/ b
w.0/
2b
w.0/Tb
x0 .2/ C 2 b
x .2/
::
:
2
2
0 2
b
w.0/ b
w.k 1/ b
w.0/
2b
w.0/Tb
x0 .k / C 2 b
x .k /
Summing the k inequalities above, yields

b
w.k /

2
2 0 2
0 2
b
w.0/ 2 b
x0 .1/ C b
x .2/ C C b
x .k /
2b
w.0/T b
x0 .2/ C C b
x0 .k /
CS 295 (UVM)
Fall 2013
19 / 21

Now define
2
x0 ;
M D max b
b
x0 2b
X 0m
and
D 2 min b
w.0/Tb
x0 :
b
x0 2b
X 0m
(Note that < 0, unless b
w.0/ solves the dichotomy.)
Whence,

b
w.k /

2
2 0 2
0 2
b
w.0/ 2 b
x0 .1/ C b
x .2/ C C b
x .k /
2b
w.0/T b
x0 .2/ C C b
x0 .k /
becomes

b
w.k /
CS 295 (UVM)
2
b
w.0/ .2 M
/k :
Fall 2013
20 / 21
Convergence Proof: Summary
Thus we have shown
Ak 2 b
w.k /
with

AD
a
kb
w? k
2
;
Thus,
kmax D
CS 295 (UVM)
2
b
w.0/ Bk :
and,
B D .M
/:
M ? 2
kb
w k :
a 2
Fall 2013
21 / 21

Perceptron

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Perceptron

Uploaded by

Copyright:

Available Formats

The Perceptron Convergence Theorem

The Perceptron Convergence Theorem

The Peceptron Algorithm

Letting x D .x1 ; x2 ; : : : ; xn /T 2 Rn and w D .w1 ; w2 ; : : : ; wn /T 2 Rn , we have

The Perceptron Convergence Theorem

Perceptron Algorithm (cont.)

Let Xm denote a dichotomy of m patterns, i.e., a set of m patterns (or feature

Xm D f.x1 ; `1 /; : : : .xm ; `m /g;

`i 2 f 1; C1g the corresponding class label.

The Perceptron Convergence Theorem

Perceptron Algorithm (cont.)

The Perceptron Convergence Theorem

Perceptron Algorithm (cont.)

float w[n] = <n random floats>;

The Perceptron Convergence Theorem

Perceptron Algorithm (cont.)

float w[n] = <n random floats>;

The Perceptron Convergence Theorem

Perceptron Convergence Theorem

In the following, hatted vectors will always represent augmented vectors.

The Perceptron Convergence Theorem

Perceptron Convergence Theorem (cont.)

The Perceptron Convergence Theorem

Perceptron Algorithm (homogeneous coordinates)

The fixed-increment Perceptron algorithm (with arbitrary increment  > 0);

The Perceptron Convergence Theorem

Then, the fixed-increment perceptron algorithm becomes

The Perceptron Convergence Theorem

Perceptron Convergence Theorem: A Proof

b 0 denote the normalized, homogeneous feature vector that triggered

The Perceptron Convergence Theorem

Perceptron Convergence Theorem: A Proof (cont.)

The Perceptron Convergence Theorem

Peceptron Convergence Theorem: A Proof (cont.)

kak2 kbk2  jaT bj2 ;

) kak2 kbk2  jaT bj2

The Perceptron Convergence Theorem

Convergence Proof: Lower Bound

b 0 denote the normalized, homogeneous feature vector that triggered

The Perceptron Convergence Theorem

Convergence Proof: Lower Bound (cont.)

The Perceptron Convergence Theorem

Convergence Proof: Lower Bound (cont.)

Squaring both sides, with the Cauchy-Schwartz inequality, yields

The Perceptron Convergence Theorem

Convergence Proof: Upper Bound

The Perceptron Convergence Theorem

Convergence Proof: Upper Bound (cont.)

Note that since b

The Perceptron Convergence Theorem

Convergence Proof: Upper Bound (cont.)

Summing the k inequalities above, yields

The Perceptron Convergence Theorem

Convergence Proof: Upper Bound (cont.)

The Perceptron Convergence Theorem

Convergence Proof: Summary

Thus we have shown

The Perceptron Convergence Theorem

You might also like

The fixed-increment Perceptron algorithm (with arbitrary increment > 0);

kak2 kbk2 jaT bj2 ;

) kak2 kbk2 jaT bj2