You are on page 1of 21

The Perceptron Convergence Theorem

Robert Snapp
snapp@cs.uvm.edu
Department of Computer Science
University of Vermont

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

1 / 21

The Peceptron Algorithm


Consider a linear threshold unit (LTU) with n inputs x1 ; : : : ; xn 2 R and n C 1 weights
w0 ; w1 ; : : : ; wn .
x1
w1
x2
w2

x3

w3

xn

wn

w0

Letting x D .x1 ; x2 ; : : : ; xn /T 2 Rn and w D .w1 ; w2 ; : : : ; wn /T 2 Rn , we have

(
T

y D sgn.w x C w0 / D

CS 295 (UVM)

1; if wT x C w0 > 0;
1; if wT x C w0  0:

The Perceptron Convergence Theorem

Fall 2013

2 / 21

Perceptron Algorithm (cont.)

Let Xm denote a dichotomy of m patterns, i.e., a set of m patterns (or feature


vectors) such that each pattern is assigned to exactly one of two classes),

Xm D f.x1 ; `1 /; : : : .xm ; `m /g;


where for i D 1; : : : ; m,
xi 2 Rn represents the i-th feature vector, and

`i 2 f 1; C1g the corresponding class label.

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

3 / 21

Perceptron Algorithm (cont.)


Assume that Xm represents a linearly separable dichotomy, that is there exists a
weight vector w 2 Rn and w0 2 R such that, for i D 1; : : : ; m,

sgn.wT xi C w0 / D `i :
If this is the case, how can we find a w and w0 that satisfies the above?

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

4 / 21

Perceptron Algorithm (cont.)


The Perceptron algorithm (Rosenblatt):

float w[n] = <n random floats>;


float w0 = <a random float>;
bool errorDetected = true;
while(errorDetected) f
errorDetected = false;
for(int i D 1; i  m; i ++) f
if (sgn.wT xi C w0 / `i ) f
errorDetected = true;
w += `i xi ;
w0 += `i ;
g
g
g
return fw, w0 g;

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

5 / 21

Perceptron Algorithm (cont.)


The fixed-increment Perceptron algorithm (with arbitrary increment  > 0);

float w[n] = <n random floats>;


float w0 = <a random float>;
bool errorDetected = true;
float  = <positive_number>;
while(errorDetected) f
errorDetected = false;
for(int i D 1; i  m; i ++) f
if (sgn.wT xi C w0 / `i ) f
errorDetected = true;
w += `i xi ;
w0 += `i ;
g
g
g
return fw, w0 g;

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

6 / 21

Perceptron Convergence Theorem


If Xm D f.x1 ; `1 /; : : : ; .xm ; `m /g describes a linearly separable dichotomy, then the
fixed-increment perceptron algorithm terminates after a finite number of weight
updates.
A geometric picture of the perceptron algorithm emerges after transforming into
augmented (or homogeneous coordinates). Let

0 1
 
b
xD

1
x

1
B x1 C
B C

D B : C 2 RnC1 I
@ :: A


and let,

b
wD

w0
w

w0
B w1 C
B C

D B : C 2 RnC1 :
@ :: A

xn

wn

In the following, hatted vectors will always represent augmented vectors.


Similarly, let

n
o


b m D .b
X
xi ; `i / D .1; xTi /T ; `i i D 1; 2; : : : ; m

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

7 / 21

Perceptron Convergence Theorem (cont.)

Consequently, the expression used by the LTU, simplifies to just an inner product, as
wT x C w0 D b
wT b
x D kb
wkkb
xk cos  ;
where  represents the angle between b
w and b
x, measured in their common plane.
Note that the cosine of  determines the sign of the inner product.

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

8 / 21

Perceptron Algorithm (homogeneous coordinates)

The fixed-increment Perceptron algorithm (with arbitrary increment  > 0);

float b
w[n C 1] = <.n C 1/ random floats>;
bool errorDetected = true;
float  = <positive_number>;
while(errorDetected) f
errorDetected = false;
for(int i D 1; i  m; i ++) f
if (sgn.b
wT b
xi / `i ) f
errorDetected = true;
b
w += `ib
xi ;
ggg
return b
w;

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

9 / 21

Normalized Coordinates
For i D 1; 2; : : : ; m, let

b
x0i D `ib
xi :

Then, the fixed-increment perceptron algorithm becomes

float b
w[n C 1] = <.n C 1/ random floats>;
bool errorDetected = true;
float  = <positive_number>;
while(errorDetected) f
errorDetected = false;
for(int i D 1; i  m; i ++) f
if (b
wT b
x0i < 0) f
errorDetected = true;
b
w += b
x0i ;
ggg
return b
w;

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

10 / 21

Perceptron Convergence Theorem: A Proof


Given a linearly separable dichotomy (in normalized, homogeneous form),

o
n

def
bm
b0 D
b
xi .b
xi ; `i / 2 X
x0i D `ib
X
m
let b
w? 2 RnC1 denote a homogeneous weight vector that satisfies the given
dichotomy, i.e., b
w?Tb
x0i > 0, for i D 1; 2; : : : ; m.
Let b
w.k / 2 RnC1 denote the value of the perceptrons homogeneous weight vector
after the k -th update.

b 0 denote the normalized, homogeneous feature vector that triggered


Let b
x0 .k / 2 X
m
the k -th update. Thus,
b
w.1/ D b
w.0/
C b
x0 .1/
b
w.2/ D b
w.1/
C b
x0 .2/
::
:
b
w.k / D b
w.k 1/ C b
x0 .k /

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

11 / 21

Perceptron Convergence Theorem: A Proof (cont.)

2
kw(k)w(0)
k

Bk
Ak2

k
b 0 , and parameter , we will show that there exists constants
For a given dichotomy X
m
A and B such that

2
w.k / b
w.0/  Bk :
Ak 2  b
Thus the network must converge after no more than kmax D B =A updates.

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

12 / 21

Peceptron Convergence Theorem: A Proof (cont.)


Lemma (Cauchy-Schwartz Inequality): Let a; b 2 Rn , then

kak2 kbk2  jaT bj2 ;


with equality, if and only if there exists a c 2 R, such that ca D b.
Proof:
Let,
def
.t / D kta C bk2 D kak2 t 2 C 2aT bt C kbk2 :
Note that .t / is a non-negative quadratic, of the form .t / D At 2 C Bt C C  0.
Thus, the discriminant satisfies,
B2

4AC  0;

with equality if and only if there exists a value of t 2 R for which .t / D 0. (Letting
c D t implies that ca D b.) More generally,
4AC  B 2 ) 4kak2 kbk2  4jaT bj2

) kak2 kbk2  jaT bj2


CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

13 / 21

Convergence Proof: Lower Bound


Given a linearly separable dichotomy (in normalized, homogeneous form),

o
n

def
bm
b0 D
b
xi .b
xi ; `i / 2 X
x0i D `ib
X
m
let b
w? 2 RnC1 denote a homogeneous weight vector that satisfies the given
dichotomy, i.e., b
w?Tb
x0i > 0, for i D 1; 2; : : : ; m.
Let b
w.k / 2 RnC1 denote the value of the perceptrons homogeneous weight vector
after the k -th update.

b 0 denote the normalized, homogeneous feature vector that triggered


Let b
x0 .k / 2 X
m
the k -th update. Thus,
b
w.1/ D b
w.0/
C b
x0 .1/
b
w.2/ D b
w.1/
C b
x0 .2/
::
:
b
w.k / D b
w.k 1/ C b
x0 .k /

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

14 / 21

Convergence Proof: Lower Bound (cont.)


b
w.1/ D b
w.0/
C b
x0 .1/
b
w.2/ D b
w.1/
C b
x0 .2/
::
:
b
w.k / D b
w.k 1/ C b
x0 .k /
Adding the above k equations, yields,


b
w.k / D b
w.0/ C  b
x0 .1/ C b
x0 .2/ C    C b
x0 .k / :
Subtracting b
w.0/ from both sides, and taking the inner-product with respect to the
hypothetical solution b
w? yields,

b
w?T .b
w.k /

CS 295 (UVM)


b
w.0// D b
w?T b
x0 .1/ C b
x0 .2/ C    C b
x0 .k / :

The Perceptron Convergence Theorem

Fall 2013

15 / 21

Convergence Proof: Lower Bound (cont.)

b
w?T .b
w.k /


b
w.0// D b
w?T b
x0 .1/ C b
x0 .2/ C    C b
x0 .k / :

Now define,
a D min b
w?Tb
x0 > 0:

b
x0 2b
X 0m

Thus,

b
w?T .b
w.k /

b
w.0/  ak > 0:

Squaring both sides, with the Cauchy-Schwartz inequality, yields

kb
w? k2 kb
w.k /

b
w.0/k2  jb
w?T .b
w.k /

b
w.0//j2  .ak /2 :

Thus,

kb
w.k /

CS 295 (UVM)

b
w.0/k2 

a
kb
w? k

The Perceptron Convergence Theorem

2

k 2:

Fall 2013

16 / 21

Convergence Proof: Upper Bound


To construct an upper bound of the growth of kb
w.k / b
w.0/k2 , we begin with the
sequence of weight values generated by the Perceptron algorithm:

b
w.1/ D b
w.0/
C b
x0 .1/
b
w.2/ D b
w.1/
C b
x0 .2/
::
:
b
w.k / D b
w.k 1/ C b
x0 .k /
Now subtract b
w.0/ from both sides,

CS 295 (UVM)

b
w.1/

b
w.0/ D b
x0 .1/

b
w.2/

b
w.0/ D b
w.1/
::
:

b
w.k /

b
w.0/ D b
w.k


b
w.0/ C b
x0 .2/

1/


b
w.0/ C b
x0 .k /

The Perceptron Convergence Theorem

Fall 2013

17 / 21

Convergence Proof: Upper Bound (cont.)


Squaring both sides yields,


b
w.1/

b
w.2/


b
w.k /

2
0 2
b
w.0/ D 2 b
x .1/
2
2
0 2
T 0
b
w.0/ D b
w.1/ b
w.0/ C 2 b
w.1/ b
w.0/ b
x .2/ C 2 b
x .2/
::
:
2
2
0 2
T 0
b
w.0/ D b
w.k 1/ b
w.0/ C 2 b
w.k 1/ b
w.0/ b
x .k / C 2 b
x .k /

Note that since b


x0 .1/ triggers the first weight update, it must have been misclassified
by the weight vector b
w.0/. Thus b
w.0/Tb
x0 .1/ < 0.
Similarly,
b
w.j 1/Tb
x0 .j / < 0 for j D 1; 2; : : : k :

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

18 / 21

Convergence Proof: Upper Bound (cont.)


Thus,


b
w.1/

b
w.2/


b
w.k /

2
0 2
b
w.0/ D 2 b
x .1/
2
2
0 2
b
w.0/  b
w.1/ b
w.0/
2b
w.0/Tb
x0 .2/ C 2 b
x .2/
::
:
2
2
0 2
b
w.0/  b
w.k 1/ b
w.0/
2b
w.0/Tb
x0 .k / C 2 b
x .k /

Summing the k inequalities above, yields


b
w.k /


2
2 0 2
0 2 
b
w.0/  2 b
x0 .1/ C b
x .2/ C    C b
x .k /
2b
w.0/T b
x0 .2/ C    C b
x0 .k /

CS 295 (UVM)

The Perceptron Convergence Theorem

Fall 2013

19 / 21

Convergence Proof: Upper Bound (cont.)


Now define

2
x0 ;
M D max b

b
x0 2b
X 0m

and

 D 2 min b
w.0/Tb
x0 :
b
x0 2b
X 0m
(Note that  < 0, unless b
w.0/ solves the dichotomy.)
Whence,


b
w.k /


2
2 0 2
0 2 
b
w.0/  2 b
x0 .1/ C b
x .2/ C    C b
x .k /
2b
w.0/T b
x0 .2/ C    C b
x0 .k /

becomes


b
w.k /

CS 295 (UVM)

2
b
w.0/  .2 M

The Perceptron Convergence Theorem

/k :

Fall 2013

20 / 21

Convergence Proof: Summary

Thus we have shown

Ak 2  b
w.k /
with


AD

a
kb
w? k

2
;

Thus,
kmax D

CS 295 (UVM)

2
b
w.0/  Bk :

and,

B D .M

/:

M  ? 2
kb
w k :
a 2

The Perceptron Convergence Theorem

Fall 2013

21 / 21

You might also like