You are on page 1of 43

CSE 473

Pattern Recognition

Instructor:
Dr. Md. Monirul Islam
Linear Classifier

2
Recall from Lecture 1

x2

x1w1+x2w2+w0=0

x1
Recall from Lecture 1

x2

wtx + w0 =0

x = [x1, x2]
w = [w1, w2]

x1
Recall from Lecture 1

x2

w.x + w0 =0
wtx + w0 =0

x = [x1, x2]
w = [w1, w2]

x1
Linear discriminant functions and
decisions surfaces
• Definition

Let a pattern vector x = [x1, x2, x3, …,]


a weight vector w = [w1, w2, w3, …,]

A discriminant function :
g(x)=x1w1+x2w2+x3w3+…
OR
g(x) = wtx + w0 (1)
where w is the weight vector and w0 the bias

6
Linear discriminant functions and
decisions surfaces
• Classify a new pattern x as follows
Decide class 1 if g(x) > 0
and class 2 if g(x) < 0
If g(x) = 0  x is assigned to either class

7
Linear discriminant functions and
decisions surfaces

8
– The equation g(x) = 0 is the decision surface that
separates patterns

– When g(x) is linear, the decision surface is a


hyperplane

9
10
11
A little bit mathematics
• The Problem: Consider a two class task with ω1, ω2

– the classifier is

g ( x)  w1 x1  w2 x2  ...  wl xl  w0  0
T
 w x  w0  0
x2

T
w xw0 0

x1
12
A little bit mathematics

– Assuming x1 and x2 are on decision hyper plane

x2

x2
x1
x1
13
A little bit mathematics

– Assuming x1 and x2 are on decision hyper plane


T T
w x1  w0  0  w x 2  w0

T
w ( x1  x 2 )  0
x2

x2
x1
x1
14
A little bit mathematics

– Assuming x1 and x2 are on decision hyper plane


T T
w x1  w0  0  w x 2  w0

T
w ( x1  x 2 )  0
x2 x1 - x2 is on decision
hyper plane

x2
x1 – x2
x1
x1
15
A little bit mathematics

– Assuming x1 and x2 are on decision hyper plane


T T
w x1  w0  0  w x 2  w0

T
w ( x1  x 2 )  0  w  ( x1  x 2 )
x2
w is perpendicular on
w decision hyper plane

x2
x1 – x2
x1
x1
16
A little bit mathematics

– For 2-d feature representation

g ( x)  w1 x1  w2 x2  w0  0

x2

w
x
w0
 g ( x)
w2 z z
w0 w12  w22
d d
w12  w22 w0

w1
x1
17
Learn the Classifier

– Assume linearly separable classes, i.e.,

x2

ω1
ω2

x1
18
Learn the Classifier

– Assume linearly separable classes, i.e.,

 w* : w* T x  0  x  1
w* T x  0  x   2
x2

ω1
w *T x  0
ω2

x1
19
Learn the Classifier

– Assume linearly separable classes, i.e.,


 w* : w* T x  0  x  1
w* T x  0  x   2

*T
– The case w x  w*0
falls under the above formulation, since

 w*   x
• w'   *  , x'   
 w0  1 

• w* T x  w*0  w' T x'  0


20
– Our goal: Compute a solution, i.e., a hyperplane w,
so that
T 1
w x (  )0 x 
2
• The steps
– Define a cost function to be minimized
– Choose an algorithm to minimize the cost function
– The minimum corresponds to a solution

21
– The Cost Function
J ( w)   ( x w x)
T

xY

• Where Y is the subset of the vectors wrongly classified


by w.

•   1 if x  Y and x  
x 1

 x  1 if x  Y and x   2

22
– The Cost Function
J ( w)   ( x w x)
T

xY

• Where Y is the subset of the vectors wrongly classified


by w.
• when Y=(empty set) a solution is achieved and

J ( w)  0
otherwise

J ( w)  0

23
• J(w) is piecewise linear (WHY?)

– The Algorithm
• The philosophy of the gradient descent is
adopted.

24
w
w(new)  w(old)   w
J ( w)
 w   w  w(old)
w
w
w(new)  w(old)   w
J ( w)
 w   w  w(old)
w

• Wherever valid
J ( w) 
(  x w x )    x x
T

w  w xY xY

26
w
w(new)  w(old)   w
J ( w)
 w   w  w(old)
w

• Wherever valid
J ( w) 
(  x w x )    x x
T

w  w xY xY

• w(t  1)  w(t )   t   x x
xY
27
w
w(new)  w(old)   w
J ( w)
 w   w  w(old)
w

• Wherever valid
J ( w) 
(  x w x )    x x
T

w  w xY xY

• w(t  1)  w(t )   t   x x
xY
28
This is the celebrated Perceptron Algorithm
– An example:

w(t  1)  w(t )   t x x
 w(t )   t x ( x  1)

29
– An example:

w(t  1)  w(t )   t x x
 w(t )   t x ( x  1)

30
– An example:

w(t  1)  w(t )   t x x
 w(t )   t x ( x  1)

– The perceptron algorithm converges in a finite


number of iteration steps to a solution if patterns are
linearly separable

31
– Example: At some stage t the perceptron algorithm
results in
w1  1, w2  1, w0  0.5
x1  x2  0.5  0
– Example: At some stage t the perceptron algorithm
results in
w1  1, w2  1, w0  0.5
x1  x2  0.5  0
The corresponding hyperplane is

+
+ + + ρ=0.7
+ +
+

33
– Example: At some stage t the perceptron algorithm
results in
w1  1, w2  1, w0  0.5
x1  x2  0.5  0
The corresponding hyperplane is

+
+ + + ρ=0.7
+ +
+

 1   0.4   0.2  1.42 


w(t  1)   1   0.7(1) 0.05  0.7(1)  0.75    0.51 
34
 0.5  1   1   0.5
– Example: At some stage t the perceptron algorithm
results in
w1  1, w2  1, w0  0.5
x1  x2  0.5  0
The corresponding hyperplane is

+
+ + + ρ=0.7
+ +
+

 1   0.4   0.2  1.42 


w(t  1)   1   0.7(1) 0.05  0.7(1)  0.75    0.51 
35
 0.5  1   1   0.5
The Perceptron

wi ' s synapses or synaptic weights


w0 threshold

 This structure is called perceptron or neuron


 a learning machine that learns from the training
vectors
36
The Perceptron Algorithm
– Use training file to learn w
– Steps
– Initialize w
– Classify all training samples using current w
– Find new w using

w(t  1)  w(t )   t   x x
xY

– Repeat until w converges

37
Variants of Perceptron Algorithm (1)

T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,
x ( t )  1

T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,
x (t )   2

w (t  1)  w (t ) otherwise

38
Variants of Perceptron Algorithm (1)
T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,

update
x ( t )  1

T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,
x (t )   2

w (t  1)  w (t ) otherwise No Update

39
Variants of Perceptron Algorithm (1)
T
w (t ) x ( t )  0

update
w (t  1)  w (t )   x ( t ) ,
x ( t )  1

T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,
x (t )   2

w (t  1)  w (t ) otherwise No Update

– It is a reward and punishment type of algorithm

40
Variants of Perceptron Algorithm (2)

 initialize weight vector w(0)


 define pocket ws.and history hs
 generate next w(t+1). If it is better than w(t),
store w(t+1) in ws and change the hs

41
Variants of Perceptron Algorithm (2)

 initialize weight vector w(0)


 define pocket ws.and history hs
 generate next w(t+1). If it is better than w(t),
store w(t+1) in ws and change the hs

– It is pocket algorithm

42
Generalization of
Perceptron Algorithm
for M- Class case

43

You might also like