CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam

CSE 473
Pattern Recognition
Instructor:
Dr. Md. Monirul Islam
Linear Classifier
2
Recall from Lecture 1
x2
x1w1+x2w2+w0=0
x1
x2
wtx + w0 =0
x = [x1, x2]
w = [w1, w2]
x1
x2
w.x + w0 =0
wtx + w0 =0
x = [x1, x2]
w = [w1, w2]
x1
Linear discriminant functions and
decisions surfaces
• Definition
Let a pattern vector x = [x1, x2, x3, …,]

a weight vector w = [w1, w2, w3, …,]
A discriminant function :
g(x)=x1w1+x2w2+x3w3+…
OR
g(x) = wtx + w0 (1)
where w is the weight vector and w0 the bias
6
decisions surfaces
• Classify a new pattern x as follows
Decide class 1 if g(x) > 0
and class 2 if g(x) < 0
If g(x) = 0  x is assigned to either class
7
decisions surfaces
8
– The equation g(x) = 0 is the decision surface that
separates patterns
– When g(x) is linear, the decision surface is a

hyperplane
9
10
11
A little bit mathematics
• The Problem: Consider a two class task with ω1, ω2
– the classifier is
g ( x)  w1 x1  w2 x2  ...  wl xl  w0  0
T
 w x  w0  0
x2
T
w xw0 0
x1
12
– Assuming x1 and x2 are on decision hyper plane
x2
x2
x1
x1
13

T T
w x1  w0  0  w x 2  w0

T
w ( x1  x 2 )  0
x2
x2
x1
x1
14

T T
w x1  w0  0  w x 2  w0

T
w ( x1  x 2 )  0
x2 x1 - x2 is on decision
hyper plane
x2
x1 – x2
x1
x1
15

T T
w x1  w0  0  w x 2  w0

T
w ( x1  x 2 )  0  w  ( x1  x 2 )
x2
w is perpendicular on
w decision hyper plane
x2
x1 – x2
x1
x1
16
– For 2-d feature representation
g ( x)  w1 x1  w2 x2  w0  0
x2
w
x
w0
 g ( x)
w2 z z
w0 w12  w22
d d
w12  w22 w0

w1
x1
17
Learn the Classifier
– Assume linearly separable classes, i.e.,
x2
ω1
ω2
x1
18
 w* : w* T x  0  x  1
w* T x  0  x   2
x2
ω1
w *T x  0
ω2
x1
19

 w* : w* T x  0  x  1
w* T x  0  x   2
*T
– The case w x  w*0
falls under the above formulation, since
 w*   x
• w'   *  , x'   
 w0  1 
• w* T x  w*0  w' T x'  0

20
– Our goal: Compute a solution, i.e., a hyperplane w,
so that
T 1
w x (  )0 x 
2
• The steps
– Define a cost function to be minimized
– Choose an algorithm to minimize the cost function
– The minimum corresponds to a solution
21
– The Cost Function
J ( w)   ( x w x)
T
xY
• Where Y is the subset of the vectors wrongly classified

by w.
•   1 if x  Y and x  
x 1
 x  1 if x  Y and x   2
22
– The Cost Function
J ( w)   ( x w x)
T
xY
• Where Y is the subset of the vectors wrongly classified

by w.
• when Y=(empty set) a solution is achieved and
J ( w)  0
otherwise
J ( w)  0
23
• J(w) is piecewise linear (WHY?)
– The Algorithm
• The philosophy of the gradient descent is
adopted.
24
w
w(new)  w(old)   w
J ( w)
 w   w  w(old)
w
w
J ( w)
 w   w  w(old)
w
• Wherever valid
J ( w) 
(  x w x )    x x
T

w  w xY xY
26
w
J ( w)
 w   w  w(old)
w
• Wherever valid
J ( w) 
(  x w x )    x x
T

• w(t  1)  w(t )   t   x x
xY
27
w
J ( w)
 w   w  w(old)
w
• Wherever valid
J ( w) 
(  x w x )    x x
T

• w(t  1)  w(t )   t   x x
xY
28
This is the celebrated Perceptron Algorithm
– An example:
w(t  1)  w(t )   t x x
 w(t )   t x ( x  1)
29
– An example:
w(t  1)  w(t )   t x x
 w(t )   t x ( x  1)
30
– An example:
w(t  1)  w(t )   t x x
 w(t )   t x ( x  1)
– The perceptron algorithm converges in a finite

number of iteration steps to a solution if patterns are
linearly separable
31
– Example: At some stage t the perceptron algorithm
results in
w1  1, w2  1, w0  0.5
x1  x2  0.5  0
results in
w1  1, w2  1, w0  0.5
x1  x2  0.5  0
The corresponding hyperplane is
+
+ + + ρ=0.7
+ +
+
33
results in
w1  1, w2  1, w0  0.5
x1  x2  0.5  0
+
+ + + ρ=0.7
+ +
+
 1   0.4   0.2  1.42 

w(t  1)   1   0.7(1) 0.05  0.7(1)  0.75    0.51 
34
 0.5  1   1   0.5
results in
w1  1, w2  1, w0  0.5
x1  x2  0.5  0
+
+ + + ρ=0.7
+ +
+
 1   0.4   0.2  1.42 

w(t  1)   1   0.7(1) 0.05  0.7(1)  0.75    0.51 
35
 0.5  1   1   0.5
The Perceptron
wi ' s synapses or synaptic weights

w0 threshold
 This structure is called perceptron or neuron

 a learning machine that learns from the training
vectors
36
The Perceptron Algorithm
– Use training file to learn w
– Steps
– Initialize w
– Classify all training samples using current w
– Find new w using
w(t  1)  w(t )   t   x x
xY
– Repeat until w converges
37
Variants of Perceptron Algorithm (1)
T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,
x ( t )  1
T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,
x (t )   2
w (t  1)  w (t ) otherwise
38
T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,
update
x ( t )  1
T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,
x (t )   2
w (t  1)  w (t ) otherwise No Update
39
T
w (t ) x ( t )  0
update
w (t  1)  w (t )   x ( t ) ,
x ( t )  1
T
w (t ) x ( t )  0
w (t  1)  w (t )   x ( t ) ,
x (t )   2
w (t  1)  w (t ) otherwise No Update
– It is a reward and punishment type of algorithm
40
 initialize weight vector w(0)

 define pocket ws.and history hs
 generate next w(t+1). If it is better than w(t),
store w(t+1) in ws and change the hs
41
 initialize weight vector w(0)

 define pocket ws.and history hs
 generate next w(t+1). If it is better than w(t),
store w(t+1) in ws and change the hs
– It is pocket algorithm
42
Generalization of
Perceptron Algorithm
for M- Class case
43

CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam

Uploaded by

Copyright:

Available Formats

CSE 473

Let a pattern vector x = [x1, x2, x3, …,]

– When g(x) is linear, the decision surface is a

– Assuming x1 and x2 are on decision hyper plane

– Assuming x1 and x2 are on decision hyper plane

– Assuming x1 and x2 are on decision hyper plane

– Assuming x1 and x2 are on decision hyper plane

– For 2-d feature representation

– Assume linearly separable classes, i.e.,

– Assume linearly separable classes, i.e.,

– Assume linearly separable classes, i.e.,

• w* T x  w*0  w' T x'  0

• Where Y is the subset of the vectors wrongly classified

• Where Y is the subset of the vectors wrongly classified

– The perceptron algorithm converges in a finite

 1   0.4   0.2  1.42 

 1   0.4   0.2  1.42 

wi ' s synapses or synaptic weights

 This structure is called perceptron or neuron

– Repeat until w converges

– It is a reward and punishment type of algorithm

 initialize weight vector w(0)

 initialize weight vector w(0)

You might also like