Professional Documents
Culture Documents
Pattern Recognition
Instructor:
Dr. Md. Monirul Islam
Linear Classifier
2
Recall from Lecture 1
x2
x1w1+x2w2+w0=0
x1
Recall from Lecture 1
x2
wtx + w0 =0
x = [x1, x2]
w = [w1, w2]
x1
Recall from Lecture 1
x2
w.x + w0 =0
wtx + w0 =0
x = [x1, x2]
w = [w1, w2]
x1
Linear discriminant functions and
decisions surfaces
• Definition
A discriminant function :
g(x)=x1w1+x2w2+x3w3+…
OR
g(x) = wtx + w0 (1)
where w is the weight vector and w0 the bias
6
Linear discriminant functions and
decisions surfaces
• Classify a new pattern x as follows
Decide class 1 if g(x) > 0
and class 2 if g(x) < 0
If g(x) = 0 x is assigned to either class
7
Linear discriminant functions and
decisions surfaces
8
– The equation g(x) = 0 is the decision surface that
separates patterns
9
10
11
A little bit mathematics
• The Problem: Consider a two class task with ω1, ω2
– the classifier is
g ( x) w1 x1 w2 x2 ... wl xl w0 0
T
w x w0 0
x2
T
w xw0 0
x1
12
A little bit mathematics
x2
x2
x1
x1
13
A little bit mathematics
x2
x1
x1
14
A little bit mathematics
x2
x1 – x2
x1
x1
15
A little bit mathematics
x2
x1 – x2
x1
x1
16
A little bit mathematics
g ( x) w1 x1 w2 x2 w0 0
x2
w
x
w0
g ( x)
w2 z z
w0 w12 w22
d d
w12 w22 w0
w1
x1
17
Learn the Classifier
x2
ω1
ω2
x1
18
Learn the Classifier
w* : w* T x 0 x 1
w* T x 0 x 2
x2
ω1
w *T x 0
ω2
x1
19
Learn the Classifier
*T
– The case w x w*0
falls under the above formulation, since
w* x
• w' * , x'
w0 1
21
– The Cost Function
J ( w) ( x w x)
T
xY
• 1 if x Y and x
x 1
x 1 if x Y and x 2
22
– The Cost Function
J ( w) ( x w x)
T
xY
J ( w) 0
otherwise
J ( w) 0
23
• J(w) is piecewise linear (WHY?)
– The Algorithm
• The philosophy of the gradient descent is
adopted.
24
w
w(new) w(old) w
J ( w)
w w w(old)
w
w
w(new) w(old) w
J ( w)
w w w(old)
w
• Wherever valid
J ( w)
( x w x ) x x
T
w w xY xY
26
w
w(new) w(old) w
J ( w)
w w w(old)
w
• Wherever valid
J ( w)
( x w x ) x x
T
w w xY xY
• w(t 1) w(t ) t x x
xY
27
w
w(new) w(old) w
J ( w)
w w w(old)
w
• Wherever valid
J ( w)
( x w x ) x x
T
w w xY xY
• w(t 1) w(t ) t x x
xY
28
This is the celebrated Perceptron Algorithm
– An example:
w(t 1) w(t ) t x x
w(t ) t x ( x 1)
29
– An example:
w(t 1) w(t ) t x x
w(t ) t x ( x 1)
30
– An example:
w(t 1) w(t ) t x x
w(t ) t x ( x 1)
31
– Example: At some stage t the perceptron algorithm
results in
w1 1, w2 1, w0 0.5
x1 x2 0.5 0
– Example: At some stage t the perceptron algorithm
results in
w1 1, w2 1, w0 0.5
x1 x2 0.5 0
The corresponding hyperplane is
+
+ + + ρ=0.7
+ +
+
33
– Example: At some stage t the perceptron algorithm
results in
w1 1, w2 1, w0 0.5
x1 x2 0.5 0
The corresponding hyperplane is
+
+ + + ρ=0.7
+ +
+
+
+ + + ρ=0.7
+ +
+
w(t 1) w(t ) t x x
xY
37
Variants of Perceptron Algorithm (1)
T
w (t ) x ( t ) 0
w (t 1) w (t ) x ( t ) ,
x ( t ) 1
T
w (t ) x ( t ) 0
w (t 1) w (t ) x ( t ) ,
x (t ) 2
w (t 1) w (t ) otherwise
38
Variants of Perceptron Algorithm (1)
T
w (t ) x ( t ) 0
w (t 1) w (t ) x ( t ) ,
update
x ( t ) 1
T
w (t ) x ( t ) 0
w (t 1) w (t ) x ( t ) ,
x (t ) 2
w (t 1) w (t ) otherwise No Update
39
Variants of Perceptron Algorithm (1)
T
w (t ) x ( t ) 0
update
w (t 1) w (t ) x ( t ) ,
x ( t ) 1
T
w (t ) x ( t ) 0
w (t 1) w (t ) x ( t ) ,
x (t ) 2
w (t 1) w (t ) otherwise No Update
40
Variants of Perceptron Algorithm (2)
41
Variants of Perceptron Algorithm (2)
– It is pocket algorithm
42
Generalization of
Perceptron Algorithm
for M- Class case
43