You are on page 1of 17

Pattern

Classification

All materials in these slides were taken from


Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John Wiley
& Sons, 2000
with the permission of the authors and the
publisher
Chapter 2 (part 3)
Bayesian Decision Theory
(Sections 2-6,2-9)

• Discriminant Functions for the Normal Density

• Bayes Decision Theory – Discrete Features


3
Discriminant Functions for the
Normal Density
• We saw that the minimum error-rate
classification can be achieved by the
discriminant function

gi(x) = ln P(x | i) + ln P(i)

• Case of multivariate normal


1
1 d 1
gi ( x )   ( x   i )  ( x   i )  ln 2  ln  i  ln P (  i )
t

2 i 2 2

Pattern Classification, Chapter 2 (Part 3)


4

• Case  =  .I
i
2 (I stands for the identity matrix)

g i ( x )  w it x  w i 0 (linear discrimina nt function)


where :
i 1
wi  2 ; wi 0    i  i  ln P (  i )
t

 2 2

(  i 0 is called the threshold for the ith category! )

Pattern Classification, Chapter 2 (Part 3)


5

• A classifier that uses linear discriminant functions


is called “a linear machine”

• The decision surfaces for a linear machine are


pieces of hyperplanes defined by:

gi(x) = gj(x)

Pattern Classification, Chapter 2 (Part 3)


6

Pattern Classification, Chapter 2 (Part 3)


7

• The hyperplane separating R and R i j

1 2 P( i )
x0  (  i   j )  ln ( i   j )
2 i   j
2
P(  j )

always orthogonal to the line linking the means!

1
if P (  i )  P (  j ) then x0  (  i   j )
2

Pattern Classification, Chapter 2 (Part 3)


8

Pattern Classification, Chapter 2 (Part 3)


9

Pattern Classification, Chapter 2 (Part 3)


10

• Case  =  (covariance of all classes are


i
identical but arbitrary!)

• Hyperplane separating R and R i j

1
x0  (  i   j ) 
 
ln P (  i ) / P (  j )
.(  i   j )
2 ( i   j )  ( i   j )
t 1

(the hyperplane separating Ri and Rj is generally


not orthogonal to the line between the means!)

Pattern Classification, Chapter 2 (Part 3)


11

Pattern Classification, Chapter 2 (Part 3)


12

Pattern Classification, Chapter 2 (Part 3)


13

• Case  = arbitrary
i

• The covariance matrices are different for each category

g i ( x )  x tWi x  w it x  w i 0
where :
1 1
Wi    i
2
w i   i 1  i
1 t 1 1
wi0    i  i  i  ln  i  ln P (  i )
2 2

(Hyperquadrics which are: hyperplanes, pairs of hyperplanes,


hyperspheres, hyperellipsoids, hyperparaboloids,
hyperhyperboloids)

Pattern Classification, Chapter 2 (Part 3)


14

Pattern Classification, Chapter 2 (Part 3)


15

Pattern Classification, Chapter 2 (Part 3)


16

Bayes Decision Theory – Discrete


Features

• Components of x are binary or integer valued, x can take


only one of m discrete values
v1, v2, …, vm

• Case of independent binary features in 2 category


problem
Let x = [x1, x2, …, xd ]t where each xi is either 0 or 1, with
probabilities:

pi = P(xi = 1 | 1)
qi = P(xi = 1 | 2)

Pattern Classification, Chapter 2 (Part 3)


17

• The discriminant function in this case is:


d
g ( x )   w i x i  w0
i 1

where :
pi ( 1  q i )
w i  ln i  1 ,..., d
q i ( 1  pi )
and :
1  pi
d
P( 1 )
w0   ln  ln
i 1 1  qi P(  2 )
decide  1 if g(x)  0 and  2 if g(x)  0
Pattern Classification, Chapter 2 (Part 3)

You might also like