Lect 13 - Bayes Decistion Theory - Derivation

Derivatives of 3 different discriminat functions
for multivariate Gaussian distribution
Compiled by Karthikeyan, CED16I015

Guided by
Dr Umarani Jayaraman
Department of Computer Science and Engineering

Indian Institute of Information Technology Design and Manufacturing
Kancheepuram
February 22, 2021
1 / 25
Introduction
We started with how this PDF actually influence the structure of the
decision surface, because
g i (X ) = lnP(X /ω i ) + lnP(ω i )
The purpose is to find gi (X) which is the maximum among all possible
discriminant function.
g i (X ) = lnP(X /ω i ) + lnP(ω i )
1 −1
P(X /ω i ) = exp[ (X − µi )t Σi -1 (X − µi )]
(2π)d/2 |Σi |1/2 2
−1 d 1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] − ln(2π) − ln(|Σi |) + lnP(ω i )
2 2 2
2 / 25
Normal Density and Discriminant function
−1 d 1
2 2 2
This is the discriminant function for the multivariate normal DF

This classifier can take care of linearly non separable classes.
When we take a decision boundary between two classes ω i and ω j ; the
decision surface is quadratic surface.
It is not a linear surface. However for specific cases, this can be
converted into a linear classifier.
Depending upon the co-variance matrix Σi we can have different
cases of discriminant function (i.e) Case 1, Case 2 and Case 3.
3 / 25
Case 1: Normal Density and Discriminant Function
Assumptions:
In every class, the samples are clustered in hyper spherical of same
shape and size
The covariance matrix is of σ 2 I
The Σi is same for all classes where i = 1, 2, .., c
Case 1: Σi = σ 2 I [I is Identity matrix]; given this,
Determinant of
|Σi | = σ 2d ⇒ d number of diagonal values

1
Inverse of Σi : Σi -1 = σ2
I
4 / 25
Case 1: Normal Density and Discriminant function
When covariance matrix is same for all different classes ∀ω i
−1 d 1
2 2 2
d
− ln(2π) ⇒ Constant or independent of classes
2
1
− ln(Σi ) ⇒ Remains same for all classes, hence ignored
2
5 / 25
Case 1: Normal Density and Discriminant functions
By substituting, Σi -1 = σ12 I
−1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] + lnP(ω i )
2
−1
= [(X − µi )t (X − µi )] + lnP(ω i )
2σ 2
−1
= 2 ||X − µi ||2 + lnP(ω i )
2σ
6 / 25
−1
g i (X ) = ||X − µi ||2 + lnP(ω i )
2σ 2
If P(ω i ) = P(ω j ) equal probability ∀ i,j = 1,2,...,c
−1
g i (X ) = ||X − µi ||2 ⇒ squared Euclidean distance
2σ 2
By taking negative; g i (X ) becomes maximum

Regardless of whether the prior probabilities are equal or not; It is not
actually necessary to compute distances.
7 / 25
Expansion of the quadratic form yields:

−1
g i (X ) = [(X − µi )t (X − µi )] + lnP(ω i )
2σ 2
−1 t
= [X X − X t µi − µti X + µti µi ] + lnP(ω i )
2σ 2
Xt X is constant and same for all i ⇒ gi (X )
Xt µi = µti X Next slide for explanation
−1
= [−µti X − µti X + µti µi ] + lnP(ω i )
2σ 2
−1
= 2 [−2µti X + µti µi ] + lnP(ω i )
2σ
8 / 25
   
2 2
X = 3 µ = 1
1 1
 
2
X µ = 2 3 1 1 = [22 + 3 + 1] = 8
t

1
 
2
µt X = 2 1 1 3 = [22 + 3 + 1] = 8

Hence Xt µi = µti X
9 / 25
−1
g i (X ) = [−2µi t X + µi t µi ] + lnP(ω i )
2σ 2
µi t X 1
= 2 − 2 µi t µi + lnP(ω i )
σ 2σ
µi
Wi = σ2
gi (X) = Wi t X + W i0 ⇒ linear equation or linear machine
1
Wi = µ
σ2 i
−1 t
Wi0 = µµ
2σ 2 i i
+ lnP(ωi )
10 / 25
discriminant function for individual class or i th class is given by
gi (X) = Wi t X + W i0
If we want to find out the decision boundary between two different

classes - ω i and ω j then let’s understand
What will be the nature of the decision boundary that separates the
two classes ω i and ω j ?
11 / 25
gi (X) = gj (X) is the decision boundary

gi (X) = W i t X + W i0
gj (X) = W j t X + W j0
g(X)= gi (X) - gj (X) = 0 is the equation of the decision boundary
g(X)= Wi t X + Wi0 - Wj t X - Wj0 = 0
g(X)=(Wi -Wj )t X + Wi0 - Wj0 = 0
12 / 25
Case 1:Normal Density and Discriminant functions
g (X ) = (W i − W j )t X + W i0 − W j0 = 0
1 t µ i t µi µj t µj
= (µ i − µ j ) X − + lnP(ω i ) + − lnP(ω j = 0
σ2 2σ 2 2σ 2
1 1
= (µi − µj )t X − 2 (µi t µi − µj t µj ) + lnP(ω i ) − lnP(ω j ) = 0
σ2 2σ
1 1 P(ω i )
= 2
(µi − µj )t X − 2 (µi t µi − µj t µj ) + ln =0
σ 2σ P(ω j )
Multiply by σ 2
1 P(ω i )
= (µi − µj )t X − [(µi − µj )t (µi + µj )] + σ 2 ln =0
2 P(ω j )
13 / 25
1 P(ω i )
= (µi − µj )t X − [(µi − µj )t (µi + µj )] + σ 2 ln =0
2 P(ω j )
Take (µi − µj )t out
1 σ2 P(ω i )
= (µi − µj )t [X − { (µi + µj ) − t
ln .(µi − µj )}]
2 (µi − µj ) (µi − µj ) P(ω j )
= W t [X − X 0 ] = 0
where
W = (µi − µj )
σ2 P(ω i )
X 0 = 12 (µi + µj ) − ln
||µi −µj ||2 P(ω j )
.(µi − µj )
14 / 25
Wt [X − X 0 ] = 0 ⇒ Decision boundary between ith and jth class
W = line joining µi and µj where µi and µj is vector

Since W t [X − X 0 ] = 0 , the decision surface is orthogonal to the
line joining µi and µj
Since the decision boundary is linear, the surface which separates two
classes is nothing but hyperplane.
If P(ω i ) = P(ω j ), it turns out to be orthogonal bisector passing
through X 0 . This is also called as minimum distance classifier.
15 / 25
Figure: Decision Boundary
If P(ω 1 ) == P(ω 2 ) It is on the point X 0

If P(ω 1 ) > P(ω 2 ) The decision surface is away from µ1
If P(ω 2 ) > P(ω 1 ) The decision surface is away from µ2
16 / 25
Case 1: Summary
Σi = σ 2 I ; i = 1,2,...,c; All covariance matrix of type σ 2 I

1
Σ-1 i = σ2
Σi is of hyper sphere of same shape and size.
W t [X − X 0 ] = 0 is the decision surface and it is linear.
It is Euclidean minimum distance classifier
2
σ1 0 0.3 0
In 2d, it turns out to be Σ1 = Σ2 = =
0 σ22 0 0.3
x1 and x2 are independent
17 / 25
Case 2 Assumption:
1 Σi = Σ 2
σ 1 σ 12
Σ is arbitrary ⇒ Σ1 = Σ2 =
σ 21 σ 2 2
2 x1 and x2 are not necessarily independent
3 Σi is the same for all different classes
4 The samples are clustered in hyper ellipsoidal of same shape and size
5 σ 12 = σ 21 , hence symmetry
18 / 25
−1 d 1
2 2 2
After ignoring constant − d2 ln(2π) and − 12 ln(|Σi |)
−1
2
If all the classes are equal probable then

Minimum distance classifier for
Case 1 - Squared Euclidean Distance
Case 2 - Squared Mahalanobis Distance
19 / 25
Expansion of the quadratic form yields:

−1
2
−1
= [(X t − µti )Σi -1 (X − µi )] + lnP(ω i )
2
−1
= [(X t Σi -1 − µti Σi -1 )(X − µi )] + lnP(ω i )
2
−1 t -1
= [X Σi X − µti Σi -1 X − X t Σi -1 µi + µti Σi -1 µi ] + lnP(ω i ) ⇒ (1)
2
20 / 25
X t Σi -1 X is same for all classes and hence ignored

−1
= [−2µti Σi -1 X + µti Σi -1 µi ] + lnP(ω i )
2
µti Σi -1 X == X t Σi -1 µi
1
= µti Σi -1 X − [µti Σi -1 µi ] + lnP(ω i )
2
gi (X) = Wi t X + Wi0 ⇒ linear equation/ machine

where
W i = µi Σi -1
W i0 = − 12 [µti Σi -1 µi ] + lnP(ω i )
21 / 25
What will be the nature of the decision boundary that separates the two
classes ω i and ω j ?
g i (X ) − g j (X ) = 0
By deriving as like previous, it turned to
W t (X − X 0 ) = 0
where,
W = Σ-1 (µi − µj )
P(ω i )
X 0 = 12 (µi + µj ) − 1
ln
(µi −µj )t Σ-1 (µi −µj ) P(ω j )
(µi − µj )
22 / 25
Case 3: It is more general case

1 Σi is arbitrary; different classes have different covariance matrix; Σi 6=
Σj
2 The decision surface is hyper quadratic in nature
3 Covariance matrix is arbitrary
From (1), We cant ignore anything here because of Σi is arbitrary in nature
−1 t -1 1
= [X Σi X − µti Σi -1 X − X t Σi -1 µi + µti Σi -1 µi ] + lnP(ω i ) − ln|Σi |
2 2
−1 t -1 1
= [X Σi X − 2µti Σi -1 X + µti Σi -1 µi ] + lnP(ω i ) − ln|Σi |
2 2
23 / 25
gi (X) = Xt Ai X + B i t X + C i0
where
−1 -1
Ai = Σi
2
B i = Σi -1 µi
−1 t -1 1
C i0 = µ Σi µi − ln|Σi | + lnP(ω i )
2 i 2
The decision surface is quadratic hyperplane
24 / 25
Summary of all 3 cases
Multivariate case:
Case 1: Σi = σ 2 I ; Same for all class

Case 2: Σi = Σ ; Same for all class
6 Σj ; Different for different class
Case 3: Σi =
Bivariate case:
σ12 0

2
Case 1: σ 1 = σ 2 2 ;Σ=
0 σ22
2
σ1 0
Case 2: σ 1 2 > σ 2 2 ;Σ=
0 σ22
2
σ σ 12
Case 3: Σ = 1
σ 21 σ22
25 / 25

Lect 13 - Bayes Decistion Theory - Derivation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect 13 - Bayes Decistion Theory - Derivation

Uploaded by

Copyright:

Available Formats

Derivatives of 3 different discriminat functions

for multivariate Gaussian distribution

Compiled by Karthikeyan, CED16I015

Department of Computer Science and Engineering

February 22, 2021

This is the discriminant function for the multivariate normal DF

Case 1: Σi = σ 2 I [I is Identity matrix]; given this,

|Σi | = σ 2d ⇒ d number of diagonal values

When covariance matrix is same for all different classes ∀ω i

By taking negative; g i (X ) becomes maximum

Expansion of the quadratic form yields:

Xt µi = µti X Next slide for explanation

gi (X) = Wi t X + W i0 ⇒ linear equation or linear machine

discriminant function for individual class or i th class is given by

If we want to find out the decision boundary between two different

gi (X) = gj (X) is the decision boundary

g(X)=(Wi -Wj )t X + Wi0 - Wj0 = 0

Wt [X − X 0 ] = 0 ⇒ Decision boundary between ith and jth class

W = line joining µi and µj where µi and µj is vector

Figure: Decision Boundary

If P(ω 1 ) == P(ω 2 ) It is on the point X 0

Σi = σ 2 I ; i = 1,2,...,c; All covariance matrix of type σ 2 I

If all the classes are equal probable then

Expansion of the quadratic form yields:

X t Σi -1 X is same for all classes and hence ignored

gi (X) = Wi t X + Wi0 ⇒ linear equation/ machine

Case 3: It is more general case

Case 1: Σi = σ 2 I ; Same for all class

You might also like