You are on page 1of 25

Derivatives of 3 different discriminat functions

for multivariate Gaussian distribution

Compiled by Karthikeyan, CED16I015


Guided by
Dr Umarani Jayaraman

Department of Computer Science and Engineering


Indian Institute of Information Technology Design and Manufacturing
Kancheepuram

February 22, 2021

1 / 25
Introduction

We started with how this PDF actually influence the structure of the
decision surface, because

g i (X ) = lnP(X /ω i ) + lnP(ω i )

The purpose is to find gi (X) which is the maximum among all possible
discriminant function.

g i (X ) = lnP(X /ω i ) + lnP(ω i )

1 −1
P(X /ω i ) = exp[ (X − µi )t Σi -1 (X − µi )]
(2π)d/2 |Σi |1/2 2
−1 d 1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] − ln(2π) − ln(|Σi |) + lnP(ω i )
2 2 2

2 / 25
Normal Density and Discriminant function

−1 d 1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] − ln(2π) − ln(|Σi |) + lnP(ω i )
2 2 2

This is the discriminant function for the multivariate normal DF


This classifier can take care of linearly non separable classes.
When we take a decision boundary between two classes ω i and ω j ; the
decision surface is quadratic surface.
It is not a linear surface. However for specific cases, this can be
converted into a linear classifier.
Depending upon the co-variance matrix Σi we can have different
cases of discriminant function (i.e) Case 1, Case 2 and Case 3.

3 / 25
Case 1: Normal Density and Discriminant Function

Assumptions:
In every class, the samples are clustered in hyper spherical of same
shape and size
The covariance matrix is of σ 2 I
The Σi is same for all classes where i = 1, 2, .., c

Case 1: Σi = σ 2 I [I is Identity matrix]; given this,

Determinant of

|Σi | = σ 2d ⇒ d number of diagonal values


1
Inverse of Σi : Σi -1 = σ2
I

4 / 25
Case 1: Normal Density and Discriminant function

When covariance matrix is same for all different classes ∀ω i

−1 d 1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] − ln(2π) − ln(|Σi |) + lnP(ω i )
2 2 2

d
− ln(2π) ⇒ Constant or independent of classes
2
1
− ln(Σi ) ⇒ Remains same for all classes, hence ignored
2

5 / 25
Case 1: Normal Density and Discriminant functions

By substituting, Σi -1 = σ12 I
−1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] + lnP(ω i )
2
−1
= [(X − µi )t (X − µi )] + lnP(ω i )
2σ 2
−1
= 2 ||X − µi ||2 + lnP(ω i )

6 / 25
Case 1: Normal Density and Discriminant functions

−1
g i (X ) = ||X − µi ||2 + lnP(ω i )
2σ 2
If P(ω i ) = P(ω j ) equal probability ∀ i,j = 1,2,...,c
−1
g i (X ) = ||X − µi ||2 ⇒ squared Euclidean distance
2σ 2

By taking negative; g i (X ) becomes maximum


Regardless of whether the prior probabilities are equal or not; It is not
actually necessary to compute distances.

7 / 25
Case 1: Normal Density and Discriminant functions

Expansion of the quadratic form yields:


−1
g i (X ) = [(X − µi )t (X − µi )] + lnP(ω i )
2σ 2
−1 t
= [X X − X t µi − µti X + µti µi ] + lnP(ω i )
2σ 2
Xt X is constant and same for all i ⇒ gi (X )

Xt µi = µti X Next slide for explanation

−1
= [−µti X − µti X + µti µi ] + lnP(ω i )
2σ 2
−1
= 2 [−2µti X + µti µi ] + lnP(ω i )

8 / 25
Case 1: Normal Density and Discriminant functions

   
2 2
X = 3 µ = 1
1 1
 
 2
X µ = 2 3 1 1 = [22 + 3 + 1] = 8
t


1
 
 2
µt X = 2 1 1 3 = [22 + 3 + 1] = 8


Hence Xt µi = µti X

9 / 25
Case 1: Normal Density and Discriminant functions

−1
g i (X ) = [−2µi t X + µi t µi ] + lnP(ω i )
2σ 2
µi t X 1
= 2 − 2 µi t µi + lnP(ω i )
σ 2σ
µi
Wi = σ2

gi (X) = Wi t X + W i0 ⇒ linear equation or linear machine

1
Wi = µ
σ2 i

−1 t
Wi0 = µµ
2σ 2 i i
+ lnP(ωi )

10 / 25
Case 1: Normal Density and Discriminant functions

discriminant function for individual class or i th class is given by

gi (X) = Wi t X + W i0

If we want to find out the decision boundary between two different


classes - ω i and ω j then let’s understand
What will be the nature of the decision boundary that separates the
two classes ω i and ω j ?

11 / 25
Case 1: Normal Density and Discriminant functions

gi (X) = gj (X) is the decision boundary


gi (X) = W i t X + W i0
gj (X) = W j t X + W j0
g(X)= gi (X) - gj (X) = 0 is the equation of the decision boundary
g(X)= Wi t X + Wi0 - Wj t X - Wj0 = 0

g(X)=(Wi -Wj )t X + Wi0 - Wj0 = 0

12 / 25
Case 1:Normal Density and Discriminant functions

g (X ) = (W i − W j )t X + W i0 − W j0 = 0

1 t µ i t µi µj t µj
= (µ i − µ j ) X − + lnP(ω i ) + − lnP(ω j = 0
σ2 2σ 2 2σ 2
1 1
= (µi − µj )t X − 2 (µi t µi − µj t µj ) + lnP(ω i ) − lnP(ω j ) = 0
σ2 2σ
1 1 P(ω i )
= 2
(µi − µj )t X − 2 (µi t µi − µj t µj ) + ln =0
σ 2σ P(ω j )
Multiply by σ 2

1 P(ω i )
= (µi − µj )t X − [(µi − µj )t (µi + µj )] + σ 2 ln =0
2 P(ω j )

13 / 25
Case 1:Normal Density and Discriminant functions

1 P(ω i )
= (µi − µj )t X − [(µi − µj )t (µi + µj )] + σ 2 ln =0
2 P(ω j )
Take (µi − µj )t out

1 σ2 P(ω i )
= (µi − µj )t [X − { (µi + µj ) − t
ln .(µi − µj )}]
2 (µi − µj ) (µi − µj ) P(ω j )

= W t [X − X 0 ] = 0
where
W = (µi − µj )
σ2 P(ω i )
X 0 = 12 (µi + µj ) − ln
||µi −µj ||2 P(ω j )
.(µi − µj )

14 / 25
Case 1:Normal Density and Discriminant functions

Wt [X − X 0 ] = 0 ⇒ Decision boundary between ith and jth class

W = line joining µi and µj where µi and µj is vector


Since W t [X − X 0 ] = 0 , the decision surface is orthogonal to the
line joining µi and µj
Since the decision boundary is linear, the surface which separates two
classes is nothing but hyperplane.
If P(ω i ) = P(ω j ), it turns out to be orthogonal bisector passing
through X 0 . This is also called as minimum distance classifier.

15 / 25
Case 1:Normal Density and Discriminant functions

Figure: Decision Boundary

If P(ω 1 ) == P(ω 2 ) It is on the point X 0


If P(ω 1 ) > P(ω 2 ) The decision surface is away from µ1
If P(ω 2 ) > P(ω 1 ) The decision surface is away from µ2
16 / 25
Case 1: Summary

Σi = σ 2 I ; i = 1,2,...,c; All covariance matrix of type σ 2 I


1
Σ-1 i = σ2
Σi is of hyper sphere of same shape and size.
W t [X − X 0 ] = 0 is the decision surface and it is linear.
It is Euclidean minimum distance classifier
 2   
σ1 0 0.3 0
In 2d, it turns out to be Σ1 = Σ2 = =
0 σ22 0 0.3
x1 and x2 are independent

17 / 25
Case 2: Normal Density and Discriminant functions

Case 2 Assumption:
1 Σi = Σ  2 
σ 1 σ 12
Σ is arbitrary ⇒ Σ1 = Σ2 =
σ 21 σ 2 2
2 x1 and x2 are not necessarily independent
3 Σi is the same for all different classes
4 The samples are clustered in hyper ellipsoidal of same shape and size
5 σ 12 = σ 21 , hence symmetry

18 / 25
Case 2: Normal Density and Discriminant functions

−1 d 1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] − ln(2π) − ln(|Σi |) + lnP(ω i )
2 2 2
After ignoring constant − d2 ln(2π) and − 12 ln(|Σi |)

−1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] + lnP(ω i )
2

If all the classes are equal probable then


Minimum distance classifier for
Case 1 - Squared Euclidean Distance
Case 2 - Squared Mahalanobis Distance

19 / 25
Case 2: Normal Density and Discriminant functions

Expansion of the quadratic form yields:


−1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] + lnP(ω i )
2
−1
= [(X t − µti )Σi -1 (X − µi )] + lnP(ω i )
2
−1
= [(X t Σi -1 − µti Σi -1 )(X − µi )] + lnP(ω i )
2
−1 t -1
= [X Σi X − µti Σi -1 X − X t Σi -1 µi + µti Σi -1 µi ] + lnP(ω i ) ⇒ (1)
2

20 / 25
Case 2: Normal Density and Discriminant functions

X t Σi -1 X is same for all classes and hence ignored


−1
= [−2µti Σi -1 X + µti Σi -1 µi ] + lnP(ω i )
2

µti Σi -1 X == X t Σi -1 µi

1
= µti Σi -1 X − [µti Σi -1 µi ] + lnP(ω i )
2

gi (X) = Wi t X + Wi0 ⇒ linear equation/ machine


where
W i = µi Σi -1
W i0 = − 12 [µti Σi -1 µi ] + lnP(ω i )

21 / 25
Case 2: Normal Density and Discriminant functions

What will be the nature of the decision boundary that separates the two
classes ω i and ω j ?
g i (X ) − g j (X ) = 0
By deriving as like previous, it turned to

W t (X − X 0 ) = 0

where,
W = Σ-1 (µi − µj )
P(ω i )
X 0 = 12 (µi + µj ) − 1
ln
(µi −µj )t Σ-1 (µi −µj ) P(ω j )
(µi − µj )

22 / 25
Case 3: Normal Density and Discriminant functions

Case 3: It is more general case


1 Σi is arbitrary; different classes have different covariance matrix; Σi 6=
Σj
2 The decision surface is hyper quadratic in nature
3 Covariance matrix is arbitrary
From (1), We cant ignore anything here because of Σi is arbitrary in nature

−1 t -1 1
= [X Σi X − µti Σi -1 X − X t Σi -1 µi + µti Σi -1 µi ] + lnP(ω i ) − ln|Σi |
2 2
−1 t -1 1
= [X Σi X − 2µti Σi -1 X + µti Σi -1 µi ] + lnP(ω i ) − ln|Σi |
2 2

23 / 25
Case 3: Normal Density and Discriminant functions

gi (X) = Xt Ai X + B i t X + C i0
where
−1 -1
Ai = Σi
2
B i = Σi -1 µi
−1 t -1 1
C i0 = µ Σi µi − ln|Σi | + lnP(ω i )
2 i 2
The decision surface is quadratic hyperplane

24 / 25
Summary of all 3 cases

Multivariate case:

Case 1: Σi = σ 2 I ; Same for all class


Case 2: Σi = Σ ; Same for all class
6 Σj ; Different for different class
Case 3: Σi =

Bivariate case:
σ12 0
 
2
Case 1: σ 1 = σ 2 2 ;Σ=
0 σ22
 2 
σ1 0
Case 2: σ 1 2 > σ 2 2 ;Σ=
0 σ22
 2 
σ σ 12
Case 3: Σ = 1
σ 21 σ22

25 / 25

You might also like