Professional Documents
Culture Documents
1 / 25
Introduction
We started with how this PDF actually influence the structure of the
decision surface, because
g i (X ) = lnP(X /ω i ) + lnP(ω i )
The purpose is to find gi (X) which is the maximum among all possible
discriminant function.
g i (X ) = lnP(X /ω i ) + lnP(ω i )
1 −1
P(X /ω i ) = exp[ (X − µi )t Σi -1 (X − µi )]
(2π)d/2 |Σi |1/2 2
−1 d 1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] − ln(2π) − ln(|Σi |) + lnP(ω i )
2 2 2
2 / 25
Normal Density and Discriminant function
−1 d 1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] − ln(2π) − ln(|Σi |) + lnP(ω i )
2 2 2
3 / 25
Case 1: Normal Density and Discriminant Function
Assumptions:
In every class, the samples are clustered in hyper spherical of same
shape and size
The covariance matrix is of σ 2 I
The Σi is same for all classes where i = 1, 2, .., c
Determinant of
4 / 25
Case 1: Normal Density and Discriminant function
−1 d 1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] − ln(2π) − ln(|Σi |) + lnP(ω i )
2 2 2
d
− ln(2π) ⇒ Constant or independent of classes
2
1
− ln(Σi ) ⇒ Remains same for all classes, hence ignored
2
5 / 25
Case 1: Normal Density and Discriminant functions
By substituting, Σi -1 = σ12 I
−1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] + lnP(ω i )
2
−1
= [(X − µi )t (X − µi )] + lnP(ω i )
2σ 2
−1
= 2 ||X − µi ||2 + lnP(ω i )
2σ
6 / 25
Case 1: Normal Density and Discriminant functions
−1
g i (X ) = ||X − µi ||2 + lnP(ω i )
2σ 2
If P(ω i ) = P(ω j ) equal probability ∀ i,j = 1,2,...,c
−1
g i (X ) = ||X − µi ||2 ⇒ squared Euclidean distance
2σ 2
7 / 25
Case 1: Normal Density and Discriminant functions
−1
= [−µti X − µti X + µti µi ] + lnP(ω i )
2σ 2
−1
= 2 [−2µti X + µti µi ] + lnP(ω i )
2σ
8 / 25
Case 1: Normal Density and Discriminant functions
2 2
X = 3 µ = 1
1 1
2
X µ = 2 3 1 1 = [22 + 3 + 1] = 8
t
1
2
µt X = 2 1 1 3 = [22 + 3 + 1] = 8
Hence Xt µi = µti X
9 / 25
Case 1: Normal Density and Discriminant functions
−1
g i (X ) = [−2µi t X + µi t µi ] + lnP(ω i )
2σ 2
µi t X 1
= 2 − 2 µi t µi + lnP(ω i )
σ 2σ
µi
Wi = σ2
1
Wi = µ
σ2 i
−1 t
Wi0 = µµ
2σ 2 i i
+ lnP(ωi )
10 / 25
Case 1: Normal Density and Discriminant functions
gi (X) = Wi t X + W i0
11 / 25
Case 1: Normal Density and Discriminant functions
12 / 25
Case 1:Normal Density and Discriminant functions
g (X ) = (W i − W j )t X + W i0 − W j0 = 0
1 t µ i t µi µj t µj
= (µ i − µ j ) X − + lnP(ω i ) + − lnP(ω j = 0
σ2 2σ 2 2σ 2
1 1
= (µi − µj )t X − 2 (µi t µi − µj t µj ) + lnP(ω i ) − lnP(ω j ) = 0
σ2 2σ
1 1 P(ω i )
= 2
(µi − µj )t X − 2 (µi t µi − µj t µj ) + ln =0
σ 2σ P(ω j )
Multiply by σ 2
1 P(ω i )
= (µi − µj )t X − [(µi − µj )t (µi + µj )] + σ 2 ln =0
2 P(ω j )
13 / 25
Case 1:Normal Density and Discriminant functions
1 P(ω i )
= (µi − µj )t X − [(µi − µj )t (µi + µj )] + σ 2 ln =0
2 P(ω j )
Take (µi − µj )t out
1 σ2 P(ω i )
= (µi − µj )t [X − { (µi + µj ) − t
ln .(µi − µj )}]
2 (µi − µj ) (µi − µj ) P(ω j )
= W t [X − X 0 ] = 0
where
W = (µi − µj )
σ2 P(ω i )
X 0 = 12 (µi + µj ) − ln
||µi −µj ||2 P(ω j )
.(µi − µj )
14 / 25
Case 1:Normal Density and Discriminant functions
15 / 25
Case 1:Normal Density and Discriminant functions
17 / 25
Case 2: Normal Density and Discriminant functions
Case 2 Assumption:
1 Σi = Σ 2
σ 1 σ 12
Σ is arbitrary ⇒ Σ1 = Σ2 =
σ 21 σ 2 2
2 x1 and x2 are not necessarily independent
3 Σi is the same for all different classes
4 The samples are clustered in hyper ellipsoidal of same shape and size
5 σ 12 = σ 21 , hence symmetry
18 / 25
Case 2: Normal Density and Discriminant functions
−1 d 1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] − ln(2π) − ln(|Σi |) + lnP(ω i )
2 2 2
After ignoring constant − d2 ln(2π) and − 12 ln(|Σi |)
−1
g i (X ) = [(X − µi )t Σi -1 (X − µi )] + lnP(ω i )
2
19 / 25
Case 2: Normal Density and Discriminant functions
20 / 25
Case 2: Normal Density and Discriminant functions
µti Σi -1 X == X t Σi -1 µi
1
= µti Σi -1 X − [µti Σi -1 µi ] + lnP(ω i )
2
21 / 25
Case 2: Normal Density and Discriminant functions
What will be the nature of the decision boundary that separates the two
classes ω i and ω j ?
g i (X ) − g j (X ) = 0
By deriving as like previous, it turned to
W t (X − X 0 ) = 0
where,
W = Σ-1 (µi − µj )
P(ω i )
X 0 = 12 (µi + µj ) − 1
ln
(µi −µj )t Σ-1 (µi −µj ) P(ω j )
(µi − µj )
22 / 25
Case 3: Normal Density and Discriminant functions
−1 t -1 1
= [X Σi X − µti Σi -1 X − X t Σi -1 µi + µti Σi -1 µi ] + lnP(ω i ) − ln|Σi |
2 2
−1 t -1 1
= [X Σi X − 2µti Σi -1 X + µti Σi -1 µi ] + lnP(ω i ) − ln|Σi |
2 2
23 / 25
Case 3: Normal Density and Discriminant functions
gi (X) = Xt Ai X + B i t X + C i0
where
−1 -1
Ai = Σi
2
B i = Σi -1 µi
−1 t -1 1
C i0 = µ Σi µi − ln|Σi | + lnP(ω i )
2 i 2
The decision surface is quadratic hyperplane
24 / 25
Summary of all 3 cases
Multivariate case:
Bivariate case:
σ12 0
2
Case 1: σ 1 = σ 2 2 ;Σ=
0 σ22
2
σ1 0
Case 2: σ 1 2 > σ 2 2 ;Σ=
0 σ22
2
σ σ 12
Case 3: Σ = 1
σ 21 σ22
25 / 25