Professional Documents
Culture Documents
P (X, Y ) = P (Y )P (X|Y )
where
1 1 0.9 0.4
µ1 = , µ2 = , Σ= .
1 −1 0.4 0.3
Given a joint distribution (X, Y ), we may ask “what is the best possible
classifier for that joint distribution?”
R∗ := min R(f )
all f
Terminology
Theorem 1
The Bayes classifier is given by
Proof of Theorem
The proof starts with the definition of R(f ):
Plug-in Classifiers
Linear Classifiers
gk (x) = ϕ(x; µk , Σ)
− d2 1 1 T −1
= (2π) |Σ| exp − (x − µk ) Σ (x − µk ) .
2
2
Estimating Parameters
Or equivalently,
LDA Classifier:
fˆ(x) = sign(wT x + b)
fˆ(x) = sign(log π̂1 + log ϕ(x; µ̂1 , Σ̂) − log π̂−1 − log ϕ(x; µ̂−1 , Σ̂)
d 1 1
= sign log π̂1 − log 2π − log |Σ̂| − (x − µ̂1 )T Σ̂−1 (x − µ̂1 )
2 2 2
d 1 1
− log π̂−1 + log 2π + log |Σ̂| + (x − µ̂−1 )T Σ̂−1 (x − µ̂−1 )
2 2 2
i
π̂1 1 h
= sign log + (x − µ̂−1 )T Σ̂−1 (x − µ̂−1 ) − (x − µ̂1 )T Σ̂−1 (x − µ̂1 )
π̂−1 2
π̂1 1
= sign log + xT Σ̂−1 (µ̂1 − µ̂−1 ) + (µ̂T−1 Σ̂−1 µ̂−1 − µ̂T1 Σ̂−1 µ̂1 )
π̂−1 2
= sign(wT x + b)
LDA Classifier:
Definition
Mahalanobis Distance:
q
DA (x, x′ ) := (x − x′ )T A(x − x′ ),
→ if we ignore the log π̂k terms, LDA can be viewed as assigning a test
point x to the class whose mean µk is closest to x according to the
Mahalanobis distance associated to Σ̂−1 .
Visualization