Professional Documents
Culture Documents
Pattern Classification: All Materials in These Slides Were Taken From
Pattern Classification: All Materials in These Slides Were Taken From
Classification
Bias
Characterized by 2 parameters
Estimation techniques
are different
General principle
1 2 11 22 m n
θ = (μ j , Σ j ) = (μ j , μ j ,..., σ j , σ j , cov(x j , x j )...)
Pattern Classification, Chapter2 3
7
Use the information
provided by the training samples to estimate
= (1, 2, …, c), each i (i = 1, 2, …, c) is associated
with each category
θ̂
ML estimate of is, by definition the value that
maximizes P(D | )
“It is the value of that best agrees with the actually
observed training sample”
Pattern Classification, Chapter2 3
8
k =n
(∇θl = ∑ ∇θ ln P(x k | θ))
k =1
l = 0
P(xi | ) ~ N(, )
(Samples are drawn from a multivariate normal population)
1 d
[ 1
] t
−1
ln P(x k | μ) = − ln (2π) Σ − (x k − μ) ∑ (x k − μ)
2 2
−1
and ∇ θμ ln P(x k | μ) = ∑ (x k − μ)
= therefore:
k =n
−1
∑ Σ (x k −μˆ ) = 0
k =1
Pattern Classification, Chapter2 3
12
1 k =n
μˆ = ∑ x k
n k =1
Just the arithmetic average of the samples
of the training samples!
Conclusion:
If P(xk | j) (j = 1, 2, …, c) is supposed to be Gaussian in a d-
dimensional feature space; then we can estimate the vector
= (1, 2, …, c)t and perform an optimal classification!
ML Estimation:
Gaussian Case: unknown and
⎧k =n 1
∑
⎪k =1 θˆ ( x k − θ 1 ) = 0 (1)
⎪ 2
⎨ k =n = ˆ 2
⎪− ∑ 1 + ∑ (x k −θ1 ) = 0
k n
(2)
⎪
⎩ k =1 ˆ
θ 2 k = 1 ˆ
θ 2
2
Bias
⎡1 2⎤ n −1 2
E ⎢ Σ ( x i −x ) ⎥ = .σ ≠ σ 2
⎣n ⎦ n
1 k =n t
C= ∑ (x k −μ )(x k −μˆ )
1 4 n4 - 41 k4=14 2 4 4 4 4 4 3
Sample covariance matrix
|D| = n
. . .x
x1 .x . .
n
2
D1 x11
. ..
x10 Dk x8 . Dc .
x . . . .
. 20
x
1 x 9
. .