Professional Documents
Culture Documents
Chapter 3
Chapter 3
M L
D M
Chapter 3
Parameter Estimation
Introduction
Maximum-Likelihood Estimation
Bayesian Estimation
p(x | i ) P(i )
P(i | x)
p ( x)
c
p (x) p (x | i ) P(i )
j 1
D {D1 , D 2 , , D c } D1 D2
The samples in Dj are drawn
independently according to the
probability law p(x|j). That is,
examples in Dj are i.i.d. random
variables, i.e., independent D3
and identically distributed.
p (x | j ) ~ N (μ j , Σ j )
j θ j (1 , 2 , , m ) T
θ̂ 2
1 θ̂
D {D1 , D 2 , , D c } D1 1 2
D2
p(x | j ) p(x | θ j )
D3
Use Dj to estimate the unknown 3 θ̂ 3
parameter vector j
θ j (1 , 2 , , m ) T
Maximum-Likelihood Estimation
View parameters as Estimate parameter values by
quantities whose maximizing the likelihood
values are fixed but (probability) of observing the
unknown actual examples.
Bayesian Estimation
θ̂
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 8
Maximum-Likelihood Estimation (Cont.)
MIMA
Criterion of ML
D {x1 , x 2 , , x n }
By the independence assumption, we have
n
p(D | θ) p (x1 | θ) p(x 2 | θ) p(x n | θ) p (x k | θ)
k 1
θˆ arg max l (θ | D )
θ
why?
θˆ arg max L(θ | D )
θ
T
Gradient
Operator θ , , ,
1 2 n
(2 ) d / 2 | Σ |1/ 2 2
n
L(μ | D ) p (D | μ) p (x k | θ)
k 1
1 1 n
n/2
1
exp ( k )
x μ T
Σ ( k )
x μ
(2 ) nd / 2
| Σ | k 1 2
l (μ | D ) ln L(μ | D )
1 n
ln(2 ) nd / 2
|Σ| n/2
(x k μ)T Σ 1 (x k μ)
2 k 1
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 12
The Gaussian Case I MIMA
l (μ | D ) ln L(μ | D )
1 n
ln(2 ) nd / 2
|Σ| n/2
(x k μ)T Σ 1 (x k μ)
2 k 1
n
μ l (μ | D ) Σ 1 (x k μ) 0
k 1
1 n
μˆ x k Sample Mean!
n k 1
1 n
l (θ | D ) ln L(θ | D ) ln(2 ) n/2
n
2 2
k
( x
k 1
) 2
1 n
ln(2 ) 2 k 1
n/2
n/2
( x ) 2
2 2 k 1
1 n
l (θ | D ) ln(2 ) n / 2 2 k 1
n/2
( x ) 2
2 2 k 1
1 n Unbiased Estimator:
k 1 ( x ) E [θˆ] θ
θl (θ | D ) 2 k 1
2 0
n ( xk 1 ) Consistent Estimator:
n
2 2 2 lim E[θˆ ] θ
2 k 1 2 n
unbiased
1 n
ˆ ˆ1 xk Arithmetic average of n vectors
n k 1
1 n
ˆ 2 ˆ2 ( xk ˆ ) 2 Arithmetic average of n matrices
n k 1
(x k μˆ )(x k μˆ )T
biased
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 15
MLE for Normal Population MIMA
1 n Sample Mean
μˆ x k
n k 1 E[μˆ ] μ
1 n
ˆ (x μˆ )(x μˆ )T ˆ n 1
Σ k k
E[ Σ ] ΣΣ
n k 1 n
1 n
Sample Covariance Matrix
C (x k μ)(x k μ)
ˆ ˆ T
n 1 k 1 E[C] Σ
Settings
The parametric form of the likelihood function for each
category is known
However, j is considered to be random variables
instead of being fixed (but unknown) values.
P(i , x, D ) P(i , x, D )
P(i | x, D )
P( x, D ) P( j , x, D )
c
j 1
Assumptions:
P(i | D ) P(i ) P(x | i , D i ) P(i )
P(i | x, D ) c
P ( x | i , D ) P ( x | i , D i ) P(x | , D
j 1
j j ) P( j )
P(x | i , D i ) P(i )
P(i | x, D ) c
P(x | , D
j 1
j j ) P( j )
p (x | θ, D) p (θ | D )dθ
p (θ | D ) ?
Phase I:
Phase II:
p(x | D ) p(x | θ) p(θ | D )dθ
Phase III:
P(x | i , D i ) P(i )
P(i | x, D ) c
P (x | , D
j 1
j j ) P ( j )
Phase I: p (θ | D ) p (x k | θ) p (θ)
k 1
p( ) p( x | ) D p( | D)
1 1 x
2
p( x | ) exp
2 2
1 1 0
2
p( ) exp
2 0 2 0
n
p(θ | D ) p(x k | θ) p(θ)
k 1
n
1 1 xk
2
1 1 0
2
p( | D ) exp exp
k 1 2 2 2 0 2 0
1 n x 2 2
exp k 0
2 k 1 0
1 n 1 2 1 n
0
exp 2 2 2 2 xk 2
2 0 k 1 0
p( | D ) exp
2 n 2 n
1 2
1
exp 2 2 n n
2
Comparison
2 n 2 n
1 n 1 2 1 n
0
p( | D ) exp 2 2 2 2 xk 2
2 0 k 1 0
n 02 2 1 n
n 2 ˆ
2 n
0 ̂ n xk
n 0 n 0
2 2
n k 1
2 2
2 0
n 2 2
n
0
p( | D) p( x | ) p( x | D)
1 1 x
2
p( x | ) exp
2 2
p ( | D ) ~ N ( n , n2 )
1 1 x 2
p (x | D ) p(x | u ) p (u | D )dθ p( x | ) exp
2 2
p ( | D ) ~ N ( n , n2 )
1 1 x
2
1 n
2
p( x | D )
2 n exp exp
2
d
2 n
1 1 ( x n ) 2
1 n
2 2
n x n
2 2
2
2
exp exp 2 2
d
2 n 2 n
2
2 n n
2 2
1 1 x 2
p (x | D ) p(x | u ) p (u | D )dθ p( x | ) exp
2 2
p ( | D ) ~ N ( n , n2 )
p( x | D ) ~ N ( , ) 2 2
1 1 x
2
1 n
2
p( x | D )
2 n exp exp
2 2 n
n
d
n
1 1 ( x n ) 2
1 n
2 2
n x n
2 2
2
2
exp exp 2 2
d
2 n 2 n
2
2 n n
2 2
Phase III:
P(x | i , D i ) P(i )
P(i | x, D ) c
P(x | , D
j 1
j j ) P ( j )
Key issue
Estimate prior and class-conditional pdf from training
set
Basic assumption on training examples: i.i.d.
Two strategies to key issue
Parametric form for class-conditional pdf
Maximum likelihood estimation
Bayesian estimation
Any Question?