Professional Documents
Culture Documents
Estimation
Prof. Nicholas Zabaras
Email: nzabaras@gmail.com
URL: https://www.zabaras.com/
MLE for the Poisson distribution, MLE for the Multinomial Distribution
Familiarize ourselves with the MLE estimates of mean and variance in the
univariate and multivariate Gaussian distributions
The quantity 𝑝(𝒟|𝜃) on the right-hand side of Bayes’ theorem is evaluated for
the observed data set 𝒟 and can be viewed as a function of the parameter
vector 𝜃, in which case it is called the likelihood function.
In the Bayesian approach, there is only one set of data 𝒟 and the uncertainty
in 𝜃 is introduced with appropriate prior and computing posterior probabilities
over 𝜃.
X ~ ( x) ( x | ), k
or, briefly N
(D | ) ( x j | )
j 1
where
D x1 , x2 ,..., xN
L(D | ) log (D | )
1 N N
L( D | )
2 2 j 1
( x j 1 ) 2
2
log(2 2 )
This gives:
1 N
mle ML,1 x j
N j 1
1 N
2
mle ML ,2 ( x j ML ,1 )2
N j 1
These estimates agree with what we predicted in an earlier lecture with the
law of large numbers.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 10
MLE for the Univariate Gaussian
So for the Gaussian distribution,
1 1 2
N ( x | , 2 ) exp ( x )
2 1/2
2
2 2
the likelihood function is
N
p (D | , ) N ( xn | , 2 )
2
n 1
2 2
( xn )2
n 1 2
ln 2 ln(2 )
2
The maximum likelihood solution is
N N
xn n ML
( x ) 2
ML n 1
, ML
2
n 1
N N
* We work often with log-likelihood to avoid underflow (taking products of small
probabilities) and for simplifying the algebra.
5
0.9
MLE based
0.8
estimates of
0.7 4
and 2 used
0.6 in constructing
0.5 3 the Gaussian shown
0.4 with the solid line.
0.3
2
0.2
0.1
1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
CMBdata 0
-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
Matlab implementation
Normal estimation
From Bayesian Core , J.M. Marin and C.P. Roberts, Chapter 2 (available on line)
1 N
[ mle
] N
i 1
xi
1 N
2
mle
N
( x
i 1
i )
mle 2
1 N 1 N
2
1 2
xi xj 1 2
N i 1 N j 1 N
2
The maximum likelihood solutions ML ML are functions of the data set values
,
𝑥1, . . . , 𝑥𝑁. Consider the expectations of these quantities with respect to the
data set values, which come from a Gaussian.
The MLE approach thus underestimates the variance (bias) – this is in the
root of the over-fitting problem.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 15
Unbiased Estimate of Variance
If 𝑥1, 𝑥2, … 𝑥𝑁 ~(i.i.d) 𝒩(, 2) then
1 N 1 N
2
1
mle
2
xi xj 1 2 2
N i 1 N j 1 N
So define
mle
2
1 N
2
unbiased
1 N 1 i 1
( xi )
mle 2
unbiased
2
2
1
N
n
P( N n) Poisson (n | ) e
n!
N n Poisson (n | ) ,
n 0
N 2
k 1 k 1 k
2
( n | ) exp n
2 2
Poisson j j
j 1 j 1
1
N /2
1 1 N
exp n j N log
2
2 2 j 1
n j N log
N
1 2
L( D | )
j 1
ML : L(D | ) 1
N
2 N N
2
n n 0
j 1 j 1
2 j j
An approximation for
1/2
1 1 N 1 1 N
n 2j n result
j from the exact density
4 N j 1 2 N j 1
N! N m1 m2
(m1 , m2 ,..., mK | 1 , 2 ,..., K ) mi
1 2 ... K
mK
mi !
i
m1 , m2 ,..., mK
Every time you introduce model reduction, you also introduce model errors.
1 1
The probability density of 𝑋 given 𝑧, is: ( x | z ) exp ( x Az )T 1
( x Az )
2 det()1/2 2
n /2
N
1 N
The likelihood function is then ( x j | z ) ~ exp ( x j Az ) ( x j Az )
T 1
j 1 2 j 1
j 1
1 N
A A z A x, where x x j
T 1 T 1
N j 1
nk
The existence of the solution of this system depends on the matrix A
ND N 1 N
ln p ( X | D , , ) ln 2 ln | | ( xn )T 1 ( xn )
2 2 2 n 1
Setting the derivatives wrt 𝝁 and 𝚺 equal to zero gives the following:
N N
1 1
ML
N
xn , ML
n 1
N
n ML n ML
( x
n 1
)( x ) T
Here we used: A
| A1 || A |1 , tr ( AB ) tr ( BA)
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 26
Appendix: Some Useful Matrix Operations
Show that
Tr T and Tr T
Indeed
Tr ik ki nm
A B B Tr T
Amn Amn
Show that
ln | A | A1
T
A
xN N 1 ( N 1)
N
N
ML ML ( N 1)
1
N
x N ML
( N 1)
Learning Error signal
rate
( N ) ( N 1) aN 1 z ( N 1)
lim aN 0, aN , aN2
N
n 1 n 1
Substituting aN 1 2
gives the exact update discussed earlier.
N
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 32
Robbins-Monro Algorithm
A graphical interpretation of the algorithm is shown here.
x ML
z
x ( N ) ML
( N 1)
2
(N )
( N 1)
aN 1
ML ML
2 p ( z | ML ) is a Gaussian
N n 1 N n 1 N
xN
2
N 1 2
( N 1)
N N
( N 1)
2 1
N
xN (2N 1)
2
If we substitute the expression for the Gaussian likelihood into the Robbins-
Monro procedure for maximizing likelihood:
1
2
2
( N 1)
2
( N 1) a N 1
2
1
4
( N 1)
N (2N 1)
x
2
N n 1
AN 1 ML
2 ( N 1) 2
N
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 35