You are on page 1of 10

Parameter Estimation

-
2 MLE for Bernoulli distribution


3

=0


4 Over-fitting of MLE
• With low number of samples, the MLE
suffers from over-fitting ..leading to poor
generalization to test data.

• For Bernoulli distribution, suppose a coin


is actually unbiased. If however we are to
estimate its probabilities (using MLE), with
very limited samples , there is a high
chance that the coin will presumably turn
out to be biased!

5 Rank of covariance matrix
• Outer product of 2 vectors has rank 1.
• The rank of the sample covariance matrix is at
most n ---> sum of n ‘rank 1’ outer product
matrices.

• If the dimension of each feature vector is greater


than the rank, then the matrix is rank deficient,
and not invertible.


6

• Thus , when d >> n, the covariance matrix is


not reliably estimated with limited samples.
• One way to ensure good estimation (invert
ability of Σ) is to increase number of training
samples , so that n >> d .
• If training samples are limited, we may add a
regularization parameter λI to ensure that it can
be inverted.
• ∑ ˆ +λI is made invertible


7

• Bayesian Estimation
• Bayesian Parameter Estimation: Gaussian
Case
• Bayesian Parameter Estimation: General
Estimation


8 Bayesian Estimation (Bayesian learning)

• In Max Likelihood θ was supposed fix


• In Bayesian Estimation, θ is a random variable
• Computation of posterior probabilities P(ωi | x)
lies at the heart of Bayesian classification

• Goal: compute P(ωi | x, D)


9

• Given the sample D, Bayes formula can be


written

p(x | ωi , D).P(ωi | D)
P(ωi | x, D) = c

∑ p(x | ω , D).P(ω
j =1
j j | D)


10

The basic problem is: “Compute the posterior density


θ | D)” then “derive p(x | D)”
p(θ
p(D | θ). p(θ)
p(θ | D) =
∫ p(D | θ). p(θ)dθ
Using Bayes formula, we have:

p (x | D) = ∫ p (x | θ). p(θ | D) dθ

And by independence assumption:


k =n
p(D | θ) = ∏ p(x k | θ)
k =1

You might also like