You are on page 1of 19

EE769 Intro to ML

Density Estimation and Sampling


Amit Sethi
Faculty member, IIT Bombay
Learning objectives

• List applications of density estimation

• Estimate sufficient statistics of some distributions

• Test goodness of fit of distributions

• Write EM algorithm for mixture of Gaussians

• Fit a kernel density estimator on given samples

• List methods to generate samples from a distribution

2
Why estimate densities

• Perform basic statistical tests

• Identify outliers

• Use Bayesian methods

• Generate new samples

3
Parametric Density Estimation

• Visualize data

• Shortlist candidate distributions

• Estimate distribution parameters

• Check fit or likelihood

4
Likelihood maximization and sufficient
statistics
• A statistic is a descriptor of a distribution

5
Fitting a Gaussian distribution
• (2π σ2)-2 exp(− (x−μ)2 / 2σ2)
• E[X] = μ
• E[(X- μ)2] = σ2
• Multivariate: ( (2π)d |C|)-2 exp(− (x−μ)T C-1 (x−μ) / 2)

6
Fitting an exponential distribution
• λ e-λx , if x ≥ 0
• E[X] = λ-1

7
Fitting a uniform distribution
• (b-a)-1, if a ≤ x ≤ b

8
Comparing two distributions using likelihood
Q-Q plot
• Plot Actual data quantiles versus
• Theoretical quantiles
• (of any distribution)

Skbkekas, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons


Mixture of Gaussians for multimodal density
• Assume K Gaussians
• Parameters Θ: { αk, μk, Ck }, k = 1…K

11
EM Algorithm for MoG
• Iterate:
• Expectation: Keeping Θ fixed, estimate membership
• wik = αk pk(xi) / ∑j αj pj(xi)
• Maximization: Keeping wik fixed, find MLE Θ. i.e.,
• αk = Nk / N, where Nk = ∑i wik
• μk = 1/Nk ∑i wik xi
• Ck = 1/Nk ∑i wik (xi − μk) (xi − μk)T

12
Non-parametric density estimation
• Kernel density estimation a.k.a. Parzen window

13
Finding optimal window size
• Minimize MSE in cumulative distribution

• Rules of thumb:
• h = 1.06 σ N-1/5
• h = 0.9 min(σ, IQR/1.34) N-1/5

14
Generating a sample from a distribution
• Why generate samples?
• Approximate expectations (Monte Carlo)
• Augment data
• If CDF is “simple”
• Generate a pseudo-random number
• Transform a uniform distribution to CDF
• X = F-1 (U), where F is the desired CDF, and U is a standard uniform RV

• Factorize multivariate distributions


• Sample from a proposal distribution and filter

15
Rejection sampling
• Sample from q(z)
• Accept with probability p(z) / k q(z)

16
Importance sampling
• Sample from q(z)
• Assign weight p(z) / q(z)

17
Markov-chain Monte Carlo
• Construct a Markov chain q(zA|zB)
• Iterate over t
• Generate q(z|z(t))
• Accept probability A = min( 1, p(z) q(z(t)|z) / p(z(t)) q(z|z(t)) )
• If accepted, z(t+1) = z
• Else, z(t+1) = z(t)

18
Gibbs sampling

• Require: Conditional distributions p(zi | z\i)


• Iterate over t
• Iterate over i
• Sample zi(t+1) from p(zi(t+1) | z\i(t))

19

You might also like