You are on page 1of 3

Estimating Rare Events Carnegie

Mellon
 MLE (Relative Frequency, RF) is generally unbiased,
but has very high variance for rare (low-count) events
 How to estimate P(ZZZ|C(ZZZ)=k)?
 for small k, RF is biased high!
 for k=0, it is biased low
 the Good-Turing formula corrects this
 can be seen as a leave-one-out method
 Good-Turing can be applied to rare events of any kind (not just
unigrams)
 need for discounting is not due to the open vocabulary
Smoothing a Multinomial Carnegie
Mellon
1. Assign a floor (minimum allowed value)
2. Interpolate with a uniform or other lower-variance
estimator
 a case of “shrinkage” (Stein’s paradox)
 Bayesian interpretation: parameters were generated from a
meta-distribution
 non-Bayesian interpretation: trade off bias+variance
 note: amount of smoothing doesn’t adjust to amount of
data!
Smoothing a Multinomial (cont.) Carnegie
Mellon
3. Add some amounts to the counts
 Bayesian interpretation: using the posterior mean, and a Dirichlet
prior
 Dirichlet is a conjugate prior wrt the multinomial
 special case of V=2: Beta is a conjugate prior wrt the binomial
 Note: amount of smoothing adjusts well to amount of data
 Bayesian interpretation: data will eventually overwhelm any prior
4. “Discount” small-counts (if new words are expected)
 discounted mass goes to the new words
 Good-Turing discounting
 Absolute discounting
 ...

You might also like