Professional Documents
Culture Documents
Sparseness Smoothing
Sparseness Smoothing
Mellon
MLE (Relative Frequency, RF) is generally unbiased,
but has very high variance for rare (low-count) events
How to estimate P(ZZZ|C(ZZZ)=k)?
for small k, RF is biased high!
for k=0, it is biased low
the Good-Turing formula corrects this
can be seen as a leave-one-out method
Good-Turing can be applied to rare events of any kind (not just
unigrams)
need for discounting is not due to the open vocabulary
Smoothing a Multinomial Carnegie
Mellon
1. Assign a floor (minimum allowed value)
2. Interpolate with a uniform or other lower-variance
estimator
a case of “shrinkage” (Stein’s paradox)
Bayesian interpretation: parameters were generated from a
meta-distribution
non-Bayesian interpretation: trade off bias+variance
note: amount of smoothing doesn’t adjust to amount of
data!
Smoothing a Multinomial (cont.) Carnegie
Mellon
3. Add some amounts to the counts
Bayesian interpretation: using the posterior mean, and a Dirichlet
prior
Dirichlet is a conjugate prior wrt the multinomial
special case of V=2: Beta is a conjugate prior wrt the binomial
Note: amount of smoothing adjusts well to amount of data
Bayesian interpretation: data will eventually overwhelm any prior
4. “Discount” small-counts (if new words are expected)
discounted mass goes to the new words
Good-Turing discounting
Absolute discounting
...