Welcome to Scribd!

Sparseness Smoothing

Uploaded by

0% found this document useful (0 votes)

9 views3 pages

The document discusses techniques for estimating rare events and smoothing multinomial distributions. It notes that maximum likelihood estimation has high variance for rare events. The Good-Turing formula corrects for bias in estimating probabilities of rare events with zero counts. Smoothing techniques for multinomials include assigning a floor value, interpolating with a lower-variance estimator, adding amounts to counts using a Dirichlet prior, and discounting small counts to account for potential new words.

Original Description:

Original Title

sparseness+smoothing (1).ppt

Copyright

Available Formats

PPT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

9 views3 pages

Sparseness Smoothing

Uploaded by

Khanh Nguyen

Copyright:

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

Estimating Rare Events Carnegie

Mellon
 MLE (Relative Frequency, RF) is generally unbiased,
but has very high variance for rare (low-count) events
 How to estimate P(ZZZ|C(ZZZ)=k)?
 for small k, RF is biased high!
 for k=0, it is biased low
 the Good-Turing formula corrects this
 can be seen as a leave-one-out method
 Good-Turing can be applied to rare events of any kind (not just
unigrams)
 need for discounting is not due to the open vocabulary
Smoothing a Multinomial Carnegie
Mellon
1. Assign a floor (minimum allowed value)
2. Interpolate with a uniform or other lower-variance
estimator
 a case of “shrinkage” (Stein’s paradox)
 Bayesian interpretation: parameters were generated from a
meta-distribution
 non-Bayesian interpretation: trade off bias+variance
 note: amount of smoothing doesn’t adjust to amount of
data!
Smoothing a Multinomial (cont.) Carnegie
Mellon
3. Add some amounts to the counts
 Bayesian interpretation: using the posterior mean, and a Dirichlet
prior
 Dirichlet is a conjugate prior wrt the multinomial
 special case of V=2: Beta is a conjugate prior wrt the binomial
 Note: amount of smoothing adjusts well to amount of data
 Bayesian interpretation: data will eventually overwhelm any prior
4. “Discount” small-counts (if new words are expected)
 discounted mass goes to the new words
 Good-Turing discounting
 Absolute discounting
 ...

The mathematics of quantum mechanics
From Everand
The mathematics of quantum mechanics
Alessio Mangoni
No ratings yet
Introduction To Bayesian Vars: Tony Yates, Lecture To MSC Time Series, Bristol, Spring 2014
Document42 pages
Introduction To Bayesian Vars: Tony Yates, Lecture To MSC Time Series, Bristol, Spring 2014
fredy alexander lopez granados
No ratings yet
Intro To Estimation 1 To 3
Document84 pages
Intro To Estimation 1 To 3
Pedro Gouv
No ratings yet
Bayesian Uncertainty Quantification
Document23 pages
Bayesian Uncertainty Quantification
Kowshik Thopalli
No ratings yet
Notes On Asymptotic Theory: IGIER-Bocconi, IZA and FRDB
Document11 pages
Notes On Asymptotic Theory: IGIER-Bocconi, IZA and FRDB
Luisa Herrera
No ratings yet
Statistical Techniques For Data Analysis in Cosmology
Document52 pages
Statistical Techniques For Data Analysis in Cosmology
Rejoice Yourself
No ratings yet
CH 1 Ex
Document3 pages
CH 1 Ex
Harold Contreras
No ratings yet
Multiple Regression Analysis
Document3 pages
Multiple Regression Analysis
Tandis Asadi
No ratings yet
Minimax
Document26 pages
Minimax
Haorui Li
No ratings yet
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
Document7 pages
Estimating Distributions and Densities: 36-350, Data Mining, Fall 2009 23 November 2009
machinelearner
No ratings yet
Bayes and Frequentism: Return of An Old Controversy: Louis Lyons
Document40 pages
Bayes and Frequentism: Return of An Old Controversy: Louis Lyons
khyati123
No ratings yet
Bayesian Estimation
Document36 pages
Bayesian Estimation
sobana
No ratings yet
Information Entropy and Uncertainty Relations: Henry E. Montgomery, JR
Document5 pages
Information Entropy and Uncertainty Relations: Henry E. Montgomery, JR
Jonathan
No ratings yet
Latihan Soal
Document49 pages
Latihan Soal
Pesta Sigalingging
No ratings yet
Statistical Inference Notes Melon University
Document5 pages
Statistical Inference Notes Melon University
Åbd Ür Råhmåñ
No ratings yet
MIT18 05S14 Reading6b PDF
Document13 pages
MIT18 05S14 Reading6b PDF
akbarfarraz
No ratings yet
A Brief Introduction To Nonlinear Vibrations: 1 General Comments
Document20 pages
A Brief Introduction To Nonlinear Vibrations: 1 General Comments
Mj
No ratings yet
ACST356 Section 4 Complete Notes
Document29 pages
ACST356 Section 4 Complete Notes
An
No ratings yet
Bayesian
Document50 pages
Bayesian
Andika Bayu Aji
No ratings yet
1805 06733
Document14 pages
1805 06733
smith tom
No ratings yet
Lecture 24-25: Weighted and Generalized Least Squares: 36-401, Fall 2015, Section B 19 and 24 November 2015
Document27 pages
Lecture 24-25: Weighted and Generalized Least Squares: 36-401, Fall 2015, Section B 19 and 24 November 2015
Juan Camilo
No ratings yet
Debre Berhan University: College of Natural and Computational Science Department of Statistics
Document9 pages
Debre Berhan University: College of Natural and Computational Science Department of Statistics
muralidharan
No ratings yet
Note 0
Document38 pages
Note 0
Subhra Sankar
No ratings yet
Econometrics Chap - 3
Document39 pages
Econometrics Chap - 3
Cris
No ratings yet
Distant Integer Signals: Identity, Maths Club of IISER Kolkata
Document13 pages
Distant Integer Signals: Identity, Maths Club of IISER Kolkata
Sabarno Saha
No ratings yet
Martingales Junge
Document49 pages
Martingales Junge
Juan Van Halen
No ratings yet
Nishakova Robust Approximation
Document8 pages
Nishakova Robust Approximation
Rolly Cadillo Poma
No ratings yet
Probability in Computing: Lecture 8: Central Limit Theorems
Document16 pages
Probability in Computing: Lecture 8: Central Limit Theorems
Lương Mạnh Đạt
No ratings yet
Adaptive and Statistical Signal Processing
Document21 pages
Adaptive and Statistical Signal Processing
fff9210
No ratings yet
Binomial
Document3 pages
Binomial
NEERAJ KUMAR
No ratings yet
37 3 Poisson Dist
Document16 pages
37 3 Poisson Dist
tarek moahmoud khalifa
No ratings yet
1 Non-Uniform Quantizer - PDF
Document5 pages
1 Non-Uniform Quantizer - PDF
ABHISHEK B
No ratings yet
Computational Higher Type Theory (CHTT) : Robert Harper Lecture Notes of Week 3 by Michael Coblenz and Ryan Kavanagh
Document11 pages
Computational Higher Type Theory (CHTT) : Robert Harper Lecture Notes of Week 3 by Michael Coblenz and Ryan Kavanagh
jiawen liu
No ratings yet
Print Mda 3
Document24 pages
Print Mda 3
surekha
No ratings yet
CS369N: Beyond Worst-Case Analysis Lecture #5: Self-Improving Algorithms
Document11 pages
CS369N: Beyond Worst-Case Analysis Lecture #5: Self-Improving Algorithms
Jenny Jack
No ratings yet
1 N 1 N N N N N N
Document12 pages
1 N 1 N N N N N N
taregh
No ratings yet
Msda3 Notes
Document8 pages
Msda3 Notes
atleti2
No ratings yet
Quantum Information Theory Series 11
Document1 page
Quantum Information Theory Series 11
Chiranjib Mukhopadhyay
No ratings yet
Business Statistics and Analysis Course 2&3
Document42 pages
Business Statistics and Analysis Course 2&3
Mugdho Hossain
No ratings yet
Part 2
Document231 pages
Part 2
Abdul Shamad
No ratings yet
Chapter 2 Bayesian Inference: 2.1 Continuous Variables and Eliciting Probability Distributions
Document15 pages
Chapter 2 Bayesian Inference: 2.1 Continuous Variables and Eliciting Probability Distributions
neethi
No ratings yet
Non-Parametric Methods
Document51 pages
Non-Parametric Methods
bill.morrisson
No ratings yet
T Multivareate Wavelet Thresholding
Document35 pages
T Multivareate Wavelet Thresholding
rotero_pujol
No ratings yet
Machine Learning and Pattern Recognition Week 2 Error Bars
Document3 pages
Machine Learning and Pattern Recognition Week 2 Error Bars
zeliawillscumberg
No ratings yet
Lec 1
Document5 pages
Lec 1
william walters
No ratings yet
Kriging: Applied Geostatistics For Mining Professionals
Document37 pages
Kriging: Applied Geostatistics For Mining Professionals
crista essien
No ratings yet
Chapter 1: Measurement: Summary Points and Objectives
Document8 pages
Chapter 1: Measurement: Summary Points and Objectives
Francis Karanja
No ratings yet
Tom Belulovich: N N N 1 N N 1 N N 1 N N 1 N
Document5 pages
Tom Belulovich: N N N 1 N N 1 N N 1 N N 1 N
Shweta Sridhar
No ratings yet
Classical Prelim Version
Document6 pages
Classical Prelim Version
Nida Razzaq
No ratings yet
05 Predicate Logic
Document20 pages
05 Predicate Logic
Amanpreet Walia
No ratings yet
Poisson Distribution
Document6 pages
Poisson Distribution
connorcollingwood
No ratings yet
Tugas Mencari Jurnal Kel 2
Document6 pages
Tugas Mencari Jurnal Kel 2
Suryani Sitanggang
No ratings yet
Summary of Definitions and Theorems For Sequences Math 121 Calculus II
Document1 page
Summary of Definitions and Theorems For Sequences Math 121 Calculus II
youness hida
No ratings yet
Boosting and Applications Yuan
Document41 pages
Boosting and Applications Yuan
Claudia Larray
No ratings yet
Beyond The Bakushinkii Veto - Regularising Linear Inverse Problems Without Knowing The Noise Distribution
Document23 pages
Beyond The Bakushinkii Veto - Regularising Linear Inverse Problems Without Knowing The Noise Distribution
zys johnson
No ratings yet
Pagina 1
Document1 page
Pagina 1
mariana mourão
No ratings yet
Mathanalysis2 04
Document5 pages
Mathanalysis2 04
이태호
No ratings yet
37 1 DSCRT Prob Distn
Document16 pages
37 1 DSCRT Prob Distn
tarek moahmoud khalifa
No ratings yet
Lecture 10
Document38 pages
Lecture 10
sbernardo
No ratings yet
SRM Notes
Document38 pages
SRM Notes
YU XUAN LEE
No ratings yet