You are on page 1of 1

50 2 Density Estimation

Central Limit Theorem The central limit theorem is an approximation.


This means that our reasoning is not accurate any more. That said, for
large enough sample sizes, the approximation is good enough to use it for
practical predictions. Assume for the moment that we knew the variance σ 2
exactly. In this case we know that X̄m is approximately normal with mean
µB and variance m−1 σ 2 . We are interested in the interval [µ − , µ + ] which
contains 95% of the probability mass of a normal distribution. That is, we
need to solve the integral
Z µ+
(x − µ)2
 
1
exp − dx = 0.95 (2.20)
2πσ 2 µ− 2σ 2

This can be solved efficiently using the cumulative distribution function of


a normal distribution (see Problem 2.3 for more details). One can check
that (2.20) is solved for  = 2.96σ. In other words, an interval of ±2.96σ
contains 95% of the probability mass of a normal distribution. The number
of observations is therefore determined by

√ σ2
 = 2.96σ/ m and hence m = 8.76 2 (2.21)

Again, our problem is that we do not know the variance of the distribution.
Using the worst-case bound on the variance, i.e. σ 2 = 40, 000 would lead to
a requirement of at least m = 876 wafers for testing. However, while we do
not know the variance, we may estimate it along with the mean and use the
empirical estimate, possibly plus some small constant to ensure we do not
underestimate the variance, instead of the upper bound.
Assuming that fluctuations turn out to be in the order of 50 processors,
i.e. σ 2 = 2500, we are able to reduce our requirement to approximately 55
wafers. This is probably an acceptable number for a practical test.

Rates and Constants The astute reader will have noticed that all three
confidence bounds had scaling behavior m = O(−2 ). That is, in all cases
the number of observations was a fairly ill behaved function of the amount
of confidence required. If we were just interested in convergence per se, a
statement like that of the Chebyshev inequality would have been entirely
sufficient. The various laws and bounds can often be used to obtain con-
siderably better constants for statistical confidence guarantees. For more
complex estimators, such as methods to classify, rank, or annotate data,
a reasoning such as the one above can become highly nontrivial. See e.g.
[MYA94, Vap98] for further details.

You might also like