Professional Documents
Culture Documents
a=0.56. b=0.60
c=(a−µ)/σ=−0.895. d=(b−µ)/σ=1.663.
P(a<X<b) = (erf(d/ 2)− erf(c/ 2))/2 = 0.766
What does “confidence interval” mean?
Naïve answer
The one we’ve already seen:
If the true frequency in the population is .574 and you take a random
sample of 1000 elements, then with probability .95, the fraction in the
sample will be between 0.543 and 0.605
Pollster: That’s not what I want to know. I have the frequency in the
sample, I need to know something about the frequency in the
population.
What does “confidence interval” mean?
Frequentist answer
Frequentist: I’ll give you a procedure for computing a 95% confidence
interval. If you follow this procedure whenever you have taken a
random sample, using the frequency in the sample as p, then 95% of
the time the true probability is in the confidence interval. For practical
purposes it’s the same as the naïve formula, in almost all cases.
Discussion of frequentist answer
Pollster: Oh. That seems rather confusing and indirect. You are talking
about going from frequencies in the sample to frequencies in the
population, which is good. But only in terms of a whole collection of
hypothetical experiments that I am not planning to run. I have a
specific situation to deal with: I polled 1000 people at random; 574
prefer blueberry pie. Can’t I just say that with probability .95, the
fraction in the population is between 0.543 and 0.645?
Frequentist: No. You’re talking nonsense. There’s no such thing as “the
probability that the fraction is between 0.543 and 0.645.” The fraction
is whatever it is. It isn’t drawn from a sample space. It isn’t generated
by a stochastic process. It doesn’t have a probability distribution.
Frequentist procedure
for confidence interval
What we want: for sample size N and confidence level h:
Two monotonically increasing functions LN,h(p) and UN,h(p) such that:
If the true frequency of the property in the population is f,
and you take a sample of size N,
and the frequency of the property in the sample S is p
Then with probability ≥ h, LN,h(p) ≤ f ≤UN,h(p).
Note: This is about the probability distribution of p (which the frequentist
considers legitimate), not of f (which the frequentist considers illegitimate).
Whatever poll is being carried out, and whatever f is, if you take a random
sample, measure p, and compute the interval, then with probability h, the
interval lies around f.
Finding L and H
f = true frequency. p=frequency in sample. h=confidence level
Find monotonically increasing functions Q(f) and R(f) such that if the
true frequency is f such that P(Q(f) ≤ p ≤ R(f)) ≥ h. This was what we
did in the naïve answer.
Q(f) ≤ p is the same as f ≤ Q-1(p) (Q inverse).
p ≤ R(f) is the same as R-1(p) ≤ f.
So choose L= R-1 and U= Q-1 and then P(L(p) ≤ f ≤ U(p)) ≥ h, which is
what we want.
Finally, if N is substantial and p is not close to 0 or 1, and you use the
Gaussian approximation we used before, you can just use the same
confidence intervals as before.