Professional Documents
Culture Documents
6739 06 ConfidenceIntervals 200302
6739 06 ConfidenceIntervals 200302
Confidence Intervals
Dave Goldsman
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology
3/2/20
P (L ≤ θ ≤ U ) = 1 − α.
Example: We’re 95% sure that President Smith’s popularity is 56% ± 3%.
Sample # L U θ CI covers θ?
1 1.86 2.23 2 Yes
2 1.90 2.31 2 Yes
3 3.21 3.86 2 No
4 1.75 2.10 2 Yes
5 1.72 2.03 2 Yes
.. .. .. ..
. . . .
10 1.62 1.98 2 No
As the number of samples gets large, the proportion of CIs that cover the
unknown θ approaches 1 − α.
There are lots of different kinds of confidence intervals. We’ll look at CIs for
the mean and variance of normal distributions as well as CIs for the success
probability of Bernoulli distributions. We’ll also extend those results to
compare competing normal distributions as well as competing Bernoulli
distributions.
Outline
Set-up: Sample from a normal distribution with unknown mean µ and known
variance σ 2 .
iid
Details: Suppose X1 , . . . , Xn ∼ Nor(µ, σ 2 ), where σ 2 is known.
Pn
Use X̄ = i=1 Xi /n as our point estimator. Recall
X̄ − µ
X̄ ∼ Nor(µ, σ 2 /n) ⇒ Z ≡ p ∼ Nor(0, 1).
σ 2 /n
X̄ − µ
= P − zα/2 ≤ p ≤ zα/2
σ 2 /n
p p
= P − zα/2 σ 2 /n ≤ X̄ − µ ≤ zα/2 σ 2 /n
p p
= P X̄ − zα/2 σ 2 /n ≤ µ ≤ X̄ + zα/2 σ 2 /n
| {z } | {z }
L U
= P (L ≤ µ ≤ U ).
Remarks:
Notice how we used the pivot to “isolate” µ all by itself to the middle of the
inequalities.
Sample-Size Calculation
If we had taken p
more observations, then the CI would have gotten shorter,
since H = zα/2 σ 2 /n.
In fact, how many observations should be taken to make the half-length (or
“error”) ≤ ?
p
zα/2 σ 2 /n ≤ iff n ≥ (σzα/2 /)2 .
Remark: We can similarly obtain one-sided CIs for µ (if we’re just interested
in one bound). . .
p
100(1 − α)% upper CI: µ ≤ X̄ + zα σ 2 /n.
p
100(1 − α)% lower CI: µ ≥ X̄ − zα σ 2 /n.
Outline
Example: Do Georgia Tech students have higher average IQ’s than Univ. of
Georgia students? Answer: Yes.
We’ll again assume that the variances of the two competitors are somehow
known. We’ll do the more-realistic unknown variance case in the next lesson.
Set-up: Suppose we have samples of sizes n and m from the two competing
populations.
iid
X1 , X2 , . . . , Xn ∼ Nor(µx , σx2 ) (population 1)
iid
Y1 , Y2 , . . . , Ym ∼ Nor(µy , σy2 ) (population 2),
where the means µx and µy are unknown, while σx2 and σy2 are somehow
known.
Obviously,
so that
σx2 σy2
X̄ − Ȳ ∼ Nor µx − µy , + .
n m
X̄ − Ȳ − (µx − µy )
Z ≡ q ∼ Nor(0, 1),
σx2 σy2
n + m
so that
P (−zα/2 ≤ Z ≤ zα/2 ) = 1 − α.
Similarly,
Example: A traveling professor gives the same test to Georgia Tech (X) and
University of Georgia (Y ) students. She assumes that the test scores are
normally distributed with known standard deviations of σx = 20 points and
σy = 12 points, respectively. She takes random samples of 40 GT scores and
24 UGA tests and observes sample means of X̄ = 95 points and Ȳ = 60,
respectively. Let’s find the 90% two-sided CI for µx − µy .
s
σx2 σy2
µx − µy ∈ X̄ − Ȳ ± zα/2 +
n m
r
400 144
= 35 ± 1.645 +
40 24
= 35 ± 6.58,
implying that 28.42 ≤ µx − µy ≤ 41.58. In other words, we’re 90% sure that
µx − µy lies in this interval. 2
Remark: Let’s assume that the sample sizes are equal, i.e., n = m.
To obtain a half-length ≤ , we require
2 (σ 2 + σ 2 )
zα/2 x y
n ≥ .
2
In the next lesson we’ll explore the more-realistic case in which the variance
of the underlying observations is unknown. Now it’s time for t. . . .
Outline
At this point we’ll look at the more-realistic case in which the variance of the
underlying normal random variables is unknown.
This takes a little more work, but has many more applications.
iid
Set-up: X1 , X2 , . . . , Xn ∼ Nor(µ, σ 2 ), with σ 2 unknown.
Facts:
X̄ − µ
(a) p ∼ Nor(0, 1).
σ 2 /n
(b) X̄ and S 2 are independent.
n
1 X σ 2 χ2 (n − 1)
(c) S2 = (Xi − X̄)2 ∼ .
n−1 n−1
i=1
Sketch of Proof: We proved Fact (a) previously. The proof of Fact (b) uses
what is known as Cochran’s Theorem, which is a bit beyond P our scope. In
order to derive Fact (c), we first do some algebra to re-write ni=1 (Xi − µ)2 ,
n n
X
2
X 2
(Xi − µ) = (Xi − X̄) + (X̄ − µ)
i=1 i=1
n
X n
X
2
= (Xi − X̄) + 2(X̄ − µ) (Xi − X̄) + n(X̄ − µ)2
i=1 i=1
n
X P
= (Xi − X̄)2 + n(X̄ − µ)2 i (Xi − X̄) = 0 ,
i=1
Now note that Xi ∼ Nor(µ, σ 2 ) and the fact that [Nor(0, 1)]2 ∼ χ2 (1)
together imply (Xi − µ)2 ∼ σ 2 χ2 (1), for all i; so the additive property of
independent χ2 ’s implies that
n
1 X σ 2 χ2 (n)
(Xi − µ)2 ∼ . (2)
n−1 n−1
i=1
n σ 2 χ2 (1)
(X̄ − µ)2 ∼ . (3)
n−1 n−1
Since X̄ and S 2 are independent (by Fact (b)), we can substitute the results
from Equations (2) and (3) into (1) and then invoke χ2 additivity to obtain
σ 2 χ2 (n) σ 2 χ2 (1) σ 2 χ2 (n − 1)
∼ S2 + ⇒ S2 ∼ . 2
n−1 n−1 n−1
√X̄−µ
σ 2 /n Nor(0, 1)
p ∼ p ∼ t(n − 1).
2
S /σ 2 2
χ (n − 1)/(n − 1)
In other words,
X̄ − µ
p ∼ t(n − 1).
S 2 /n
Note that this expression doesn’t contain the unknown σ 2 . It’s been replaced
by S 2 , which we can calculate.
p
100(1 − α)% lower CI: µ ≥ X̄ − tα,n−1 S 2 /n.
p
100(1 − α)% upper CI: µ ≤ X̄ + tα,n−1 S 2 /n.
Example: Here are 20 residual flame times (in seconds) of treated specimens
of children’s nightwear. (Don’t worry — children were not in the nightwear
when the clothing was set on fire.)
R Code:
x <- data.frame(x=c(
9.85, 9.93, 9.75, 9.77, 9.67,
9.87, 9.67, 9.94, 9.85, 9.75,
9.83, 9.92, 9.74, 9.99, 9.88,
9.95, 9.95, 9.93, 9.92, 9.89))
print(confint(lm(x~1,x)))
2.5 \% 97.5 \%
(Intercept) 9.807357 9.897643
Outline
We assume that the means µx and µy are unknown, and the variances σx2 and
σy2 are also unknown.
Big Assumption: Suppose for now that σx2 = σy2 = σ 2 , i.e., that the two
variances are unknown but equal.
Both Sx2 and Sy2 are estimators for σ 2 (the common variance).
A better estimator is the pooled estimator for σ 2 , which uses the info from
both Sx2 and Sy2 .
(n − 1)Sx2 + (m − 1)Sy2
Sp2 ≡ .
n+m−2
Theorem:
σ 2 χ2 (n + m − 2)
Sp2 ∼ ,
n+m−2
Proof: Follows from the fact that independent χ2 RVs add up. 2
X̄ − Ȳ − (µx − µy )
q ∼ t(n + m − 2).
Sp n1 + m 1
So, after more of the usual algebra, we get a two-sided 100(1 − α)% CI for
µx − µy :
r
1 1
µx − µy ∈ X̄ − Ȳ ± tα/2,n+m−2 Sp + ,
n m
iid
Georgia Tech students: X1 , . . . , X25 ∼ Nor(µx , σ 2 ).
iid
Univ. of Georgia students: Y1 , . . . , Y36 ∼ Nor(µy , σ 2 ).
The two sample variances are pretty close, so we’ll go ahead and feel good
about our common σ 2 assumption; and we’ll use the pooled estimator,
Note that the above CI doesn’t contain 0 (not even close). But it’s obvious
anyway that GT students are smarter than UGA students! Duh. 2
Outline
As before, start by calculating sample means and variances, X̄, Ȳ , Sx2 , Sy2 .
Since the variances are possibly unequal, we can’t use the pooled estimator
Sp2 . Instead, use Welch’s approximation method.
X̄ − Ȳ − (µx − µy )
t? ≡ q ≈ t(ν),
Sx2 Sy2
n + m
You can tell from Sx2 and Sy2 that there’s no way that the two variances are
equal. So we’ll have to use the approximation method.
= 20 ± 9.56. 2
Outline
The RVs between different pairs are independent. The two observations
within the same pair may not be independent.
Pair 1 : (X1 , Y1 )
Pair 2 : (X2 , Y2 )
.. ..
independent . .
Pair n : (Xn , Yn )
| {z }
not indep
Example: One twin takes a new drug, the other takes a placebo.
We assume that the means µx and µy are unknown, and the variances σx2 and
σy2 are also unknown and possibly unequal.
Di ≡ Xi − Yi , i = 1, 2, . . . , n.
Note that
iid
D1 , D2 , . . . , Dn ∼ Nor(µd , σd2 ),
where µd ≡ µx − µy (which is what we want the CI for), and
Now the problem reduces to the old Nor(µ, σ 2 ) case with unknown µ and σ 2 .
D̄ − µd
q ∼ t(n − 1).
Sd2 /n
After the usual algebra on the pivot (with the t distribution instead of the
normal), we obtain a two-sided 100(1 − α)% CI,
q
µd = µx − µy ∈ D̄ ± tα/2,n−1 Sd2 /n.
q
One-sided lower: µd ≥ D̄ − tα,n−1 Sd2 /n.
q
One-sided upper: µd ≤ D̄ + tα,n−1 Sd2 /n.
Clearly, the people are independent, but the times for the same individual to
park the two cars may not be independent.
Let’s assume that all times are normal. We want a 90% two-sided CI for
µd = µh − µc .
Thus, the CI is
q
µd ∈ D̄ ± t0.05,4 Sd2 /n
p
= −9 ± 2.13 42.5/5
= −9 ± 6.21. 2
So why didn’t we just use the “usual” CI for the difference of two means?
Namely, s
Sx2 Sy2
µx − µy ∈ X̄ − Ȳ ± tα/2,ν + .
n m
Main reason: This CI requires the X’s to be independent of the Y ’s. (Recall
that the paired-t method allows Xi and Yi to be dependent.)
Good thing about using the “usual” method: The approximate df ν from the
“usual” method would probably be larger than the df n − 1 from the paired-t
method. This would make the CI smaller.
Bad thing: You’d introduce much more noise into the system by using the
“usual” method. This could make the CI much larger.
Example: Back to the car example (now use “usual” approximate method).
Just concentrate on the Xi and Yi columns. (The middle column is from the
last example for comparison purposes.)
The Xi ’s and Yi ’s have the same sample averages as before, but now there’s
more natural variation, since we’re using 10 different people.
Then we have
2
Sy2
Sx2 2
n + n 4 62.5 + 142.5 .
ν = (Sy2 /n)2
= 2 2
= 6.94 = 7.
(Sx2 /n)2 (62.5) + (142.5)
n−1 + n−1
This CI is wider than the paired-t version, even though we have more df here.
Outline
Next Few Lessons: We’ll now look at a potpourri of different types of CIs.
Normal variance.
Ratio of normal variances from two populations.
Bernoulli proportion.
Difference of Bernoulli proportions from two populations.
To address how much variability we can expect from some system, we’ll now
get a CI for the variance σ 2 of a normal distribution (instead of the mean).
Usual set-up:
iid
X1 , X2 , . . . , Xn ∼ Nor(µ, σ 2 ).
Since
(n − 1)S 2
∼ χ2 (n − 1),
σ2
we use χ2 quantiles and some algebra to get
(n − 1)S 2
2 2
1 − α = P χ1− α ,n−1 ≤ ≤ χ α ,n−1
2 σ2 2
σ2
1 1
= P ≥ ≥ 2
χ21− α ,n−1 (n − 1)S 2 χ α ,n−1
2 2
(n − 1)S 2 (n − 1)S 2
2
= P ≤σ ≤ 2 .
χ2α ,n−1 χ1− α ,n−1
2 2
(n − 1)S 2 (n − 1)S 2
σ2 ∈ , .
χ2α ,n−1 χ21− α ,n−1
2 2
Example: Suppose 25 people take an IQ test and that their scores are
normally distributed.
Outline
Set-up:
iid
X1 , X2 , . . . , Xn ∼ Nor(µx , σx2 ).
iid
Y1 , Y2 , . . . , Ym ∼ Nor(µy , σy2 ).
All X’s and Y ’s are independent with unknown means and variances.
m
1 X σy2 χ2 (m − 1)
Sy2 = (Yi − Ȳ )2 ∼ .
m−1 m−1
i=1
Thus,
Sy2 /σy2 χ2 (m − 1)/(m − 1)
∼ ∼ F (m − 1, n − 1).
Sx2 /σx2 χ2 (n − 1)/(n − 1)
Sy2 /σy2
1 − α = P F1− α2 ,m−1,n−1 ≤ 2 2 ≤ F α2 ,m−1,n−1
Sx /σx
2
σx2 Sx2
Sx 1
= P ≤ 2 ≤ 2 F 2 ,m−1,n−1 .
α
Sy2 F α2 ,n−1,m−1 σy Sy
Remark: The CI for σx2 /σy2 is proportional to the ratio of the sample
variances, Sx2 /Sy2 , and contains no reference to µx or µy .
Sx2 1 σx2
≤ .
Sy2 Fα,n−1,m−1 σy2
σx2 Sx2
≤ Fα,m−1,n−1 .
σy2 Sy2
Remark: If you want CIs for σy2 /σx2 , just flip all of the X’s and Y ’s in the
various CIs discussed above.
Outline
iid
Suppose that X1 , X2 , . . . , Xn ∼ Bern(p).
Bin(n, p)
X̄ ∼ .
n
Let’s assume that n is “large,” so that we’ll be able to use the Central Limit
Theorem (and don’t worry about the continuity correction).
(If n isn’t large, then we’ll have to use nasty Binomial tables, which I don’t
want to deal with here!)
Note that
E[X̄] = E[Xi ] = p, and
Var(X̄) = Var(Xi )/n = pq/n.
X̄ − E[X̄] X̄ − p
p = p ≈ Nor(0, 1).
Var(X̄) pq/n
X̄ − p
p ≈ Nor(0, 1).
X̄(1 − X̄)/n
Suppose that a random sample of 200 students yields 160 correct answers to
the question.
2 (1.96)2
n ≥ zα/2 /(42 ) = = 1067.1.
4 (0.03)2
Instead suppose that a pilot study gives an estimate of p̂ = 0.7. How should n
be revised?
2 (1.96)2
n ≥ zα/2 p̂(1 − p̂)/2 = (0.7)(0.3) = 896.4.
(0.03)2
Remark: We can also derive approximate CIs for the difference between
two competing Bernoulli parameters.
iid iid
Suppose that X1 , X2 , . . . , Xn ∼ Bern(px ) and Y1 , Y2 , . . . , Ym ∼ Bern(py ),
where the two samples are independent of each other.
Using the techniques already developed in this lesson, we can easily obtain
the following approximate two-sided 100(1 − α)% CI for the difference of
the success proportions:
r
X̄(1 − X̄) Ȳ (1 − Ȳ )
px − py ∈ X̄ − Ȳ ± zα/2 + .
n m