6739 06 ConfidenceIntervals 200302

6.
Confidence Intervals
Dave Goldsman
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology
3/2/20
ISYE 6739 — Goldsman 3/2/20 1 / 75

Outline
1 Introduction to Confidence Intervals

2 Normal Mean (variance known)
3 Difference of Two Normal Means (variances known)
4 Normal Mean (variance unknown)
5 Difference of Two Normal Means (unknown equal variances)
6 Difference of Two Normal Means (variances unknown)
7 Difference of Paired Normal Means (variances unknown)
8 Normal Variance
9 Ratio of Variances of Two Normals
10 Bernoulli Proportion
ISYE 6739 — Goldsman 3/2/20 2 / 75

Introduction to Confidence Intervals
Lesson 6.1 — Introduction to Confidence Intervals
Next Few Lessons:

This Introduction to Confidence Intervals.
CI for the Mean of a Normal with Known Variance (easiest case).
CI for the Mean Difference of Two Normals with Known Variances.
Idea: Instead of estimating a parameter by a point estimator alone, give a

(random) interval that contains the parameter with a certain probability.
Example: X̄ is a point estimator for the parameter µ.

A 95% confidence interval for µ might look like
h p p i
µ ∈ X̄ − z0.025 σ 2 /n, X̄ + z0.025 σ 2 /n ,
where z0.025 is the Nor(0, 1)’s 0.975 quantile.

This means that µ is in the interval with probability 0.95.
ISYE 6739 — Goldsman 3/2/20 3 / 75
Definition: A 100(1 − α)% confidence interval for an unknown

parameter θ is given by two random variables L and U satisfying
P (L ≤ θ ≤ U ) = 1 − α.
L and U are the lower and upper confidence limits, and

1 − α is the confidence coefficient, specified in advance.
There is a 1 − α chance that θ actually lies between L and U .
Example: We’re 95% sure that President Smith’s popularity is 56% ± 3%.
Since L ≤ θ ≤ U , we call [L, U ] a two-sided CI for θ.
If L is such that P (L ≤ θ) = 1 − α, then [L, ∞) is a 100(1 − α)%

one-sided lower CI for θ.
Similarly, if U is such that P (θ ≤ U ) = 1 − α, then (−∞, U ] is a

100(1 − α)% one-sided upper CI for θ.
ISYE 6739 — Goldsman 3/2/20 4 / 75
Example: Here are some results from 10 independent samples, each

consisting of 100 different observations. From each sample, we use the 100
observations to re-calculate L and U .
Is the unknown θ in [L, U ]?
Sample # L U θ CI covers θ?
1 1.86 2.23 2 Yes
2 1.90 2.31 2 Yes
3 3.21 3.86 2 No
4 1.75 2.10 2 Yes
5 1.72 2.03 2 Yes
.. .. .. ..
. . . .
10 1.62 1.98 2 No
ISYE 6739 — Goldsman 3/2/20 5 / 75

As the number of samples gets large, the proportion of CIs that cover the
unknown θ approaches 1 − α.
Sometimes CIs miss θ too low, i.e., θ > U .
Sometimes CIs miss θ too high, i.e., θ < L.
But 1 − α of the time, they’re just right — Three Little Bears.
There are lots of different kinds of confidence intervals. We’ll look at CIs for
the mean and variance of normal distributions as well as CIs for the success
probability of Bernoulli distributions. We’ll also extend those results to
compare competing normal distributions as well as competing Bernoulli
distributions.
ISYE 6739 — Goldsman 3/2/20 6 / 75

Normal Mean (variance known)
Outline

8 Normal Variance
ISYE 6739 — Goldsman 3/2/20 7 / 75

Lesson 6.2 — Normal Mean (variance known)
Set-up: Sample from a normal distribution with unknown mean µ and known
variance σ 2 .
Goal: Obtain a confidence interval for µ.
Remark: This is an unrealistic case, since if we didn’t know µ in real life,

then we wouldn’t know σ 2 either. But it’s a good place to start the discussion.
iid
Details: Suppose X1 , . . . , Xn ∼ Nor(µ, σ 2 ), where σ 2 is known.
Pn
Use X̄ = i=1 Xi /n as our point estimator. Recall
X̄ − µ
X̄ ∼ Nor(µ, σ 2 /n) ⇒ Z ≡ p ∼ Nor(0, 1).
σ 2 /n
The quantity Z is called a pivot. It’s a “starting point” for us.

ISYE 6739 — Goldsman 3/2/20 8 / 75
The definition of Z implies that

1 − α = P − zα/2 ≤ Z ≤ zα/2
X̄ − µ
= P − zα/2 ≤ p ≤ zα/2
σ 2 /n
p p
= P − zα/2 σ 2 /n ≤ X̄ − µ ≤ zα/2 σ 2 /n
p p
= P X̄ − zα/2 σ 2 /n ≤ µ ≤ X̄ + zα/2 σ 2 /n
| {z } | {z }
L U
= P (L ≤ µ ≤ U ).
Thus, the 100(1 − α)% two-sided CI for µ is

p p
X̄ − zα/2 σ 2 /n ≤ µ ≤ X̄ + zα/2 σ 2 /n.
ISYE 6739 — Goldsman 3/2/20 9 / 75

Remarks:
Notice how we used the pivot to “isolate” µ all by itself to the middle of the
inequalities.
After you observe X1 , . . . , Xn , you calculate L and U . Nothing is unknown,

since L and U don’t involve µ.
Sometimes we’ll write the CI as µ ∈ X̄ ± H, where the half-width is

p
H ≡ zα/2 σ 2 /n.
ISYE 6739 — Goldsman 3/2/20 10 / 75

Example: Suppose we take n = 25 iid observations from a Nor(µ, σ 2 )

distribution, where we somehow know that σ = 30. Further suppose that X̄
turns out to be 278. Let’s find a 100(1 − α)% = 95% CI for µ.
p
µ ∈ X̄ ± zα/2 σ 2 /n
= 278 ± z0.025 (30/5)
= 278 ± 11.76 (z0.025 = 1.96).
So a 95% CI for µ is 266.24 ≤ µ ≤ 289.76. 2
ISYE 6739 — Goldsman 3/2/20 11 / 75

Sample-Size Calculation
If we had taken p
more observations, then the CI would have gotten shorter,
since H = zα/2 σ 2 /n.
In fact, how many observations should be taken to make the half-length (or
“error”) ≤ ?
p
zα/2 σ 2 /n ≤ iff n ≥ (σzα/2 /)2 .
Thus, the sample size goes up as. . .

The variance σ 2 increases (more uncertainty).
The required half-width decreases (becomes more stringent).
The value of α decreases (more confidence).
ISYE 6739 — Goldsman 3/2/20 12 / 75

Example: Suppose, in the previous example, that we want the half-length to

be ≤ 10, i.e., µ ∈ X̄ ± 10. What should n be?
2
n ≥ (σzα/2 /)2 = (30)(1.96)/10 = 34.57.
Just to make n an integer, round up to n = 35. 2
Remark: We can similarly obtain one-sided CIs for µ (if we’re just interested
in one bound). . .
p
100(1 − α)% upper CI: µ ≤ X̄ + zα σ 2 /n.
p
100(1 − α)% lower CI: µ ≥ X̄ − zα σ 2 /n.
Note that we use the 1 − α quantile (not 1 − α/2).
ISYE 6739 — Goldsman 3/2/20 13 / 75

Difference of Two Normal Means (variances known)
Outline

8 Normal Variance
ISYE 6739 — Goldsman 3/2/20 14 / 75

Lesson 6.3 — Difference of Two Normal Means (variances known)
Idea: Compare the means of two competing alternatives or processes by

getting a confidence interval for their difference.
Example: Do Georgia Tech students have higher average IQ’s than Univ. of
Georgia students? Answer: Yes.
We’ll again assume that the variances of the two competitors are somehow
known. We’ll do the more-realistic unknown variance case in the next lesson.
ISYE 6739 — Goldsman 3/2/20 15 / 75

Set-up: Suppose we have samples of sizes n and m from the two competing
populations.
iid
X1 , X2 , . . . , Xn ∼ Nor(µx , σx2 ) (population 1)
iid
Y1 , Y2 , . . . , Ym ∼ Nor(µy , σy2 ) (population 2),
where the means µx and µy are unknown, while σx2 and σy2 are somehow
known.
Also assume that the Xi ’s are independent of the Yi ’s.
Let’s find a CI for the difference in means, µx − µy .
ISYE 6739 — Goldsman 3/2/20 16 / 75

Define the sample means from populations 1 and 2,

n m
1X 1 X
X̄ ≡ Xi and Ȳ ≡ Yi .
n m
i=1 i=1
Obviously,
X̄ ∼ Nor(µx , σx2 /n) and Ȳ ∼ Nor(µy , σy2 /m),
so that
σx2 σy2

X̄ − Ȳ ∼ Nor µx − µy , + .
n m
ISYE 6739 — Goldsman 3/2/20 17 / 75

This implies that
X̄ − Ȳ − (µx − µy )
Z ≡ q ∼ Nor(0, 1),
σx2 σy2
n + m
so that
P (−zα/2 ≤ Z ≤ zα/2 ) = 1 − α.
Using the same manipulations as in the single-population case, we get a

two-sided 100(1 − α)% CI for µx − µy :
s
σx2 σy2
µx − µy ∈ X̄ − Ȳ ± zα/2 + .
n m
ISYE 6739 — Goldsman 3/2/20 18 / 75

Similarly,
One-sided upper CI:

s
σx2 σy2
µx − µy ≤ X̄ − Ȳ + zα + .
n m
One-sided lower CI:

s
σx2 σy2
µx − µy ≥ X̄ − Ȳ − zα + .
n m
ISYE 6739 — Goldsman 3/2/20 19 / 75

Example: A traveling professor gives the same test to Georgia Tech (X) and
University of Georgia (Y ) students. She assumes that the test scores are
normally distributed with known standard deviations of σx = 20 points and
σy = 12 points, respectively. She takes random samples of 40 GT scores and
24 UGA tests and observes sample means of X̄ = 95 points and Ȳ = 60,
respectively. Let’s find the 90% two-sided CI for µx − µy .
s
σx2 σy2
µx − µy ∈ X̄ − Ȳ ± zα/2 +
n m
r
400 144
= 35 ± 1.645 +
40 24
= 35 ± 6.58,
implying that 28.42 ≤ µx − µy ≤ 41.58. In other words, we’re 90% sure that
µx − µy lies in this interval. 2
ISYE 6739 — Goldsman 3/2/20 20 / 75

Remark: Let’s assume that the sample sizes are equal, i.e., n = m.
To obtain a half-length ≤ , we require
2 (σ 2 + σ 2 )
zα/2 x y
n ≥ .
2
In the next lesson we’ll explore the more-realistic case in which the variance
of the underlying observations is unknown. Now it’s time for t. . . .
ISYE 6739 — Goldsman 3/2/20 21 / 75

Normal Mean (variance unknown)
Outline

8 Normal Variance
ISYE 6739 — Goldsman 3/2/20 22 / 75

Lesson 6.4 — Normal Mean (variance unknown)
Next Few Lessons:

Confidence Interval for the Mean of a Normal Distribution with
Unknown Variance.
CIs for the Mean Difference of Two Normals with Unknown Variances
when. . .
the variances are assumed to be equal.
the variances are unequal.
observations between the two normals are taken in special pairs.
ISYE 6739 — Goldsman 3/2/20 23 / 75

At this point we’ll look at the more-realistic case in which the variance of the
underlying normal random variables is unknown.
This takes a little more work, but has many more applications.
iid
Set-up: X1 , X2 , . . . , Xn ∼ Nor(µ, σ 2 ), with σ 2 unknown.
Facts:
X̄ − µ
(a) p ∼ Nor(0, 1).
σ 2 /n
(b) X̄ and S 2 are independent.
n
1 X σ 2 χ2 (n − 1)
(c) S2 = (Xi − X̄)2 ∼ .
n−1 n−1
i=1
ISYE 6739 — Goldsman 3/2/20 24 / 75

Sketch of Proof: We proved Fact (a) previously. The proof of Fact (b) uses
what is known as Cochran’s Theorem, which is a bit beyond P our scope. In
order to derive Fact (c), we first do some algebra to re-write ni=1 (Xi − µ)2 ,
n n
X
2
X 2
(Xi − µ) = (Xi − X̄) + (X̄ − µ)
i=1 i=1
n
X n
X
2
= (Xi − X̄) + 2(X̄ − µ) (Xi − X̄) + n(X̄ − µ)2
i=1 i=1
n
X P
= (Xi − X̄)2 + n(X̄ − µ)2 i (Xi − X̄) = 0 ,
i=1
which can be rewritten as

n
1 X n
(Xi − µ)2 = S 2 + (X̄ − µ)2 . (1)
n−1 n−1
i=1
ISYE 6739 — Goldsman 3/2/20 25 / 75

Now note that Xi ∼ Nor(µ, σ 2 ) and the fact that [Nor(0, 1)]2 ∼ χ2 (1)
together imply (Xi − µ)2 ∼ σ 2 χ2 (1), for all i; so the additive property of
independent χ2 ’s implies that
n
1 X σ 2 χ2 (n)
(Xi − µ)2 ∼ . (2)
n−1 n−1
i=1
Similarly, Fact (a) implies that
n σ 2 χ2 (1)
(X̄ − µ)2 ∼ . (3)
n−1 n−1
Since X̄ and S 2 are independent (by Fact (b)), we can substitute the results
from Equations (2) and (3) into (1) and then invoke χ2 additivity to obtain
σ 2 χ2 (n) σ 2 χ2 (1) σ 2 χ2 (n − 1)
∼ S2 + ⇒ S2 ∼ . 2
n−1 n−1 n−1
ISYE 6739 — Goldsman 3/2/20 26 / 75

Using Facts (a)–(c), we have, by the definition of the t distribution,
√X̄−µ
σ 2 /n Nor(0, 1)
p ∼ p ∼ t(n − 1).
2
S /σ 2 2
χ (n − 1)/(n − 1)
In other words,
X̄ − µ
p ∼ t(n − 1).
S 2 /n
Note that this expression doesn’t contain the unknown σ 2 . It’s been replaced
by S 2 , which we can calculate.
This process is called “standardizing and Studentizing.”
ISYE 6739 — Goldsman 3/2/20 27 / 75

Now, by the same manipulations as in the known-variance case, we can get a

two-sided 100(1 − α)% CI for µ:
p
µ ∈ X̄ ± tα/2,n−1 S 2 /n.
p
100(1 − α)% lower CI: µ ≥ X̄ − tα,n−1 S 2 /n.
p
100(1 − α)% upper CI: µ ≤ X̄ + tα,n−1 S 2 /n.
Remark: Here we use t-distribution quantiles (instead of normal quantiles as

in the known-variance case from the previous lesson). The t quantile tends to
be larger than the corresponding Nor(0,1) quantile, so these
unknown-variance CIs tend to be a bit longer than the known-variance CIs.
The longer CIs are the result of the fact that we lack precise info about the
variance.
ISYE 6739 — Goldsman 3/2/20 28 / 75

Example: Here are 20 residual flame times (in seconds) of treated specimens
of children’s nightwear. (Don’t worry — children were not in the nightwear
when the clothing was set on fire.)
9.85 9.93 9.75 9.77 9.67

9.87 9.67 9.94 9.85 9.75
9.83 9.92 9.74 9.99 9.88
9.95 9.95 9.93 9.92 9.89
Let’s get a 95% CI for the mean residual flame time.
ISYE 6739 — Goldsman 3/2/20 29 / 75

After a little algebra, we obtain
X̄ = 9.8525 and S = 0.0965.
Further, you can use the Excel function T.INV(0.975,19) to get

tα/2,n−1 = t0.025,19 = 2.093.
Then the half-length of the CI is

p (2.093)(0.0965)
H = tα/2,n−1 S 2 /n = √ = 0.0451.
20
Thus, the CI is µ ∈ X̄ ± H, or 9.8074 ≤ µ ≤ 9.8976. 2
ISYE 6739 — Goldsman 3/2/20 30 / 75

R Code:
x <- data.frame(x=c(
9.85, 9.93, 9.75, 9.77, 9.67,
9.87, 9.67, 9.94, 9.85, 9.75,
9.83, 9.92, 9.74, 9.99, 9.88,
9.95, 9.95, 9.93, 9.92, 9.89))
print(confint(lm(x~1,x)))
2.5 \% 97.5 \%
(Intercept) 9.807357 9.897643
ISYE 6739 — Goldsman 3/2/20 31 / 75

Difference of Two Normal Means (unknown equal variances)
Outline

8 Normal Variance
ISYE 6739 — Goldsman 3/2/20 32 / 75

Lesson 6.5 — Difference of Two Normal Means (unknown equal

variances)
Compare the means from two competing normal populations.
Suppose we have independent samples of sizes n and m from the two.

iid
X1 , X2 , . . . , Xn ∼ Nor(µx , σx2 ) (population 1).
iid
Y1 , Y2 , . . . , Ym ∼ Nor(µy , σy2 ) (population 2).
We assume that the means µx and µy are unknown, and the variances σx2 and
σy2 are also unknown.
Big Assumption: Suppose for now that σx2 = σy2 = σ 2 , i.e., that the two
variances are unknown but equal.
Let’s find a CI for the difference in means, µx − µy .
ISYE 6739 — Goldsman 3/2/20 33 / 75

To get things going, calculate sample means.

n m
1X 1 X
X̄ ≡ Xi and Ȳ ≡ Yi .
n m
i=1 i=1
Now, sample variances.

Pn 2
Pm 2
2 i=1 (Xi − X̄) 2 i=1 (Yi − Ȳ )
Sx ≡ and Sy ≡ .
n−1 m−1
Both Sx2 and Sy2 are estimators for σ 2 (the common variance).
ISYE 6739 — Goldsman 3/2/20 34 / 75

A better estimator is the pooled estimator for σ 2 , which uses the info from
both Sx2 and Sy2 .
(n − 1)Sx2 + (m − 1)Sy2
Sp2 ≡ .
n+m−2
Theorem:
σ 2 χ2 (n + m − 2)
Sp2 ∼ ,
n+m−2
Proof: Follows from the fact that independent χ2 RVs add up. 2
After some of the usual algebra, it turns out that
X̄ − Ȳ − (µx − µy )
q ∼ t(n + m − 2).
Sp n1 + m 1
ISYE 6739 — Goldsman 3/2/20 35 / 75

So, after more of the usual algebra, we get a two-sided 100(1 − α)% CI for
µx − µy :
r
1 1
µx − µy ∈ X̄ − Ȳ ± tα/2,n+m−2 Sp + ,
n m
where tα/2,n+m−2 is the appropriate t distribution quantile.
One-Sided Lower CI:

r
1 1
µx − µy ≥ X̄ − Ȳ − tα,n+m−2 Sp + .
n m
One-Sided Upper CI:
r
1 1
µx − µy ≤ X̄ − Ȳ + tα,n+m−2 Sp + .
n m
ISYE 6739 — Goldsman 3/2/20 36 / 75

Example: IQ’s of students at Georgia Tech and the “Univ.” of Georgia.
iid
Georgia Tech students: X1 , . . . , X25 ∼ Nor(µx , σ 2 ).
iid
Univ. of Georgia students: Y1 , . . . , Y36 ∼ Nor(µy , σ 2 ).
Note: We assume common σ 2 .
Suppose it turns out that
X̄ = 120 and Ȳ = 80.

Pn
i=1 (Xi− X̄)2
Sx2 = = 100.
n−1
Pm
− Ȳ )2
i=1 (Yi
Sy2 = = 95.
m−1
ISYE 6739 — Goldsman 3/2/20 37 / 75

The two sample variances are pretty close, so we’ll go ahead and feel good
about our common σ 2 assumption; and we’ll use the pooled estimator,
(n − 1)Sx2 + (m − 1)Sy2 (24)(100) + (35)(95)

Sp2 = = = 97.03.
n+m−2 59
Thus, a two-sided 95% CI for µx − µy is
r
1 1
µx − µ y ∈ X̄ − Ȳ ± tα/2,n+m−2 Sp +
n m
√ √
= 120 − 80 ± 2.00 97.03 0.0678
= 40 ± 5.13.
So the 95% CI is 34.87 ≤ µx − µy ≤ 45.13.
Note that the above CI doesn’t contain 0 (not even close). But it’s obvious
anyway that GT students are smarter than UGA students! Duh. 2
ISYE 6739 — Goldsman 3/2/20 38 / 75

Difference of Two Normal Means (variances unknown)
Outline

8 Normal Variance
ISYE 6739 — Goldsman 3/2/20 39 / 75

Lesson 6.6 — Difference of Two Normal Means (variances

unknown)
Again find a CI for the difference in means, µx − µy , from two normal

populations. But now the normals have unknown and possibly unequal
variances, σx2 and σy2 .
Again suppose we have samples of sizes n and m,

iid
X1 , X2 , . . . , Xn ∼ Nor(µx , σx2 ) (population 1)
iid
Y1 , Y2 , . . . , Ym ∼ Nor(µy , σy2 ) (population 2),
where we assume that the Xi ’s are independent of the Yj ’s.
As before, start by calculating sample means and variances, X̄, Ȳ , Sx2 , Sy2 .
ISYE 6739 — Goldsman 3/2/20 40 / 75

Since the variances are possibly unequal, we can’t use the pooled estimator
Sp2 . Instead, use Welch’s approximation method.
X̄ − Ȳ − (µx − µy )
t? ≡ q ≈ t(ν),
Sx2 Sy2
n + m
where the approximate degrees of freedom is given by

Sy2 2
2
Sx
n + m
ν ≡ (Sy2 /m)2
.
(Sx2 /n)2
n−1 + m−1
After the usual algebra, we obtain an approximate two-sided 100(1 − α)% CI

for µx − µy (which has very wide application in practice):
s
Sx2 Sy2
µx − µy ∈ X̄ − Ȳ ± tα/2,ν + .
n m
ISYE 6739 — Goldsman 3/2/20 41 / 75

Example: Two normal populations. Let’s get a 95% CI for µx − µy .

Suppose it turns out that
n = 25 X̄ = 100 Sx2 = 400

m = 16 Ȳ = 80 Sy2 = 100.
You can tell from Sx2 and Sy2 that there’s no way that the two variances are
equal. So we’ll have to use the approximation method.
The approximate degrees of freedom is

2
Sy2

Sx2
+ 400 100 2

n m 25 + 16 .
ν ≡ (Sy2 /m)2
= (400/25)2 (100/16)2
= 37.30 = 37,
(Sx2 /n)2
n−1 + m−1 24 + 15
where we round df down to be conservative (i.e., slightly longer CIs).
ISYE 6739 — Goldsman 3/2/20 42 / 75

Since the confidence coefficient is 0.95, we have tα/2,ν = t0.025,37 = 2.026.
This gives us the following CI for µx − µy .

s
Sx2 Sy2
µx − µy ∈ X̄ − Ȳ ± tα/2,ν +
n m
√
= 20 ± 2.026 22.25
= 20 ± 9.56. 2
ISYE 6739 — Goldsman 3/2/20 43 / 75

Difference of Paired Normal Means (variances unknown)
Outline

8 Normal Variance
ISYE 6739 — Goldsman 3/2/20 44 / 75

Lesson 6.7 — Difference of Paired Normal Means (variances

unknown)
Again consider two competing normal populations. Suppose we collect

observations from the two populations in pairs.
The RVs between different pairs are independent. The two observations
within the same pair may not be independent.


 Pair 1 : (X1 , Y1 )




 Pair 2 : (X2 , Y2 )
.. ..

independent . .





 Pair n : (Xn , Yn )

 | {z }
not indep
Pair i is indep of pair j, i.e., the pair (Xi , Yi ) is indep of (Xj , Yj ).

But within pair i, it may be that Xi and Yi are not independent.
ISYE 6739 — Goldsman 3/2/20 45 / 75
Example: One twin takes a new drug, the other takes a placebo.
Idea: By setting up such experiments, we hope to be able to capture the

difference between the two normal populations more precisely, since we’re
using the pairs to eliminate extraneous noise.
Here’s the set-up. Take n pairs of observations:

iid
X1 , X2 , . . . , Xn ∼ Nor(µx , σx2 ).
iid
Y1 , Y2 , . . . , Yn ∼ Nor(µy , σy2 ).
(Additional technical assumption: All Xi ’s and Yj ’s are jointly normal.)
We assume that the means µx and µy are unknown, and the variances σx2 and
σy2 are also unknown and possibly unequal.
Further, pair i is independent of pair j (between pairs), but Xi may not be

independent of Yi (within a pair).
ISYE 6739 — Goldsman 3/2/20 46 / 75

Goal: Find a CI for the difference in means, µx − µy .
Define the pair-wise differences,
Di ≡ Xi − Yi , i = 1, 2, . . . , n.
Note that
iid
D1 , D2 , . . . , Dn ∼ Nor(µd , σd2 ),
where µd ≡ µx − µy (which is what we want the CI for), and
σd2 ≡ σx2 + σy2 − 2 Cov(Xi , Yi ).
Now the problem reduces to the old Nor(µ, σ 2 ) case with unknown µ and σ 2 .
ISYE 6739 — Goldsman 3/2/20 47 / 75

So let’s calculate the sample mean and variance as before.

n
1X
D̄ ≡ Di ∼ Nor(µd , σd2 /n).
n
i=1
n
1 X σ 2 χ2 (n − 1)
Sd2 ≡ (Di − D̄)2 ∼ d .
n−1 n−1
i=1
Just like before, we get
D̄ − µd
q ∼ t(n − 1).
Sd2 /n
ISYE 6739 — Goldsman 3/2/20 48 / 75

After the usual algebra on the pivot (with the t distribution instead of the
normal), we obtain a two-sided 100(1 − α)% CI,
q
µd = µx − µy ∈ D̄ ± tα/2,n−1 Sd2 /n.
q
One-sided lower: µd ≥ D̄ − tα,n−1 Sd2 /n.
q
One-sided upper: µd ≤ D̄ + tα,n−1 Sd2 /n.
ISYE 6739 — Goldsman 3/2/20 49 / 75

Example: Times for people to parallel park two cars.
Person Park Honda Park Cadillac Difference

1 10 20 −10
2 25 40 −15
3 5 5 0
4 20 35 −15
5 15 20 −5
Clearly, the people are independent, but the times for the same individual to
park the two cars may not be independent.
ISYE 6739 — Goldsman 3/2/20 50 / 75

Let’s assume that all times are normal. We want a 90% two-sided CI for
µd = µh − µc .
We see that n = 5, D̄ = −9, and Sd2 = 42.5.
Thus, the CI is
q
µd ∈ D̄ ± t0.05,4 Sd2 /n
p
= −9 ± 2.13 42.5/5
= −9 ± 6.21. 2
ISYE 6739 — Goldsman 3/2/20 51 / 75

So why didn’t we just use the “usual” CI for the difference of two means?
Namely, s
Sx2 Sy2
µx − µy ∈ X̄ − Ȳ ± tα/2,ν + .
n m
Main reason: This CI requires the X’s to be independent of the Y ’s. (Recall
that the paired-t method allows Xi and Yi to be dependent.)
Good thing about using the “usual” method: The approximate df ν from the
“usual” method would probably be larger than the df n − 1 from the paired-t
method. This would make the CI smaller.
Bad thing: You’d introduce much more noise into the system by using the
“usual” method. This could make the CI much larger.
ISYE 6739 — Goldsman 3/2/20 52 / 75

Example: Back to the car example (now use “usual” approximate method).
A guy parks Same guy Different guy

Honda Xi parks Caddy parks Caddy Yi
10 20 30
25 40 15
5 5 40
20 35 10
15 20 25
Just concentrate on the Xi and Yi columns. (The middle column is from the
last example for comparison purposes.)
The Xi ’s and Yi ’s have the same sample averages as before, but now there’s
more natural variation, since we’re using 10 different people.
ISYE 6739 — Goldsman 3/2/20 53 / 75

Now all of the Xi ’s are independent of all of the Yi ’s.
X̄ = 15, Ȳ = 24, Sx2 = 62.5, Sy2 = 142.5.
Then we have
2
Sy2

Sx2 2
n + n 4 62.5 + 142.5 .
ν = (Sy2 /n)2
= 2 2
= 6.94 = 7.
(Sx2 /n)2 (62.5) + (142.5)
n−1 + n−1
This gives us the following 90% CI for µx − µy .

s
Sx2 Sy2 √
µx − µy ∈ X̄ − Ȳ ± t0.05,7 + = −9 ± 1.895 41 = −9 ± 12.13.
n n
This CI is wider than the paired-t version, even though we have more df here.
Moral: Use paired-t when you can. 2
ISYE 6739 — Goldsman 3/2/20 54 / 75

Normal Variance
Outline

8 Normal Variance
ISYE 6739 — Goldsman 3/2/20 55 / 75

Normal Variance
Lesson 6.8 — Normal Variance
Next Few Lessons: We’ll now look at a potpourri of different types of CIs.
Normal variance.
Ratio of normal variances from two populations.
Bernoulli proportion.
Difference of Bernoulli proportions from two populations.
First, CIs for normal variance. . . .
ISYE 6739 — Goldsman 3/2/20 56 / 75

Normal Variance
To address how much variability we can expect from some system, we’ll now
get a CI for the variance σ 2 of a normal distribution (instead of the mean).
Usual set-up:
iid
X1 , X2 , . . . , Xn ∼ Nor(µ, σ 2 ).
Recall that the distribution of the sample variance is

n
2 1 X σ 2 χ2 (n − 1)
S = (Xi − X̄)2 ∼ .
n−1 n−1
i=1
ISYE 6739 — Goldsman 3/2/20 57 / 75

Normal Variance
Since
(n − 1)S 2
∼ χ2 (n − 1),
σ2
we use χ2 quantiles and some algebra to get
(n − 1)S 2

2 2
1 − α = P χ1− α ,n−1 ≤ ≤ χ α ,n−1
2 σ2 2
σ2

1 1
= P ≥ ≥ 2
χ21− α ,n−1 (n − 1)S 2 χ α ,n−1
2 2
(n − 1)S 2 (n − 1)S 2

2
= P ≤σ ≤ 2 .
χ2α ,n−1 χ1− α ,n−1
2 2
ISYE 6739 — Goldsman 3/2/20 58 / 75

Normal Variance
So a 100(1 − α)% CI for σ 2 is
(n − 1)S 2 (n − 1)S 2

σ2 ∈ , .
χ2α ,n−1 χ21− α ,n−1
2 2
Remark: The CI for σ 2 is directly proportional to the sample variance S 2 .
Remark: This CI contains no reference to the unknown µ!
Meanwhile, a 100(1 − α)% lower CI for σ 2 is:

(n − 1)S 2
≤ σ2.
χ2α,n−1
100(1 − α)% upper CI for σ 2 :

(n − 1)S 2
σ2 ≤ .
χ21−α,n−1
ISYE 6739 — Goldsman 3/2/20 59 / 75

Normal Variance
Example: Suppose 25 people take an IQ test and that their scores are
normally distributed.
If S 2 = 100, find a 95% upper CI for the variance σ 2 .
Looking up the χ2 quantile, we get
(n − 1)S 2 (24)(100) 2400

σ2 ≤ 2 = 2 = = 173.3. 2
χ1−α,n−1 χ0.95,24 13.85
ISYE 6739 — Goldsman 3/2/20 60 / 75

Ratio of Variances of Two Normals
Outline

8 Normal Variance
ISYE 6739 — Goldsman 3/2/20 61 / 75

Lesson 6.9 — Ratio of Variances of Two Normals
Which of two normal distributions is more variable?
Set-up:
iid
X1 , X2 , . . . , Xn ∼ Nor(µx , σx2 ).
iid
Y1 , Y2 , . . . , Ym ∼ Nor(µy , σy2 ).
All X’s and Y ’s are independent with unknown means and variances.
We’ll get a confidence interval for the ratio σx2 /σy2 .
ISYE 6739 — Goldsman 3/2/20 62 / 75

Recall the distributions of the two sample variances:

n
1 X σ 2 χ2 (n − 1)
Sx2 = (Xi − X̄)2 ∼ x , and
n−1 n−1
i=1
m
1 X σy2 χ2 (m − 1)
Sy2 = (Yi − Ȳ )2 ∼ .
m−1 m−1
i=1
Thus,
Sy2 /σy2 χ2 (m − 1)/(m − 1)
∼ ∼ F (m − 1, n − 1).
Sx2 /σx2 χ2 (n − 1)/(n − 1)
ISYE 6739 — Goldsman 3/2/20 63 / 75

Using F quantiles and some algebra, we get
Sy2 /σy2

1 − α = P F1− α2 ,m−1,n−1 ≤ 2 2 ≤ F α2 ,m−1,n−1
Sx /σx
2
σx2 Sx2

Sx 1
= P ≤ 2 ≤ 2 F 2 ,m−1,n−1 .
α
Sy2 F α2 ,n−1,m−1 σy Sy
So a 100(1 − α)% CI for σx2 /σy2 is
σx2 Sx2 Sx2

1
∈ , Fα .
σy2 Sy2 F α2 ,n−1,m−1 Sy2 2 ,m−1,n−1
Remark: The CI for σx2 /σy2 is proportional to the ratio of the sample
variances, Sx2 /Sy2 , and contains no reference to µx or µy .
ISYE 6739 — Goldsman 3/2/20 64 / 75

Meanwhile, a 100(1 − α)% lower CI for σx2 /σy2 is:
Sx2 1 σx2
≤ .
Sy2 Fα,n−1,m−1 σy2
100(1 − α)% upper CI for σx2 /σy2 :
σx2 Sx2
≤ Fα,m−1,n−1 .
σy2 Sy2
Remark: If you want CIs for σy2 /σx2 , just flip all of the X’s and Y ’s in the
various CIs discussed above.
ISYE 6739 — Goldsman 3/2/20 65 / 75

Example: Suppose 25 people take IQ test A, and 16 people take IQ test B.

Assume all scores are normal and independent.
2 = 100 and S 2 = 70, find a 95% upper CI for the ratio σ 2 /σ 2 .
If SA B A B
Looking up the Fα,nB −1,nA −1 = F0.05,15,24 quantile, we get

2
σA 2
SA 100
2 ≤ 2 Fα,nB −1,nA −1 = (2.11) = 3.01. 2
σB SB 70
ISYE 6739 — Goldsman 3/2/20 66 / 75

Bernoulli Proportion
Outline

8 Normal Variance
ISYE 6739 — Goldsman 3/2/20 67 / 75

Lesson 6.10 — Bernoulli Proportion
iid
Suppose that X1 , X2 , . . . , Xn ∼ Bern(p).
What probability of “success” can we expect from this distribution?
We’ll get a CI for the proportion p of successes.

Pn
Since i=1 Xi ∼ Bin(n, p), we know that
Bin(n, p)
X̄ ∼ .
n
Let’s assume that n is “large,” so that we’ll be able to use the Central Limit
Theorem (and don’t worry about the continuity correction).
(If n isn’t large, then we’ll have to use nasty Binomial tables, which I don’t
want to deal with here!)
ISYE 6739 — Goldsman 3/2/20 68 / 75

Note that
E[X̄] = E[Xi ] = p, and
Var(X̄) = Var(Xi )/n = pq/n.
Then for large n, the Central Limit Theorem implies
X̄ − E[X̄] X̄ − p
p = p ≈ Nor(0, 1).
Var(X̄) pq/n
Now let’s do something crazy and estimate pq by its maximum likelihood

estimator, X̄(1 − X̄). This gives
X̄ − p
p ≈ Nor(0, 1).
X̄(1 − X̄)/n
ISYE 6739 — Goldsman 3/2/20 69 / 75

Then the “usual” algebra implies

. X̄ − p
1 − α = P −zα/2 ≤ p ≤ zα/2
X̄(1 − X̄)/n
r r
X̄(1 − X̄) X̄(1 − X̄)
= P X̄ − zα/2 ≤ p ≤ X̄ + zα/2 .
n n
So an approximate two-sided CI for p is
r
X̄(1 − X̄)
p ∈ X̄ ± zα/2 .
n
Similarly, an approximate lower CI is

r
X̄(1 − X̄)
X̄ − zα ≤ p,
n
and an approximate upper CI is
r
X̄(1 − X̄)
p ≤ X̄ + zα .
n
ISYE 6739 — Goldsman 3/2/20 70 / 75
Example: The probability that a student correctly answers a certain test

question is p.
Suppose that a random sample of 200 students yields 160 correct answers to
the question.
A 95% two-sided CI for p is given by

r
X̄(1 − X̄)
p ∈ X̄ ± zα/2
r n
(0.8)(0.2)
= 0.8 ± 1.96
200
= 0.8 ± 0.055. 2
ISYE 6739 — Goldsman 3/2/20 71 / 75

The half-width of the two-sided CI is

r
X̄(1 − X̄)
zα/2 .
n
How many observations should we take so that the half-length is ≤ ?

r
X̄(1 − X̄)
zα/2 ≤ ⇐⇒ n ≥ (zα/2 /)2 X̄(1 − X̄).
n
Of course, X̄ is unknown before taking observations. A conservative choice

for n arises by maximizing X̄(1 − X̄) = 1/4. Then we have
2
n ≥ zα/2 /(42 ).
On the other hand, if we can somehow make a preliminary estimate p̂ of p

(based on a preliminary sample mean), we could use
2
n ≥ zα/2 p̂(1 − p̂)/2 .
ISYE 6739 — Goldsman 3/2/20 72 / 75

Example: Suppose we want to take enough observations so that a 95%

two-sided CI for p will be no bigger than ±3%. How many are required?
2 (1.96)2
n ≥ zα/2 /(42 ) = = 1067.1.
4 (0.03)2
So take n = 1068 samples (though this will likely be conservative).
Instead suppose that a pilot study gives an estimate of p̂ = 0.7. How should n
be revised?
2 (1.96)2
n ≥ zα/2 p̂(1 − p̂)/2 = (0.7)(0.3) = 896.4.
(0.03)2
So now we only have to take n = 897. 2
ISYE 6739 — Goldsman 3/2/20 73 / 75

Remark: We can also derive approximate CIs for the difference between
two competing Bernoulli parameters.
iid iid
Suppose that X1 , X2 , . . . , Xn ∼ Bern(px ) and Y1 , Y2 , . . . , Ym ∼ Bern(py ),
where the two samples are independent of each other.
Using the techniques already developed in this lesson, we can easily obtain
the following approximate two-sided 100(1 − α)% CI for the difference of
the success proportions:
r
X̄(1 − X̄) Ȳ (1 − Ȳ )
px − py ∈ X̄ − Ȳ ± zα/2 + .
n m
ISYE 6739 — Goldsman 3/2/20 74 / 75

Example: The probabilities that Georgia Tech and University of Georgia

students earn at least a 700 on the Math portion of the SAT test are p and p ,
respectively.
A random sample of 200 GT students yielded 160 students who scored ≥ 700
on the test. But, sadly, a sample of 400 UGA students revealed only 50 who
succeeded.
Then a 95% CI for the difference in success probabilities is
r
X̄(1 − X̄) Ȳ (1 − Ȳ )
p −p ∈ X̄ − Ȳ ± zα/2 +
rn m
(0.8)(0.2) (0.125)(0.875)
= 0.8 − 0.125 ± 1.96 +
200 400
= 0.675 ± 0.064.
Not looking real good for UGA. 2
Coming Up: Hypothesis Testing!

ISYE 6739 — Goldsman 3/2/20 75 / 75

6739 06 ConfidenceIntervals 200302

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6739 06 ConfidenceIntervals 200302

Uploaded by

Copyright:

Available Formats

6.

ISYE 6739 — Goldsman 3/2/20 1 / 75

1 Introduction to Confidence Intervals

ISYE 6739 — Goldsman 3/2/20 2 / 75

Lesson 6.1 — Introduction to Confidence Intervals

Next Few Lessons:

Idea: Instead of estimating a parameter by a point estimator alone, give a

Example: X̄ is a point estimator for the parameter µ.

where z0.025 is the Nor(0, 1)’s 0.975 quantile.

Definition: A 100(1 − α)% confidence interval for an unknown

L and U are the lower and upper confidence limits, and

There is a 1 − α chance that θ actually lies between L and U .

Since L ≤ θ ≤ U , we call [L, U ] a two-sided CI for θ.

If L is such that P (L ≤ θ) = 1 − α, then [L, ∞) is a 100(1 − α)%

Similarly, if U is such that P (θ ≤ U ) = 1 − α, then (−∞, U ] is a

Example: Here are some results from 10 independent samples, each

Is the unknown θ in [L, U ]?

ISYE 6739 — Goldsman 3/2/20 5 / 75

Sometimes CIs miss θ too low, i.e., θ > U .

Sometimes CIs miss θ too high, i.e., θ < L.

But 1 − α of the time, they’re just right — Three Little Bears.

ISYE 6739 — Goldsman 3/2/20 6 / 75

1 Introduction to Confidence Intervals

ISYE 6739 — Goldsman 3/2/20 7 / 75

Lesson 6.2 — Normal Mean (variance known)

Goal: Obtain a confidence interval for µ.

Remark: This is an unrealistic case, since if we didn’t know µ in real life,

The quantity Z is called a pivot. It’s a “starting point” for us.

The definition of Z implies that

Thus, the 100(1 − α)% two-sided CI for µ is

ISYE 6739 — Goldsman 3/2/20 9 / 75

After you observe X1 , . . . , Xn , you calculate L and U . Nothing is unknown,

Sometimes we’ll write the CI as µ ∈ X̄ ± H, where the half-width is

ISYE 6739 — Goldsman 3/2/20 10 / 75

Example: Suppose we take n = 25 iid observations from a Nor(µ, σ 2 )

= 278 ± z0.025 (30/5)

= 278 ± 11.76 (z0.025 = 1.96).

So a 95% CI for µ is 266.24 ≤ µ ≤ 289.76. 2

ISYE 6739 — Goldsman 3/2/20 11 / 75

Thus, the sample size goes up as. . .

ISYE 6739 — Goldsman 3/2/20 12 / 75

Example: Suppose, in the previous example, that we want the half-length to

Just to make n an integer, round up to n = 35. 2

Note that we use the 1 − α quantile (not 1 − α/2).

ISYE 6739 — Goldsman 3/2/20 13 / 75

1 Introduction to Confidence Intervals

ISYE 6739 — Goldsman 3/2/20 14 / 75

Lesson 6.3 — Difference of Two Normal Means (variances known)

Idea: Compare the means of two competing alternatives or processes by

ISYE 6739 — Goldsman 3/2/20 15 / 75

Also assume that the Xi ’s are independent of the Yi ’s.

Let’s find a CI for the difference in means, µx − µy .

ISYE 6739 — Goldsman 3/2/20 16 / 75

Define the sample means from populations 1 and 2,

X̄ ∼ Nor(µx , σx2 /n) and Ȳ ∼ Nor(µy , σy2 /m),

ISYE 6739 — Goldsman 3/2/20 17 / 75

This implies that

Using the same manipulations as in the single-population case, we get a

ISYE 6739 — Goldsman 3/2/20 18 / 75

One-sided upper CI:

One-sided lower CI:

ISYE 6739 — Goldsman 3/2/20 19 / 75

ISYE 6739 — Goldsman 3/2/20 20 / 75