You are on page 1of 6

Stat 305, Spring 2014

Name

Chapter 6 part I: Confidence Intervals for


Motivating Example
Suppose you are the manufacturer of construction equipment. You make 14 wire rope and need to
determine how much weight it can hold before breaking so that you can label it clearly and avoid
lawsuits. You might begin this decision process by finding out what the average breaking weight
strength of the wire rope is.

Where do we start?
A good place to start would be to take a SRS of n lengths of wire rope and record the weight
they can hold until they break. Compute the sample average of breaking weight strength, and let
this be our estimate of the true mean of breaking weight strengths for wire rope made under your
manufacturing process. (This is a point estimate)

What does that tell us?


Are we ready to sell our wire rope with only our point estimate for breaking weight strengths?
While we have found an estimate of the average breaking weight strength of our 41 wire rope,
we dont know how reliable the estimate is.
How do we know if our single sample was representative?
How would you answer a buyer if they asked how confident are you that your estimate is in
fact the true average of the breaking weight strength of your wire rope?
Z
P (X = ) =
fX (x)dx = 0

A better idea is to give an interval estimate for the true population mean.

Definitions
The point estimate of a parameter is the value of the statistic that estimates the value of
the parameter. (i.e. x
is a point estimate for )
An interval estimate of a parameter is an interval of numbers used to estimate the value
of a population parameter.
The level of confidence is the probability that represents the percentage of intervals that
will contain if a large number of repeated samples are obtained.
Denoted (1 )100%
If = 0.05 then (1 ) 100% = 95%
We call the significance level; note 0 1, usually small values (0.1, 0.05, 0.01).
A (1 )100% confidence interval for a parameter (or function of one or more parameters)
is a data based interval of numbers thought likely to contain the parameter and possessing a
stated probability based confidence or reliability.
1

Motivating Example (cont.)


Suppose a SRS of size n = 25 lengths of 14 wire rope resulted in a sample average of 15 tons.
We would like to say that the interval x
# i.e. the interval (15 #, 15 + #), for some #,
has a probability based confidence level of containing the true breaking strength.
How should we make such an interval and what confidence level should we assign to it?
Note: if we truly have a SRS, then we can model this situation by assuming that we have
X1 , X2 , . . . , Xn iid following some distribution with unknown mean and variance 2 .
We can then use the fact that, by the CLT, X n N (, 2 ).
Question: Suppose, as stated above, n = 25, and that we somehow know that 2 = 36.
Also, assume that we dont know what distribution X1 , X2 , . . . , Xn follow. What is the
(approximate) probability that the sample average of breaking weight strength will be within
1 ton of the true breaking weight of our wire rope?

What is the (approximate) probability that the sample average breaking weight of our rope
will be within 1.645 of its standard deviations of the true average breaking weight?

General Setup
2
Start by assuming X follows a normal distribution. For Xi iid N (, 2 ), we know X n N (, )
n
exactly.

Example Interval
We want to find an interval that has a probability of 0.7.
The z-scores -1.04 and 1.04 cut off 0.15 in each tail on the standard norma
P (1.04 < Z < 1.04) = 0.7 thus P ( 1.04 n < X < + 1.04 n ) = 0.7
Draw a picture to represent this:

We can manipulate the inequality in the above probability statement so that it is centered around
.

General Formula
How do we develop a generic formula for a Large-n CI for if 2 is known?
Definition: we define the notation z1/2 as the z-value (from a N(0,1) distribution) that has a
probability of (1 /2) less than it. ie: P [Z < z1/2 ] = 1 2 and P [Z < z1/2 ] = 2
We want (1 )100% confidence level.
Set up: P [z1/2 < Z < z1/2 ] = 1

We can rearrange our interval as:

Large-n CI for with known variance 2 : If X1 , X2 , . . . , Xn are iid with E[Xi ] = and V ar(Xi ) =
2 , where 2 is known, then for large enough n (ie: n 25) then,

X n z1/2
n
is a (1 )100% confidence interval for .
Question: What affects the location and width of a CI for ?
As n it increases, the width of the CI decreases.
As it increases, the width of the CI increases.
As (1 ) it increases, the width of the CI increases.
There are confidence levels for CIs that are more common. Here is a table of those values along
with the associated z-value.
Confidence
80%
90%
95%
99%

z1/2
1.28
1.645
1.96
2.58

We started by assuming that X follows the normal distribution. What if X follows some other
distribution, or we dont know what distribution it follows?
2
For large n (n 25), by CLT X n N (, ) approximately.
n
Question: What does it mean to be (1 )100% confident?
Answer: If we repeat this methodology many, many times, then (1)100% of the confidence
intervals will contain the true parameter (or function of parameters)
However, once the confidence interval is constructed, we CAN NOT SAY that there is a
PROBABILITY OR CHANCE that the true value of the population parameter (or function
of parameters) lies inside the confidence interval
P [a < < b] = 1 ; but here, a is a constant, b is a constant and is a constant so this
statement DOESNT MAKE ANY SENSE!!
Once the confidence interval for a mean has been constructed, the true value for the population
mean either IS in the interval or IS NOT in the interval. So we can only say that we are
CONFIDENT the interval contains the truth.
(1 )100% confidence refers to the method, NOT the particular result.
When we calculate a single confidence interval, the correct interpretation is
We are (1 )100% confident that the true population mean is between a and b (of course,
put it into context)
Note the difference in the two statements
What does it mean to be (1 )100% confident
This refers to the methodology.
How can we interpret a single confidence interval
This needs to be present after EVERY confidence interval that you ever compute.
4

Example 1
Suppose we know that the standard deviation for the weight of a bag of M&Ms (labeled 10 oz.)
is 0.5 ounces. If we randomly sampled 30 bags which resulted in a sample mean weight of 10.4
ounces, what our 95% confidence interval for the true mean weight of all 10 oz. bags of M&Ms?
How do we interpret our confidence interval?

Find how many bags we need to sample to create a 95% CI that has a width of 0.1 ounces.

In general, to find the required sample size for a given confidence level:

Example 2
Ten simple random samples of 30 observations were drawn from a Normal(0,1) distribution. With
each sample, a 90% confidence interval was constructed. Here are the 10 intervals that were
constructed.
Lower
Upper

-0.26
0.34

-0.26
0.34

-0.12
0.48

-0.39
0.21

-0.42
0.19

-0.23
0.37

-0.33
0.27

-0.13
0.47

-0.27
0.33

-0.23
0.38

How many of these intervals contain 0?


How many would I expect to contain 0?
The above procedure was repeated, this time 1,000 SRS of size n=30 observations were drawn. 886
intervals contained 0. What percentage of these intervals contained 0?
Finally, 10,000 SRS of size n=30 observations were drawn, 8,994 of the resulting intervals contained
0. What percentage of these intervals contained 0?

Unknown Population Variance


So far we have only defined CIs for when 2 is known and for large n.
Problem: the variance of an operation is rarely known
Question: If we dont know the population variance, what do we do with 2 ?
Answer: replace the population variance with the sample variance!
Large-n CI for with unknown variance 2 :
If X1 , X2 , . . . , Xn are iid with E[Xi ] = and V ar(Xi ) = 2 , where 2 is NOT known, then
for large enough n (ie: n 25) then,
s
X n z1/2
n
is a (1 )100% confidence interval for .

Example 1 (cont)
Suppose the sample variance of the 10 oz. M&M bags is 1.0 ounce. Find a 90% confidence interval
for the true mean weight of all 10 oz. bags of M&Ms.

You might also like