You are on page 1of 4

CHAPTER 11

Confidence Intervals: Student’s t-distribution

1. Introduction

Until now, in working with confidence intervals, we have assumed that we know the population standard
deviation, : in that case, we could use the normal distribution. But if we do not know , a di↵erent
approach is needed.

2. Confidence Interval for a Population Mean µ, with unknown

If the population standard deviation, , is unknown, we can substitute the sample standard deviation,
s, for the population standard deviation, . This will introduce extra uncertainty or variation, since s
varies from sample to sample. In order to overcome this problem, we use the Student1 t distribution
instead of the normal distribution.
Before calculating a confidence interval for a population mean, we need to make some assumptions:
• the sample is obtained through simple random sampling;
• either
– the population from where the sample is drawn is normally distributed; or
– if the population is not normally distributed, we use a large sample (following the Central
Limit Theorem); and
• the population standard deviation is unknown.
The formula for calculating the confidence interval is
s
CI = x̄ ± tn 1p (⇤)
n
where
• x̄ is the calculated sample mean;
• tn 1 is the critical value of the t distribution with n 1 degrees of freedom and an area of ↵/2
in each tail; we can also write tn 1 as t(1 ↵/2,n 1) or tcrit or just tc ;
• s is the sample standard deviation; and
• n is the sample size.
The confidence interval of the form [min, max] is now calculated by substituting this critical value together
with n, x̄ and s into the formula (⇤) above. We get the CI as

s s
x̄ t↵/2 p ; x̄ + t↵/2 p .
n n

Student’s t is actually a family of distributions. The t-value depends on degrees of freedom, (d.f.). The
number of degrees of freedom is the number of observations free to vary after the sample mean has been
calculated 2. The t-distribution is symmetrical around the mean and bell-shaped. Yet it is di↵erent from
1Student was the pen name of the Irish statistician William Sealy Gossett. Gossett was employed by the Guinness
Brewery to improve the quality of the brewing process, and his work involved estimating variabilty across samples. He
developed the t distribution as part of this work. He wanted to publish this work, but as Guinness did not allow employees
to publish (in case they revealed company secrets), he had to publish under an assumed name; he chose the name Student,
hence the distribution is called “Student’s t distribution”.
2This is also the reason why we use n 1 in the denominator of the sample variance formula seen earlier in Descriptive
Statistics.
99
100 MIS2008L

the normal distribution in the thickness of the tails, which contain much more area. The precise shape
also varies with the number of degrees of freedom. When the t-distribution’s number of df approaches
infinity, its shape becomes more and more the same as the shape of a normal distribution. Since s is
a random quantity varying with the choice of sample, the variability in the calculations is increased,
resulting in a larger spread: this is why we need to use the t-tables. See Figure 11.1.

Figure 11.1. Intervals and level of confidence

Example 11.1. A baker takes a sample of 49 muffins. The mean weight of the muffins is 33 grams and
the sample standard deviation is 14 grams. Compute the 99% confidence interval for the population
mean muffin weight.

Solution: Here, n = 49, s = 14 and x̄ = 33g from the given data. Thus n 1 = 49 1 = 48. The
confidence interval is based on the equation from formula (⇤):
s
CI = x̄ ± tc p
n
14
= 33 ± t(0.995,48) p
49
14
= 33 ± 2.6822 p
49
= [33 5.3644, 33 + 5.3644]
= [27.636, 38.364].

Thus,
27.636g  µ  38.364g.
So what do these figures mean? The baker can say, with 99% confidence, that the average weight of the
muffins is between 27.636 and 38.364 grams, based on 49 muffins. Although the true population mean
may or may not be in this interval, 99% of intervals formed in this manner (using a single sample of size
49) will contain the true population mean µ.

To avoid any confusion and decide when to use the Z or the t distributions, please use the flow diagram
in Figure 11.2.

Exercise 11.2. A chocolate factory produces several types of chocolates, including white tru✏es. Each
box of white tru✏es shall weight on average 75g.

• The of the process is equal to 5g. Given a sample of 35 boxes, what is the 95% confidence
interval for the boxes weight?
• What if the standard deviation of the sample is equal to 5.8g?
Data Analysis for Decision Makers 101

Figure 11.2. Confidence Intervals Flow Diagram

3. Determining Sample Size

We will now see how to find the sample size required to reach a desired margin of error, e, with a specified
level of confidence 1 ↵. Note that this error margin e is not the same as the constant e = 2.71828 . . .
used earlier in the Poisson and Exponential distributions.
The margin of error is also called sampling error. It is the amount of imprecision in the estimate of the
population parameter or the amount added and subtracted to the point estimate to form the confidence
interval.
To determine the required sample size for the mean, you must know:
• the desired level of confidence 1 ↵, which determines the critical Z value;
• the acceptable sampling error (or margin of error), e; and
• the standard deviation, .
Usually, is unknown, yet it can be estimated to use the sample size formula. One approach is to use
a value for that is expected to be at least as large as the true . Another approach is to select a pilot
sample and estimate by the sample standard deviation, s. Whether we have the true or an estimate,
we then choose the required sample size as:
Z2 2
n= .
e2

To determine the required sample size for the proportion, you must know:
• the desired level of confidence 1 ↵, which determines the critical Z value;
• the acceptable sampling error, e (also called the margin of error); and
• the true proportion of “successes”, p.
The sample size for the proportion can be calculated using the following formula:
Z 2 p(1 p)
n= .
e2
Since p is the very thing we are using the sample to infer a value for, this seems like a contradiction; but
a value for p can be estimated with a pilot sample, if necessary (or, conservatively, we can use p = 0.5).
Example 11.3. The length of steel rods produced by an engineering company is normally distributed
with a standard deviation of 1.8 mm. The quality control manager measures a random sample of nine
rods and creates a 99% confidence interval:
194.65  µ  197.55mm.
102 MIS2008L

However, a customer is asking for a more precise interval. What sample size should be used to ensure
the margin of error is no more than 0.5mm at 99% confidence?
Solution: we are told that e = 0.5, Zc = 2.5758 and = 1.8. Then:
Z2 2 2.57582 1.82
n= 2
= ⇡ 85.9863.
e 0.52
We round up, to get a sample size of 86.
That is, the quality control manager should use a sample size of at least 86 rods to achieve the desired
interval width at 99% confidence. }
Example 11.4. A marketing company is commissioned to conduct a poll at 95% confidence to determine
the proportion of voters in favour of a referendum proposal, with a request that the results be accurate
to be within ±6% of the true proportion.
Solution: we are told that Zc = 1.96, e = 0.06 and pest = 0.5. Then:
Z 2 p(1 p) 1.962 0.5(1 0.5)
n= = ⇡ 266.778.
e2 0.062
We round up, to get a sample size of 267.
Thus, the marketing company should use a sample size of at least 267 people to achieve the desired
interval width at 95% confidence. }
Exercise 11.5. A bakery wishes to ensure that the wording on the packaging of their 800g sliced pans
is precise. If = 45g, what sample size is needed to estimate the mean within ±5g with 90% confidence?
Exercise 11.6. In estimating the support for an election candidate, how large a sample would be neces-
sary to estimate the true proportion supportive in a large population within ±3%, with 95% confidence?
Assume that a pilot sample yields ps = 0.12.

You might also like