You are on page 1of 11

Business Modelling

Confidence Intervals
Prof Baibing Li BE 1.26 E-mail: b.li2@lboro.ac.uk Tel 228841

Reading
D Waters (2008), Quantitative Methods for Business, 4th ed. Prentice Hall, Chapters 16, 17.
C Morris (2008), Quantitative Approaches in Business Studies, 7th ed. Prentice Hall, Chapters 9, 10.
D Anderson, D J Sweeney, T A Williams (2007), Statistics for Business and Economics, West, 8-10.

Aims 1.To understand the concepts of estimation and inference.


2. To learn how to construct estimates for means.
3. To investigate the use of the t-distribution in estimation.

Introduction In this set of lectures we will investigate an enabling technology – statistical


inference and testing – which will frequently play a part within an application
rather than be the subject of an application. The following two illustrations offer
some motivation as to what is aimed at.

Illustration 1In a particular week at a supermarket you note the proportion of customers who
pay by credit card. (i) How can you estimate the proportion of customers who will
pay by credit card out of all customers every week? (ii) If you can estimate this
proportion, how accurate will it be?

Terminology Inferential statistics is the term given to the use of a set of techniques for making
broadly sensible statements about a population based on measurements of
characteristics of a representative sample of that population.

Illustration 2If we measure the heights of ten people in this room and calculate the mean height
of the set of ten, what does that tell us about the mean height of all the people in
the room?
 Are the means the same?
 Can we estimate one mean from the other?
 Can we improve on the estimate?
 Can we calculate in some way the accuracy of any estimate?
As we might expect, inferential statistics will not enable us to use sample
information to provide exact information regarding a population. Instead,
inferential statistics will allow us to estimate what is likely to be true about the
whole population, based on information from a representative sample from the
population. Thus we will also become interested in the level of the error we
make when we estimate. As a consequence this will allow us to determine how
large a sample we need to investigate if we wish to keep the error within some
limit.

1
Definitions Population: The set of all possible individuals or objects of a defined
type. (May be an infinite set.)
Sample: A subset of the population from which some information is
gathered.
Parameter: A numerical summary figure relating to a population, e.g.
population mean. A parameter may be unknown.
Sample statistic A numerical summary figure relating to a sample which,
(estimate): it is hoped, will be close in value to the relevant population
parameter, e.g. sample mean. The value of the sample
statistic will vary from sample to sample but the population
parameter is fixed in value.

In general, samples are used when the population is such that measuring each of its members
is impractical (too many or too difficult to measure certain items. It is desirable for a sample to
be representative of a population, that is it mirrors many of its characteristics. For example, it
may be unwise to choose as a sample of 100 students a group comprising 95 men and 5
women.

Sampling Distribution
If we repeatedly take different samples, of a given size, from the same population and use each
to calculate particular parameters (e.g. sample means) then each will provide a (possibly
different) estimate of the corresponding population parameter. The population parameter
remains the same but the sample means are subject to a distribution. The sampling distribution
is thus the frequency distribution of some parameter which is obtained by taking different
samples from the same population.

Notation for mean and standard deviation

Sample Statistic Population Parameter


Mean x 
Standard Deviation s 

Margin of error
When we collect a set of sample data, we can calculate the sample mean x and that sample
mean is typically different from the population mean . The difference between the sample
mean and the population mean is an error. The margin of error, denoted by E, is the maximum
likely (with probability 1) difference between the observed sample mean x and the true value
of the population mean . The margin of error is defined to be E  z / 2 / n , where n is the
sample size,  is the population standard deviation (assumed to be known), and z / 2 , termed
critical value, is the positive value that is at the vertical boundary separating an area  / 2 in the
right tail of the standard normal distribution, i.e. satisfying Pr( Z  z / 2 )   / 2 .

Shape of the distribution of the sample mean


When the population is normally distributed, the sample mean also follows a normal distribution.
When the population is non-normal the distribution of the sample mean is not normal, but

2
becomes closer to normal when the sample size increases. With samples of 30 or more for
almost any population distribution we can assume that the sample mean has a normal
distribution. This will be very helpful.

The distribution of a sample mean x is centred on the population mean  and the standard

deviation of the sample mean is given by  x = (the standard error of the mean) where n =
n
sample size. When  is unknown but sample size is large (greater than 30), we have to work
s
with sample standard deviation, s, and our formula for practical purposes becomes .
n
Correspondingly the margin of error becomes E  z / 2 s / n .

Note: in all the work in this module we will assume that standard deviation is calculated using
the ‘sample formula’ rather than the ‘population formula’, i.e. the n-1 (or Sn-1) button on the
calculator is used and not the n (or Sn) button.

Confidence Interval
When we take different samples from the same population and use them to estimate some
population parameter we hope that we will obtain an estimate that is close to the population
parameter. A confidence interval (CI) is arange of values that will include the population
parameter with a specified probability and is calculated from a sample of the population. The CI
is analogous to a measure from science where, because of limitations in accuracy of the
measuring device, it is quoted together with an error e.g. temperature of a liquid = 60 1.

As the confidence interval depends on the sample drawn, it is random. The population
parameter is not random and therefore either does or does not lie in any one particular
confidence interval. By a 95% CI it is meant that 95% of such CIs will contain the parameter.

The practical computation of a CI of a sample statistic is from the formula:

Confidence interval of population parameter = sample statistic  Margin of error

where Margin of error = k*(standard error) and k = number of desired standard errors for the
estimate.

Recall that from tables of the Normal distribution

z= 1.28 p = 0.1
when z = 1.645 p = 0.05
z = 1.96 p = 0.025
z = 2.32 p = 0.01
z = 2.58 p = 0.005 (check for yourself!)

These particular z and p values are frequently used in CIs and hypotheses testing (to be
covered later).

3
Specifically, assuming that a population is normally distributed and a large sample (n > 30) is
taken, the margin of error is E  z / 2 s / n . A 95% confidence interval for the population mean
is given by

( x  E, x  E ) ,
  s   s 
or  x  z / 2  , x  z / 2    ,
 
  n  n 

where x is the sample mean, z / 2 denotes the value of the normal distribution with right-hand-
tail area of  / 2.

Example 1
From the data on advertised house prices a sample of size 40 was drawn which resulted in a
mean price of 205830 and a standard deviation of 73724.

a) Construct a 95% confidence interval estimate for the true population price.
b) Construct a 99% confidence interval estimate for the true population price.
c) Calculate a 95% confidence interval estimate for the true population price assuming that
the sample size had been 50, but otherwise the data were unchanged.

Sample size for estimating mean 


Determining the size of a sample is a very important issue, because samples that are
needlessly large waste time and money, and samples that are too small may lead to poor
results. We now want to address the important question: When we plan to collect a simple
random sample of data that will be used to estimate a population mean , how many sample
values must be obtained? In other words, we will find the sample size n that is required to
estimate the value of a population mean.

Now suppose that we have a desired margin of error and a confidence level. In order to
determine a suitable sample size, we also need to have a preliminary estimate of the population
standard deviation . In practice, this can normally worked out from past experience (e.g. based
on history data), or by carrying out a (small-scale) pilot study to collect some data. Once we
know E, z / 2 and , the required sample size can be obtained by solving the equation
z  
2
E  z / 2 s / n . This yields n    / 2  .
 E 

NB. The sample size must be a whole number because it represents that number of sample
values that must be found. When the above formula does not result in a whole number, always
increase the value of n to the next larger whole number.

4
Example (extra)
Suppose that we want to estimate the mean weight of airline passengers (an important reason
of safety) and it is known that the population standard deviation is 15 kg. How many passengers
must be randomly selected and weighted if we want 95% confidence that the sample mean is
within 5 kg of the population mean?

Solution: E = 5kg,  = 15kg. confidence level = 0.95, hence z / 2 =1.96.

z  
2 2
1.96  15 
n    /2      34.57 so the required sample size is 35.
 E   5 

Confidence Interval for the Population Mean - Small Sample


The CI formula for a mean we have used can only be used when n>30. If our sample is small,
but the population is still normally distributed then using s to estimate  will not be very
accurate so we use as the sampling distribution of the mean a t-distribution. The t-distribution is
a generalisation of the normal distribution which is specific to the size of the sample.

t n 1, / 2 s
The margin of error now becomes E  . The confidence interval for the population
n
s s
mean is given by ( x  E, x  E ) , i.e. ( x  t / 2,n 1 ( ), x  t / 2,n 1 ( )) , where t / 2 ,n 1 denotes
n n
the value of a t-distribution with n-1 degrees of freedom and right-hand-tail area of  / 2 . This
is a symmetric confidence interval, which is what we usually require.

Notes
Refer to t-distribution tables for explanation.
1. A t-distribution is a symmetric distribution that is categorised by its number of degrees of
freedom.
2. The degrees of freedom (df) is the measure of how many values can be determined
independently of a known parameter. For example, if it is known that 10 exam marks
have a mean of 60, then once 9 of these are known, the 10th mark is determined
because the mean must equal 60. This set of values has degrees of freedom = 9.
3. For the estimation of one population mean, the degrees of freedom = n-1, where n is the
sample size.
4. The mean of the t-distribution = 0.
5. The shape of the t-distribution varies with its degrees of freedom.
6. The standard deviation of the t-distribution varies with its degrees of freedom.
7. As n increases, the t-distribution gets closer to the standard normal distribution. For
values of n>30 the t-distribution is very similar to the normal distribution. You can check
this for yourself by examining the values towards the foot of the table of t-distribution
values. Notice how the familiar values 1.96, 2.32 and 2.58 from the normal distribution
are closely matched by t-distributions with larger numbers of degrees of freedom.
8. In fact we could always work with the t-distribution for CIs even for large samples.
However, familiarity with the normal distribution and its ease of use encourages us to
use it whenever we can.

5
Example 2 (Examination Question – adapted)
A controversial media star has recently been appointed editor of the Sunday Contentment
newspaper. Before her arrival management had asked a sample of the staff to rate the quality
of the paper they produced on a 0 to 10 scale. They obtained the following ratings from 10
respondents

8 7 9 7 6 7 7 8 8 9

Construct a 95% confidence interval for the mean rating before the arrival of the new editor.

Solution: From the given sample we can calculate the following values: x  7.6 , s=0.966, n=10
(so the sample size is small).

confidence level = 0.95, hence   0.05 .

t n 1, / 2 s 2.262  0.966


t n1, / 2  t101,0.05 / 2  t 9,0.025 =2.262 and E   0.691
n 10

So the two endpoints of a 95% confidence interval are x  E  6.91 and x  E  8.29 .

Construction of a Confidence Interval for Two Means


If we have two different populations we may wish to compare and contrast their parameters.
This is usually done by considering the difference between the means of different populations.
Using our first example, we might wish to look at the difference between typical house prices in
Loughborough and some other town in another part of England.

How we go about choosing the two samples will affect the way we construct the confidence interval. We
could choose either two matched pair samples (or paired samples) or two independent samples. We
have a paired sample when each observation of one sample is paired with an observation in the other
sample. This means that the items or people in one group have been matched in some way to the other
group. For example in healthcare similar people are matched to compare services or when we want to
compare a before and after effect, i.e. the weight loss achieved from following a particular diet by a group
of people and the weight loss achieved by a similar group of people following a fitness regime. Whereas,
we assume independent samples if the items or people involved in each group are in no way related to
those in the other group. Any similarities are coincidental.

If the populations are normally distributed the differences in mean follow a t-distribution. The degrees of
freedom of the t-distribution are also dependent on the sizes of the two samples.

6
Constructing a confidence interval for paired samples

We find the sample mean differences by calculating the differences for each data point and subsequently
we calculate the standard deviation of the differences using the following formulas:

 Sample mean of the differences: 𝑑̅ = ∑ 𝑑 /𝑛


∑(𝑑−𝑑̅ )2
 Sample standard deviation of the differences: 𝑠 = √ 𝑛−1

Then we calculate the confidence interval of the differences:

𝑑̅ ± 𝐸 = 𝑑̅ ± 𝑡𝑛−1,𝛼/2 𝑠/√𝑛

wheren is the number of pairs and the degrees of freedom are n -1.

Example 3.
Suppose we are interested in comparing the performance ratings of two sports cars. A random sample of
40 drivers is selected to drive the two models. Assume we have a supply of 40 cars for each model (40
New Aston Martin and 40 New Ferrari). The time of each test drive is recorded for each driver on both
cars he or she selects from the pool. The difference in time (Aston M-Ferrari) for each driver is computed
and from these differences a sample mean = 5 seconds and a sample deviation 2.3 seconds are obtained.
Construct a 95% confidence interval for the average time difference in seconds for the two models over
the course driven. Based on these data, which car has a higher performance?

Constructing confidence intervals for independent samples:

When comparing for the difference between two samples, we can only use the t-distribution if both
populations’ standard deviations are equal, and we accordingly have only one standard deviation to
estimate.

(To determine if two populations have equal standard deviations or not we should perform a statistical
test, but for the purposes of this module we shall just decide by judgement. So if two populations have
sample standard deviations of 8.8 and 9.4 we would regard them as being equal but if the figures were 5.8
and 15.4 then we would have to regard them as being unequal.)

The point estimate of the difference between the two samples (d) is now calculated by subtracting the
mean of the two samples:

𝑑̅ = 𝑥̅1 − 𝑥̅2

The margin of error is calculated using the following formula,

1 1
𝐸 = 𝑡𝑛1 +𝑛2 −2,𝛼/2 𝑠𝑝 √ +
𝑛1 𝑛2

wheresp is an estimate of the population standard deviation  calculated using the following formula:

7
(𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22
𝑠𝑝 = √
𝑛1 + 𝑛2 − 2

wheren1, n2 are sizes of the samples from population 1 and population 2 respectively (the samples need
not be of the same size).

The degrees of freedom (df) are calculated using both sample sizes = n1 + n2 -2.

Hence, the confidence interval is

1 1
𝑑̅ ± 𝐸 = 𝑑̅ ± 𝑡𝑛1 +𝑛2 −2,𝛼/2 𝑠𝑝 √ +
𝑛1 𝑛2

Example 4
Recalling the situation of Example 2, where a controversial media star has recently been appointed editor
of the Sunday Contentment newspaper. Before her arrival management had asked a sample of the staff to
rate the quality of the paper they produced on a 0 to 10 scale. They obtained the following ratings from
10 respondents

8 7 9 7 6 7 7 8 8 9

The following ratings were recorded by a sample of 12 respondents after the new editor arrived:

7 7 8 7 5 6 7 6 8 7 7 6

Construct a 95% confidence interval for the mean difference in employee ratings between the two
periods, assuming equal standard deviations.

8
Percentage Points of the t-distribution (One tail)

9
Week 6 Tutorial Questions

1. The mileages recorded for a sample of 50 company vehicles during a given week
yielded a mean of 256.8 miles with a standard deviation of 12.05 miles.

(a) Construct a 99% confidence interval for the mean mileage driven by all
company vehicles. Interpret your confidence interval.

(b) Construct a 95% confidence interval for the mean mileage driven by all company
vehicles. Interpret your confidence interval.

(c) If the sample results had been found fromasampleof20companyvehicles,


what woulda95% confidence interval be?

2. The topic of interest in the business school was whether or not extra mathematics
tutorials would affect the marks of the students. In a study to investigate this, test
scores were comparedbeforeandafterthemathematicstutorialsonagroupof17students
of the class. The differences (before-after) for each of the 17 students were
calculated. The sumo f the differences are - 126.77 and the variance of the
differences is 945.44. Provide a 90% confidence interval for the average difference
of scores for the whole class.

3. The average weekly overtime earnings from a sample of workers this year and last year
were recorded as follows:

Lastyear Thisyear
Number of workers 15 12
Mean overtime earnings £6.50 £5.00
Standard deviation £2.95 £3.05

(a) Construct a95% confidence interval for the decrease in average weekly
overtime earnings (assuming equal standard deviations).

(b) Construct a 95% confidence interval for the decrease in average weekly overtime
earnings, assuming that the above means and standard deviations have been obtained
from a sample of 80 from last year and a sample of 90 from the current year (assuming
equal standard deviations).

(c) Compare the widths of the confidence intervals in parts (a) and (b). What
conclusions can be drawn?

10
4. Data sets have been obtained of advertised prices published in the current issues of
local papers for two different towns in your area for a sample of houses for sale. The
following summary statistics have been obtained.

Town A B
Mean Price(£s) 100,000 105,000
Variance of mean price (£x£s) 12,400,000 12,500,000
Sample size 60 60

(a)Calculate a 95% confidence interval for Town A.

(b)Calculate a95% confidence interval for the difference in advertised prices between
Town B and Town A (assuming equal standard deviations). Interpret your
Confidence interval.

5. Past Exam Question (Altered)


(a) A questionnaire, completed by 40 students at a large university, reveals that
the average satisfaction score (outof10) with their choice of accommodation
is 7.45 with a standard deviation of 1.52. Construct a 95% confidence
interval for the mean of all students at the university who are satisfied with
their choice of accommodation.

(b) If the sample size is 1000, calculate the width of the confidence interval.
By how much does the width of the CI change?

(c) For the case of sample size 𝑛 = 1000, what is the width of the confidence
interval if you use 𝑧0.025 = 1.96 to calculate the margin of error, i.e. 𝐸 =
𝑧𝛼/2 𝑠/√𝑛 ? Comment on the result in comparison with that in part (b).

(d) On the basis of parts (a)-(c), discuss what role the sample size plays in
estimation. Think of as many points as you can.

11

You might also like