Professional Documents
Culture Documents
2 Sampling Distributions
Time
It may take to much time to contact the whole population. Even if you could contact the
whole population the results may be meaningless as they would be out of date, e.g. if it
took me 2 years to poll the voting population in the 1 year run up to an election.
Cost
The cost may be too high and hence prohibitive, e.g. if I was to poll the 90 million people
who vote a general election, the cost would be astronomical.
It may well be impossible to track the whole population down, e.g. it would be difficult to
find all 90 million voters in the run up to a general election.
The additional accuracy gained from testing a whole population rather than a sample may
not justify the additional time, cost or effort expended in doing so.
Section 1 - Chapter 2 63
Sampling Distribtions Chapter 2
Sampling Methods
When sampling we must ensure that we choose a sample which is representative of the
whole population.
The sampling methods that follow are just a few ways in which sampling can be carried
out. Other methods also exist which are not discussed here.
A sample is selected so that each item or person in the population has the same chance of
being selected.
Example
An example of this would be writing this class’ names on pieces of paper and then
putting the names in a box and drawing names from that box.
The starting point would be a random number between 1 and k. Then we would pick
every number after that.
Example
If we had a population of size 2,000 and we wanted to choose a sample of size 100, then
2,000
k 20 . We would then choose a random number between 1 and 20 as our
100
starting point.
If we choose 18 as our random starting point, then starting with the 18th observation,
every 20th observation (18, 38, 58,……) would be chosen. We would end up with a
sample of 100 observations.
Section 1 - Chapter 2 64
Sampling Distribtions Chapter 2
Example
Consider the advertising expenditure for the largest 352 companies in the United States.
Suppose we wanted to study whether firms with high returns on equity, spent more of
each sales dollar on advertising than firms with a low return or deficit.
If we use simple random sampling firms in the 3rd and 4th strata would have a much
higher chance (87%) of being chosen, whereas firms in the 1 st and 5th strata would have
little chance of being chosen and may well not be chosen at all.
Section 1 - Chapter 2 65
Sampling Distribtions Chapter 2
Cluster Sampling
A population is divided into clusters using naturally occurring geographic or other
boundaries. Ideally each cluster is a representative small scale version of the population
(i.e. heterogeneous). A simple random sample of the clusters is then chosen. All elements
within each sampled (chosen) cluster form the sample.
So here, we will not have all clusters (groups) represented in our sample.
Sampling “Error”
Sampling error is the difference between a sample statistic and its corresponding
population parameter. In the case of the mean, it is
X Where:
Samples are used to estimate population characteristics. For example the mean of a
sample is used to estimate the mean of the population. However, since the sample is only
part of the population, it is unlikely that the sample mean will be exactly equal to the
population mean.
Likewise, the sample standard deviation is unlikely to be exactly the same as the
population standard deviation.
We would not be surprised then if the sample statistic is different from the corresponding
population parameter.
Section 1 - Chapter 2 66
Sampling Distribtions Chapter 2
Example
Last week the output for each employee was 97, 103, 96, 99 and 105 units.
Suppose we select two employees whose output was 97 and 105. The mean of this
97 105
sample is 101.
2
Suppose we select another two employees whose output was 103 and 96. The mean of
103 96
this sample is 99.5.
2
97 103 96 99 105
We know however that the mean of the population is 100
5
This was found from X , where x = sample mean & = population mean.
Each of these differences, 1.0 and -0.5, is the sampling error made in estimating the
population mean based on the sample mean.
Each of the possible samples of size 2 has an equal chance of selection. Each sample may
have a different sample mean and hence, sampling error. The value of the sampling error
is based on the random selection of the sample. Therefore, sampling errors are random
and occur by chance.
Here our random variable will be a mean. Each observation represents the average of a
sample of size n.
Organizing the means of all possible samples of size n, into a probability distribution
would result in us obtaining the sampling distribution of the sample mean.
Section 1 - Chapter 2 67
Sampling Distribtions Chapter 2
Example-Constructing Sampling Distribution of Sample Mean
Yerani Industries has seven production employees (the population). The hourly earnings
for each employee are given in the table below.
Joe $7
Sam $7
Sue $8
Bob $8
Jan $7
Art $8
Ted $9
What is the sampling distribution of the sample mean of the samples of size 2?
To arrive at the sampling distribution of the sample mean, all possible samples of size 2
need to be selected without replacement from the population, and their means computed.
7!
There are 21 possible samples ( 7 C 2 21 ).
2!5!
Listed below are all the 21 sample means from all samples of size 2.
Total 21 1.00
Section 1 - Chapter 2 68
Sampling Distribtions Chapter 2
Let us consider rolling a fair die an infinite number or times. We know that the possible
outcomes are 1, 2, 3, 4, 5, 6. The probability distribution of the random variable X is:
X 1 2 3 4 5 6
P(X) 1/6 1/6 1/6 1/6 1/6 1/6
xP (x)
x
2
2 P ( x)
2 1 2 1 2 1 2 1 2 1 2 1
1 3.5 2 3.5 3 3.5 4 3.5 5 3.5 6 3.5 2.92
6 6 6 6 6 6
The Distribution of x is
Section 1 - Chapter 2 69
Sampling Distribtions Chapter 2
We can create the sampling distribution of the mean of two dice, by drawing samples of
size 2 from the population. For each sample of two dice we add their scores and divide by
2 – that is, take the average. We have constructed a new random variable x .
Each sample mean can then be recorded. Using classical probability this will lead to the
following table:
Sample # Sample x
1 1, 1 1.0
2 1, 2 1.5
3 1, 3 2.0
4 1, 4 2.5
5 1, 5 3.0
6 1, 6 3.5
7 2, 1 1.5
8 2, 2 2.0
9 2, 3 2.5
10 2, 4 3.0
11 2, 5 3.5
12 2, 6 4.0
12 3, 1 2.0
14 3, 2 2.5
15 3, 3 3.0
16 3, 4 3.5
17 3, 5 4.0
18 3, 6 4.5
19 4, 1 2.5
20 4, 2 3.0
21 4, 3 3.5
22 4, 4 4.0
23 4, 5 4.5
24 4, 6 5.0
25 5, 1 3.0
26 5, 2 3.5
27 5, 3 4.0
28 5, 4 4.5
29 5, 5 5.0
30 5, 6 5.5
31 6, 1 3.5
32 6, 2 4.0
33 6, 3 4.5
34 6, 4 5.0
35 6, 5 5.5
36 6, 6 6.0
There are 36 possible samples of size 2. Each sample outcome is equally likely and has a
probability of 1/36 of occurring. x can assume only 11 different possible values: 1.0,
1.5, ……….6.0, with certain values of x occurring more frequently than others.
Section 1 - Chapter 2 70
Sampling Distribtions Chapter 2
Sampling Distribution of x
x P(x )
1.0 1/36
1.5 2/36
2.0 3/36
2.5 4/36
3.0 5/36
3.5 6/36
4.0 5/36
4.5 4/36
5.0 3/36
5.5 2/36
6.0 1/36
The value x =1.0 occurs only once, so its probability is 1/36. The value of x =3.5 occurs
6 times, so its probability is 6/36.
x x P (x )
1 2 3 3 2 1
1.0 1.5 2.0 .......... .5.0 5.5 6.0 3.5
36 36 36 36 36 36
2 x x x P ( x )
2
2 1 2 2 2 1
1.0 3.5 1.5 3.5 .......... ...... 6.0 3.5 1.46
36 36 36
Note that the mean of 3.5 is the same as the mean of the population of tossing a die.
Further, note that the variance of the sampling distribution of x , where n=2, is 1.46 which
is exactly half the variance of the population of the toss of a die (2.92).
Section 1 - Chapter 2 71
Sampling Distribtions Chapter 2
The Distribution of x is
Section 1 - Chapter 2 72
Sampling Distribtions Chapter 2
Repeating the experiment with larger sample sizes n, the sampling distribution tends to
resemble a normal probability distribution.
Section 1 - Chapter 2 73
Sampling Distribtions Chapter 2
Mean of x =3.5 and variance of x =2.92/25
For each value of n, the mean of the sampling distribution is exactly the same as the
population mean.
That is:
and the standard deviation is This is known as the standard error of the mean.
n
Where:
Also as the sample size n, increases, the sample means tend to cluster around the true
population mean.
The sampling distribution of the mean of a random sample drawn from any population is
approximately normal for a sufficiently large sample size, typically taken to be least 30
observations. The larger the sample size the more closely the sampling distribution will
resemble a normal distribution.
This means that as the sample size, n, gets larger, the sample means tend to follow a
normal probability distribution and tend to cluster around the true population mean. This
holds regardless of the distribution of the population from which the sample was drawn.
In summary, regardless of the type of distribution for which one draws a random sample,
the sampling distribution will be normal under certain conditions:
Section 1 - Chapter 2 74
Sampling Distribtions Chapter 2
Example
Here we have an underlying normal distribution x, with mean = and
standard deviation = .
Normal Distribution of x, with Mean = and Standard Deviation =
NORMAL DISTRIBUTION
x
We will generate the x distribution (sample size n), with mean = and standard
deviation = . This will also be a normal distribution – according to the central limit
n
theorem.
Normal Distribution of x , with Mean = and Standard Deviation =
n
NORMAL DISTRIBUTION
z
This rule holds true for any underlying distribution x of x . So even if the underlying
distribution, x, was not normal, the distribution of x would still be normal with mean =
and standard deviation = . This is the result of the central limit theorem and we
n
must bear in mind that the sample size n must be at least 30.
Section 1 - Chapter 2 75
Sampling Distribtions Chapter 2
x when the population standard deviation, , is KNOWN
n
s
sx when the population standard deviation, , is UNKNOWN
n
x
x the mean of the sample means
the mean of the population
Here we take the average of the sample means and use that as an approximation to the
population mean. It is denoted by x .
Section 1 - Chapter 2 76
Sampling Distribtions Chapter 2
Since the sample means tend to follow a normal probability distribution – (we know this
by looking at the Central Limit Theorem) we can use the ideas discussed earlier to
compute the probability that a sample mean will fall within a certain range.
When the population standard deviation and population mean are both KNOWN.
X
z
n
When the population standard deviation is UNKNOWN and the population mean is
KNOWN.
X
z
s n
When the population standard deviation is KNOWN and the population mean is
UNKNOWN.
X x
z
n
When the population standard deviation and population mean are both UNKNOWN.
X x
z
s n
Where:
Section 1 - Chapter 2 77
Sampling Distribtions Chapter 2
Example
The foreman of a bottling plant has observed that the amount of soda in each 32-ounce
bottle is actually a normally distributed random variable, with mean of 32.2 ounces and a
standard deviation of 0.3 ounces.
a) If a customer buys one bottle, what is the probability that the bottle will contain more
than 32 ounces?
b) If a customer buys a carton of four bottles, what is the probability that the mean
amount of the four bottles will be greater than 32 ounces?
Solution a.
(The solution uses table B8. You may also use tables B81 & B82 and Excel to solve. The
methods to achieve this were covered at great length in the previous chapter.)
Let X be the random variable representing the amount of soda in one bottle.
It is normally distributed with mean = 32.2 and SD = 0.3
X mean 32 32 . 2
P ( X 32 ) P ( ) P ( Z 0 . 67 )
SD 0 .3
NORMAL DISTRIBUTION
Mean=0 & SD=1
We require this area
where z is greater than -0.67
z
–0.67 0
P ( 0 . 67 Z 0 ) 0 . 5
0 . 2486 0 . 5 0 . 7486
Section 1 - Chapter 2 78
Sampling Distribtions Chapter 2
Solution b.
(The solution uses table B8. You may also use tables B81 & B82 or Excel to solve. The
methods to achieve this were covered at great length in the previous chapter.)
Let X be the random variable representing the average amount of soda in four bottles.
0 .3
It is normally distributed with mean = 32.2 and SD 0 . 15
4
X mean 32 32 . 2
P ( X 32 ) P ( ) P ( Z 1 . 33 )
SD 0 . 15
NORMAL DISTRIBUTION
Mean=0 & SD=1
We require this area
where z is greater than -1.33
z
–1.33 0
P ( 1 . 33 Z 0 ) 0 . 5
0 . 4082 0 . 5 0 . 9082
Section 1 - Chapter 2 79
Sampling Distribtions Chapter 2
Example
A real estate exams scores are normally distributed with mean 430 and standard deviation 20.
If we randomly selected 50 exams what is the probability that the sample mean of these 50
exams would exceed a score of 458?
(The solution uses table B8. You may also use tables B81/B82 and Excel to solve. The
methods to achieve this were covered at great length in the previous chapter.)
NORMAL DISTRIBUTION
Mean=0 & SD=1
We require this area
where z is greater than 9.899
z
0 9.899
Section 1 - Chapter 2 80
Sampling Distribtions Chapter 2
We may be interested in testing measures other than the sample mean. We may be
interested in measuring the percentage of people in the work force that would opt for
early retirement. Each person has two choices of either agreeing with early retirement or
not. This experiment follows a binomial probability distribution (there are three other
conditions, see end of chapter 3)
As we do not know the proportion of people in the population of the workforce that
would opt for early retirement we can take samples and calculate the approximate
population proportion.
If the samples are large enough we may use the normal distribution as an approximation
to the binomial.
The conditions that must apply for this to be the case are:
If ∗ ≥ 5 ∗ ≥ 5 where:
= ℎ
= ℎ ( =1− )
= ℎ ℎ
Taking a number of samples of workers and working out the sample proportion, , for
each is the first step to finding an approximation for the population proportion. We may
then take all of the sample proportions and calculate the average. This will give us an
estimate of the population proportion.
The standard deviation of such a distribution- known as the standard error of the
proportion, is denoted by , where:
(1 − )
=
Section 1 - Chapter 2 81
Sampling Distribtions Chapter 2
Example
Suppose we take 10 sample groups of 150 people in each group, and record the number
of people in each group that agree with early retirement. The following are the results:
Averaging these all out gives an approximation for the population proportion:
0.173 + 0.12 + 0.14 + 0.2 + 0.16 + 0.14 + 0.107 + 0.187 + 0.233 + 0.18
= 0.164
10
(1 − ) 0.164(1 − 0.164)
= = = √0.000914 = 0.030
150
Section 1 - Chapter 2 82
Sampling Distribtions Chapter 2
Now we can answer such questions as, “what is the probability that 20% or less of the
workforce will agree with early retirement?” We already have the mean and standard
error and we know we can use the normal distribution to approximate the binomial.
Sampling Distribution
of the Proportion
Standardizing, gives
̅ . .
< = ( < 1.20) = 0.8849
.
Sampling Distribution
of the Proportion
0.8849
1.2 z
Section 1 - Chapter 2 83
Sampling Distribtions Chapter 2
STATS
Exercises 2 Normal Distribution – Sampling & Central Limit Theorem
1. A normal population has a mean of 60 and a standard deviation of 12. You select a
random sample of 9. Compute the probability the sample mean is:
2. A population of unknown shape has a mean of 75. You select a sample of 40. The
standard deviation of the sample is 5. Compute the probability the sample mean is:
3. The mean rent for a one-bedroom apartment in Southern California is $2,200 per
month. The distribution of the monthly rent does not follow the normal distribution. In
fact, it is positively skewed. What is the probability of selecting a sample of 50 one-
bedroom apartments and finding the mean to be at least $1,950 per month? The standard
deviation of the sample is $250.
4. According to an IRS study, it takes an average of 330 minutes for taxpayers to prepare,
copy, and electronically file a 1040 tax form. A consumer watchdog agency selects a
random sample of 40 taxpayers and finds the standard deviation of the time to prepare,
copy, and electronically file form 1040 is 80 minutes.
a. What assumption or assumptions do you need to make about the shape of the
population?
b. What is the standard error of the mean?
c. What is the likelihood the sample mean is greater than 320 minutes?
d. What is the likelihood the sample mean is between 320 and 350 minutes?
e. What is the likelihood the sample mean is greater than 350 minutes?
Section 1 - Chapter 2 84
Sampling Distribtions Chapter 2
Section 1 - Chapter 2 85
Sampling Distribtions Chapter 2
Section 1 - Chapter 2 86