Chapter 6 – Sampling distributions
6.1 Definitions
A sampling distribution arises when repeated samples are drawn from a particular
population (distribution) and a statistic (numerical measure of description of sample data) is
calculated for each sample. The interest is then focused on the probability distribution
(called the sampling distribution) of the statistic.
Sampling distributions arise in the context of statistical inference i.e. when statements are
made about a population on the basis of random samples drawn from it.
Example
Suppose all possible samples of size 2 are drawn with replacement from a population with
sample space S = {2, 4, 6, 8} and the mean calculated for each sample.
The different values that can be obtained and their corresponding means are shown in the
table below.
1st value/2nd value 2 4 6 8
2 2 3 4 5
4 3 4 5 6
6 4 5 6 7
8 5 6 7 8
In the above table the row and column entries indicate the two values in the sample (16
possibilities when combining rows and columns). The mean is located in the cell
4+ 6
=5
corresponding to these entries e.g. 1st value = 4, 2nd value = 6 has a mean entry of 2 .
Assuming that random sampling is used, all the mean values in the above table are equally
likely. Under this assumption the following distribution can be constructed for these mean
values.
x̄ 2 3 4 5 6 7 8 sum
count 1 2 3 4 3 2 1 16
1 1 3 1 3 1 1
P( X̄ = x̄ )
1
16 8 16 4 16 8 16
The above distribution is referred to as the sampling distribution of the mean for random
samples of size 2 drawn from this distribution. For this sampling distribution the population
size N = 4 and the sample size n = 2.
The mean of the population from which these samples are drawn is µ = 5 and the variance is
σ2 = [ ∑ x −( ∑ x ) / N ]÷N = (22+42+62+82 -202/4)/4 = 5.
2 2
The sampling distribution of the mean has mean
μ X̄ =5 and variance
440
−5 2
σ 2X̄ = ∑ x̄2
P( X̄= x̄ ) 2
-µ = 16 = 2.5 (verify this result).
Note that
μ X̄ =5 = µ and that σ 2X̄ = σ2/2 = 5/2 =2.5.
Consider a population with mean µ and variance σ2. It can be shown that the mean and
variance of the sampling distribution of the mean, based on a random sample of size n, are
given by
μ X̄ =μ and σ 2X̄ = σ2/n.
σ
σ X̄ = √ n is known as the standard error.
In the preceding example n = 2.
Sampling distributions can involve different statistics (e.g. sample mean, sample proportion,
sample variance) calculated from different sample sizes drawn from different distributions.
Some of the important results from statistical theory concerning sampling distributions are
summarized in the sections that follow.
6.2 The Central Limit Theorem
The following result is known as the Central Limit Theorem.
Let X1, X2, . . . , Xn be a random sample of size n drawn from a distribution with mean µ
n
X̄ =∑ X i / n
and variance σ2 (σ2 should be finite). Then for sufficiently large n the mean i=1 is
approximately normally distributed with mean = X̄
μ =μ and variance = σ X̄ = σ2/n.
2
This result can be written as X̄ ~ N(µ, σ2/n).
Note:
X̄−μ
1 The random variable Z = σ / √ n ~ N(0, 1).
2 The value of n for which this theorem is valid depends on the distribution from which the
sample is drawn. If the sample is drawn from a normal population, the theorem is valid for all
n. If the distribution from which the sample is drawn is fairly close to being normal, a value
of n > 30 will suffice for the theorem to be valid. If the distribution from which the sample is
drawn is substantially different from a normal distribution e.g. positively or negatively
skewed, a value of n much larger than 30 will be needed for the theorem to be valid.
3 There are various versions of the central limit theorem. The only other central limit
theorem result that will be used here is the following one.
If the population from which the sample is drawn is a Bernoulli distribution (consists of only
values of 0 or 1 with probability p of drawing a 1 and probability of q = 1-p of drawing a 0),
n
S=∑ X i 2
then i=1 follows a binomial distribution with mean µS = np and variance σ S = npq.
n
^
P=S / n=∑ X i / n
According to the central limit theorem, i=1 follows a normal distribution
^ ^ σ 2S
with mean µ( P ) = µS /n = np/n = p and variance σ2( P ) = / n2 = npq/n2 = pq/n when n is
^
sufficiently large. P is the proportion of 1’s in the sample and can be seen as an estimate of
p the proportion of 1’s in the population (distribution from which sample is drawn).
Using the central limit theorem, it follows that
^
P−μ( P^ ) P−^ p
=
^ √ pq/n
Z = σ(P) ~ N(0, 1).
Example:
An electric firm manufactures light bulbs whose lifetime (in hours) follows a normal
distribution with mean 800 and variance 1600. A random sample of 10 light bulbs is drawn
and the lifetime recorded for each light bulb. Calculate the probability that the mean of this
sample
(a) differs from the actual mean lifetime of 800 by not more than 16 hours.
(b) differs from the actual mean lifetime of 800 by more than 16 hours.
(c) is greater than 820 hours.
(d) is less than 785 hours.
(a) P(-16 ≤ X̄ − 800 ≤ 16) = P( | X̄ −800 | ≤ 16) = P(|Z |≤ 16/√ 1600/10 ) = P(|Z|≤1.265)
= P(Z ≤ 1.265) – P(Z ≤ -1.265)
= 0.8971 – 0.1029 = 0.7942
(b) P( | X̄ −800 | > 16 ) = 1 - P( | X̄ −800 | ≤ 16) = 1 – 0.7942 = 0.2058
820−800
)
X̄
(c) P( > 820) = P( Z > √ 1600 /10 = P( Z > 1.58) = 1 – 0.9429 = 0.0571
785−800
)
(d) P( X̄<¿ ¿ 785) = P( Z < √ 1600 /10 = P( Z < -1.19) = 0.117
6.3 The t-distribution (Student’s t-distribution)
X̄−μ
The central limit theorem states that the statistic Z = σ / √ n follows a standard normal
distribution. If σ is not known, it would be logical to replace σ (in the formula for Z) by its
X̄−μ
sample estimate S. For small values of the sample size n , the statistic t = S/ √ n does not
follow a normal distribution. If it is assumed that sampling is done from a population that is
approximately a normal population, the distribution of the statistic t follows a t-distribution.
This distribution changes with the degrees of freedom = df = n-1 i.e.for each value of
degrees of freedom a different distribution is defined.
The t-distribution was first proposed in a paper by William Gosset in 1908 who wrote the
paper under the pseudonym “Student”. The t-distribution has the following properties.
1. The Student t-distribution is symmetric and bell-shaped, but for smaller sample sizes it
shows increased variability when compared to the standard normal distribution (its curve
has a flatter appearance than that of the standard normal distribution). In other words, the
distribution is less peaked than a standard normal distribution and with thicker tails. As
the sample size increases, the distribution approaches a standard normal distribution. For
n > 30, the differences are fairly small.
2. The mean is zero (like the standard normal distribution).
3. The distribution is symmetrical about the mean.
4. The variance is greater than one, but approaches one from above as the sample size
increases (σ2=1 for the standard normal distribution).
The graph below shows how the t-distribution changes for different values of r (the degrees
of freedom).
Tables for the t-distribution
The layout of the t-tables is as follows.
ν =df/α 0.90 0.95 . 0.995
0 .
1 3.07 6.314 63.66
8
2 1.88 2.920 9.925
6
. .
. .
∞ 1.28 1.645 2.576
2
The row entry is the degrees of freedom (df) and the column entry (α) the area under the t-
curve to the left of the value that appears in the table at the intersection of the row and
column entry.
When a t-value that has an area less than 0.5 to its left is to be looked up, the fact that the t-
distribution is symmetrical around 0 is used i.e.
P(t ≤ tα) = P(t ≤ -t1-α) = P(t ≥ t1-α) for α ≤ 0.5 (Using symmetry).
This means that tα = -t1-α .
Examples
1 For df = 2 and α = 0.995 the entry is 9.925. This means that for the t-distribution with 2
degrees of freedom
P(t ≤ 9.925) = 0.995.
2 For df = ∞ and α = 0.95 the entry is 1.645. This means that for the t-distribution with ∞
degrees of freedom
P(t ≤ 1.645) = 0.95. This is the same as P(Z ≤ 1.645) , where Z ~ N(0,1).
3 For df = ν = 10 and α = 0.10 the value of t0.10 such that P(t ≤ t0.10 ) = 0.10 is found from
t0.10 = -t1-0.10 = -t0.90 = -1.372.
Note that the percentile values in the last row of the t-distribution are identical to the
corresponding percentile entries in the standard normal table. Since the t-distribution for large
samples (degrees of freedom) is the same as the standard normal distribution, their percentiles
should be the same.
6.4 The chi-square (χ2) distribution
The chi-square distribution arises in a number of sampling situations. These include the ones
described below.
1 Drawing repeated samples of size n from an approximate normal distribution with
variance σ2 and calculating the variance (S2) for each sample. It can be shown that the
quantity
(n−1)S 2
χ2 = σ 2 follows a chi-square distribution with degrees of freedom = n-1.
2 When comparing sequences of observed and expected frequencies as shown in the table
below. The observed frequencies (referring to the number of times values of some variable of
interest occur) are obtained from an experiment, while the expected ones arise from some
pattern believed to be true.
observed frequency f1 f2 .. fk
expected frequency e1 e2 .. ek
k
( f i −ei )2
∑ e
The quantity χ2 = i=1 i can be shown to follow a chi-square distribution with k-1
degrees of freedom. The purpose of calculating this χ2 is to make an assessment as to how
well the observed and expected frequencies correspond.
The chi-square curve is different for each value of degrees of freedom. The graph below
shows how the chi-square distribution changes for different values of ν (the degrees of
freedom).
Unlike the normal and t-distributions the chi-square distribution is only defined for positive
values and is not a symmetrical distribution. As the degrees of freedom increase, the chi-
square distribution becomes more a more symmetrical. For a sufficiently large value of
degrees of freedom the chi-square distribution approaches the normal distribution.
Tables for the chi-square distribution
The layout of the chi-square tables is as follows.
ν= 0.005 0.01 . 0.99 0.995
df/α .
1 0.000039 0.000157 6.63 7.88
2 0.010025 0.020101 9.21 10.60
.
30 13.79 14.95 50.89 53.67
The row entry is the degrees of freedom (df) and the column entry (α) the area under the chi-
square curve to the left of the value that appears in the table at the intersection of the row and
column entry.
Examples:
1 For df = 30 and α = 0.01 the entry is 14.95. This means that for the chi-square distribution
with 30 degrees of freedom
2
P( χ ≤ 14.95) = 0.01.
2 For df = 30 and α = 0.995 the entry is 53.67. This means that for the chi-square distribution
with 30 degrees of freedom
2
P( χ ≤ 53.67) = 0.995.
3 For df = 6 and α = 0.95 the entry is 12.59. This means that for the chi-square distribution
with 6 degrees of freedom
2 2
P( χ ≤12 .59 ) = 0.95 or P( χ >12. 59 ) = 0.05.
This probability statement is illusrated in the next graph.
6.5 The F-distribution
Random samples of sizes n and m are drawn from normally distributed populations that are
2
labeled 1 and 2 respectively. Denote the variances calculated from these samples by S1 and
S22 respectively and their corresponding population variances by σ 21 and σ 22 respectively.
S 21 / σ 21
F=
The ratio S 22 / σ 22 is distributed according to an F-distribution (named after the famous
statistician R.A. Fisher) with degrees of freedom
df 1=n 1−1 (called the numerator degrees of
freedom) and
df 2 =n2 −1 (called the denominator degrees of freedom). When σ 21= σ 22 the
S 21
F=
F-ratio is S 22 .
The F-distribution is positively skewed, and the F-values can only be positive. The graph
2 2
below shows plots for a number of F-distributions (F-curves) with σ 1= σ 2 . These plots are
referred to by F (df 1 , df 2 ) e.g. F (33,10) refers to an F-distribution with 33 degrees of
freedom associated with the numerator and 10 degrees of freedom associated with the
df df
denominator. For each combination of 1 and 2 there is a different F-distribution. Three
other important distributions are special cases of the F-distribution. The normal distribution is
an F(1, infinity) distribution, the t-distribution an F(1, n2 ) distribution and the chi-square
distribution an F(
n1 , infinity) distribution.
Tables for the F-distribution
2 2
The layout of the F-distribution tables with σ 1= σ 2 is as follows.
df2/df1 1 2 ... ∞
1 161.5 199.5 254.
3
2 18.51 19.0
19.5
.
∞ 3.85 3.0 1.01
...
The entry in the table corresponding to a pair of (
df 1 , df 2 ) values has an area of α under the
F (df 1 , df 2 ) curve to its right.
Examples
1 F (3 ,26 )=2. 98 has an area (under the F (3,26) curve) of α =0 . 05 to its right (see graph
below).
2 F (4 ,32 )=2.67 has an area (under the F (4,32) curve ) of α =0 . 05 to its right (see graph
below).
For each different value of α a different F-table is used to read off a value that has an area of
α to its right i.e. a percentage of 100(1-α ) to its left. The F-tables that are used and their α
and 100(1-α ) values are summarized in the table below.
α Percentage point = 100(1-α )
0.05 95%
0.02 97.5%
5
0.01 99%
The first entry in the above table refers to the percentage of the area under the F-curve to the
left of the F-value read off and the second entry to the proportion under the F-curve to the
right of this F-value.
Examples:
1 For 1
df =7 , df =52 the value read from the 95% F-distribution table is 4.88. This means
that for this F-distribution 95% of the area under the F-curve is to the left of 4.88 (a
proportion of 0.05 to the right of 4.88).
P( F ¿4 .88) = 0.95
P( F >4.88) = 0.05
2 For 1
df =7 , df =5
2 the value read from the 97.5% F-distribution table is 6.85. This
means that for this F-distribution 97.5% of the area under the F-curve is to the left of 6.85 (a
proportion of 0.025 to the right of 6.85).
P( F ¿6.85) = 0.975
P( F >6.85) = 0.025
3 For 1
df =10 , df =17
2 the value read from the 99% F-distribution table is 3.59. This means
that for this F-distribution 99% of the area under the F-curve is to the left of 3.59 (a
proportion of 0.01 to the right of 3.59).
P( F ¿3 .59 ) = 0.99
P( F >3.59) = 0.01
Lower tail values from the F-distribution
Only upper tail values (areas of 5%, 2.5% and 1% above) can be read off from the F-tables.
Lower tail values can be calculated from the formula
1
F (df 1 , df 2 ;α )=
F (df 2 , df 1 ,1−α ) i.e.
F value with an area α under the F-curve to its left
= 1/ (F value with an area 1− α under the F-curve to its left with numerator and
denominator
degrees of freedom interchanged)
Examples
1 Find the value such that 2.5% of the area under the F(7,5) curve is to the left of it.
In the above formula
df 1=7 , df 2 =5 and α =0 . 025 . Then
1 1
F (7,5;0.025)= = =0.189.
F (5,7;0.975) 5.29
2 Find the value such that 1% of the area under the F(10,17) curve is to the left of it.
In the above formula
df 1=10 , df 2 =17 and α =0 . 01 . Then
1 1
F (10,17;0.01)= = =0.223.
F (17,10; 0.99) 4.49
6.6 Computer output
In excel values from the t, chi-square and F-distributions, that have a given area under the
curve above it, can be found by using the TINV(area, df), CHIINV (area, df) and FINV(area,
df1,df2) functions respectively.
Examples
1 TINV(0.05, 15) = 2.13145. The area under the t(15) curve to the right of 2.13145 is 0.025
and to the left of -2.13145 is 0.025. Thus the total tail area is 0.05.
2 CHIINV(0.01, 14) = 29.14124. The area under the chi-square (14) curve to the right of
29.14124 is 0.01.
3 FINV(0.05,10,8) = 3.347163. The area under the F (10, 8) curve to the right of 3.347163
is 0.05.