You are on page 1of 40

Chi-Kong Ng, ENGG2780B, Dept.

of SEEM, CUHK 2:1

Chapter 2.

Sampling Distributions
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:2

2.1. Introduction
2.1.1. Populations and Samples
• A population consists of the totality of the observations with which
we are concerned.
• When the population is too large to study in its entirety, or
techniques used in the study are destructive in nature, in either
cases we must depend on a subset or “sample” of observations from
the population to help us make inferences concerning that same
population.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:3

2.1.2. Random Sampling


• To eliminate any possibility of bias in the sampling procedure, it is
desirable to choose a random sample.
• If X1, X2, . . . , Xn are independent and identically distributed (IID)
random variables, we say that they constitute a random sample
of size n from the infinite population given by their common
probability distribution/density function f (x). Note that this
definition applies also to sampling with replacement from finite
populations.
• We say that the random variables X1, X2, . . . , Xn constitute a
random sample of size n from a finite population of size N if its
values are chosen so that each subset of n of the N elements of the
population has the same probability of being selected.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:4

• Before a random sample of size n is selected, the observations are


modeled as the random variables X1, X2, . . . , Xn.
We also apply the term random sample to the set of observed values
x1, x2, . . . , xn of the random variables.
The lower case distinguishes the realization of a random sample
from the upper case which represents the random variables before
they are observed.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:5

2.1.3. Statistics and Sampling Distributions


• Statistical inferences are usually based on statistics.
• A statistic is a random variable which is a function of a random
sample X1, X2, . . . , Xn.
• Distributions of statistics are referred to as sampling distributions.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:6

2.2. The Sample Mean


• If X1, X2, . . . , Xn constitute a random sample, then the statistic
n
1X
X̄ = Xj .
n j=1

is called the sample mean or the mean of the random sample.


Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:7

Exercise 2.1. A random sample of size 9 yields the following


observations on the random variable X, the coal consumption in millions
of tons by electric utilities for a given year:
406, 395, 400, 450, 390, 410, 415, 401, 408.
Find the sample mean of these data.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:8

2.2.1. Sampling from an Infinite Population


Theorem 2.1. If X̄ is the mean of a random sample of size n from
an infinite population with the mean µ and the variance σ 2, then

2 σ2
E(X̄) = µX̄ = µ and Var(X̄) = σX̄ =
n
2
and σX̄ (the positive square root of σX̄ ) is called the standard error of
the mean.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:9

Theorem 2.2 (Chebyshev’s Theorem for Sampling Distribution). If X̄


is the mean of a random sample of size n from an infinite population
with the mean µ and the variance σ 2, then for any positive constant c,
σ2
P (|X̄ − µ| ≥ c) ≤ 2 ,
nc
or equivalently,
σ2
P (|X̄ − µ| < c) ≥ 1 − 2 .
nc
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:10

Theorem 2.3 (The Law of Large Numbers). If X̄ is the mean of a


random sample of size n from an infinite population with the mean µ
and the variance σ 2, then for any positive constant c,
P (|X̄ − µ| ≥ c) → 0 as n → ∞.

Theorem 2.4 (Central Limit Theorem). If X̄ is the mean of a random


sample of size n from an infinite population with the mean µ and the
variance σ 2, then the statistic (called the standardized sample mean)
X̄ − µ
Z= √
σ/ n
is a random variable whose probability density function approaches that
of the standard normal distribution as n → ∞.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:11

Remark: Although the Central Limit Theorem will work well for small
samples in most cases, particularly where the population is continuous,
unimodal, and symmetric, larger samples (depending on the shape of
the population) will be required in other situations. In many cases
of practical interest, if n ≥ 30, the normal approximation will be
satisfactory regardless of the shape of the population.
An illustration of the approach
toward normality for the sampling
distribution of X̄ as sample size
increases
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:13

Example 2.2. If a 1-gallon can of paint covers on the average


513.3 square feet with a standard deviation of 30 square feet, what
is the probability that the sample mean area covered by a sample of 36
of these 1-gallon cans will be anywhere from 510 to 520 square feet?
Solution:
P (510 < X̄ < 520)
 
510 − 513.3 X̄ − µ 520 − 513.3
=P √ < √ < √
30/ 36 σ/ n 30/ 36
= P (−0.66 < Z < 1.34)
≈ Φ(1.34) − Φ(−0.66)
≈ 0.9099 − 0.2546
= 0.6553.

Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:14

Exercise 2.3. The lifetime of a special type of battery is a random


variable with mean 40 hours and standard deviation 20 hours.
A battery is used until it fails, at which point it is replaced by a new
one.
Assuming a stockpile of 36 such batteries the lifetimes of which are
independent, approximate the probability that over 1560 hours of use
can be obtained.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:15

2.2.2. Sampling from a Finite Population


Theorem 2.5. If X̄ is the mean of a random sample of size n from a
finite population of size N with the mean µ and the variance σ 2, then

2 σ2 N − n
E(X̄) = µX̄ = µ and Var(X̄) = σX̄ = · .
n N −1

• It is of interest to note that the formulas we obtained for Var(X̄) of


the random sample of size n from an infinite population and that
from a finite population of size N differ only by the finite population
correction factor
N −n
.
N −1
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:16

• Indeed, when N is large compared to n, then


N −n
→1
N −1
and thus
σ2
Var(X̄) → .
n
A general rule of thumb is to use this approximation when
n ≤ N/20.
• In practice, we often deal with random samples from populations
that are finite, but large enough to be treated as if they were infinite.
Thus, most statistical theory and most of the methods we shall
discuss apply to samples from infinite populations.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:17

2.3. The Sample Variance


• Suppose that X1, X2, . . . , Xn constitute a random sample of size n.
Let X̄ be the sample mean. The statistic
n
2 1 X
S = (Xj − X̄)2
n − 1 j=1

is called the sample variance (or the variance of the random sample).
• The statistic S (the positive square root of S 2 ) is called the
sample standard deviation (or the standard deviation of the random
sample).
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:18

Theorem 2.6. Suppose that X1, X2, . . . , Xn constitute a random


sample of size n from an infinite population which has the mean µ
and the variance σ 2. If S 2 is the sample variance, then
E(S 2) = σ 2.

Theorem 2.7. Suppose that X1, X2, . . . , Xn constitute a random


sample of size n from an infinite population which has the mean µ
and the variance σ 2. If S 2 is the sample variance, then
n
!
2 1 X
S = Xj2 − nX̄ 2 .
n − 1 j=1
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:19

Exercise 2.4. A random sample of size 9 yields the following


observations on the random variable X, the coal consumption in millions
of tons by electric utilities for a given year:
406, 395, 400, 450, 390, 410, 415, 401, 408.
Using the results of the exercise on page 7, find the sample standard
deviation of these data.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:20

2.4. Sampling Distributions from a Normal


Population
2.4.1. Independence
Theorem 2.8. If X̄ and S 2 are the mean and the variance of a random
sample from a normal population, then X̄ and S 2 are independent.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:21

2.4.2. The Sampling Distribution of the Mean


Theorem 2.9. If X̄ is the mean of a random sample of size n from
the normal population with the mean µ and the variance σ 2, then the
standardized sample mean
X̄ − µ
Z= √
σ/ n
has the standard normal distribution, no matter how small the size
of the sample.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:22

Exercise 2.5. An electrical firm manufactures light bulbs that have a


length of life that is approximately normally distributed, with mean
equal to 800 hours and a standard deviation of 40 hours.
Find the probability that a random sample of 16 bulbs will have an
average life of less than 775 hours.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:23

2.4.3. The t Distribution and its Applications in


Sampling
The t Distribution
• A random variable T has the t distribution (also called the Student-t
distribution or the Student’s-t distribution) with ν degrees of
freedom, and it is referred to as a t random variable, if and only if
its probability density function is
ν+1
  −(ν+1)/2
1 Γ 2 t2
f (t) = √ · ν
 · 1+
πν Γ 2 ν
where Z ∞
Γ(α) = y α−1 e−y dy for α > 0
0
for −∞ < t < ∞, where ν is a positive integer.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:24

• Since the t distribution arises in many important applications,


integrals of its probability density function have been extensively
tabulated.
• Let tα,ν be such that the area to its right under the curve of the
t distribution with ν degrees of freedom is equal to α. That is, tα,ν
is such that P (T ≥ tα,ν ) = α.

The tα,ν notation


Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:25

• Since the probability density function of the t distribution is


symmetrical about t = 0, thus
−tα,ν = t1−α,ν and P (T < −tα,ν ) = α.

Example 2.6.
t0.05,10 = 1.812 and t0.95,10 = −1.812.

Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:26

Example 2.7. Find P (−t0.025 < T < t0.05), for any degrees of
freedom ν.
Solution: For any degrees of freedom ν, we have
P (T > −t0.025) = P (T > t1−0.025) = 1 − 0.025 = 0.975
P (T > t0.05) = 0.05
Thus
P (−t0.025 < T < t0.05) = P (T > −t0.025) − P (T > t0.05)
= 0.925

Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:27

Applications of the t Distribution in Sampling


Theorem 2.10. If X̄ and S are the mean and the standard deviation
of a random sample of size n from a normal population with the mean µ
(but unknown variance), then the statistic
X̄ − µ
T = √
S/ n
has the t distribution with (n − 1) degrees of freedom.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:28

Theorem 2.11. The t distribution with ν degrees of freedom


approaches the standard normal distribution as ν → ∞.

Remark: The standard normal distribution provides a good approxi-


mation to the t distribution for samples of size n ≥ 30.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:29

Example 2.8. In 16 one-hour test runs, the gasoline consumption of an


engine averaged 16.4 liters with a standard deviation of 2.1 liters.
Assume that the distribution of gasoline consumption is approxi-
mately normal. Test the claim that the average gasoline consumption
of this engine is 12.0 liters per hour.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:30

Solution: Degrees of freedom = n − 1 = 15.


Now, the value of the statistic is
x̄ − µ 16.4 − 12.0
t= √ = √ = 8.38
s/ n 2.1/ 16
Since
P (T > t0.005 = 2.947) = 0.005
thus
P (T > 8.38) → 0.
Therefore, it would seem reasonable to conclude that the true average
hourly gasoline consumption of the engine exceeds 12.0 liters. ✷
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:31

Exercise 2.9. A chemical engineer claims that the population mean


yield of a certain batch process is 500 grams per milliliter of raw
material.
To check this claim he samples 25 batches each month.
If the computed t-value falls between −t0.025 and t0.025, he is satisfied
with his claim.
Assuming the distribution of yields to be approximately normal, what
conclusion should he draw from a sample that has a mean 518 grams
per milliliter and a sample standard deviation 40 grams?
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:32

2.4.4. The Chi-Square Distribution and its


Applications in Sampling
The Chi-Square Distribution
• A random variable χ2 has a chi-square distribution, and it is referred
to as a chi-square random variable, if and only if its probability
density function is
f (χ2) = chi-square(χ 2
; ν)
1

ν−2 −χ2 /2
 χ e , for χ2 > 0,
= 2ν/2 · Γ(ν/2)
 0, elsewhere,
where ν is a positive integer.
• The parameter ν is referred to as the number of degrees of freedom,
or simply the degrees of freedom.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:33

• Since the chi-square distribution arises in many important ap-


plications, integrals of its probability density function have been
extensively tabulated.
• Let χ2α,ν be such that the area to its right under the curve of the
chi-square distribution with ν degrees of freedom is equal to α.
That is, χ2α,ν is such that P (χ2 ≥ χ2α,ν ) = α.

The χ2α,ν notation


Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:34

Example 2.10.
χ20.05,10 = 18.307 and χ20.95,10 = 3.940.

Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:35

Applications of the Chi-Square Distribution in Sampling


Theorem 2.12. If S 2 is the variance of a random sample of size n
from a normal population with the variance σ 2, then the statistic

2 (n − 1)S 2
χ =
σ2
has the chi-square distribution with (n − 1) degrees of freedom.
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:36

Exercise 2.11. An optical firm purchases glass to be ground into lenses,


and it is known from past experience that the variance of the refractive
index of this kind of glass is 1.26 × 10−4.
As it is important that the various pieces of glass have nearly the
same index of refraction, the firm rejects such a shipment if the sample
variance of 20 pieces selected at random exceeds 2.00 × 10−4.
Assuming that the sample values may be looked upon as a random
sample from a normal population, what is the probability that a
shipment will be rejected even though σ 2 = 1.26 × 10−4?
Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:37

2.4.5. The F Distribution and its Applications in


Sampling
The F Distribution
• A random variable X has an F distribution, and it is referred to as
an F random variable, if and only if its probability density function
is
 ν + ν   ν ν1 /2

 1 2 1 ν/2−1

 Γ x
2 ν2


(ν1 +ν2 )/2 , for x > 0,

f (x) = ν  ν   
1 2 ν1
 Γ Γ x+1
2 2 ν

2



0, elsewhere,

where ν1 and ν2 are positive integers.


Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:38

• The parameters ν1 and ν2 are referred as the degrees of freedom for


numerator and denominator , respectively.
• In view of its importance, the F distribution has been tabulated
extensively.
• Let Fα,ν1 ,ν2 be such that the area to its right under the curve of
the F distribution with ν1 and ν2 degrees of freedom is equal to α.
That is Fα,ν1,ν2 is such that P (F ≥ Fα,ν1,ν2 ) = α.

The Fα,ν1 ,ν2 notation


Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:39

• F1−α,ν1 ,ν2 = 1/Fα,ν2,ν1

Example 2.12. Find the value of F0.95 for ν1 = 10 and ν2 = 20 degrees


of freedom.
Solution:
1 1
F0.95,10,20 = ≈ ≈ 0.361
F0.05,20,10 2.77

Chi-Kong Ng, ENGG2780B, Dept. of SEEM, CUHK 2:40

Applications of the F Distribution in Sampling


Theorem 2.13. If S12 and S22 are the variances of independent
random samples of size n1 and n2, respectively, taken from two normal
populations with the variances σ12 and σ22, respectively, then the statistic
S12 /σ12
F = 2 2
S2 /σ2
has the F distribution with (n1 − 1) and (n2 − 1) degrees of freedom.

You might also like