You are on page 1of 64

Econ6034-Econometrics and Business

Statistics

Topic 5: Sampling Distributions and Central Limit


Theorem

(Ref: Keller Chapters 9, 10)

1 / 64
Sampling Distribution of the Sample Mean

I Let a random variable X ∼ (µ, σ 2 ), and X1 , X2 , . . . , Xn be "n"


randomly selected values from the population distribution of X.

I The set of the n values is called a random sample of size n, and


the sample mean is defined as X̄ = ni=1 Xi /n.
P

I An estimator is a formula that is used to estimate an unknown


constant based on the values in a random sample.

For
Pn example, the sample mean, X̄, denotes the specific formula,
i=1 Xi /n, and it is used to estimate E(X) = µ. So,it is an
estimator of µ.

2 / 64
Sampling Distribution of the Sample Mean

I The value of an estimator is obtained by substituting the


observations included in a specific sample into the formula.
The value obtained (realized) is called an estimate.
For example, if a random sample of size 3 consists of 3, 1, and
5, the sample mean is 3, and the 3 is an estimate of µ.
As a sample is a subset (as opposed to the whole set) of the
population, the sample mean value from one sample will be
mostly different from the sample mean value from another even
if the samples are drawn from the same population and the
sample sizes are the same.

I Due to such sampling variation, an estimator is a variable that


has a probability distribution.The probability distribution of an
estimator is called the sampling distribution of the estimator.

3 / 64
Sampling distribution of the sample mean

If X∼ N (µ, σ 2 ), then the sampling distribution of the sample


mean based on random samples of size n is given by

X̄ ∼ N (µ, σ 2 /n).

I The sample mean can be written as

1
X̄ = (X1 + X2 + · · · + Xn )
n

I So, X̄ is a linear combination of Xi that are independently and


identically distributed as (i.i.d.) N(µ, σ 2 ). So, X̄ is also
normally distributed.

4 / 64
Sampling distribution of the sample mean

E(X) = n1 E (X1 + X2 + . . . + Xn ) = n1 nµ = µ

1 1 σ2
Var(X̄) = n2
Var (X1 + X2 + . . . + Xn ) = n2
nσ 2 = n

If X̄ ∼ N µ, σ 2 /n , then it can be standardized as




X−µ X−µ
p = √ = Z ∼ N(0, 1)
2
σ /n σ/ n

5 / 64
Example

Example:
Assume that annual percentage revenue increases for all Australian
companies are normally distributed with mean 12.2% and standard
deviation 3.6%. A random sample of 9 observations from this
population is taken. What is the probability that the sample mean
will be less than 10%?

Let X be the random variable representing annual percentage


revenue increases of Australian companies.

X ∼ N(12.2, 3.62 ).

The sampling distribution of X̄ is then given by

X̄ ∼ N(12.2, 3.62 /9).

6 / 64
Example cont.

We calculate the required probability using the standardized variable


and the cumulative Standard Normal Distribution:
 
10 − 12.2 
P (X̄ < 10) = P z < 3.6√
9
= P (Z < −1.83)
= 0.0336

We conclude that the probability of the sample mean being less


than 10% is 0.0336.

7 / 64
Notes

I In the cases so far, the sample means have been normally


distributed because the populations from which the samples
were drawn were normal.

I However, even if the population distribution is not normal or


unknown, we can still apply the standardized normal
distribution in computing probabilities on X provided the
sample size is large.

8 / 64
Central Limit Theorem (CLT)

Central Limit Theorem (CLT)


If X is a random variable with a mean µ and variance σ 2 ,
then the sample means from the population distribution of X
will follow approximately a normal distribution,
 2

X̄ ∼ N µ, σn

when the sample size (n) is sufficiently large.

9 / 64
Central Limit Theorem (CLT)

I This property is very important!

No matter what distribution we are dealing with, we know that


the sample means will be normally distributed.

We can use the Z-Tables to derive probabilities!

This means that when n is sufficiently large

X−µ

σ/ n
= Z ∼ N(0, 1)

10 / 64
How large does n need to be?

It depends on the original distribution of X.

I If X has a normal distribution, then the sample mean has a


normal distribution for all sample sizes.

I If X has a distribution that is close to normal, the


approximation is good for small sample sizes (e.g. n ≥20).

I If X has a distribution that is far from normal, the


approximation requires larger sample sizes (e.g. n ≥ 50).

11 / 64
Example 1

Example
Consider the marks of all students who did an economics test. If
marks are normally distributed, with mean=72 and standard
deviation=9.

Let X be the mark of a randomly selected student

X ∼ N (µ = 72, σ 2 = 92 )

12 / 64
Example 1

a) The probability that any one student will have a mark over 78.
 
X−µ 78−72
− Want to find P(X>78). P (X > 78) = P σ > 9

 
6
=P Z> 9

= P (Z > 0.67)

= 1 − P (Z ≤ 0.67)

= 0.2514

13 / 64
Example 1

There is a 25.14% chance of a randomly selected student having a


mark over 78.
14 / 64
Example1
b) The probability that a sample of 10 students will have an
average mark over 78

− X is the mark of a randomly selected student.

− X ∼ N (µ = 72, σ 2 = 92 )

− By CLT,
σ2 92
X̄ ∼ N (µ = 72, = )
n 10
→ P(X̄>78)

 
X−µ 78−72
P(X > 78) = P √
σ/ n
> √
9/ 10
= P(Z > 2.11)
= 1 − P(Z < 2.11)

15 / 64
Example1

1-P(Z<2.11)
=1-0.9826
=0.0174

There is a 1.74% chance that a sample of 10 students will have an


average mark over 78.

16 / 64
Example 2

A firm manufactures light bulbs that have a length of life with a


mean of 800 hours and a standard deviation of 40 hours.

Assuming that the sample is large enough for the sample mean to
follow a normal distribution, find the probability that a random
sample of 16 bulbs will have an average life of less than 775 hours.

17 / 64
Let X be the life of a randomly selected light bulb.
 2

402
X ∼ (µ = 800, σ 2 = 402 ) X̄ ∼ N µ = 800, σn = 16

!
X−µ 775 − 800
P(X < 775) = P √ < √
σ/ n 40/ 16
= P(Z < −2.5)
= 0.0062
There is a 0.62% chance that the sample mean will be less than 775
hours.

18 / 64
Sampling Distribution of the Sample Proportion

I In Topic 4 we discussed the Binomial Distribution


I We found that the mean and variance of a Binomial Variable X
are:

µ = E(X) = np

σ 2 = var(X) = np(1 − P )

where n is the number of trials and p is the probability of


success in each trial.

19 / 64
Sampling Distribution of the Sample Proportion

I The sample proportion is given by:


Pn
p̂ = i=1 Xi /n

E.g. if Xi =1 when the ith vote is “Yes”, and 60 people vote


“Yes” in a poll of 100 people then p̂ = 0.6.

I p̂ is an estimator of the true, unknown proportion p for the


whole population.
I Thus, the CLT applies when n is large.That is, if n is large,

p(1 − p)
 
p̂ approx. ∼ N p,
n

20 / 64
Properties of the sample proportion

I when the sample size is large, the distribution of the random


variable

p̂ − p
q ≈ Z ∼ N (0, 1)
p(1−p)
n

I Even if the true p is unknown, for large samples:

s s
p(1 − p) p̂(1 − p̂)

n n

21 / 64
Example

Suppose that the true proportion of the population of residents who


are aged 65 or over in a particular council area is 44%. What is the
probability that the sample proportion of 200 randomly selected and
surveyed people from the area is more than 50%?
Solution
From CLT, n islarge so p̂ isapproximately normally distributed as
p(1 − p)
p̂ approx. ∼ N p,
n

p is give, so
p̂ ∼N(o.44, o.44(1-0.44)/200)=N(0.44, 0.001232)

Therefore, P (p̂ > 0.5) = P (Z > √0.5−0.44


0.001232
)
P (Z > 1.7094) = 1 − P (Z < 1.7094) = 1 − 0.9563 = 0.0437
i.e. the probability is 4.37%

22 / 64
Statistical Inference

The work we have done so far to understand sampling distributions


is important because it is a key building block for some tools which
will allow us to draw some well-informed conclusions about the
value of population parameters, which cannot be measured with
100% certainty.

I In the following sections we will learn about:-

Statistical Inference
Point Estimators
Interval Estimators

23 / 64
What is Statistical Inference?
Statistical Inference refers to the problem of determining the
behavior of a large population by studying a small sample from that
population.

We use information from a


SAMPLE to answer
questions /make
inferences /reach
conclusions about a
target POPULATION

24 / 64
What is Statistical Inference?

I We use sample statistics to make inferences about population


parameters.

I Thus, we can acquire information from samples and use it to


draw conclusions about the larger population from which it was
drawn!

I There are two types of inference:

Estimation
Hypothesis testing

25 / 64
Estimation

I The objective of estimation is to determine the approximate


value of a population parameter on the basis of a sample
statistic.

I E.g., the sample mean X̄ is employed to estimate the


population mean µ. When we obtain a single value of X̄ from
a sample, say 12, then we say that 12 is an estimate of µ.

26 / 64
Estimator

I An estimator of a parameter is a random variable that is a


defined function of sample observations, designed to
approximate the parameter.

I There are two types of estimators

Point Estimator
Interval Estimator

27 / 64
Two types of Estimators

I Point estimator: we can compute the value of a estimator, and


use the value of the estimate as the measure of population
parameter.

Example: sample mean = 4 is a point estimate of the


population mean, µ.

I Interval estimator: draws inferences about a population by


estimating a parameter using an interval (range).

Example: we are 95% confident that the population mean lies


between 3.8 and 4.2.

28 / 64
Properties of Estimators 1: Linearity

I If an estimator is a linear function (or a linear combination) of


the sample observations of the random variable, the estimator
is called a linear estimator.

For example, the sample mean is a linear combination of the


sample observations:
n
1X 1
X̄ = Xi = (X1 + X2 + . . . + Xn )
n i=1 n
1 1 1
= X1 + X2 + . . . + Xn
n n n

29 / 64
Property 2: Unbiasedness

2. Unbiasedness
I An unbiased estimator of a population parameter is an
estimator whose expected value is equal to that parameter.
That is, estimator θ̂ of parameter θ is said to be unbiased if
E(θ̂) = θ.

Example: the sample mean X̄ is an unbiased estimator of the


population mean µ, since E(X̄) = µ .
Example: the sample variance S 2 is an unbiased estimator of
the population variance σ 2
If this is not true, then the estimator is said to be biased, and
the bias is
Bias (θ̂) = E(θ̂) − θ.

30 / 64
Property 3: Efficiency

I If there are two unbiased estimators of a parameter, the one


whose variance is smaller is said to be relatively efficient.

θ̂ is said to be more efficient than θ̃ if

V ar(θ̂) < V ar(θ̃)

Example:
Both the sample median and sample mean are unbiased estimators
of the population mean, however, the sample median has a greater
variance than the sample mean, so we choose sample mean since it
is relatively efficient when compared to the sample median.

31 / 64
Property 3: Efficiency

I If θ̂ is an unbiased estimator of θ and no other unbiased


estimator has smaller variance, then θ̂ is said to be the (most)
efficient or minimum variance unbiased estimator of θ.
I Example: consider the following two alternative estimators of µ:

X̄ = (X1 + X2 + X3 + X4 )/4
X̃ = (0.5X1 + X2 + X3 + 0.5X4 )/3

"Both estimators are unbiased because"


E(X̄ = µ) and E(X̃) = µ.
However, V ar(X̄) = 0.25σ 2 and V ar(X̃) = 0.278σ 2
So X̄ is more efficient.

32 / 64
Best Linear Unbiased Estimator

I An estimator is called best linear unbiased estimator


(BLUE) if it is linear, unbiased and efficient (best, i.e., it has
the smallest variance among the group of linear unbiased
estimators).

I It can be shown that the sample mean is in fact BLUE for the
population mean.

33 / 64
Property 4: Consistency
I In simple terms, an estimator is consistent if it converges on
the true value when the sample size is large.
I Convergence in probability: If the sampling distribution of
an estimator of θ, θ̂ , collapses onto a constant α as the sample
size increases indefinitely, θ̂ is said to converge in probability to
α. Thus, α is the probability limit of θ̂, and it is written as
p
plim θ̂ = α or θ̂ → α.
I If plim θ̂ = α = θ, then θ̂ is said to be a consistent estimator
of θ.
I It is possible for an estimator to be biased but consistent.
I E.g. the sample variance S 2 is a consistent estimator of σ 2
which is also unbiased. This implies that an alternative
estimator for σ 2 which is defined by (Xi − X̄)2 /n is also
P

consistent (since it will converge to the same value when n is


large), although it is biased.
34 / 64
Interval Estimation

I A point estimate does not carry any information about its


reliability.
For example, an estimate computed from a sample of 100
observations is much more reliable in general than an estimate
based on only 10 observations, but the point estimates will not
show such information. Put in other words, with only a point
estimate, we cannot answer such questions as "How confident
are we that the true parameter will lie between a and b?".

I An interval estimate provides the upper and lower boundaries


of the unknown value of a population parameter with certain
level of confidence. Since there is a certain level of confidence
associated with any interval estimate, the interval estimate is
commonly called a confidence interval.

35 / 64
Interval Estimation

Assume that A and B are two random variables (A < B) that satisfy

P (a < µ < b) = 1 − α

a and b are their realizations (observed values).


I The interval between a and b is called a 100(1- α)%
confidence interval.

I The quantity (1- α) is called the level of confidence of the


interval.

CI Interpretation: If the population were repeatedly sampled


infinite number of times, 100(1-α)% of the intervals calculated this
way will contain the true parameter µ.

36 / 64
Estimating µ when σ is known

How do we provide an interval estimator for µ when σ is known (and


the population is normally distributed, or the sample size is large)?

I From the last lecture, we know (Central Limit Theorem):


!
σ2 X−µ
X ∼ N µ, ⇒Z= √ ∼ N(0, 1)
n σ/ n

37 / 64
Estimating µ when σ is known
We know that P(-1.96<Z<1.96)=1-2*P(Z<-1.96)=95% because
P(Z<-1.96)=0.025.

38 / 64
Estimating µ when σ is known

I Therefore, for a standard normal distribution, 95% of the area


is contained between -1.96 and + 1.96, namely
P(-1.96<Z<1.96)=95%

39 / 64
Estimating µ when σ is known

I Thus
P(-1.96<Z<1.96)=0.95

!
X−µ
P −1.96 < √ < 1.96 = 0.95
σ/ n

σ σ
 
P X̄ − 1.96 √ < µ < X̄ + 1.96 √ = 0.95
n n

40 / 64
Estimating µ when σ is known

σ σ
 
P X̄ − 1.96 √ < µ < X̄ + 1.96 √ = 0.95
n n

σ σ
I [X̄ − 1.96 √ , X̄ + 1.96 √ ] is called a 95% confidence
n n
interval estimator for µ.
σ
I i.e. the interval is the range ±1.96 √ on either side of X̄.
n
I Interpretation of the Confidence Interval:
In repeated sampling, 95% of the intervals created in this way
would contain true population µ.

41 / 64
Example 1

Example: Suppose we know from experience that a random variable


X ∼ N (µ, 1.66),and for a sample of size 10 from this population,
the sample mean is 1.58.

I Determine the probability of losing money.


We know that

X̄ = 1.58, σ = 1.66 and n = 10

42 / 64
Example 1

I The 95% confidence interval can be calculated as


σ σ
 
P X − 1.96 √ < µ < X + 1.96 √ = 0.95
n n
√ √ !
1.66 1.66
P 1.58 − 1.96 ∗ √ < µ < 1.58 + 1.96 ∗ √ = 0.95
10 10
P(0.78 < µ < 2.38) = 0.95

I Lower Confidence Limit (LCL): 0.78


I Upper Confidence Limit (UCL): 2.38

I Interpretation:
If the experiment were carried out multiple times, 95% of the
intervals created in this way would contain true population µ.

43 / 64
Notation for Confidence Interval

I In general, a 1-α confidence interval estimator for µ is given by

σ σ
 
P X̄ − Zα/2 √ < µ < X̄ + Zα/2 √ =1−α
n n
σ
I Confidence interval estimator (CI): X̄ ± Zα/2 √
n
σ
Lower Confidence Limit (LCL): X̄ − Zα/2 √
n
σ
Upper Confidence Limit (UCL): X̄ + Zα/2 √
n

44 / 64
Notation for Confidence Interval

I 1-α is the confidence level, that is, the proportion of the values
of X̄ for which the interval
σ
X̄ ± Zα/2 √
n

includes the population mean.

If we want 95% confidence, choose α=0.05 (or 5%).


If we want 90% confidence, choose α=0.10 (or 10%).
If we want 99% confidence, choose α=0.01 (or 1%).

45 / 64
Notation for Confidence Interval
I What does Zα/2 mean?

We want to find the middle 1- α area of the standard normal


curve.
So the area left in each tail will be α/2.
Zα/2 is the point which marks off area of α/2 in the tail
Need to look up normal tables to find this!

I Z0.025 =1.96 for 95% CI


I Z0.05 =1.645 for 90% CI
I Z0.01 =2.33 for 98% CI
I Z0.005 =2.575for 99% CI

46 / 64
Factors affecting CI

Confidence Interval:
σ
X̄ ± Zα/2 √
n

I Confidence level: 90%, 95%, 99%.

I Population standard deviation: larger variation in the random


variable widens the interval.

I Sample size: as n gets bigger, the interval gets narrower.

47 / 64
IMPORTANT!

I Remember that the confidence interval estimator was derived


from the sampling distribution of the sample mean (CLT).
!
σ2 X−µ
X ∼ N µ, ⇒Z= √ ∼ N(0, 1)
n σ/ n

We used the sampling distribution to make probability


statements about the sample mean.
I It is the interval that changes from sample to sample.
I µ is a fixed (constant) value. It is either within the interval or
not.
I You should interpret a 95% confidence interval as saying “In
repeated sampling, 95% of such intervals created would contain
the true population mean”.

48 / 64
Example 2

Example
The average height of a sample of 25 men is found to be 178cm.
Assume that the standard deviation of male heights is known to be
10cm, and that heights follow a normal distribution. Find
a) A 95% confidence interval for the population mean height.

b) A 90% confidence interval for the population mean height.

49 / 64
Example 2

a) A 95% confidence interval for the population mean height.

σ 10
X̄ ± Z0.025 √ = 178 ± 1.96 √
n 25
= [174.08, 181.92]

I LCL: 174.08 and UCL: 181.92

I So, in repeated sampling, we would expect 95% of the intervals


created this way to contain µ, or there is a 95% chance our
interval contains µ.

50 / 64
Example 2

b) A 90% confidence interval for the population mean height.

σ 10
X̄ ± Z0.05 √ = 178 ± 1.645 √
n 25
= [174.71, 181.29]

I LCL: 174.71 and UCL: 181.29

I So, in repeated sampling, we would expect 90% of the intervals


created this way to contain µ, or there is a 90% chance our
interval contains µ.

51 / 64
Selecting the sample size

I Suppose that before we gather data, we know that we want to


get an average within a certain distance of the true population
value.

We can use the CLT to find the minimum sample size required
to meet this condition, if the standard deviation of the
population is known.

52 / 64
Example 3
Example
I time my morning bus trips to work, and get an average of 35
minutes. Assuming that the times are normally distributed and the
standard deviation of times is known to be 5 minutes, I want to
estimate the true population mean length to within 3 minutes, with
99% certainty. What is the sample size required?

I Define X=time spent on bus trip to work each morning.

X̄ = 35 and σ = 5

I Find a n such that

P (|X̄ − µ| < 3) = 99%


53 / 64
Example 3
I Standardise
!
|X̄ − µ| 3
P √ < √ = 99%
σ/ n σ/ n
3
 
P |Z| < √ = 99%
5/ n

I Solve for n
3
√ = 2.575
5/ n
n = 18.24 ≈ 19

I I need at least 19 estimates of my bus trip to derive a 99% C.I.


that is with 3 mins. of the population mean.
54 / 64
Estimating µ when σ is unknown (use Student’s
t-distribution)
I If we use the sample standard deviation s, for σ in the
standardization formula, it has been shown that the
standardized variable does not exactly follow the standardized
normal distribution, but a distribution known as the (Student’s)
t distribution.
σ2
I if X̄ ∼ N (µ, ), then
n
X̄ − µ
t= √ ∼ tn−1 .
s/ n

(n-1) is the number of degrees of freedom


s is the sample standard deviation of X, computed as the square
root of
n
1 X
s2 = (Xi − X̄)2
n − 1 i=1

55 / 64
Estimating µ when σ is unknown (use Student’s
t-distribution

I Remarks
The t distribution is very similar to the standardized normal
distribution except that its tails are fatter.
It is symmetric about zero.
For large degrees of freedom, the two distributions are virtually
identical.

t∞ = N (0, 1)
I This implies that if the t distribution (table or computer
program) is not available, the standardized normal distribution
can be used as a good approximation for cases of large samples
(say, d.f. > 30).

56 / 64
Estimating µ when σ is unknown (use Student’s
t-distribution
I When the population standard deviation is unknown for a
normally distributed variable, the appropriate distribution to get
a confidence interval for the population mean is the t
distribution with (n-1) degrees of freedom.
I That is, the 100(1-α)% confidence interval for µ is given by
s s
X̄ − tn−1, α2 √ < µ < X̄ + tn−1, α2 √
n n

I where tn−1, α is the cut-off point from the t distribution table


2
satisfying that
  α
P tn−1 > tn−1, α2 =
2

57 / 64
Example 4

A personnel manager for a new plant is attempting to determine a


salary scale for junior-level computer programmers in the area. He
takes a sample of 25 persons in the area employed in equivalent
positions and finds a sample mean of $21,000 and a standard
deviation of $5,000. A 95% confidence interval for the salary mean
is desired. Assume a normal population.

Let X=salary ($). Then, X ∼ N (µ, ?) and X̄ ∼ N (µ, ?).


The standardized variable

X̄ − µ
t= √ ∼ tn−1 .
s/ n

58 / 64
Example 4

I Thus, the 95% CI for the population mean is

s s
 
P X̄ − tn−1, α2 √ < µ < X̄ + tn−1, α2 √ = 0.95
n n

5000 5000
 
P X̄ − t24,0.025 √ < µ < X̄ + t24,0.025 √ = 0.95
25 25

5000 5000
 
P 21000 − 2.064 × √ < µ < 21000 + 2.064 × √ = 0.95
25 25

P (18.93 < µ < 23.06) = 0.95

59 / 64
Example 5

Assume that 9 randomly observed values from a population are as


follows.

63 63 68 71 42 49 59 55 59

Using this information, provide a 95% confidence interval for the


population mean. State clearly any assumption you make in order to
derive the result.
From the above data, we can compute the sample mean and
standard deviation:
X̄ = 529/9 = 58.78
S 2 = (31755 − 9 × 58.782 )/(9 − 1)) = 82.694
S = 9.09.

60 / 64
I To transform X̄ into a standardized normal when σ is known or
a t distribution variable when σ is unknown, we have to
confirm that X̄ follows a normal distribution first.
I Since we do not have any information on the shape of the
distribution of X, the only justification for us to approximate
the distribution of X̄ by a normal distribution would be the
central limit theorem.
I However, the sample size is only of 9 which is not large enough
to validate the theorem.
I So, the best we can do is to assume that the distribution of X is
normal, so that the sample mean which is a linear combination
of X follows a normal distribution. (This implies that the
results will be sensitive to the validity of this assumption.)
I Assume that
X ∼ N (µ, ?).Therefore, it follows thatX̄ ∼ N (µ, ?)

61 / 64
I Since the population standard deviation is unknown, we cannot
standardize X into a standardized normal variable. Instead, we
transform it into a t variable:

X̄ − µ
t= √ ∼ tn−1 .
s/ n
.
I In the current case, n=9. Hence the above statistic follows the
t distribution with 8 degrees of freedom.
I To get the 95% confidence interval, we have to find the cut-off
point that

P (−t8,0.025 < t8 < t8,0.025 ) = 0.95

X̄ − µ
P (−t8,0.025 < √ < t8,0.025 ) = 0.95
s/ 9
s s
P (X̄ − t8,0.025 √ < µ < X̄ + t8,0.025 √ ) = 0.95
9 9
62 / 64
Thus, substituting the observed s and the cut-off point from the
table gives the 95% confidence interval for the population mean as
9.09 9.09
58.78 − 2.306 √ < µ < 58.78 + 2.306 √
9 9

51.79 < µ < 65.77

63 / 64
Summary

I Sampling distribution of the sample mean


I Sampling distribution of the sample proportion
I Statistical Inference
I Point Estimator and Interval Estimator
I Properties of point estimator
Linearity, unbiasedness, efficiency, consistency
Best linear unbiased estimator (BLUE)
I Interval estimation/Confidence interval
estimating µ when σ is known
estimating µ when σ is unknown

64 / 64

You might also like