You are on page 1of 28

Statistical Methods for Business & Economics

Sampling Distributions

Davide Raggi1

1 davide.raggi@unibo.it

March 26, 2013

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
1 / 28
Outline

▶ Random sampling
▶ The sampling average
▶ The Central Limit Theorem (CLT)
▶ Confidence Intervals

▶ Suggested readings:
▶ Chapter 4 of Agresti and Finlay (paragraphs 4.4 and 4.5)
▶ Chapter 5

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
2 / 28
Simple random sampling

▶ Suppose to extract a sample of size n from a population of


size N through a simple random sampling. Define Yi as the
characteristic associated to the i-th subject extracted. A priori
Yi is unknown, since we don’t know which subject is going to
enter the sample (since we pick each subject at random).
▶ Yi is thus a random variable. Yi , i = 1, . . . , N is a
sequence of random variables that are independent (since we
pick each subject independently with respect to the others).
▶ In a simple random sampling scheme, to Y1 we associate the
characteristic y1 of the first subject picked at random and
independently from the population. To Y2 we associate y2 ,
etc. etc. Then yi , i = 1, . . . , n is the sample.

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
3 / 28
Simple random sampling
Main Consequences:

▶ Suppose we aim at estimating some characteristics of the


population’s distribution. For instance, we are interested on
its expected value E[Y ] = µY . This distribution is usually
unknown. However, each subject in the sample belongs to
this distribution.
▶ All the subjects are independently picked form the population,
that is Yi is independent on Yj , ∀i, j
▶ All of the Yi come from the same population. Before
observing yi , we may expect that, on average, its
characteristic is likely close to the expected value of the
population. Formally, that means E[Yi ] = µY

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
4 / 28
Simple random sampling
Example:

▶ Suppose we are interested on estimating the expected GPA score from a


population of University student in the US.
▶ The dataset includes college grade point average (GPA), for a population
of 141 students. Of course, the expected value µGPA of this population
can be easily and exactly computed
∑141 GPAby averaging along all the 141
i
observations, i.e. by doing i=1 141
= 3.056737589.
▶ Suppose we do not know the GPA score of all the 141 students but still,
we aim at estimating the expected value of this population on the basis of
a sample of size n < 141. The first student’s answer is described by the
random variable Y1 . The second student’s answer is described by Y2 . The
ith student’s answer is described by Yi . Once the extraction is made, the
sample becomes a sequence of numbers, namely, y1 , y2 , . . . , yn . An
estimate of the expected value of the population could be

1∑
n

y= yi
n
i=1

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
5 / 28
Simple random sampling (cont.)
Example:

.5
0
1
.5
D
1.5
Density
2
2.5
3
3.5
C
4
College
ollege
.5
ensity GPA

2
1.5
Density
1.5
0

2 2.5 3 3.5 4
College GPA

Figure 1 : Histogram of the GPA population

▶ For instance, a possible sample of 10 observations picked at


random is (2.9, 3.5, 3.2, 2.2, 3.2, 2.8, 3.4, 3.0, 2.9, 3.3),
that delivers an average of 3.04.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
6 / 28
The sampling average

▶ A main consequence of the random sampling scheme is that


the sampling average is a random variable as well. In fact,
different samples deliver different averages ȳ . For instance,
four different samples picked at random from the GPA
population give
▶ Sample 1:
(3.3, 3.6, 2.7, 3.5, 3.4, 3.2, 3.2, 3.0, 2.7, 2.8) ⇒ ȳ = 3.14
▶ Sample 2:
(3.5, 2.7, 2.6, 2.9, 2.7, 2.9, 2.8, 3.6, 3.3, 2.3) ⇒ ȳ = 2.93
▶ Sample 3:
(3.6, 3.0, 3.0, 3.5, 2.8, 2.8, 3.0, 2.9, 3.2, 3.4) ⇒ ȳ = 3.12
▶ Sample 4:
(3.6, 3.1, 3.5, 2.8, 4.0, 3.5, 3.8, 3.1, 2.7, 3.6) ⇒ ȳ = 3.37

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
7 / 28
The sampling average (cont.)
▶ The random variable collecting all the possible averages
computed for all the possible samples of a given size n is
defined through capital letters as

1∑ n
Y = Yi
n i=1

▶ Since Y is a random object, it is characterized by its own


probability distribution that is usually known as sampling
distribution of the sampling average.
▶ In the GPA example, the number of different samples of size n
is
▶ If n = 1 the number of different samples is 141.
▶ If n = 10 the number of different samples is 6.1745 × 1014 .
▶ If n = 50 the number of different samples is 4.61612 × 1038 .

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
8 / 28
The sampling average (cont.)
1.1
8
6
4
2
0
Density
10
2.9
3
3.1
3.2
3.3
Sample
0 size n=50
10
8 6
Density
4
2
0

2.9 3 3.1 3.2 3.3


Sample size n=50

Figure 2 : Sampling distribution of Y obtained from 10,000 different


samples of size n = 50.

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
9 / 28
The sampling average
Why the sampling average?

▶ Is the sampling average a good strategy to estimate the


expected value of a distribution?
▶ Sampling average is a natural and intuitive estimator of the
expected value of a population. However this choice is
justified by its good properties.
▶ We first wander if, on average, the sampling average provides
a good approximation of the true expected value of the
population µY . If we compute the expected value of the
sampling average (with respect to all the possible samples
obtainable) we get
[ ]
1∑ 1∑
E [Y ] = E Yi = E [Yi ] = µY
n i n i

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
10 / 28
The sampling average (cont.)
Why the sampling average?

▶ The variance of the sampling average is


( n ) ( n )
1 ∑ 1 ∑
Var[Y ] = σȲ2 = Var Yi = 2 Var Yi
n i=1 n i=1

1 ∑∑
n n
= Cov(Yi , Yj )
n2 i=1 j=1

1 ∑ 1 ∑ ∑
n n n
= Var(Y i ) + Cov(Yi , Yj )
n2 i=1 n2 i=1
j=1,j̸=i

1 ∑ 2
n
σ2
= 2
σY = Y
n i=1 n

▶ The standard
√ 2 deviation of the sampling distribution, i.e.,
σY
σȲ = n is also known as standard error.

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
11 / 28
The sampling average
Summary of the results

▶ Y is said to be an unbiased estimator of µY , since


E(Y ) = µY . This property is not affected by the sample size
n.
▶ The sampling error reduces when n increases. In fact
σ2
Var[Y ] = σȲ2 = nY tends to 0 if n → ∞. This means that the
uncertainty about the estimate of E[Y ] is negligible
(disappear) for n big enough. This is the well known Law of
Large Numbers.
▶ As a main consequence, the distribution of Ȳ collapses to the
true population’s expected value E[Y ]. In this case the
estimator is said to be consistent.

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
12 / 28
The sampling average (cont.)
Summary of the results

.5
.5
1.5
2.5
3.5
s
D
2.6
2.8
3.4
3.6
8
6
4
2
2.9
3.2
3.3
1
5
0
Density
20
15
10
2.95
3
3.05
3.1
3.15
sample
ample
.5
.2
.1
0
.05
ensity size n=100
n=2
n=10
n=50

4
1 1.5 2

2 3
Density

Density
1
.5
0

0
2 2.5 3 3.5 4 2.6 2.8 3 3.2 3.4 3.6
sample size n=2 sample size n=10
4 6 8 10

20
10 15
Density

Density
5
2
0

0
2.9 3 3.1 3.2 3.3 2.95 3 3.05 3.1 3.15
sample size n=50 sample size n=100

Figure 3 : GPA example: Sampling distributions of Y obtained from


10,000 different samples of size n = 1, 10, 50 and 100. Note that the true
expected value of Y is 3.0567

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
13 / 28
The sampling average (cont.)
Summary of the results
Sample
1
5
0
Histograms
20
15
10
2
2.5
3
3.5
n
4
n=2
n=10
n=50
n=100
=2
0
.5 size n=50
20 n=100
n=2
n=10

Sample size n=100


15
Histograms
10

Sample size n=50

Sample size n=10


5

Sample size n=2


0

2 2.5 3 3.5 4

n=2 n=10
n=50 n=100

Figure 4 : GPA example: Overlapping sampling distributions of Y


obtained from 10,000 different samples of size n = 2, 10, 50 and 100.
Note that the true expected value of Y is 3.0567. Note that the
dispersion of these distributions is inversely proportional with n.

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
14 / 28
Central Limit Theorem

▶ Up to now we have derived the mean and the standard


deviation of Ȳ . However, it is possible to describe the whole
distribution of Ȳ . In fact it can be proved that the sampling
average is well approximated by a Gaussian random variable.
The approximation is accurate when the sample size is large.
Furthermore, the approximation does not depend on the
population’s distribution (see Figure 4.14 of the textbook p.
93)
▶ Central Limit Theorem: For random sampling with a large
sample size n, the distribution of the sample mean Ȳ is
approximately a normal distribution, i.e.,
( )
σ2
Ȳ ∼ N µY ,
n

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
15 / 28
Central Limit Theorem (cont.)

▶ In particular, if we consider the standardized distribution

(Ȳ − E [Ȳ ]) (Ȳ − µY ) √ (Ȳ − µY )


= √ = n ∼ N (0, 1).
σȲ σY / n σY

▶ Computer simulations on this topic can be implemented


through the applet available at
http://www.prenhall.com/agresti/applet_files/samplingdist.html

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
16 / 28
Central Limit Theorem
Example:

.6
.
Density
.5
.4
.3
.2
.1
-4
-2
0
2
S
4
Sample
2
1
ample size n=50
n=2
.6

.5
.4
.4

.3
Density

Density
.2
.2

.1
0

0
-4 -2 0 2 4 -4 -2 0 2 4
Sample size n=2 Sample size n=50

√ ( )
Figure 5 : GPA example: Sampling distribution of n Y −µ σ obtained
from 10,000 different samples of size n = 2, and n = 50. In red the
density of a Normal variable with mean 0 and variance 1.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
17 / 28
Confidence intervals

▶ To estimate the expected value of a given population the


sampling average Ȳ is an outstanding tool:
▶ It is unbiased;
▶ It is efficient;
▶ We know that the sampling distribution is Normal.
▶ However, in practical applications, a single number, i.e., a
point estimate, does not provide a sufficient set of
information for inferential purposes. In fact, it is always useful
to define a margin of error that summarizes the degree of
uncertainty associated to the estimate.

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
18 / 28
Confidence intervals (cont.)
▶ An interval estimate or confidence interval is an interval of
numbers around the point estimate, within which the
parameter value is expected to fall.
▶ The probability the true parameter is not included in the
interval is called error probability and it is indicated with the
greek letter α. The length of the interval depends on the
confidence level 1 − α, that measures the probability that the
true parameter is included in the interval. In general 1 − α is
taken to be 0.95 or 0.99.
▶ In a nutshell, we would like to obtain an interval that likely is
going to include the true parameter of the population µY
(with probability 1 − α).
▶ In general a confidence interval has the form

Point Estimate ± Margin of Error


. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
19 / 28
Confidence intervals (cont.)
▶ The key point to construct a confidence interval is, as usual,
the sampling distribution.
▶ As hypothesized before, the center of the interval is the
average ȳ . We need to evaluate the margin of error such that
the confidence level is (1 − α).
▶ Since Ȳ is gaussian, then a symmetric interval centered at ȳ
may be defined as
[ ]
ȳ − z α2 σȳ , ȳ + z α2 σȳ

in which z α2 is such that P(Z ≥ z α2 ) = α2 .


▶ We know from the properties of a Normal random variable
that the probability of an event depends just on its z − score,
i.e., P(Ȳ ≥ µY + zσȲ ) = P(Z ≥ z), in which Z = Ȳ σ−µY .

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
20 / 28
Confidence intervals (cont.)
Ȳ −µY
▶ We know that σȲ ∼ N (0, 1). This result implies that
( )
Ȳ − µY
P −z α2 ≤ ≤ z α2 = 1 − α.
σȲ

Note that z α2 is a number (z-score) such that


P(Y ≥ z α2 ) = P(Y ≤ −z α2 ) = α2 . In particular, given α, the
quantities ±z α2 can be obtained through the tables.
▶ Note that simple algebra allows to write
( )
Ȳ − µY ( )
P −z 2 ≤
α ≤z2
α = P −z α2 σȲ + Ȳ ≤ µY ≤ z α2 σȲ + Ȳ =
σȲ
= 1−α

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
21 / 28
Confidence intervals (cont.)
▶ In particular the interval defined as follows


|{z} ± z α2 σȲ
point estimate
| {z }
margin of error

guarantees that the (1 − α)% of the times the true parameter


µY is included.
▶ Note that in this case, lower and upper bounds of the
interval are random, since they depend on the sample
mean Ȳ (that actually is a random variable). The
meaning of this result is that, if we have the chance to get a
large number of samples, and each time we compute the
corresponding interval, i.e. ȳ ± z α2 σȲ , then we expect that
the (1 − α)% of times µY belongs to these intervals.

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
22 / 28
Confidence intervals (cont.)

▶ Issue: In general we do not know the exact value of σ and


then we cannot compute σȲ . For this reason the unknown σ
is replaced
√ by the estimated standard deviation
1 ∑n
s = n−1 i=1 (yi − ȳ )2 . Thus, a reasonable estimate of the
sampling mean is sȳ = √sn .
▶ Then a confidence interval for µY is given by

Confidence interval = [ȳ − z α2 × sȳ ; ȳ + z α2 × sȳ ]

▶ Computer simulations can be done at


http://www.prenhall.com/agresti/applet_files/meanci.html

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
23 / 28
Confidence Intervals
Graphical illustration

Figure 6 : Confidence Intervals: Summary of 1,000 intervals computed


from 1,000 different samples (the last 100 plotted) obtained through the
script provided above. The distribution of the population is assumed to
be Normal with mean 25, s.d.=10 and sample size=50.

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
24 / 28
Confidence Intervals
Example:

▶ Consider the GPA example and suppose we extract a sample


of size n = 50 (see the file gpa1.xls).
▶ The estimated average is ȳ = 3.118
▶ The standard deviation of this sample is s = 0.3035
▶ The standard error is sȳ = √sn = 0.0429
▶ If we consider a confidence interval at the 90% then z α2 = 1.64
and then the confidence interval is estimated as
[3.0476, 3.1884]
▶ If we consider a confidence interval at the 95% then z α2 = 1.96
and then the confidence interval is estimated as
[3.0338, 3.2021];
▶ If we consider a confidence interval at the 99% then z α2 = 2.57
and then the confidence interval is estimated as
[3.0077, 3.2283].

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
25 / 28
Example: Estimating a proportion

▶ To summarize categorical data it is useful to record the


proportions or percentages of observations in the categories.
The 2006 FLORIDA POLL conducted by Florida International
University asked Do you think it is appropriate for state
government to make laws restricting access to abortion?
▶ This poll consists on a sample of 1,200 subjects (see file
polls.xls). 396 subjects answered yes whereas 804 subjects
answered no. We define the following random variable Yi that
describes the answer of the i th subject, namely
{
1 if the answer is no with probability 1 − π
⇒ Yi =
0 if the answer is yes with probability π

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
26 / 28
Example: Estimating a proportion (cont.)
▶ Let π be the proportion of Floridians who would respond yes.
An estimate of π, let say π̂ is given by


1 1200 396
π̂ = (1 − yi ) = = 0.33
N i=1 1200

▶ Note that this proportion is an average over the variables


1 − Yi (that is Bernoulli).
▶ Since π̂ is a sampling average, it is an unbiased and efficient
estimator of the true proportion of the population π.
Furthermore, the Central Limit theorem holds, thus
σπ̂ ∼ N (0, 1).
π̂−π

▶ For these reasons a 95% confidence interval for π is given by

π̂ ± 1.96σπ̂
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
27 / 28
Example: Estimating a proportion (cont.)
▶ In order to compute the confidence interval, we need to
compute σπ̂ . Note first that Yi is a Bernoulli random variable
such that P(Yi = 1) = 1 − π and P(Yi = 0) = π.

▶ The standard deviation of Yi is thus σ = π(1 − π).
▶ A
√ naı̈ve estimate of σ on the basis of this sample is then
(0.33) × (1 − 0.33) and then the estimated standard error
of the sample proportion is

π̂(1 − π̂)
sπ̂ = = 0.0136.
n
▶ A 95% confidence interval is thus

π̂ ± 1.96sπ̂ = 0.33 ± 1.96(0.0136) = 0.33 ± 0.03

▶ This result suggests that the true proportion π ranges


between 30% and 36% with high probability (i.e., 95%).
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
28 / 28

You might also like