Statistical Methods For Business & Economics Sampling Distributions

Statistical Methods for Business & Economics
Sampling Distributions
Davide Raggi1
1 davide.raggi@unibo.it
March 26, 2013
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
1 / 28
Outline
▶ Random sampling
▶ The sampling average
▶ The Central Limit Theorem (CLT)
▶ Confidence Intervals
▶ Suggested readings:
▶ Chapter 4 of Agresti and Finlay (paragraphs 4.4 and 4.5)
▶ Chapter 5
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
2 / 28
Simple random sampling
▶ Suppose to extract a sample of size n from a population of

size N through a simple random sampling. Define Yi as the
characteristic associated to the i-th subject extracted. A priori
Yi is unknown, since we don’t know which subject is going to
enter the sample (since we pick each subject at random).
▶ Yi is thus a random variable. Yi , i = 1, . . . , N is a
sequence of random variables that are independent (since we
pick each subject independently with respect to the others).
▶ In a simple random sampling scheme, to Y1 we associate the
characteristic y1 of the first subject picked at random and
independently from the population. To Y2 we associate y2 ,
etc. etc. Then yi , i = 1, . . . , n is the sample.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
3 / 28
Main Consequences:
▶ Suppose we aim at estimating some characteristics of the

population’s distribution. For instance, we are interested on
its expected value E[Y ] = µY . This distribution is usually
unknown. However, each subject in the sample belongs to
this distribution.
▶ All the subjects are independently picked form the population,
that is Yi is independent on Yj , ∀i, j
▶ All of the Yi come from the same population. Before
observing yi , we may expect that, on average, its
characteristic is likely close to the expected value of the
population. Formally, that means E[Yi ] = µY
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
4 / 28
Example:
▶ Suppose we are interested on estimating the expected GPA score from a

population of University student in the US.
▶ The dataset includes college grade point average (GPA), for a population
of 141 students. Of course, the expected value µGPA of this population
can be easily and exactly computed
∑141 GPAby averaging along all the 141
i
observations, i.e. by doing i=1 141
= 3.056737589.
▶ Suppose we do not know the GPA score of all the 141 students but still,
we aim at estimating the expected value of this population on the basis of
a sample of size n < 141. The first student’s answer is described by the
random variable Y1 . The second student’s answer is described by Y2 . The
ith student’s answer is described by Yi . Once the extraction is made, the
sample becomes a sequence of numbers, namely, y1 , y2 , . . . , yn . An
estimate of the expected value of the population could be
1∑
n
y= yi
n
i=1
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
5 / 28
Simple random sampling (cont.)
Example:
.5
0
1
.5
D
1.5
Density
2
2.5
3
3.5
C
4
College
ollege
.5
ensity GPA
2
1.5
Density
1.5
0
2 2.5 3 3.5 4
College GPA
Figure 1 : Histogram of the GPA population
▶ For instance, a possible sample of 10 observations picked at

random is (2.9, 3.5, 3.2, 2.2, 3.2, 2.8, 3.4, 3.0, 2.9, 3.3),
that delivers an average of 3.04.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
6 / 28
The sampling average
▶ A main consequence of the random sampling scheme is that

the sampling average is a random variable as well. In fact,
different samples deliver different averages ȳ . For instance,
four different samples picked at random from the GPA
population give
▶ Sample 1:
(3.3, 3.6, 2.7, 3.5, 3.4, 3.2, 3.2, 3.0, 2.7, 2.8) ⇒ ȳ = 3.14
▶ Sample 2:
(3.5, 2.7, 2.6, 2.9, 2.7, 2.9, 2.8, 3.6, 3.3, 2.3) ⇒ ȳ = 2.93
▶ Sample 3:
(3.6, 3.0, 3.0, 3.5, 2.8, 2.8, 3.0, 2.9, 3.2, 3.4) ⇒ ȳ = 3.12
▶ Sample 4:
(3.6, 3.1, 3.5, 2.8, 4.0, 3.5, 3.8, 3.1, 2.7, 3.6) ⇒ ȳ = 3.37
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
7 / 28
The sampling average (cont.)
▶ The random variable collecting all the possible averages
computed for all the possible samples of a given size n is
defined through capital letters as
1∑ n
Y = Yi
n i=1
▶ Since Y is a random object, it is characterized by its own

probability distribution that is usually known as sampling
distribution of the sampling average.
▶ In the GPA example, the number of different samples of size n
is
▶ If n = 1 the number of different samples is 141.
▶ If n = 10 the number of different samples is 6.1745 × 1014 .
▶ If n = 50 the number of different samples is 4.61612 × 1038 .
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
8 / 28
1.1
8
6
4
2
0
Density
10
2.9
3
3.1
3.2
3.3
Sample
0 size n=50
10
8 6
Density
4
2
0
2.9 3 3.1 3.2 3.3

Sample size n=50
Figure 2 : Sampling distribution of Y obtained from 10,000 different

samples of size n = 50.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
9 / 28
Why the sampling average?
▶ Is the sampling average a good strategy to estimate the

expected value of a distribution?
▶ Sampling average is a natural and intuitive estimator of the
expected value of a population. However this choice is
justified by its good properties.
▶ We first wander if, on average, the sampling average provides
a good approximation of the true expected value of the
population µY . If we compute the expected value of the
sampling average (with respect to all the possible samples
obtainable) we get
[ ]
1∑ 1∑
E [Y ] = E Yi = E [Yi ] = µY
n i n i
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
10 / 28
Why the sampling average?
▶ The variance of the sampling average is

( n ) ( n )
1 ∑ 1 ∑
Var[Y ] = σȲ2 = Var Yi = 2 Var Yi
n i=1 n i=1
1 ∑∑
n n
= Cov(Yi , Yj )
n2 i=1 j=1
1 ∑ 1 ∑ ∑
n n n
= Var(Y i ) + Cov(Yi , Yj )
n2 i=1 n2 i=1
j=1,j̸=i
1 ∑ 2
n
σ2
= 2
σY = Y
n i=1 n
▶ The standard
√ 2 deviation of the sampling distribution, i.e.,
σY
σȲ = n is also known as standard error.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
11 / 28
Summary of the results
▶ Y is said to be an unbiased estimator of µY , since

E(Y ) = µY . This property is not affected by the sample size
n.
▶ The sampling error reduces when n increases. In fact
σ2
Var[Y ] = σȲ2 = nY tends to 0 if n → ∞. This means that the
uncertainty about the estimate of E[Y ] is negligible
(disappear) for n big enough. This is the well known Law of
Large Numbers.
▶ As a main consequence, the distribution of Ȳ collapses to the
true population’s expected value E[Y ]. In this case the
estimator is said to be consistent.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
12 / 28
.5
.5
1.5
2.5
3.5
s
D
2.6
2.8
3.4
3.6
8
6
4
2
2.9
3.2
3.3
1
5
0
Density
20
15
10
2.95
3
3.05
3.1
3.15
sample
ample
.5
.2
.1
0
.05
ensity size n=100
n=2
n=10
n=50
4
1 1.5 2
2 3
Density
Density
1
.5
0
0
2 2.5 3 3.5 4 2.6 2.8 3 3.2 3.4 3.6
sample size n=2 sample size n=10
4 6 8 10
20
10 15
Density
Density
5
2
0
0
2.9 3 3.1 3.2 3.3 2.95 3 3.05 3.1 3.15
sample size n=50 sample size n=100
Figure 3 : GPA example: Sampling distributions of Y obtained from

10,000 different samples of size n = 1, 10, 50 and 100. Note that the true
expected value of Y is 3.0567
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
13 / 28
Sample
1
5
0
Histograms
20
15
10
2
2.5
3
3.5
n
4
n=2
n=10
n=50
n=100
=2
0
.5 size n=50
20 n=100
n=2
n=10
Sample size n=100

15
Histograms
10
Sample size n=50
Sample size n=10

5
Sample size n=2

0
2 2.5 3 3.5 4
n=2 n=10
n=50 n=100
Figure 4 : GPA example: Overlapping sampling distributions of Y

obtained from 10,000 different samples of size n = 2, 10, 50 and 100.
Note that the true expected value of Y is 3.0567. Note that the
dispersion of these distributions is inversely proportional with n.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
14 / 28
Central Limit Theorem
▶ Up to now we have derived the mean and the standard

deviation of Ȳ . However, it is possible to describe the whole
distribution of Ȳ . In fact it can be proved that the sampling
average is well approximated by a Gaussian random variable.
The approximation is accurate when the sample size is large.
Furthermore, the approximation does not depend on the
population’s distribution (see Figure 4.14 of the textbook p.
93)
▶ Central Limit Theorem: For random sampling with a large
sample size n, the distribution of the sample mean Ȳ is
approximately a normal distribution, i.e.,
( )
σ2
Ȳ ∼ N µY ,
n
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
15 / 28
Central Limit Theorem (cont.)
▶ In particular, if we consider the standardized distribution
(Ȳ − E [Ȳ ]) (Ȳ − µY ) √ (Ȳ − µY )

= √ = n ∼ N (0, 1).
σȲ σY / n σY
▶ Computer simulations on this topic can be implemented

through the applet available at
http://www.prenhall.com/agresti/applet_files/samplingdist.html
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
16 / 28
Central Limit Theorem
Example:
.6
.
Density
.5
.4
.3
.2
.1
-4
-2
0
2
S
4
Sample
2
1
ample size n=50
n=2
.6
.5
.4
.4
.3
Density
Density
.2
.2
.1
0
0
-4 -2 0 2 4 -4 -2 0 2 4
Sample size n=2 Sample size n=50
√ ( )
Figure 5 : GPA example: Sampling distribution of n Y −µ σ obtained
from 10,000 different samples of size n = 2, and n = 50. In red the
density of a Normal variable with mean 0 and variance 1.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
17 / 28
Confidence intervals
▶ To estimate the expected value of a given population the

sampling average Ȳ is an outstanding tool:
▶ It is unbiased;
▶ It is efficient;
▶ We know that the sampling distribution is Normal.
▶ However, in practical applications, a single number, i.e., a
point estimate, does not provide a sufficient set of
information for inferential purposes. In fact, it is always useful
to define a margin of error that summarizes the degree of
uncertainty associated to the estimate.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
18 / 28
Confidence intervals (cont.)
▶ An interval estimate or confidence interval is an interval of
numbers around the point estimate, within which the
parameter value is expected to fall.
▶ The probability the true parameter is not included in the
interval is called error probability and it is indicated with the
greek letter α. The length of the interval depends on the
confidence level 1 − α, that measures the probability that the
true parameter is included in the interval. In general 1 − α is
taken to be 0.95 or 0.99.
▶ In a nutshell, we would like to obtain an interval that likely is
going to include the true parameter of the population µY
(with probability 1 − α).
▶ In general a confidence interval has the form
Point Estimate ± Margin of Error

. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
19 / 28
▶ The key point to construct a confidence interval is, as usual,
the sampling distribution.
▶ As hypothesized before, the center of the interval is the
average ȳ . We need to evaluate the margin of error such that
the confidence level is (1 − α).
▶ Since Ȳ is gaussian, then a symmetric interval centered at ȳ
may be defined as
[ ]
ȳ − z α2 σȳ , ȳ + z α2 σȳ
in which z α2 is such that P(Z ≥ z α2 ) = α2 .

▶ We know from the properties of a Normal random variable
that the probability of an event depends just on its z − score,
i.e., P(Ȳ ≥ µY + zσȲ ) = P(Z ≥ z), in which Z = Ȳ σ−µY .
Ȳ
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
20 / 28
Ȳ −µY
▶ We know that σȲ ∼ N (0, 1). This result implies that
( )
Ȳ − µY
P −z α2 ≤ ≤ z α2 = 1 − α.
σȲ
Note that z α2 is a number (z-score) such that

P(Y ≥ z α2 ) = P(Y ≤ −z α2 ) = α2 . In particular, given α, the
quantities ±z α2 can be obtained through the tables.
▶ Note that simple algebra allows to write
( )
Ȳ − µY ( )
P −z 2 ≤
α ≤z2
α = P −z α2 σȲ + Ȳ ≤ µY ≤ z α2 σȲ + Ȳ =
σȲ
= 1−α
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
21 / 28
▶ In particular the interval defined as follows
Ȳ
|{z} ± z α2 σȲ
point estimate
| {z }
margin of error
guarantees that the (1 − α)% of the times the true parameter

µY is included.
▶ Note that in this case, lower and upper bounds of the
interval are random, since they depend on the sample
mean Ȳ (that actually is a random variable). The
meaning of this result is that, if we have the chance to get a
large number of samples, and each time we compute the
corresponding interval, i.e. ȳ ± z α2 σȲ , then we expect that
the (1 − α)% of times µY belongs to these intervals.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
22 / 28
▶ Issue: In general we do not know the exact value of σ and

then we cannot compute σȲ . For this reason the unknown σ
is replaced
√ by the estimated standard deviation
1 ∑n
s = n−1 i=1 (yi − ȳ )2 . Thus, a reasonable estimate of the
sampling mean is sȳ = √sn .
▶ Then a confidence interval for µY is given by
Confidence interval = [ȳ − z α2 × sȳ ; ȳ + z α2 × sȳ ]
▶ Computer simulations can be done at

http://www.prenhall.com/agresti/applet_files/meanci.html
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
23 / 28
Confidence Intervals
Graphical illustration
Figure 6 : Confidence Intervals: Summary of 1,000 intervals computed

from 1,000 different samples (the last 100 plotted) obtained through the
script provided above. The distribution of the population is assumed to
be Normal with mean 25, s.d.=10 and sample size=50.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
24 / 28
Confidence Intervals
Example:
▶ Consider the GPA example and suppose we extract a sample

of size n = 50 (see the file gpa1.xls).
▶ The estimated average is ȳ = 3.118
▶ The standard deviation of this sample is s = 0.3035
▶ The standard error is sȳ = √sn = 0.0429
▶ If we consider a confidence interval at the 90% then z α2 = 1.64
and then the confidence interval is estimated as
[3.0476, 3.1884]
[3.0338, 3.2021];
[3.0077, 3.2283].
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
25 / 28
Example: Estimating a proportion
▶ To summarize categorical data it is useful to record the

proportions or percentages of observations in the categories.
The 2006 FLORIDA POLL conducted by Florida International
University asked Do you think it is appropriate for state
government to make laws restricting access to abortion?
▶ This poll consists on a sample of 1,200 subjects (see file
polls.xls). 396 subjects answered yes whereas 804 subjects
answered no. We define the following random variable Yi that
describes the answer of the i th subject, namely
{
1 if the answer is no with probability 1 − π
⇒ Yi =
0 if the answer is yes with probability π
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
26 / 28
Example: Estimating a proportion (cont.)
▶ Let π be the proportion of Floridians who would respond yes.
An estimate of π, let say π̂ is given by
∑
1 1200 396
π̂ = (1 − yi ) = = 0.33
N i=1 1200
▶ Note that this proportion is an average over the variables

1 − Yi (that is Bernoulli).
▶ Since π̂ is a sampling average, it is an unbiased and efficient
estimator of the true proportion of the population π.
Furthermore, the Central Limit theorem holds, thus
σπ̂ ∼ N (0, 1).
π̂−π
▶ For these reasons a 95% confidence interval for π is given by
π̂ ± 1.96σπ̂
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
27 / 28
Example: Estimating a proportion (cont.)
▶ In order to compute the confidence interval, we need to
compute σπ̂ . Note first that Yi is a Bernoulli random variable
such that P(Yi = 1) = 1 − π and P(Yi = 0) = π.
√
▶ The standard deviation of Yi is thus σ = π(1 − π).
▶ A
√ naı̈ve estimate of σ on the basis of this sample is then
(0.33) × (1 − 0.33) and then the estimated standard error
of the sample proportion is
√
π̂(1 − π̂)
sπ̂ = = 0.0136.
n
▶ A 95% confidence interval is thus
π̂ ± 1.96sπ̂ = 0.33 ± 1.96(0.0136) = 0.33 ± 0.03
▶ This result suggests that the true proportion π ranges

between 30% and 36% with high probability (i.e., 95%).
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
28 / 28

Statistical Methods For Business & Economics Sampling Distributions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Methods For Business & Economics Sampling Distributions

Uploaded by

Copyright:

Available Formats

Statistical Methods for Business & Economics

March 26, 2013

▶ Suppose to extract a sample of size n from a population of

▶ Suppose we aim at estimating some characteristics of the

▶ Suppose we are interested on estimating the expected GPA score from a

Figure 1 : Histogram of the GPA population

▶ For instance, a possible sample of 10 observations picked at

▶ A main consequence of the random sampling scheme is that

▶ Since Y is a random object, it is characterized by its own

2.9 3 3.1 3.2 3.3

Figure 2 : Sampling distribution of Y obtained from 10,000 different

▶ Is the sampling average a good strategy to estimate the

▶ The variance of the sampling average is

▶ Y is said to be an unbiased estimator of µY , since

Figure 3 : GPA example: Sampling distributions of Y obtained from

Sample size n=100

Sample size n=50

Sample size n=10

Sample size n=2

Figure 4 : GPA example: Overlapping sampling distributions of Y

▶ Up to now we have derived the mean and the standard

▶ In particular, if we consider the standardized distribution

(Ȳ − E [Ȳ ]) (Ȳ − µY ) √ (Ȳ − µY )

▶ Computer simulations on this topic can be implemented

▶ To estimate the expected value of a given population the

Point Estimate ± Margin of Error

in which z α2 is such that P(Z ≥ z α2 ) = α2 .

Note that z α2 is a number (z-score) such that

guarantees that the (1 − α)% of the times the true parameter

▶ Issue: In general we do not know the exact value of σ and

Confidence interval = [ȳ − z α2 × sȳ ; ȳ + z α2 × sȳ ]

▶ Computer simulations can be done at

Figure 6 : Confidence Intervals: Summary of 1,000 intervals computed

▶ Consider the GPA example and suppose we extract a sample

▶ To summarize categorical data it is useful to record the

▶ Note that this proportion is an average over the variables

▶ For these reasons a 95% confidence interval for π is given by

π̂ ± 1.96sπ̂ = 0.33 ± 1.96(0.0136) = 0.33 ± 0.03

▶ This result suggests that the true proportion π ranges

You might also like