You are on page 1of 29

Fifth Week

Sampling Distributions

Adapted From :
Scientists, 8th Ed
Probability & Statistics for Engineers & Scientists Ed.
Walpole/Myers/Myers/Ye (c)2007
Introduction to Business Statistics, 5e
Kvanli/Guynes/Pavur (c)2000
South-Western College Publishing
Statistics for Managers
Using Microsoft® Excel 4th Edition

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1
Populations & Samples

ƒ A population is the set (possibly infinite) of all possible observations.


ƒ Each observation is a random variable X having some (often unknown)
probability distribution, f (x).
( )
ƒ If the population distribution is known, it might be referred to, for
example, as a normal population, etc.
ƒ A sample is a subset of a population.
ƒ Our goal is to make inferences about the population based on an
y of the sample.
analysis p
ƒ A biased sample, usually obtained by taking convenient, rather than
representative observations, will consistently over- or under-estimate
some characteristic of the population.
ƒ Observations in a random sample are made independently and at
random. Here, random variables X1, X2, …, Xn in the sample all have
same distribution as the population
population, X
X.
Inferential Statistics

Populasi Sampel
Statistik sbg
ringkasan sampel

Parameter sbg
ringkasan
g populasi
p p

P
Penarikan
ik kesimpulan
k i l terhadap
h d populasi
l i melalui
l l i sampell
Parameter & Statistik

“Rata-rata tinggi peserta


pelatihan
pe at a ASDIS 3
tahun terakhir
Parameter

Populasi

“Rata
Rata rata tinggi peserta Sampel
pelatihan ASDI tahun ini
Statistik
Sampling Distributions

ƒ A sampling distribution is a
distribution of all of the possible
values of a statistic for a given size
sample selected from a population

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-5
Sample Statistics

ƒ Any function of the random variables X1, X2, …, Xn


making up a random sample is called a statistic.
ƒ The most important statistics, as we have seen are the
sample
p mean, sample p variance and sample
p standard
deviation:
n

X ∑
X = i =1 i

n
− X) n∑ X − (∑ X )
n n n

∑(X
2 2 2

S 2
= i =1 i
= i =1 i i =1 i

n −1 n(n − 1)
Sampling Distributions

ƒ Starting with an unknown population distribution, we


can study the sampling distribution, or distribution of a
sample statistic (like Xbar or S) calculated from a sample
of size n from that population.
ƒ The sample consists of independent and identically distributed
observations X1, X2, …, Xn from the population.
ƒ Based on the sampling distributions of Xbar and S for samples of
size n, we will make inferences about the population mean and
variance μ and σ.
ƒ We could approximate the sampling distribution off Xbar by taking
a large number of random samples of size n and plotting the
distribution of the Xbar values.
Developing a
S
Sampling
li DiDistribution
t ib ti

ƒ Assume there is a population …


C D
ƒ Population size N=4 A B

ƒ Random variable,
variable XX,
is age of individuals
ƒ Values of X: 18, 20,
22,, 24 (y
(years))

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-8
Developing a
S
Sampling
li DiDistribution
t ib ti
(continued)

Summary Measures for the Population Distribution:

μ=
∑ X i P( )
P(x)
N .3
18 + 20 + 22 + 24
= = 21 .2
4 .1
1

∑ i
0
(X
( − μ) 2
18 20 22 24 x
σ= = 2.236
2 236
N A B C D
Uniform Distribution

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-9
Developing a
S
Sampling
li DiDistribution
t ib ti
(continued)
N
Now consider
id allll possible
ibl samples
l off size
i n=2
2
1st 2nd Observation
16 Sample
Obs 18 20 22 24
Means
18 18,18 18,20 18,22 18,24
1st 2nd Observation
20 20,18 20,20 20,22 20,24 Obs 18 20 22 24
22 22,18
22 18 22,20
22 20 22,22
22 22 22,24
22 24 18 18 19 20 21
24 24,18 24,20 24,22 24,24 20 19 20 21 22
16 possible samples 22 20 21 22 23
(sampling with
replacement)
24 21 22 23 24

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-10
Developing a
S
Sampling
li DiDistribution
t ib ti
(continued)
Sampling Distribution of All Sample Means

16 S
Sample
l MMeans Sample Means
Distribution
1st 2nd Observation _
Obs 18 20 22 24 P(X)
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 X
(no longer uniform)
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-11
Developing a
S
Sampling
li DiDistribution
t ib ti
(continued)

Summary Measures of this Sampling Distribution:

μX =
∑ X i
=
18 + 19 + 21 + " + 24
= 21
N 16

σX =
∑ i X
(X − μ ) 2

(18 - 21)2 + (19 - 21)2 + " + (24 - 21)2


= = 1.58
16

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-12
Comparing the Population with its
S
Sampling
li Di
Distribution
ib i
Population Sample Means Distribution
N=4 n=2
μ = 21 σ = 2.236 μX = 21 σ X = 1.58
1 58
_
P(X) P(X)
.3 .3

.2 .2

.1
1 .1
1

0 X 0
18 19 20 21 22 23 24
_
18 20 22 24 X
A B C D
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-13
Sampling Distribution Summary

ƒ Normal distribution: Sampling distribution of Xbar when σ is known


for any population distribution.
ƒ Also the sampling distribution for the difference of the means of
two different samples.
ƒ t-distribution: Sampling distribution of Xbar when σ is unknown and
S is used. Population must be normal.
ƒ Also the sampling distribution for the difference of the means of
two different samples when σ is unknown.
ƒ Chi-square (χ2) distribution: Sampling distribution of S2. Population
must be normal.
ƒ F-distribution: The distribution of the ratio of two χ2 random
variables. Sampling distribution of the ratio of the variances of two
different samples
samples. Population must be normal
normal.
Standard Error of the Mean

ƒ Different samples of the same size from the same


population will yield different sample means
ƒ A measure of the variability from sample to sample is
given by the Standard Error of the Mean:

σ
σX =
n
ƒ Note that the standard error of the mean decreases as
the sample size increases

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-15
If the Population is Normal

ƒ If a population is normal with mean μ and


standard deviation σ,σ the sampling distribution
of X is also normally distributed with

σ
μX = μ and σX =
n
(This assumes that sampling is with replacement or
sampling is without replacement from an infinite population)

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-16
Sampling Distribution Properties

ƒ For sampling with replacement:


As n increases, Larger
σ x decreases sample size

Smaller
sample size

μ x
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-17
Central Limit Theorem
ƒ The central limit theorem is the most important theorem
in statistics. It states that
ƒ If Xbar is the mean of a random sample of size n from a
population with an arbitrary distribution with mean μ and
variance σ2, then as n→∞ n→∞, the sampling distribution of
Xbar approaches a normal distribution with mean μ and
sstandard
a da d de a o σ/√
deviation σ/√n .
ƒ The central limit theorem holds under the following
conditions:
ƒ For any population distribution if n ≥ 30.
ƒ For n < 30, if the population distribution is generally shaped like
a normall di
distribution.
t ib ti
ƒ For any value of n if the population distribution is normal.
If the Population is not Normal

ƒ We can apply the Central Limit Theorem:


ƒ Even if the population is not normal,
normal
ƒ …sample means from the population will be
approximately normal as long as the sample size is
large enough
ƒ …and the sampling
p g distribution will have

σ
μx = μ and σx =
n
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-19
Central Limit Theorem

the sampling
As the n↑↑
distribution
sample
becomes
size g
gets
almost normal
large
regardless of
enough…
g
shape of
population

x
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-20
If the Population is not Normal
(continued)

Population Distribution
Sampling distribution
properties:
Central Tendency

μx = μ
μ x
Variation Sampling Distribution
σ (becomes normal as n increases)
σx = Larger
n Smaller
sample size
sample
size
(Sampling with
replacement)
l t)
μx x
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-21
How Large is Large Enough?

ƒ For most distributions, n > 30 will give a


sampling
li di
distribution
t ib ti ththatt iis nearly
l normall
ƒ For fairly symmetric distributions,
distributions n > 15
ƒ For normal population distributions, the
sampling distribution of the mean is always
normally distributed

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-22
Example

ƒ Suppose a population has mean μ = 8 and


standard deviation σ = 2
2. Suppose a random
sample of size n = 25 is selected.

ƒ What is the probability that the sample mean is


between 7.8 and 8.2?

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-23
Example: μ = 8 σ =2 n = 25
P ( 7.8 < X < 8.2 ) = ?
⎛ 7.8
7 8 − 8 X − μ X 88.2 2−8 ⎞
P ( 7.8 < X < 8.2 ) = P ⎜ < < ⎟
⎝ 2 / 25 σX 2 / 25 ⎠
= P ( −.5 < Z < .5 ) = .3830

Sampling Distribution Standardized


2 Normal Distribution
σX = = .4 σZ =1
25
.1915

7.8 8.2 X −0.5 0.5 Z


μX = 8
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.
μZ = 0 Chap 6-24
Inferences About the
Population Mean

ƒ We often want to test hypotheses about the population mean (hypothesis


testing will be formalized later).
ƒ Example:
ƒ Suppose a manufacturing process is designed to produce parts with μ
= 6 cm in diameter, and suppose σ is known to be .15 cm. If a random
sample of 80 parts has xbar = 6.046 cm, what is the probability (P-
value) that a value this far from the mean could occur by chance if μ is
truly 6 cm?
6.046 − 6.00
z= = 2.74
.15 / 80
P [| X − 6.0 | ≥ .046] = P [| Z | ≥ 2.74] = ?

P [| Z | ≥ 2.74] = 2 P [ Z ≥ 2.74] = 2(1 − .9969) = .0062


Difference Between Two Means

ƒ In addition, we can make inferences about the


difference between two population means based on the
difference between two sample means.
ƒ The central limit theorem also holds in this case.
ƒ If independent samples of size n1 and n2 are drawn at
random from two populations, discrete or continuous,
with means μ1 and μ2, and variances σ12 and σ22, then
as n gets large, the sampling distribution of Xbar1 - Xbar2
approaches a normal distribution with
σ 2
σ 2

μ X1−X 2
= μ −μ
1 2
and σ 2

X1−X 2
= 1
+ 2
.
n 1
n 2
Difference of Two Means
Example

ƒ Example:
ƒ Suppose we record the drying time in hours of 20 samples each
of two types of paint, type A and type B. Suppose we know that
the population standard deviations are both equal to 1/2 hour.
Assuming that the population means are equal
equal, what is the
probability that the difference in the sample means is greater
than 1/2 hour? μ = μ −μ = 0 X A− X B A B

σ 2
σ 2
.25 .25
σ 2

X A−X B
= A
+ B
= + = .025
n A
n B
20 20
( x − x ) − ( μ − μ ) .5 − 0
z= = = 3.16
A B
A B

(σ n ) + (σ n )
2

A
.025
02A
2

B B
t-Distribution (when σ is
Unknown)

ƒ The problem with the central limit theorem is that it assumes that σ is
known.
ƒ Generally,
G iff μ is being estimated from
f the sample, σ must be
estimated from the sample as well.
ƒ The t-distribution can be used if σ is unknown, but it requires that the
original population must be normally distributed.
ƒ Let X1, X2, ..., Xn be independent, normally distributed random variables
with mean μ and standard deviation σ. Then the random variable T below
has a
t-distribution with ν = n - 1 degrees of freedom:
X −μ
T =
S/ n
ƒ The t-distribution is like the normal, but with greater spread since both
μ and σ have fluctuations due to sampling.
Using the tt-Distribution
Distribution

ƒ Observations on the t-Distribution:


ƒ The t-statistic is like the normal, but using S rather than σ.
ƒ Table value depends on the sample size (degrees of freedom).
ƒ The t-distribution is symmetric with μ = 0, but σ2 > 1. As would be
expected,
p σ2 is largest
g for small n.
ƒ Approaches the normal distribution (σ2 = 1) as n gets large.
ƒ For a given probability α, table shows the value of tα that has P(α) to
the right of it.
it
ƒ The t-distribution can also be used for hypotheses concerning the
difference of two means where σ1 and σ2 are unknown, as long as the two
populations are normally distributed
distributed.
ƒ Usually if n ≥ 30, S is a good enough estimator of σ, and the normal
distribution is typically used instead.

You might also like