You are on page 1of 92

Sampling Distributions

Sampling Distribution
Introduction
• In real life calculating parameters of
populations is prohibitive because
populations are very large.
• Rather than investigating the whole
population, we take a sample, calculate a
statistic related to the parameter of interest,
and make an inference.
• The sampling distribution of the statistic is
the tool that tells us how close is the statistic
to the parameter.
Sample Statistics as Estimators
of Population Parameters
• A sample statistic is a A population parameter
numerical measure of a is a numerical measure of
summary characteristic a summary characteristic
of a sample. of a population.

• An estimator of a population parameter is a sample


statistic used to estimate or predict the population
parameter.
• An estimate of a parameter is a particular numerical
value of a sample statistic obtained through
sampling.
• A point estimate is a single value used as an
estimate of a population parameter.
Estimators

• The sample mean, X , is the most common


estimator of the population mean, 
• The sample variance, s2, is the most common
estimator of the population variance, 2.
• The sample standard deviation, s, is the most
common estimator of the population standard
deviation, .
• The sample proportion, p̂ , is the most common
estimator of the population proportion, p.
Sampling Distribution of X

• The sampling distribution of X is the


probability distribution of all possible values
the random variable X may assume when a
sample of size n is taken from a specified
population.
Sampling Distribution of the Mean
• An example
– A die is thrown infinitely many times. Let X
represent the number of spots showing on
any throw.
– The probability distribution of X is

E(X) = 1(1/6) +
x 1 2 3 4 5 6 2(1/6) + 3(1/6)+
………………….= 3.5
p(x) 1/6 1/6 1/6 1/6 1/6 1/6
V(X) = (1-3.5)2(1/6) +
(2-3.5)2(1/6) +
Throwing a dice twice – sampling
distribution of sample mean

• Suppose we want to estimate m


from the mean x of a sample of
size n = 2.
• What is the distribution of x ?
Throwing a die twice – sample
mean

Sample Mean Sample Mean Sample Mean


1 1,1 1 13 3,1 2 25 5,1 3
2 1,2 1.5 14 3,2 2.5 26 5,2 3.5
3 1,3 2 15 3,3 3 27 5,3 4
4 1,4 2.5 16 3,4 3.5 28 5,4 4.5
5 1,5 3 17 3,5 4 29 5,5 5
6 1,6 3.5 18 3,6 4.5 30 5,6 5.5
7 2,1 1.5 19 4,1 2.5 31 6,1 3.5
8 2,2 2 20 4,2 3 32 6,2 4
9 2,3 2.5 21 4,3 3.5 33 6,3 4.5
10 2,4 3 22 4,4 4 34 6,4 5
11 2,5 3.5 23 4,5 4.5 35 6,5 5.5
12 2,6 4 24 4,6 5 36 6,6 6
Sample Mean Sample Mean Sample Mean
1 1,1 1 13 3,1 2 25 5,1 3
The distribution of x when n = 2
2
3
1,2
1,3
1.5
2
14
15
3,2
3,3
2.5
3
26
27
5,2
5,3
3.5
4
4 1,4 2.5 16 3,4 3.5 28 5,4 4.5
2
5 1,5 3 17
x
3,5 4 29
30 2
5,5 5
Note :    and  
6
7
1,6
2,1
3.5 18
1.5 x 19
3,6
4,1
x
4.5
2.5 31 x
5,6
6,1
5.5
3.5
8
9
2,2
2,3
2
2.5
20
21
2
4,2
4,3
3
3.5
32
33
6,2
6,3
4
4.5
10 2,4 3 22 4,4 4 34 6,4 5
11 2,5 3.5 23 4,5 4.5 35 6,5 5.5
12 2,6 4 24 4,6 5 36 6,6 6

E( x) =1.0(1/36)+
1.5(2/36)+….=3.5
6/36
5/36
V(X) = (1.0-3.5)2(1/36)+
4/36 (1.5-3.5)2(2/36)... = 1.46
3/36
2/36
1/36
1 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 x
Sampling Distribution of the
Mean
n5 n  10 n  25
 x  3.5  x  3.5  x  3.5
2
  2
 2x
 2x  .5833 (  x )  2x  .2917 (  x )
2
  .1167 (  )
x
5 6 10 25
Sampling Distribution of the
Mean
n5 n  10 n  25
 x  3 .5  x  3 .5  x  3.5
2 2
 2  
2
  .5833 (  ) x 2x  .2917 (  x )  2x  .1167 (  x )
x
5 10 25

Notice that  x2 is smaller than .sx2.


The larger the sample size the
smaller  x . Therefore, x tends
2

to fall closer to m, as the sample


size increases.
Relationships between Population Parameters and
the Sampling Distribution of the Sample Mean

The expected value of the sample mean is equal to the population mean:

E( X )    
X X

The variance of the sample mean is equal to the population variance divided by
the sample size:

 2

V(X)  2
 X
X
n
The standard deviation of the sample mean, known as the standard error of
the mean, is equal to the population standard deviation divided by the square
root of the sample size:
X
s.e.  SD( X )   X 
n
The Central Limit Theorem
n=5
When sampling from a population 0.25

with mean  and finite standard


0.20
0.15

P(X)
0.10

deviation , the sampling 0.05


0.00
X

distribution of the sample mean will n = 20


tend to be a normal distribution with 0.2

mean  and standard deviation n as

P(X)
0.1

the sample size becomes large 0.0


X

(n >30). Large n
0.4
0.3

f(X)
0.2

For “large enough” n: X ~ N (  , / n)


2
0.1
0.0


X
-
The Central Limit Theorem Applies to
Sampling Distributions from Any Population
Normal Uniform Skewed General

Population

n=2

n = 30

 X  X  X  X
The Central Limit Theorem
(Example)
Mercury makes a 2.4 liter V-6 engine, used in speedboats. The company’s
engineers believe the engine delivers an average horsepower of 220 HP and
that the standard deviation of power delivered is 15 HP. A potential buyer
intends to sample 100 engines. What is the probability that the sample
mean will be less than 217 HP?

 
 X   217   
P ( X  217)  P  
   
 n n 

   
 217  220   217  220
 P Z    P Z  
 15   15 
 100   10 

 P ( Z  2)  0.0228
Student’s t Distribution
If the population standard deviation, , is unknown, replace with
the sample standard deviation, s. If the population is normal, the
resulting statistic: X 
t
s/ n
has a t distribution with (n - 1) degrees of freedom.
• The t is a family of bell-shaped and
symmetric distributions, one for each
number of degree of freedom.
• The expected value of t is 0. Standard normal
• The variance of t is greater than 1, but t, df=20
approaches 1 as the number of degrees of t, df=10
freedom increases.
• The t distribution approaches a standard
normal as the number of degrees of
freedom increases. 

• When the sample size is small (<30) we use
t distribution.
Sampling Distributions

Finite Population Correction Factor

If the sample size is more than 5% of the


population size and the sampling is done
without replacement, then a correction needs
to be made to the standard error of the
means.

 N n
x  
n N 1
Sampling Distribution of x

Standard Deviation of x
Finite Population Infinite Population

 N n 
x  ( ) x 
n N 1 n
• A finite population is treated as being
infinite if n/N < .05.
• ( N  n ) / ( N  1) is the finite correction factor.
•  x is referred to as the standard error of
the
mean.
Sampling Distribution of the Sample Mean
• The amount of soda pop in each bottle is normally
distributed with a mean of 32.2 ounces and a
standard deviation of 0.3 ounces.
• Find the probability that a carton of four bottles will
have a mean of more than 32 ounces of soda per
bottle.
• Solution
– Define the random variable as the mean amount of soda per
bottle.
x   32  32.2
P( x  32)  P(  ) 0.9082
x .3 4
 P( z  1.33)  0.9082

x = 32
x  32 m = 32.2
 x  32.2
Sampling Distribution of the
Sample Mean
• Example
– Dean’s claim: The average weekly income of
M.B.A graduates one year after graduation is
$600.
– Suppose the distribution of weekly income has a
standard deviation of $100. What is the probability
that 25 randomly selected graduates have an
average weekly income of less than $550?
– Solution
x   550  600
P( x  550)  P (  )
x 100 25
 P ( z  2.5)  0.0062
The Sampling Distribution of the Sample
Proportion, p
n= 2 , p = 0 .3

The sample proportion is the percentage of 0 .5

0 .4

successes in n binomial trials. It is the 0 .3

P(X)
0 .2

number of successes, X, divided by the 0 .1

number of trials, n.
0 .0
0 1 2

n=10,p=0.3

X 0.3

Sample proportion: p  0.2

P(X)
0.1

0.0
0 1 2 3 4 5 6 7 8 9 10
X

As the sample size, n, increases, the sampling


distribution of p approaches a normal
n=15, p = 0.3

0.2

distribution with mean p and standard


deviation
P(X)
p(1  p) 0.1

n 0.0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 X

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 ^
p
Normal approximation to the
Binomial
– Normal approximation to the binomial works
best when
• the number of experiments (sample size) is
large, and
• the probability of success, p, is close to 0.5.

– For the approximation to provide good results


two conditions should be met:
 
np 5; n(1 - p) 5
• Example
– A state representative received 52% of the
votes in the last election.
– One year later the representative wanted
to study his popularity.
– If his popularity has not changed, what is
the probability that more than half of a
sample of 300 voters would vote for him?
• Example
– Solution
• The number of respondents who prefer the
representative is binomial with n = 300 and p
= .52. Thus, np = 300(.52) = 156 and
n(1-p) = 300(1-.52) = 144 (both greater than 5)

 pˆ  p .50  .52 
P( pˆ  .50)  P    .7549
 p (1  p ) n (.52)(1  .52) 300 

SHAPE OF THE SAMPLING
DISTRIBUTION OF x

• Sampling from a Normally Distributed


Population
• Sampling from a Population That Is Not
Normally Distributed

25
Sampling From a Normally
Distributed Population
If the population from which the samples
are drawn is normally distributed with mean
μ and standard deviation σ , then the
sampling distribution of the sample mean,
x ,will also be normally distributed with the
following mean and standard deviation,
irrespective of the sample size:

 x   and  x 
n 26
Figure 7.2 Population distribution and sampling
distributions of .
x

(a) Population distribution.

Normal distribution

27
Figure 7.2 Population distribution and sampling
distributions of .
x

(b) Sampling distribution of xfor n = 5.

Normal distribution

28
Figure 7.2 Population distribution and sampling
distributions of .
x

(c) Sampling distribution of xfor n = 16.

Normal distribution

29
Figure 7.2 Population distribution and sampling
distributions of .
x

(d) Sampling distribution of xfor n = 30.

Normal distribution

x 30
Figure 7.2 Population distribution and sampling
distributions of .
x

(e) Sampling distribution of xfor n = 100.

Normal distribution

x 31
Example 7-3

 In a recent SAT, the mean score for all


examinees was 1020. Assume that the
distribution of SAT scores of all examinees is
normal with the mean of 1020 and a standard
deviation of 153. Let x be the mean SAT score
of a random sample of certain examinees.
Calculate the mean and standard deviation of x
and describe the shape of its sampling
distribution when the sample size is
a) 16 b) 50 c) 1000
32
Solution 7-3

a)

 x    1020
 153
x    38.250
n 16

33
Figure 7.3

Sampling
 x = 38.250 distribution of x for
n = 16

Population
distribution
σ = 153

 x = μ = 1020 SAT scores


34
Solution 7-3

b)

 x    1020
 153
x    21.637
n 50

35
Figure 7.4

Sampling
 x = 21.637 distribution of x for
n = 50

Population
distribution
σ = 153

 x = μ = 1020 SAT scores

36
Solution 7-3

c)

 x    1020
 153
x    4.838
n 1000

37
Figure 7.5

Sampling
 x = 4.838 distribution of x for
n = 1000

Population
σ = 153 distribution

38
 x = μ = 1020 SAT scores
Sampling From a Population That
Is Not Normally Distributed
Central Limit Theorem
According to the central limit theorem, for a large
sample size, the sampling distribution of is x
approximately normal, irrespective of the shape of
the population distribution. The mean and standard
deviation of the sampling distribution of x are

 x   and  x 
n
The sample size is usually considered to be large if
n ≥ 30.
39
Figure 7.6 Population distribution and sampling
distributions of .
x

(a) Population distribution.

40
Figure 7.6 Population distribution and sampling
distributions of .
x

(b) Sampling distribution of xfor n = 4.

41
Figure 7.6 Population distribution and sampling
distributions of .
x

(c) Sampling distribution of xfor n = 15.

42
Figure 7.6 Population distribution and sampling
distributions of .
x

(d) Sampling distribution of xfor n = 30.


Approximately
normal distribution

43
Figure 7.6 Population distribution and sampling
distributions of .
x

(e) Sampling distribution of xfor n = 80.


Approximately
normal distribution

x
44
Example 7-4

 The mean rent paid by all tenants in a large city


is $1550 with a standard deviation of $225.
However, the population distribution of rents for
all tenants in this city is skewed to the right.
Calculate the mean and standard deviation of x
and describe the shape of its sampling
distribution when the sample size is
a) 30 b) 100

45
Solution 7-4

a) Let x be the mean rent paid by a sample of 30


tenants

 x    $1550
 225
x    $41.079
n 30

46
Figure 7.7

(a) Population distribution.

σ = $225

μ = $1550 x

47
Figure 7.7

(b) Sampling distribution of xfor n = 30.

 x = $41.079

 x = $1550 x 48
Solution 7-4

b) Let x be the mean rent paid by a sample of 100


tenants

 x    $1550
 225
x    $22.500
n 100

49
Figure 7.8

(a) Population distribution.

σ = $225

μ = $1550 x

50
Figure 7.8

(b) Sampling distribution of xfor n = 100.

 x = $22.500

 x = $1550 x 51
APPLICATIONS OF THE
SAMPLING DISTRIBUTION OF x

1. If we make all possible samples of the


same (large) size from a population and
calculate the mean for each of these
samples, then about 68.26% of the
sample means will be within one
standard deviation of the population
mean.

52
Figure 7.9
P(   1 x  x    1 x )

Shaded area
is .6826

.3413 .3413

  1 x    1 x x

53
APPLICATIONS OF THE SAMPLING
DISTRIBUTION OF x cont.

2. If we take all possible samples of the


same (large) size from a population and
calculate the mean for each of these
samples, then about 95.44% of the
sample means will be within two standard
deviations of the population mean.

54
Figure 7.10
P (   2 x  x    2 x )

Shaded area
is .9544

.4772 .4772

  2 x    2 x x

55
APPLICATIONS OF THE SAMPLING
DISTRIBUTION OF x cont.

3. If we take all possible samples of the


same (large) size from a population and
calculate the mean for each of these
samples, then about 99.74% of the
sample means will be within three
standard deviations of the population
mean.

56
Figure 7.11
P (   3 x  x    3 x )

Shaded area
is .9974

.4987 .4987

  3 x    3 x x

57
Example 7-5

Assume that the weights of all packages of


a certain brand of cookies are normally
distributed with a mean of 32 ounces and a
standard deviation of .3 ounce. Find the
probability that the mean weight, x,of a
random sample of 20 packages of this
brand of cookies will be between 31.8 and
31.9 ounces.
58
Solution 7-5

 x    32 ounces
 .3
x    .06708204 ounce
n 20

59
z Value for a Value of x

The z value for a value of xis calculated


as

x
z
x

60
Solution 7-5

31.8  32
• For x = 31.8: z  2.98
.06708204
31.9  32
• Forx = 31.9: z  1.49
.06708204

• P (31.8 < x< 31.9) = P (-2.98 < z < -1.49 )


= .4986 - .4319
= .0667
61
Figure 7.12

Shaded are
is .0667

31.8 31.9  x = 32 x
-2.98 -1.49 0
z
62
Example 7-6

According to the College Board’s report, the


average tuition and fees at four-year private
colleges and universities in the United States was
$18,273 for the academic year 2002-2003 (The
Hartford Courant, October 22, 2002). Suppose that
the probability distribution of the 2002-2003 tuition
and fees at all four-year private colleges in the
United States was unknown, but its mean was
$18,273 and the standard deviation was $2100.
Let x be the mean tuition and fees for 2002-2003
for a random sample of 49 four-year private U.S.
colleges. Assume that n /N ≤ .05.
63
Example 7-6

a) What is the probability that the 2002-2003


mean tuition and fees for this sample was
within $550 of the population mean?
b) What is the probability that the 2002-2003
mean tuition and fees for this sample was
lower than the population mean by $400 or
more?

64
Solution 7-6

 x    $18,273
 2100
x    $300
n 49

65
Solution 7-6

a)
• For x= $17,723: z  x    17,723  18,273  1.83
x 300

• For x = $18,823: x   18,823  18,273


z   1.83
x 300
• P (17,723 ≤ x≤ 18,823)
= P (-1.83 ≤ z ≤ 1.83) = .4664
+ .4664 = .9328
66
Solution 7-6

a)
Therefore, the probability that the 2002-
2003 mean tuition and fees for this
sample of 49 four-year private U.S.
colleges was within $550 of the
population mean is .9328.

67
Figure 7.13
P($17,723  x  $18,823)

Shaded area
is .9328

.4664 .4664

$17,723  x = $18,273 $18,823 x


-1.83 0 1.83 z
68
Solution 7-6

b)
• For x= $17,873:
x   17,873  18,273
z   1.33
x 300

• P ( x ≤ $17,873) = P (z ≤ -1.33)
= .5 - .4082
= .0918
69
Solution 7-6

b)
Therefore, the probability that the 2002-
2003 mean tuition and fees for this
sample of 49 four-year private U.S.
colleges was lower than the population
mean by $400 or more is .0918.

70
Figure 7.14
P ( x  $17,873)

The required
probability
is .0918

.4082

$17,873  x = $18,273 x
-1.33 0 z
71
POPULATION AND SAMPLE
PROPORTIONS
The population and sample proportions,
denoted by p and p̂ are
, respectively,
calculated as

X x
p and pˆ 
N n

72
POPULATION AND SAMPLE
PROPORTIONS cont.
where
– N = total number of elements in the population
– n = total number of elements in the sample
– X = number of elements in the population that
possess a specific characteristic
– x = number of elements in the sample that
possess a specific characteristic

73
Example 7-7

Suppose a total of 789,654 families live in a


city and 563,282 of them own homes. A
sample of 240 families is selected from this
city, and 158 of them own homes. Find the
proportion of families who own homes in
the population and in the sample. Find the
sampling error.

74
Solution 7-7

X 563,282
p   .71
N 789,654
x 158
pˆ    .66
n 240
Sampling error  pˆ  p  .66  .71  .05

75
MEAN, STANDARD DEVIATION, AND
SHAPE OF THE SAMPLING
p̂ DISTRIBUTION OF
• Sampling Distribution of p̂
• Mean and Standard Deviation of p̂
• Shape of the Sampling Distribution of p̂

76
Sampling Distribution of

Definition
The probability distribution of the sample
proportion, p̂ , is called its sampling
distribution. It gives various values that
p̂ assume and their probabilities.
can

77
Example 7-8

Boe Consultant Associates has five


employees. Table 7.6 gives the names of
these five employees and information
concerning their knowledge of statistics.

78
Table 7.6 Information on the Five Employees of Boe
Consultant Associates

Name Knows Statistics


Ally yes
John no
Susan no
Lee yes
Tom yes

79
Example 7-8

• If we define the population proportion, p,


as the proportion of employees who know
statistics, then
• p = 3 / 5 = .60

80
Example 7-8

• Now, suppose we draw all possible


samples of three employees each and
compute the proportion of employees, for
each sample, who know statistics.

5! 5  4  3  2 1
Total number of samples  5 C 3    10
3!(5  3)! 3  2  1  2  1

81
Table 7.7 All Possible Samples of Size 3 and the Value
of for Each Sample

Proportion Who Know Statistics


Sample p̂
Ally, John, Susan 1/3 = .33
Ally, John, Lee 2/3 = .67
Ally, John, Tom 2/3 = .67
Ally, Susan, Lee 2/3 = .67
Ally, Susan, Tom 2/3 = .67
Ally, Lee, Tom 3/3 = 1.00
John, Susan, Lee 1/3 = .33
John, Susan, Tom 1/3 = .33
John, Lee, Tom 2/3 = .67
Susan, Lee, Tom 2/3 = .67
82
Table 7.8 Frequency and Relative Frequency
Distribution of When the Sample Size Is 3

Relative
p̂ f Frequency
.33 3 3/10 = .30
.67 6 6/10 = .60
1.00 1 1/10 = .10
Σf = 10 Sum = 1.00

83
Table 7.9 Sampling Distribution of When the

Sample Size is 3

p̂ P ( p̂ )
.33 .30
.67 .60
1.00 .10
ΣP ( p̂ ) = 1.00

84
Mean and Standard Deviation of

Mean of the Sample Proportion
The mean of the sample proportion, ,p̂is
denoted by and p̂
is equal to the population
proportion, p. Thus,

 pˆ  p

85
Mean and Standard Deviation of
p̂ cont.
Standard Deviation of the Sample Proportion
The standard deviation of the sample
proportion, p̂, is denoted by and
p̂ is given by
the formula pq
 pˆ 
n
Where p is the population proportion, q = 1 –p ,
and n is the sample size. This formula is used
when n /N ≤ .05, where N is the population size.
86
Mean and Standard Deviation of
p̂ cont.
If the n /N ≤ .05 condition is not satisfied,
we use the following formula to calculate
 p̂ :
pq N n
 pˆ 
n N 1
N n
where the factor N  1 is
called the finite
population correction factor
87
Shape of the Sampling
Distribution
p̂ of
Central Limit Theorem for Sample Proportion
According to the central limit theorem, the
sampling distribution of p̂ is approximately
normal for sufficiently large sample size. In the
case of proportion, the sample size is considered
to be sufficiently large if np and nq are both
greater than 5 – that is if
np > 5 and nq >5
88
Example 7-9

The National Survey of Student Engagement


shows about 87% of freshmen and seniors rate
their college experience as “good” or “excellent”
(USA TODAY, November 12, 2002). Assume this
result is true for the current population of all
freshmen and seniors. Let p̂ be the proportion of
freshmen and seniors in a random sample of 900
who hold this view. Find the mean and standard
deviation of and describep̂ the shape of its
sampling distribution.
89
Solution 7-9

p  .87 and q  1  p  1  .87  .13


 pˆ  p  .87
pq (.87)(.13)
 pˆ    .011
n 900
np  900(.87)  783 and nq  900(.13)  117

90
Solution 7-9

• np and nq are both greater then 5


• Therefore, the sampling distribution of p̂ is
approximately normal with a mean of .87
and a standard deviation of .001, as shown
in Figure 7.15

91
Figure 7.15

Approximately
normal  p̂ =.011

 p̂ = .87 p̂

92

You might also like