Statistics II - Chapter 09 PDF

Copyright © 2014 Pearson Education. All rights reserved.
9-1
Statistics
For Business & Economics
II
Prof.dr.ir. Wouter VERBEKE
Drs. Jeroen BERREVOETS
Copyright © 2014 Pearson Education. All rights reserved. 9-2

Chapter 9
Sampling Distributions and

Confidence Intervals for Proportions
Statistics II

Outline
Introductory example
9.1 The Distribution of Sample Proportions

9.2 A Confidence Interval for a Proportion
9.3 Margin of Error: Certainty vs. Precision
9.4 Choosing the Sample Size
What can go wrong

What have we learned

4 9-4

• Classroom experiment
• Media: bullying

6 9-6
• Response rate credit card customers

• Sample RR = 21.1%
• Population RR > 20%
• Probability of default in a loan portfolio

• Year 2019 DR = 1.02%
• Population DR = ?
• What is the population?

7 9-7
9.1 The Distribution of Sample
Proportions

• We usually cannot know the value of the true proportion of

an event in a population, i.e., the population proportion.
• E.g., response rate, default rate, etc.
• Why?
• Therefore, we typically take a sample to estimate the

proportion, i.e., the sample proportion.
• Notation:
• Population proportion: 𝑝 Parameter
• Sample proportion: 𝑝̂ Estimate

9 9-9
• When taking a sample to investigate a population

proportion,
• We must realize that our sample proportion is only one
possible sample that we could have taken (e.g., RR) or that
could have occurred (e.g., DR).
The sample proportion is a random variable!
• To learn more about the variability of the sample

proportion,
• We have to imagine how the sample proportion would vary
across all possible samples.

10 9-10
• One way to do that is to simulate lots of samples of the same

size using the same population proportion.
• In a simulation,
• We have only two possible outcomes for an event, which
are labeled “success” and “failure”,
• We set the true proportion of successes to a known value,
• Draw random samples,
• And then record the sample proportion of successes.
• On the next slide is a histogram:

• Of 10,000 sample proportions,
• Each for a random sample of size 1000,
• Using p = 0.2 as the true proportion.
11 9-11
How does this • A histogram of 10,000 sample
distribution look like? proportions,
• Each for a random sample of size
1000,
• Using p = 0.2 as the true
proportion.
Observations:
• Not every sample has a sample proportion equal to
0.2.
• Sample proportions bigger than 0.24 and smaller than
0.16 are rare.
• Most sample proportions are between 0.18 and 0.22.
12 9-12
• The distribution of proportions over many independent

samples from the same population is called the sampling
distribution of the proportions.
• E.g., the histogram on the previous slide
Can we model the sampling distribution?

• In other words: what is the theoretical sampling
distribution?
• the distribution of all the sample proportions
• that would arise from all possible samples
• of the same size
• with a constant probability of a ‘success’, p

13 9-13
• The number of successes can be modeled by a Binomial
distribution
• Which, in turn, can be modeled by a Normal distribution
• With mean 𝑛𝑝 and standard deviation 𝑛𝑝𝑞
• As long as 𝑛𝑝 and 𝑛𝑞 are large enough (>10)
• The sample proportion, 𝑝,̂ is the number of successes, 𝑋,

divided by the sample size, 𝑛
• If the distribution of 𝑋 is Normal

• Then the distribution of 𝑋 divided by n, i.e., 𝑝,̂ is Normal
& is Normal
The sampling distribution of 𝒑
14 9-14

• The sampling distribution of 𝒑& is Normal
• The center of this Normal distribution is at the true proportion 𝑝:

• 𝑋 has mean 𝑛𝑝 If 𝑿~𝑵 𝝁, 𝝈 , then
( )*
• Then ) has mean ) = 𝑝 𝒂𝑿~𝑵 𝒂𝝁, 𝒂𝝈
*,
• And has a standard deviation equal to )
• 𝑋 has standard deviation 𝑛𝑝𝑞
( )*, )*, *,
• Then ) has standard deviation )
= )-
= )

15 9-15

16 9-16
• Remember that the difference

between sample proportions, Why is it called an error?
referred to as sampling error is
not really an error.
• It’s just the variability you’d
expect to see from one sample to
another.
• A better term might be sampling

variability.
*,
• With the variability: SD(𝑝)̂ =
)

17 9-17
æ pq ö
• The particular Normal model,N çç p, ÷÷ ,
è n ø
• is a sampling distribution model for the sample proportion.
• It won’t work for all situations, but it

works for most situations that you’ll
encounter in practice.
Simulation:
• SD of sample proportions = 0.0126
*, (6.8)(6.:)
• Theoretical SD = = =
) ;666
0.0126
18 9-18
It won’t work for all situations …

How Good Is the Normal Model?
• A model?
• Is a simplified or idealized representation of (a part of)

reality

19 9-19
It won’t work for all situations …

How Good Is the Normal Model?
• Samples of size 1 or 2 just aren’t going to work very well, but

the distributions of proportions of many larger samples have
histograms that are remarkably close to a Normal model.
• And the model becomes a better and better representation

of the distribution of the sample proportions as the sample
size gets bigger.

20 9-20
Most models are useful only when

Assumptions and Conditions specific assumptions are true
I. Independence Assumption:
The sampled values must be independent of each other.
II. Sample Size Assumption:

The sample size, n, must be large enough.

21 9-21
Assumptions and Conditions
A. Randomization Condition:
• If your data come from an experiment, subjects should have
been randomly assigned to treatments.
• If you have a survey, your sample should be a simple random

sample of the population.
• If some other sampling design was used, be sure the

sampling method was not biased and that the data are
representative of the population.

22 9-22
B. 10% Condition:
• If sampling has not been made with replacement,
• then the sample size, n, must be no larger than 10% of the
population.
C. Success/Failure Condition:
• The sample size must be big enough
• so that both the number of “successes,” np, and the number
of “failures,” nq, are expected to be at least 10.

23 9-23
Example:
Information on a packet of seeds claims

that the germination rate is 92%.
Are conditions met to answer the question, “What is the

probability that more than 95% of the 160 seeds in the packet
will germinate?”
To germinate = to sprout,
to bud
“They germinate in the

Quality control! spring, grow foliage, then
produce flowers and finally
seeds.”

24 9-24
Example (ctd.):
Independence:
• It is reasonable to assume the seeds will germinate
independently from each other.
Randomization:
• The sample of seeds can be considered a random sample
from all seeds from this producer.
10% Condition:
• The packet is less than 10% of all seeds manufactured.
Success/Failure Condition:
• np = (0.92×160) = 147.2 > 10;
• nq = (0.08×160) = 12.8 > 10
25 9-25
Example (ctd.):
• Information on a packet of seeds claims that the germination

rate is 92%.
• What is the probability that more than 95% of the 160 seeds in
the packet will germinate?
æ (0.92)(0.08) ö
N çç 0.92, ÷÷ = N ( 0.92, 0.021)
è 160 ø
pˆ - p 0.95 - 0.92
z= = = 1.429
SD( pˆ ) 0.021
P ( z ³ 1.429 ) = 0.0765

26 9-26
Business Case
Foreclosures (guided example, p.304):
• Information on a portfolio of 90 mortgages

• Default rate = 13%
• No more than 15 will default?
• Plan:
• Assumptions & conditions
• Sampling distribution model and parameters
• Report conclusion

27 9-27
9.2 A Confidence Interval for a
Proportion

Example:
• In April 2013, a Gallup Poll found that 1495 out of 3559

respondents thought economic conditions were getting
better
• That is a sample proportion of 𝑝̂ = 1495 / 3559 = 42%
• We’d like use this sample proportion to say something

about what proportion, p, of the entire population think
that economic conditions are getting better.
https://www.gallup.com/home.aspx

29 9-29
• We know that our sampling distribution model is centered at
the true proportion, p
• We know the standard deviation of the sampling

distribution is given by the formula below:
pq
SD ( pˆ ) = , where q = 1 - p
n
• We also know from the Central Limit Theorem that the shape
of the sampling distribution is approximately Normal, when
the sample is large enough

30 9-30
• We do not know 𝑝
*,
• We do not know )
• We can estimate the standard deviation of the sampling

*<,<
distribution by using
)
• An estimate of the standard deviation of a sampling distribution is

called a standard error (SE)
ˆˆ
pq (0.42)(1 - 0.42)
SE ( pˆ ) = = = 0.008
n 3559
31 9-31
• Now, we use these facts to draw our best guess of the
sampling distribution for the true proportion who think
the economy is getting better:
• Because the distribution is Normal, we expect that about 95%

of all samples of 3559 U.S. adults would have had sample
proportions within two SEs of p.
• That is, we are 95% sure that p̂ is within 2×(0.008) of p.

32 9-32
&
• Given the sampling distribution, we are 95% sure that 𝒑
is within 2×(0.008) of p
• But we don’t know 𝒑!
• Key to using sampling distributions: reverse the

perspective and look at the distribution from 𝑝̂ ‘s perspective
• If you are 𝑝,̂ there’s a 95% chance that you are no
more than 2 SE’s away from 𝒑
• I.e., there’s a 95% chance that p is no more than
2SE’s away from you
• So, if you reach out 2SE’s on both sides, you are 95%
sure that p is in your grasp
33 9-33
• The 95% confidence interval for the true proportion of all US
adults who think the economy is improving is given by:
42.0% ± 2 ´ 0.8%
42.0% ± 1.6%
40.4% to 43.6%
34 9-34
What Can We Say about a Proportion?
Here’s what we would like to be able to say, however, we can’t say
most of these things…
1) “42.0% of all U.S. adults thought the economy was
improving.”
However… there is no way to be sure that the population

proportion is the same as the sample proportion.
2) “It is probably true that 42.0% of all U.S. adults thought

the economy was improving.”
However… we can be pretty certain that whatever the true

proportion is, it’s probably not exactly 42.0%.
35 9-35
3) “We don’t know the exact proportion of U.S. adults who

thought the economy was improving but we know it is
between 40.4% and 43.6%.”
However … we can’t know for sure that the true proportion is in

this interval.
4) “We don’t know the exact proportion of U.S. adults who

thought the economy was improving but the interval
from 40.4% to 43.6% probably contains the true
proportion.”
This is close to correct, however … what is meant by probably?

36 9-36
An appropriate interpretation of our confidence interval would be,
“We are 95% confident that between 40.4% and 43.6% of U.S.
adults thought the economy was improving.”
• Statements like this are called confidence intervals.
• The confidence interval calculated and interpreted here is an

example of a one-proportion z-interval.
Be precise in your speech!

37 9-37
Let’s think about this for a second, what we actually do here:
• We take a sample of 3559 respondents

• A statistical approach allows to make the statement:
“We are 95% confident that between 40.4% and 43.6% of all
U.S. adults thought the economy was improving.”

38 9-38
What Does “95% Confidence” Really Mean?
• What does it mean when we say we have 95% confidence

that our interval contains the true proportion?
• Our uncertainty is about whether the particular sample we

have at hand
• is one of the successful ones
• or one of the 5% that fail to produce an interval that
captures the true value.
• We know the sample proportion varies from sample to
sample.
• If other pollsters would have collected samples, their
confidence intervals would have been centered at the
proportions they observed.
39 9-39
Below we see the confidence intervals for 20 simulated samples.
• The purple dots are the

simulated proportions of
adults who thought the
economy was improving.
• The orange segments
show each sample’s
confidence intervals.
• The green line represents
the true proportion of the Note: Not all confidence intervals
entire population. capture the true proportion.

40 9-40
• Though we will never be sure that our confidence interval

contains the true population proportion, the Normal
model assures us that 95% of the theoretically possible
intervals are winners – covering the true population value
– and only 5%, on average, miss the target.
• This is why we are 95% confident that our interval is a

winner, and in writing our interpretations of intervals, we
must be careful to say only this much.
• However, we do have to check whether using the

normal model is appropriate!

41 9-41
Be precise in your use
Independence Assumption: of methods!
• Check the Randomization Condition – the data must be

sampled at random.
• Check the 10% Condition – if less than 10% of the

population was sampled, it is safe to proceed.
Sample Size Assumption:
• Check the Success/Failure Condition – we must have at

least 10 successes and 10 failures in our sample.

42 9-42
9.3 Margin of Error: Certainty vs.
Precision

• The 95% confidence interval for a proporation is expressed as:
pˆ ± 2 SE ( pˆ )
• The extent of that interval on either side of 𝑝̂ is called the
margin of error (ME).
• The general confidence interval can now be expressed in
terms of the ME.
estimate ± ME
• Note: the ME depends on the confidence level, e.g., 95%

44 9-44
• The more confident we want to be, the larger the margin of

error must be.
• We can be 100% confident that any proportion is between
0% and 100%,
• but we wouldn’t be very confident the interval goes from
41.98% to 42.02%.
• Every confidence interval is a balance between certainty and

precision.
• Tension between certainty and precision
• Fortunately, we can usually be both sufficiently certain and
sufficiently precise to make useful statements.

45 9-45
• The choice of confidence level is somewhat arbitrary, but

you must choose that level yourself.
• In practice, the most common confidence levels chosen

are 90%, 95%, and 99%.
• Any percentage may be used

• But using something like 92.9% or 97.2% might be
viewed with suspicion.
• Depending on the motivation!

46 9-46
Critical Values
• To change the confidence level, we’ll need to change the
number of SEs to correspond to the new level.
• For any confidence level the number of SEs we must stretch

& is called the critical value.
out on either side of 𝒑
• Because a critical value is based on the Normal model, we

denote it z*.
(z always used to refer to values of
the standard Normal distribution)
1.96 Instead of 2

47 9-47
Critical Values
• A 90% confidence interval has a critical value of 1.645.

• That is, 90% of the values are within 1.645 standard deviations
from the mean.

48 9-48
Note:
• the size of the population does not directly affect the finding,
• only the size of the sample does

49 9-49
Guided Example
• In March 2013, workers at the greeting card company Edit66,
took their bosses hostage.
• The company chiefs had informed employees who were to be
laid off that they would not receive the severance pay they
were legally entitled to.
• These incidents have been nicknamed “bossnappings,” and
are not uncommon in France.
• A Paris Match poll found 30% of the French “approving” of

such action, and 63% were understanding or sympathetic of
the action. Only 7% condemned the action.
• The Paris Match poll was based on a random representative
sample of 1010 adults.

50 9-50
Guided Example
Research questions:
• What did other French adults think of this practice?
• Where they sympathetic? Understanding? Approving?
Plan:
• Statistical research question:
• What can we conclude about the proportion of all
French adults who sympathize with the practice of
bossnapping?
• Setup:
• One-proportion z-interval allows to calculate a
confidence interval for the true proportion
• We choose confidence level of 95%
• Check:
• Assumptions & conditions!
51 9-51
Guided Example (continued):
Independence assumption:
• Randomization Condition:
The sample was selected randomly.
• 10% Condition:
The sample is certainly less than 10% of the population.
Sample size assumption:

• Success/Failure Condition:
npˆ = (1010)(0.63) = 636 ³ 10
nqˆ = (1010)(0.37) = 374 ³ 10
Conclusion of check:
• The conditions are satisfied so a one-proportion z-interval
using the Normal model is appropriate.
52 9-52
Mechanics:
• Construct the 95% confidence interval.
n = 1010, pˆ = 0.63
(0.63)(0.37)
SE(pˆ ) = = 0.015
1010
For a 95% confidence interval where
the sampling model is Normal, z* = 1.96
ME = z *SE( pˆ ) = 1.96(0.015) = 0.029
0.63 ± 0.029 or (0.601, 0.659)

53 9-53
Report conclusions:
• The polling agency Paris Match surveyed 1010 French adults

and asked whether they approved, were sympathetic to or
disapproved of recent bossnapping actions.
• Although we can’t know the true proportion of French adults

who were sympathetic (without supporting outright), based on
the survey we can be 95% confident that between 60.1% and
65.9% of all French adults were.

54 9-54

• To get a narrower confidence interval without giving up
confidence, we must choose a larger sample.
• Suppose a company wants to offer a new service
• They want to estimate, to within 3%, the proportion of
customers who are likely to purchase this new service
with 95% confidence.
How large a sample do they need?
• To answer this question, we look at the margin of error.

pˆ qˆ pˆ qˆ 𝑝̂ 𝑞<
ME = z * Þ 0.03 = 1.96
n n 𝑛= 𝑧8
• This question can’t be answered because there are two

unknown values, 𝒑 & and n.
56 9-56
• We proceed by guessing the worst case scenario for 𝑝.̂
• We guess 𝑝̂ is 0.50 because this makes the SD (and
therefore n) the largest.
• Alternatively, we may use an initial estimate for 𝑝̂
• We may now compute n:
(0.5)(0.5)
0.03 = 1.96 Þ n = 1067.1
n
• To be safe, always round up!
• We can conclude that the company will need at least 1068

respondents to keep the margin of error as small as 3% with
confidence level 95%.
57 9-57
• Usually a margin of error of 5% or less is acceptable.
• To cut the margin of error in half, you will have to quadruple

the sample size.
• The sample size in a survey is the number of

respondents, not the number of questionnaires sent or
phone numbers dialed, so increasing the sample size can
dramatically increase the cost and time needed to collect the
data.

58 9-58
What Can Go Wrong?

• Don’t confuse the sampling distribution with the
distribution of the sample.
• Distribution for a single outcome in the sample
• Beware of observations that are not independent.
• Watch out for small samples.

60 9-60
• Be sure to use the right language to describe your
confidence intervals.
• Don’t suggest that the parameter varies.
• Don’t claim that other samples will agree with yours.
• Don’t be certain about the parameter.
• Don’t forget: It’s about the parameter.
• Don’t claim to know too much.
• Do take responsibility.
61 9-61
Violations of Assumptions
• Watch out for biased sampling. Don’t forget the sources of

bias in surveys.
• Think about independence. It is tough to check the

assumption that values in a sample are mutually independent,
but it pays to think about it.
• Statistical tests are not only about mechanics!
• Using data for decision making requires
• Technical expertise
• Domain expertise
• Be careful of sample size. The validity of the confidence
interval for proportions may be affected by sample size.
62 9-62
What Have We Learned?

Model the variation in statistics from sample to sample with

a sampling distribution.
• The sampling distribution of the sample proportion is Normal

as long as the sample size is large enough.
Understand that, usually, the mean of a sampling

distribution is the value of the parameter estimated.
• For the sampling distribution of 𝑝,̂ the mean is p.

64 9-64
Interpret the standard deviation of a sampling

distribution.
• The standard deviation of a sampling model is the most

important information about it.
• The standard deviation of the sampling distribution of a

pq
proportion is where q = 1 – p.
n

65 9-65
Construct a confidence interval for a proportion, p, as the

statistic, p̂ , plus and minus a margin of error.
• The margin of error consists of a critical value based on the

sampling model times a standard error based on the sample.
• The critical value is found from the Normal model.

ˆˆ
pq
• The standard error of a sample proportion is calculated as
n

66 9-66
Know and check the assumptions and conditions for finding

and interpreting confidence intervals.
• Independence Assumption or Randomization Condition

• 10% Condition
• Success/Failure Condition
Be able to invert the calculation of the margin of error to find

the sample size required, given a proportion, a confidence
level, and a desired margin of error.

67 9-67
Interpret a confidence interval correctly.
• You can claim to have the specified level of confidence that the
interval you have computed actually covers the true value.
Understand the importance of the sample size, n, in

improving both the certainty (confidence level) and
precision (margin of error).
• For the same sample size and proportion, more certainty

requires less precision and more precision requires less
certainty.

68 9-68
Questions?

Statistics II - Chapter 09 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics II - Chapter 09 PDF

Uploaded by

Copyright:

Available Formats

Copyright © 2014 Pearson Education. All rights reserved.

Copyright © 2014 Pearson Education. All rights reserved. 9-2

Sampling Distributions and

Copyright © 2014 Pearson Education. All rights reserved. 9-3

9.1 The Distribution of Sample Proportions

What can go wrong

Copyright © 2015 Pearson Education. All rights reserved.

Copyright © 2014 Pearson Education. All rights reserved. 9-5

Copyright © 2015 Pearson Education. All rights reserved.

• Response rate credit card customers

• Probability of default in a loan portfolio

Copyright © 2015 Pearson Education. All rights reserved.

Copyright © 2014 Pearson Education. All rights reserved. 9-8

• We usually cannot know the value of the true proportion of

• Therefore, we typically take a sample to estimate the

Copyright © 2015 Pearson Education. All rights reserved.

• When taking a sample to investigate a population

The sample proportion is a random variable!

• To learn more about the variability of the sample

Copyright © 2015 Pearson Education. All rights reserved.

• One way to do that is to simulate lots of samples of the same

• On the next slide is a histogram:

• The distribution of proportions over many independent

Can we model the sampling distribution?

Copyright © 2015 Pearson Education. All rights reserved.

• The sample proportion, 𝑝,̂ is the number of successes, 𝑋,

• If the distribution of 𝑋 is Normal

Can we model the sampling distribution?

• The center of this Normal distribution is at the true proportion 𝑝:

Copyright © 2015 Pearson Education. All rights reserved.

Copyright © 2015 Pearson Education. All rights reserved.

• Remember that the difference

• A better term might be sampling

Copyright © 2015 Pearson Education. All rights reserved.

• It won’t work for all situations, but it

It won’t work for all situations …

• Is a simplified or idealized representation of (a part of)

Copyright © 2015 Pearson Education. All rights reserved.

It won’t work for all situations …

• Samples of size 1 or 2 just aren’t going to work very well, but

• And the model becomes a better and better representation

Copyright © 2015 Pearson Education. All rights reserved.

Most models are useful only when

II. Sample Size Assumption:

Copyright © 2015 Pearson Education. All rights reserved.

Assumptions and Conditions

• If you have a survey, your sample should be a simple random

• If some other sampling design was used, be sure the

Copyright © 2015 Pearson Education. All rights reserved.

Assumptions and Conditions

Copyright © 2015 Pearson Education. All rights reserved.

Information on a packet of seeds claims

Are conditions met to answer the question, “What is the

“They germinate in the

Copyright © 2015 Pearson Education. All rights reserved.

• Information on a packet of seeds claims that the germination

Copyright © 2015 Pearson Education. All rights reserved.

• Information on a portfolio of 90 mortgages

Copyright © 2015 Pearson Education. All rights reserved.

Copyright © 2014 Pearson Education. All rights reserved. 9-28

• In April 2013, a Gallup Poll found that 1495 out of 3559

• We’d like use this sample proportion to say something

Copyright © 2015 Pearson Education. All rights reserved.