You are on page 1of 66

CHAPTER 8

INTERVAL
ESTIMATION
Review

• Population: all items of interest in a statistical problem


• Sample: subset of the population
• Given sample data, use sample statistics to estimate unknown population
parameters
• Two basic methods emerge from inferential statistics
• Estimation
• Hypothesis testing
Point Estimate

• 
• Point estimator: statistic used to estimate a parameter
• Point estimate: a particular value of the estimator
• is the estimator of
• is the estimator of
• The point estimates change from sample to sample
Confidence Interval
• Confidence Interval - provides a range of values that, with a certain level of confidence,
contains the population parameter of interest.
 Also referred to as an interval estimate.
• Construct a confidence interval as:
Point estimate ± Margin of error.

 Margin of error accounts for the variability of the estimator and the desired
confidence level of the interval.

LO 8.1
Confidence Interval for the Population Mean
When σ Is Known
 Consider a standard normal random variable Z.
 as illustrated here.
P  1.96  Z  1.96   0.95

LO 8.2
Confidence Interval for the Population Mean
When σ Is Known
X 
Z
 
Since  forn a normally distributed ,

Þ we get  X  
P  1.96   1.96   0.95
  n 

Þ Rearranging terms, we obtain


P   1.96  n  X    1.96  
n  0.95

LO 8.2
Confidence Interval for the Population Mean
When σ Is Known
 

P   1.96  n  X    1.96  
n  0.95

Þ This implies that there is a 95% probability that the sample mean falls
within the interval .
Þ Thus, if samples of size n are drawn repeatedly from a given population,
95% of the computed sample means, ’s, will fall within the interval and the
remaining 5% will fall outside the interval.

LO 8.2
Confidence Interval for the Population Mean
When σ Is Known
•  Example: A sample of 25 cereal boxes yields a mean weight of 1.02
pounds of cereal per box.
• Construct the 95% confidence interval for the mean weight of all cereal
boxes.
• Assume that the weight is normally distributed with a standard deviation of
0.03 pound.

• With 95% confidence, we can report the mean eight of all cereal
boxes falls between 1.008 and 1.032 pounds.
Confidence Interval for the Population Mean
When σ Is Known
•  Using 95% is common, we can construct an interval of any confidence
from 0 to 100%.
• Let denote the allowed probability that the procedure generates an
interval that does not contain .
• Confidence coefficient: , the probability that the procedure generates an
interval that does contain
• Confidence level:
• Examples
• A 95% confidence level has
• A 90% confidence level has 10
Interpreting a Confidence Interval

   Interpreting a confidence interval requires care.


 Incorrect: The probability that  falls in the interval is 0.95.
 Correct: If numerous samples of size n are drawn from a
given population, then 95% of the intervals formed by the
formula will contain .
• Since there are many possible samples, we will be right
95% of the time, thus giving us 95% confidence.

LO 8.2
Width of a Confidence Interval

   Margin of Error Confidence Interval Width

 The width of the confidence interval is


influenced by the:
I. Standard deviation σ.
II. Sample size n.
III. Confidence level 100(1 − α)%.

LO 8.3
Width of a Confidence Interval

  I. For a given confidence level 100(1 − α)% and sample size n, the greater the
population standard deviation σ, the wider the confidence interval.
 Example: Let the standard deviation of the population of cereal boxes be
0.05 instead of 0.03. Then, the 95% confidence interval:

 The width has increased from 0.024 to 2(0.020) = 0.040.

LO 8.3
Width of a Confidence Interval

  II. For a given confidence level 100(1 − α)% and population standard
deviation σ, the smaller the sample size n, the wider the confidence
interval.
 Example: Instead of 25 observations, let the sample be 16 cereal
boxes of Granola Crunch. Then, the 95% confidence interval:

 The width has increased from 0.024 to 2(0.015) = 0.030.

LO 8.3
Width of a Confidence Interval

  III. For a given sample size n and population standard deviation σ, the
greater the confidence level 100(1 − α)%, the wider the confidence
interval.
 Example: Instead of the 95% confidence interval, compute the 99%
confidence interval:

 The width has increased from 0.024 to 2(0.015) = 0.030.

LO 8.3
Confidence Interval for the Population Mean
When σ Is Unknown
•  Thus far we have assumed the population standard deviation
is known.
• In reality, it is rarely known
• The standard deviation is a function of the mean
• Highly unlikely the standard deviation is known but the mean is
not
• Fairly stable in some instances and can be determined from prior
experience, treat as known
The t Distribution

• If repeated samples of size n are taken from a normal population with a


finite variance, then the
statistic follows the t distribution with (n − 1) degrees of freedom, df.

LO 8.4
Confidence Interval for the Population Mean When σ Is
Unknown
• Example:
  Recall the introductory case.
• and
• Assume that MPG follows a normal distribution
• Construct the 90% confidence interval for the population mean
• For a 90% confidence interval:

• With 90% confidence, the average MPG of all ultra-green cars is between 92.86 MPG and
100.18 MPG
• The manufacturers claim that the ultra-green car will average 100 MPG cannot be rejected
since 100 falls within the interval.
Excel Application
• Example: Compute tα,df for α = 0.025 using 2, 5, and 50 degrees of
freedom.
 Excel: T.INV(cumulprob, df)
• cumulprob is a cumulative probability or the area under the tdf
curve to the left of this value;
• df – degree of freedom.
Confidence Interval

• Constructing a Confidence Interval for m When s is


Unknown
 A 100(1 − α)% confidence interval for the population mean m
when the population standard deviation σ is not known:
or

where s is the sample standard deviation.


 This formula is valid only if (approximately) follows a normal distribution.

LO 8.5
Confidence Interval for the Population
Proportion
•  The parameter p represents the proportion of successes in the population.
• Use the sample proportion as the point estimator of the population
proportion p.
• is approximately normally distributed when and .
• A confidence interval for the population proportion is computed as the
below.

• is given by
Selecting the Required Sample Size
•  Compute confidence intervals by margin of error.
• Large margin of error, interval becomes too wide
• Wide intervals are not always useful
• Precision is a low margin of error.
• Increase the sample size, reduce the margin of error
• Larger sample size improves precision, add time and money
• Before collecting data, first decide on the sample size that is adequate
for what you want to accomplish.
Selecting the Required Sample Size
•   Consider a confidence interval for and let E denote the desired
margin of error

• Rearrange to get
• Minimum sample size to estimate a confidence interval for the mean with
a desired margin of error
• is a reasonable estimate of (previous study; pilot sample standard
deviation)
• Use if known
Selecting the Required Sample Size
•   Example: Recall the introductory case.
• Previously constructed a confidence interval for the mean MPG of all ultra-gree
cars
• Constrain the margin of error to within 2 mpg
• The lowest MPG in the population is 76 and the highest is 118
• How large of a sample is needed to compute the 90% confidence interval of the
population mean?
• Estimate the standard deviation with

• Round up to 75.
Hypothesis Testing
Introductory Case -Are today’s college
students studying hard or hardly studying?
• A recent study asserts that, over the past five decades, the number of hours that
the average college student studies each week has been steadily dropping (The
Boston Globe, July 4, 2010).
• In 1961, students invested 24 hours per week in their academic pursuits, whereas
today’s students study an average of 14 hours per week.
• If the Dean randomly selected 35 students to ask about their average study time
per week. Using these results, she wants to:
1. Determine if the mean study time of students at her university is below the
1961 national average of 24 hours per week.
2. Determine if the mean study time of students at her university differs from
today’s national average of 14 hours per week.
Hypothesis Testing
• Hypothesis tests resolve conflicts between two
competing opinions (hypotheses).
• In a hypothesis test, we define:
 H0, the null hypothesis, the presumed default state of
nature or status quo.
 HA, the alternative hypothesis, a contradiction of the
default state of nature or status quo.
Hypothesis Testing
• we use sample information to make inferences
regarding the unknown population parameters of
interest.
• So, you are making inference about population!
• We conduct hypothesis tests to determine if sample
evidence contradicts H0.
• Null Hypothesis is hypothesis of no difference
Null Hypothesis and alternative hypothesis
• Null hypothesis, H0, states the status quo.
• Alternative hypothesis, HA, states whatever we wish to
establish (i.e., contests the status quo).
• Use the following signs in hypothesis tests:
H0 = ≥ ≤
HA ≠ < >
Hypothesis Testing Result
• Once the test has been carried out, you either “reject Ho” or “do not
reject Ho”
• If you “do not reject Ho” also does not mean Ho is true. You just don’t
have sufficient evidence to reject Ho
• Rejecting Ho suggests H1 may be true
Example
• A trade group predicts that back-to-school spending will average
$606.40 per family this year. A different economic model is needed if
the prediction is wrong. Specify the null and the alternative
hypotheses to determine if a different economic model is needed.
Example 2
• It is generally believed that at least 60% of the residents in a small town in
Texas are happy with their lives. A sociologist wonders whether recent
economic woes have adversely affected the happiness level in this town.
Specify the null and the alternative hypotheses to determine if the
sociologist’s concern is valid.
Construct the null and the alternative
hypotheses for the following claims:
• “I am going to get the majority of the votes to win this election.”
• “I suspect that your 10-inch pizzas are, on average, less than 10 inches
in size.”
• “I will have to fine the company since its tablets do not contain an
average of 250 mg of ibuprofen as advertised.”
One-Tailed vs Two-Tailed Hypothesis
Tests
 Two-Tailed Test
• Reject H0 on either side of the hypothesized value of
the population parameter.
• For example:
H0:  =  0 versus HA:  ≠  0
H0: p = p0 versus HA: p ≠ p0
 The “≠” symbol in HA indicates that both tail areas of
the distribution will be used to make the decision
regarding the rejection of H0.
One-Tailed vs Two-Tailed Hypothesis
Tests
 One-Tailed Test
• Reject H only on one side of the hypothesized value
0
of the population parameter.
• For example:
H0:  ≤  0 versus HA:  >  0 (right-tail test)
H0:  ≥  0 versus HA:  <  0 (left-tail test)
 The inequality in HA determines which tail area will be
used to make the decision regarding the rejection of
H0.
Steps to Formulate Hypotheses
• Identify the relevant population parameter of interest (e.g., p).
• Determine whether it is a one- or a two-tailed test.
• Include some form of the equality sign in H0 and use HA to establish a
claim.
Example
• A trade group predicts that back-to-school spending will
average $606.40 per family this year. A different economic
model is needed if the prediction is wrong.
1. Parameter of interest is  since we are interested in the average
back-to-school spending.
2. Since we want to determine if the population mean differs from
$606.40 (i.e, ≠), it is a two-tail test.
3. H0:  = 606.40
HA:  ≠ 606.40
Example
• A television research analyst wishes to test a claim that more
than 50% of the households will tune in for a TV episode.
Specify the null and the alternative hypotheses to test the
claim.
1. Parameter of interest is p since we are interested in the
proportion of households.
2. Since the analyst wants to determine whether p > 0.50, it is a
one-tail test.
3. H0: p ≤ 0.50
HA: p > 0.50
From previous examples, determine the
parameter of interest and whether the test will
be two tail or one tail
• “I am going to get the majority of the votes to win this election.”
• “I suspect that your 10-inch pizzas are, on average, less than 10 inches
in size.”
• “I will have to fine the company since its tablets do not contain an
average of 250 mg of ibuprofen as advertised.”
Type 1 and Type 2 Errors
 Type I error: Committed when we reject H0 when H0 is
actually true.
 Type II error: Committed when we do not reject H0 when
H0 is actually false.

Remember, type I errors require action or rejection while type II errors


require inaction or failure to reject.
Example
• Consider the following competing hypotheses that
relate to the court of law.
 H0: An accused person is innocent
HA: An accused person is guilty
• Consequences of Type I and Type II errors:
 Type I error: Conclude that the accused is guilty when, in
reality, she is innocent.
 Type II error: Conclude that the accused is innocent when,
in reality, she is guilty.
Intermezzo
Remember the story of a boy who cried wolf?

At first, he cried wolf, people believed him, but there was no wolf
Then, he cried wolf, people did not believe him while there was indeed
a wolf

Which type of errors did the village people make in the two cases?
Exercise
• The screening process for detecting a rare disease is not perfect.
Researchers have developed a blood test that is considered fairly
reliable. It gives a positive reaction in 98% of the people who have that
disease. However, it erroneously gives a positive reaction in 3% of the
people who do not have the disease. Consider the null hypothesis “the
individual does not have the disease” to answer the following questions.
• What is the probability of a Type I error?
• What is the probability of a Type II error?
• What are the consequences of Type I and Type II errors?
• What is wrong with the nurse’s analysis, “The blood test result has proved that
the individual is free of disease”?
Hypothesis Test of the Population
Mean When σ Is Known
• Basic principle: First assume that H0 is true and then
determine if sample evidence contradicts this assumption.
• Two approaches to hypothesis testing:
 The p-value approach.
 The critical value approach.
Test Statistics (σ Is Known )
•   The test statistic for the hypothesis test of the population mean  when
the population standard deviation σ is known:

where 0 is the hypothesized mean value.

• Valid only if (approximately) follows a normal distribution:


1. If the underlying population is normally distributed;
2. If the sample size n is sufficiently large that is, n ≥ 30 (central
limit theorem).
P-Value Approach (σ Is Known )
 p-value: the likelihood of obtaining a sample mean that is at
least as extreme as the one derived from the sample, under the
assumption that the null hypothesis is true as an equality ( = 0).
 The calculation of the p-value depends on the specification of the
alternative hypothesis:

 Decision rule: Reject H0 if p-value < α.


P-Value

• Determining the p-
value depending
on the specification
of the competing
hypotheses.
• Reject H0 if p-value <
α
P-Value
Four Step Procedure Using The p-value
Approach
 Step 1. Specify the null and the alternative
hypotheses.
 Step 2. Specify the significance level α (i.e. the
allowed probability of making a Type I error).
 Step 3. Calculate the value of the test statistic and the
p-value.
 Step 4. State the conclusion and interpret the results.
Example
• A sample of 30 households showed that the sample mean of
back-to-school spending is $622.85. It is believed that back-to-
school spending is normally distributed with a population
standard deviation of $65. An analyst wishes to test if the
average back-to-school spending differ from $606.40 per family
predicted by the trade group at the 5% significance level.
 Step 1. State the hypotheses:
H0:  = 606.40
HA:  ≠ 606.40
 Step 2. The allowed probability of a Type I error is equivalent to the
significance level of the test, which is given as α = 0.05
•    Step 3. The population is normally distributed with a known standard
deviation, σ = 65  the test statistic

 Compute the p-value:


• Since HA:  ≠ 606.40, this is a two-tailed test.
• For a two-tailed test, p-value = 2P(Z ≥ 1.39).
• P(Z ≥ 1.39) ≈ 0.0828.
• p-value = 2 × 0.0828 = 0.1657.
 Step 4. Since 0.1657 > 0.05, we do not reject H0  At the 5%
significance level, we cannot conclude that average back-to-school
spending differs from $606.40.
Confidence Intervals and Two-Tailed
Hypothesis Tests
 Given the significance level α, we can use the sample
data to construct a 100(1 − α)% confidence interval for
the population mean .
 Decision Rule:
 Reject H0 if 0 does not fall within the confidence interval.
 Do not reject H0 if 0 falls within the confidence interval.
Confidence Intervals and Two-Tailed
Hypothesis Tests
• Confidence Intervals and Two-Tailed Hypothesis
Tests
 The general specification for a 100(1 − α)% confidence
interval of the population mean  when the population
standard deviation σ is known is computed as

x  z /2  n or  x  z /2  n , x  z /2  n 

 Decision rule: Reject H0 if 0  x  z /2  n

0  x  z /2  n
or if
Recall the hypothesis test about the average
back-to school spending.

• A trade group predicts that back-to-school spending


will average $606.40 per family this year. A different
economic model is needed if the prediction is wrong.
•  • At 95% confidence level, we find zα∕2 = z0.025 = 1.96.
• Using n = 30, = 622.85, and σ = 65, compute the confidence
interval:

• The confidence interval is [599.59, 646.11].


• Conclusion: Since the hypothesized value of the population
mean μ0 = 606.40 falls within the 95% confidence interval,
we do not reject H0.
Exercise
• A researcher wants to determine if the population mean is greater
than 45. A random sample of 36 observations yields a sample mean of
47. Assume that the population standard deviation is 8.
• Specify the competing hypotheses to test the researcher’s claim.
• Calculate the value of the test statistic.
• Find the p-value.
• At the 5% significance level, what is the conclusion?
Exercise
• According to the Centers for Disease Control and Prevention
(February18,2016), 1 in 3 American adults don’t get enough sleep. A
researcher wants to determine if Americans are sleeping less than the
recommended 7 hours of sleep on weekdays. He takes a random
sample of 150 Americans and computes the average sleep time of 6.7
hours on weekdays. Assume that the population is normally
distributed with a known standard deviation of 2.1 hours. Test the
researcher’s claim at α = 0.01.
Test Statistic for  When σ is Unknown
•   When the population standard deviation σ is unknown,
the test statistic for testing the population mean  is
assumed to follow the tdf distribution with (n − 1) degrees
of freedom (df).
 The value of tdf is computed as .
 This formula is valid only if (approximately) follows a normal
distribution.
Example
• In the introductory case, the dean wonders if students at the university
study less than the 1961 national average of 24 hours per week. She
randomly selects 35 students and asks their average study time per week
(in hours). From their responses, a sample mean is 16.3714 hours and a
sample standard deviation is 7.2155 hours.
 Step 1. State the hypotheses:
H0:  ≥ 24
HA:  < 24
Thus, 0 = 24.
 Step 2. Set the significance level for this test to be 5%, i.e. α=
0.05.
•    Step 3. is normally distributed since n = 35 (central limit theorem) 
the test statistic:

 Compute the p-value:


• Since HA:  < 24, this is a left-tailed test.
• For a left-tailed test, p-value = P(T34 ≤ -6.255):
• P(T34 ≤ -6.255) < 0.005.
• p-value < 0.05.
 Step 4. Since p-value < 0.05, we reject H0  At the 5% significance
level, we conclude that the average study time at the university is less
than 24 hours per week.
Exercise
• A local brewery wishes to ensure that an average of 12 ounces of beer
is used to fill each bottle. In order to analyze the accuracy of the
bottling process, the bottler takes a random sample of 48 bottles. The
sample mean weight and the sample standard deviation of the bottles
are 11.80 ounces and 0.8 ounce, respectively.
• State the null and the alternative hypotheses to test if the accuracy of the
bottling process is compromised.
• Do you need to make any assumption regarding the population for testing?
• Calculate the value of the test statistic and the p-value.
• At α = 0.05, what is the conclusion to the test? Make a recommendation to
the bottler.
Hypothesis Test for the Population Proportion
••   The test statistic for the hypothesis test of the population proportion
p is
p  p0
z
p0  1  p0  n

• where ;
p0 is the hypothesized value of the population proportion.
• This formula is valid only if (approximately) follows a normal
distribution.
• can be approximated by a normal distribution if np ≥ 5 and n(1 − p)
≥ 5.
Example
• A popular weekly magazine asserts that fewer than 40% of
households have changed their lifestyles because of
environmental concerns. A recent survey of 180 households
finds that 67 households have made lifestyle changes due to
environmental concerns.
 Step 1. State the hypotheses:
H0: p ≥ 0.40
HA: p < 0.40
Thus, p0 = 0.40.
 Step 2. Set the significance level for this test to be 5%, i.e. α = 0.05.
•    Step 3. Since both np0 and n(1 − p0) exceed 5, the normal
approximation is justified: np0  67  0.4  26.8  5
n(1  p0 )  67  0.6  40.2  5

 Compute the test statistic using = 67/180 = 0.3722:

 Compute the p-value:


• Since HA: p < 0.40, this is a left-tailed test.
• For a left-tailed test, p-value = P(Z ≤ -0.76) = 0.2234.
 Step 4. Since p-value > 0.05, we do not reject H0  At the 5%
significance level, we cannot conclude that fewer than 40% of
households have changed their lifestyles because of environmental
concerns.

You might also like