Professional Documents
Culture Documents
STAT609 SP23 LCN Unit3-Presentation2
STAT609 SP23 LCN Unit3-Presentation2
STATISTICAL INFERENCE
Chapter 9
Hypothesis Testing
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-1 Introduction
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.1: A New Pizza Style at
Pepperoni Pizza Restaurant
The manager of Pepperoni Pizza Restaurant has recently begun
experimenting with a new method of baking pizzas.
He would like to base the decision whether to switch from the
old method to the new method on customer reactions, so he
performs an experiment.
For 100 randomly selected customers who order a pepperoni
pizza for home delivery, he includes both an old-style and a free
new-style pizza.
He asks the customers to rate the difference between the pizzas
on a -10 to +10 scale, where -10 means that they strongly favor
the old style, +10 means they strongly favor the new style, and 0
means they are indifferent between the two styles.
How might he proceed by using hypothesis testing?
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-2a Null and Alternative Hypotheses
The manager would like to prove that the new method provides
better-tasting pizza, so this becomes the alternative hypothesis.
The opposite, that the old-style pizzas are at least as good as the new-
style pizzas, becomes the null hypothesis.
He judges which of these are true on the basis of the mean rating
over the entire customer population, labeled μ.
If it turns out that μ≤ 0, the null hypothesis is true.
If μ> 0, the alternative hypothesis is true.
Usually, the null hypothesis is labeled H 0, and the alternative
hypothesis is labeled Ha.
In our example, they can be specified as H0:μ≤ 0 and Ha:μ> 0.
The null and alternative hypotheses divide all possibilities into two
nonoverlapping sets, exactly one of which must be true.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-2b One-Tailed versus Two-Tailed Tests
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-2d Significance Level and Rejection
Region
To decide how strong the evidence in favor of the alternative hypothesis must
be to reject the null hypothesis, one approach is to prescribe the probability of
a type I error that you are willing to tolerate.
This type I error probability is usually denoted by α and is most commonly set equal
to 0.05.
The value of α is called the significance level of the test.
The rejection region is the set of sample data that leads to the rejection of the
null hypothesis.
The significance level, α, determines the size of the rejection region.
Sample results in the rejection region are called statistically significant at the α
level.
It is important to understand the effect of varying α:
If α is small, such as 0.01, the probability of a type I error is small, and a lot of
sample evidence in favor of the alternative hypothesis is required before the null
hypothesis can be rejected
When α is larger, such as 0.10, the rejection region is larger, and it is easier to reject
the null hypothesis.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-2e Significance from p-values
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-3 Hypothesis Tests for a Population Mean
P-Value:
The p-value is also called the observed level of significance.
The smallest α level at which can be rejected.
If p-value < α, reject
If p-value α, do not reject
Ha P-value
>
<
≠
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.1: Undergraduate Study Habits
slide (1 of 2)
A recent study asserts that, over the past five decades, the number
of hours that the average college student studies each week has
been steadily dropping (The Boston Globe, July 4, 2010). In 1961,
students invested 24 hours per week in their academic pursuits,
whereas today’s students study an average of 14 hours per week.
The dean randomly selects 35 students and asks their average study
time per week (in hours). From their responses, she calculates a
sample mean of 16.3714 hours and a sample standard deviation of
7.2155 hours. The dean would also like to test if the mean study
time of students at her university differs from today’s national
average of 14 hours per week. At the 5% significance level, what is
the conclusion to this test?
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.1: Undergraduate Study Habits
slide (2 of 2)
Step 1: Specify the null and alternative hypotheses.
hours
hours
Step 2: Specify the significance level; .
Step 3: Calculate the test statistic and the p-value.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.2: A New Pizza Style at
Pepperoni Pizza Restaurant (slide 1 of 5)
Objective: To use a one-sample t test to see whether
consumers prefer the new-style pizza to the old style.
Use 5% significance level (
The file name: Pizza Ratings
Solution: The ratings for the 40 randomly selected
customers are shown on the next slide.
Calculate the test statistic using the borderline null
hypothesis value , and report how much probability is
beyond it in the right tail of the appropriate t distribution.
The right tail is appropriate because the alternative is one-
tailed of the “greater than” variety.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.2: A New Pizza Style at Pepperoni Pizza
Restaurant (slide 3 of 5)
SPSS Steps:
From the menus choose:
Analyze Compare Means One-Sample T Test..
Click in the Test Value box and enter the value that you will compare to. In
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.2: A New Pizza Style at
Pepperoni Pizza Restaurant (slide 4 of 5)
t=2.816, P-value = 0.004.
Since , we reject the null hypothesis.
The t-value indicates the sample mean is slightly more
than 2.8 standard errors to the right of the null value,
which provides a lot of evidence in favor of the
alternative.
The manager of Pepperoni Pizza Restaurant can
conclude that the alternative hypothesis is true—and
presumably switch to the new-style pizza.
If the alternative is still one-tailed but of the “less than”
variety, the analysis remains virtually unchanged.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.2: A New Pizza Style at
Pepperoni Pizza Restaurant (slide 5 of 5)
A two-tailed test would be relevant if the pizza
manager were deciding which of two new-style
pizzas to switch to.
Calculate the same t-statistic, but because of the two-
tailed nature of the test, the previous p-value is
doubled.
The p-value is still less than .05, and the null
hypothesis can be rejected. The manager can conclude
the mean ratings of the two new pizzas are not the
same.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-4 Hypothesis Tests for Other Parameters
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-4a Hypothesis Tests for a Population
Proportion
To test a population proportion p, recall that the sample proportion has a
sampling distribution that is approximately normal when the sample
size is reasonably large.
Specifically, the distribution of the standardized value
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-4a Hypothesis Tests for a Population
Proportion
P-Value:
If p-value < α, reject
If p-value α, do not reject
Ha P-value
>
<
≠
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.3: Customer Complaints at
Walpole Appliance Company (slide 1 of 2)
Objective: To use a test for a proportion to see
whether a new process of responding to complaint
letters results in an acceptably low proportion of
unsatisfied customers.
Solution: The manager’s goal is to reduce the
proportion of unsatisfied customers after 30 days
from 0.15 to 0.075 or less.
With the new process in place, the manager has
tracked 400 letter writers and found that 23 of them
are “unsatisfied” after 30 days.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.3: Customer Complaints at
Walpole Appliance Company (slide 2 of 2)
The hypotheses are .
The p-value is .
Because the p-value of 0.092 is less than , we reject the null
hypothesis. Therefore, at the 10% significance level, the
manager has indeed achieved her goal. The proportion of
unsatisfied customers after 30 days is reduced (less than 0.075).
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-4b Hypothesis Tests for Differences
between Population Means
The comparison problem, where the difference between two
population means is tested, is one of the most important problems
analyzed with statistical methods.
The form of the analysis depends on whether the two samples are
independent or paired.
If the samples are paired, then the test is referred to as the t test for
difference between means from paired samples.
• A common case of dependent sampling is matched pairs.
Samples are paired or matched in some way.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-4b Hypothesis Tests for Differences
between Population Means
Example: measuring the weight of clients before and after a diet
plan
A pairing of observations.
Not the same individual who gets sampled twice
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.4: Measuring the Effects of Soft-
Drink Cans (slide 1 of 4)
Objective: To use paired-sample t tests for differences
between means to see whether consumers rate the
attractiveness, and their likelihood to purchase, higher for a
new-style can than for the traditional-style can. The file
name: Soft-Drink Cans
Solution: Randomly selected customers are asked to rate
each of the following on a scale of 1 to 7:
Attractiveness of the traditional-style can (AO)
Attractiveness of the new-style can (AN)
,
P-value < 0.001
P-value=0.00000013232, so reject H0.
We conclude that there is
overwhelming evidence that
consumers, on average, rate the
attractiveness of the new design higher
than the attractiveness of the current
design.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.4: Measuring the Effects of Soft-
Drink Cans (slide 4 of 4)
SPSS Output:
Test statistic for independent samples test of difference between means when :
The t-value follows a t distribution with degrees of freedom (df) equal to (n1 +
n2 – 2) where the pooled standard deviation sp is given by:
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-4b Hypothesis Tests for Differences
between Population Means
P-value
Ha P-value
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.5: Designing a promotional
Web page to increase online sales (slide 1 of 4)
A marketing team designed a promotional Web page to increase
online sales. Visitors to www.name-of-this-company.com were
randomly directed to the old page or the new page. During this
A/B test, 300 visitors to the site were randomly assigned. The
131 visitors who were directed to the new page spent = $328 on
average (snew = $161); those directed to the old page spent =
$253 on average (sold = $155). (Assume that these samples are
large enough to satisfy the sample size condition with equal
variances).
Does the new page generate statistically significantly higher
sales than the old page? State the null hypothesis and whether
it’s rejected.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.5: Designing a promotional
Web page to increase online sales (slide 2 of 4)
The hypotheses are:
H0: new - old = 0 (i.e. H0: new = old )
Ha: new - old > 0 (i.e. Ha: new > old )
The pooled standard deviation, sp:
√
2 2
❑ (131−1)161 + ( 169−1 ) 155
𝑝 𝑠= =157.646
The t-statistic: 131+169−2
328 − 253
𝑡= =4.09
157.646
1
+
1
1 3 1 169
. The p-value is much less than 0.05, then reject
√[ ]
Conclusion:
We conclude that the new page generates statistically significantly higher
sales than the old page.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.5: Designing a promotional Web
page to increase online sales (slide 3 of 4)
SPSS Steps:
From the menus choose:
Analyze Compare Means Summary Independent-
Samples T Test.
Complete the dialog box as shown.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.5: Designing a promotional Web
page to increase online sales (slide 4 of 4)
SPSS Output:
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.6: Productivity Due to Exercise
at Informatrix (slide 1 of 3)
Objective: To use a two-sample t test for the difference between
means to see whether regular exercise increases worker
productivity.
Solution: Informatrix Software Company installed exercise
equipment on site a year ago and wants to know if it has had an
effect on productivity.
The company gathered data on a sample of 80 randomly chosen
employees: 23 used the exercise facility regularly, 6 exercised
regularly elsewhere, and 51 admitted to being nonexercisers.
The 51 nonexercisers were compared to the 29 exercisers based
on the employees’ productivity over the year, as rated by their
supervisors on a scale of 1 to 25, 25 being the best.
The file name: Exercise & Productivity
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.6: Productivity Due to Exercise
at Informatrix (slide 2 of 3)
SPSS Steps:
From the menus choose:
Analyze Compare Means Independent- Samples T Test.
Select Rating and move it onto the Test Variable(s) field.
Select Exerciser and move it onto the Grouping Variable field.
Click on the <Define Groups> button. In the window displayed,
enter Yes and No for Groups 1 and 2 respectively.
SPSS Output:
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.6: Productivity Due to Exercise
at Informatrix (slide 3 of 3)
The hypotheses are:
where and are the mean ratings for the exerciser and non-exerciser
populations.
First check the equality of variances by testing
H0: 12 = 22 vs. Ha: 12 ≠ 22.
The p-value = 0.129 > 0.05, we do not reject H0. Therefore, we run the 2
samples t-test assuming equal variances.
The independent samples T test shows that the test statistic is 2.387, and the
p-value for a one-tailed test equals 0.009711 which is slightly less than 0.01.
We reject and conclude that the exercisers perform better, in terms of mean
ratings, than non-exercisers.
A 95% confidence interval for this mean difference is all positive; it extends
from 0.452 to 4.988.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
9-5 Tests for Normality
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.6: Distribution of Metal Strip
Widths in Manufacturing (slide 3 of 5)
SPSS Output:
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.6: Distribution of Metal Strip
Widths in Manufacturing (slide 4 of 5)
A popular, but informal, test of normality is the quantile-quantile
(Q-Q) plot.
It is basically a scatterplot of the standardized values from the
data set versus the values that would be expected if the data were
perfectly normally distributed.
SPSS Steps:
From the menus choose:
Analyze Descriptive Statistics Q-Q Plots.
Select Width and move it onto the Variables field.
Select Normal as a test distribution.
The Q-Q plot for the Width data appears in the next slide.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 9.6: Distribution of Metal Strip
Widths in Manufacturing (slide 5 of 5)
Although the points in this Q-Q plot do not all lie exactly on a
45° line, they are about as close to doing so as can be
expected from real data. Therefore, there is no reason to
question the normal hypothesis for these data—the same
conclusion as from the Lilliefors test.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.