STAT 2006 Chapter 3

Hypothesis Testing
STAT 2006 Chapter 3

Hypothesis Testing
Presented by
Simon Cheung
Email: kingchaucheung@cuhk.edu.hk
Department of Statistics, The Chinese University of Hong Kong
STAT 2006 - Jan 2022 1

Introduction
Steps to conduct a hypothesis testing.

• make an assumption about the population parameter(s).
• collect evidence in the form of data.
• based on the evidence to decide whether our initial assumption is rejected or is not
rejected.
STAT 2006 - Jan 2022 2

Hypothesis testing about proportion
Example. A six-sided die is tossed 1000 times, and 200 fours are observed. Is there
evidence to conclude that the die is biased?
1
Let 𝑝𝑝 be the probability of getting a 4. If the die is unbiased, 𝑝𝑝 = . This is our initial
6
1
assumption. It is called the null hypothesis, and we write 𝐻𝐻0 : 𝑝𝑝 = The next step is to
.
6
collect evidence (data). We know that the tie was tossed 𝑛𝑛 = 1000 times, and 𝑦𝑦 = 200
200
fours were observed. Hence, an estimate of 𝑝𝑝 is 𝑝𝑝̂ = = 0.20. To make a judgement
1000
about whether to reject or not to reject the null hypothesis, we note that, by the central
𝑝𝑝 1−𝑝𝑝
�
limit theorem, the large sample distribution of 𝑃𝑃 is 𝑛𝑛 𝑝𝑝, . Based on our initial
𝑛𝑛
1 5
assumption (under 𝐻𝐻0 ), the large sample distribution of 𝑃𝑃� is 𝑛𝑛 , .
6 36000
STAT 2006 - Jan 2022 3

Example. A six-sided die is tossed 1000 times, and 200 fours are observed. Is there
evidence to conclude that the die is biased?
Hence, under 𝐻𝐻0 ,
1
1 0.20 −
𝑃𝑃 𝑃𝑃� ≥ 0.20 𝑝𝑝 = = 𝑃𝑃 𝑍𝑍 ≥ 6 = 𝑃𝑃 𝑍𝑍 ≥ 2.82843 = 0.00234.
6 5
36000
1
With our initial assumption that 𝑝𝑝 = , we only have a 0.23% chance of observing 200
6
fours from a 1000 tosses of the die. Can we say that the evidence provided by the data is
more extreme than we expect?
STAT 2006 - Jan 2022 4

1
Suppose that we decide to reject the null hypothesis 𝐻𝐻0 : 𝑝𝑝 = in favor of the alternative
6
1 1
hypothesis 𝐻𝐻𝐴𝐴 : 𝑝𝑝 > . It makes sense that we reject the null hypothesis when 𝑝𝑝̂ − > 𝐶𝐶,
6 6
1
for some constant 𝐶𝐶 > 0. The subset of the sample space where 𝑝𝑝̂ − > 𝐶𝐶 is called the
6
critical region, or the rejection region of the test and 𝐶𝐶 is called the critical value.
To determine a proper value of 𝐶𝐶, we consider the possible errors the test can commit.
• If we reject 𝐻𝐻0 when 𝐻𝐻0 is true, we commit a type I error. Write 𝛼𝛼 = 𝑃𝑃 reject 𝐻𝐻0 𝐻𝐻0 .
• If we fail to reject 𝐻𝐻0 when 𝐻𝐻0 is false, we commit a type II error. Write 𝛽𝛽 = 1 −
𝑃𝑃 reject 𝐻𝐻0 𝐻𝐻𝐴𝐴 .
STAT 2006 - Jan 2022 5

Example. If 𝐶𝐶 = 0.02, the probability of committing a type I error is

112 1
1 1 112 1 −6
𝑃𝑃 𝑃𝑃� − > 0.02 𝑝𝑝 = = 𝑃𝑃 𝑃𝑃� > 𝑝𝑝 = = 𝑃𝑃 𝑍𝑍 > 600
6 6 600 6 5
36000
= 𝑃𝑃 𝑍𝑍 > 1.6971 = 0.0448.
If 𝑝𝑝 = 0.2, the probability of committing a type II error is
112 1
1 1 112 1 −
𝑃𝑃 𝑃𝑃� − ≤ 0.02 𝑝𝑝 = = 𝑃𝑃 𝑃𝑃� ≤ 𝑝𝑝 = = 𝑃𝑃 𝑍𝑍 ≤ 600 5
6 5 600 5 4
25000
= 𝑃𝑃 𝑍𝑍 ≤ −1.0541 ≈ 0.14592.
STAT 2006 - Jan 2022 6

Example. We are interested in studying the proportion of children living near a lead smelter who
have colic. The prevalence of colic in the general public is estimated to be about 7%. A random
sample of 124 children who lived near a lead smelter is collected and it is found that 23 of them
had colic. Test if the prevalence of colic among children who live near lead smelter is higher than
that of the general public.
The hypothesis is 𝐻𝐻0 : 𝑝𝑝 = 0.07 vs 𝐻𝐻𝐴𝐴 : 𝑝𝑝 > 0.07. The observed sample proportion of children who
23
have colic is 𝑝𝑝̂ = = 0.19. It is reasonable to reject 𝐻𝐻0 if 𝑝𝑝̂ > 0.07 + 𝐶𝐶. To determine 𝐶𝐶, we set
124
the probability of committing a type I error at 5%, that is 𝛼𝛼 = 0.05.
𝐶𝐶 𝐶𝐶
�
0.05 = 𝑃𝑃 𝑃𝑃 > 0.07 + 𝐶𝐶 𝑝𝑝 = 0.07 = 𝑃𝑃 𝑍𝑍 > ⟹ = 1.645 ⟹ 𝐶𝐶
0.07 × 0.93 0.07 × 0.93
124 124
= 0.0377.
Since 𝑝𝑝̂ = 0.19 > 0.07 + 0.0377 = 0.1077, we reject 𝐻𝐻0 at α = 5% significance level.
STAT 2006 - Jan 2022 7

The p-value approach.
The sample proportion of children who have colic, 𝑃𝑃, � is a random variable. The observed value is
𝑝𝑝̂ = 0.19. The p-value of the test 𝐻𝐻0 : 𝑝𝑝 = 0.07 vs 𝐻𝐻𝐴𝐴 : 𝑝𝑝 > 0.07 is given by
0.19 − 0.07
𝑃𝑃 𝑃𝑃� > 0.19 𝑝𝑝 = 0.07 = 𝑃𝑃 𝑍𝑍 > = 𝑃𝑃 𝑍𝑍 > 5.23723 ≈ 0.
0.07 × 0.93
124
This is the probability of observing the given sample under the null hypothesis. When the p-value is
smaller than 𝛼𝛼, the observed sample falls in the critical region and the null hypothesis is rejected at
𝛼𝛼 level of significance.
STAT 2006 - Jan 2022 8

Example. A poll released on October 13, 2010 found that 47% of 1000 adults surveyed classified
themselves as "very happy" when given the choices of (A) "very happy", (B) "fairly happy", and (C)
"not too happy". Suppose that a journalist who is a pessimist took advantage of this poll to write a
headline titled "Poll finds that adults who are very happy are in the minority." Is the pessimistic
journalist's headline warranted?
Answer. Let 𝑝𝑝 be the proportion of adults who are "very happy". The hypothesis is 𝐻𝐻0 : 𝑝𝑝 = 0.5 vs
𝐻𝐻𝐴𝐴 : 𝑝𝑝 < 0.5. It is reasonable to reject 𝐻𝐻0 if 𝑝𝑝̂ − 0.5 < 𝐶𝐶, for some constant 𝐶𝐶. To determine 𝐶𝐶, we
set the probability of committing a type I error at 5%, that is 𝛼𝛼 = 0.05.
𝐶𝐶 𝐶𝐶
0.05 = 𝑃𝑃 𝑝𝑝̂ − 0.5 < 𝐶𝐶 𝑝𝑝 = 0.5 = 𝑃𝑃 𝑍𝑍 < ⟹ = −1.645 ⟹ 𝐶𝐶 = −0.026.
1 1
4000 4000
Since 𝑝𝑝̂ = 0.47 < 0.5 − 0.026 = 0.474, we reject 𝐻𝐻0 at 𝛼𝛼 = 5% significance level.
STAT 2006 - Jan 2022 9


The sample proportion of adults who are "very happy", 𝑃𝑃, � is a random variable. The observed
value is 𝑝𝑝̂ = 0.47. The p-value of the test 𝐻𝐻0 : 𝑝𝑝 = 0.5 vs 𝐻𝐻𝐴𝐴 : 𝑝𝑝 < 0.5 is given by
0.47 − 0.5
𝑃𝑃 𝑃𝑃� < 0.47 𝑝𝑝 = 0.5 = 𝑃𝑃 𝑍𝑍 < = 𝑃𝑃 𝑍𝑍 < −1.8974 ≈ 0.0289.
1
4000
This is the probability of observing the given sample under the null hypothesis. Since the p-value is
less than 0.05, we reject the null hypothesis at 𝛼𝛼 = 5% significance level.
STAT 2006 - Jan 2022 10

Example. A past survey indicated that 20% of adults watch Game of Thrones. Suppose that we are
interested in estimating the proportion of students in CUHK who watch Game of Thrones. A
random sample of 62 students is taken and it turns out that 17 out of 62 students watch Game of
Thrones. Test whether we can conclude that the proportion of CUHK students who watch Game of
Thrones is 20%.
Answer. The hypothesis is 𝐻𝐻0 : 𝑝𝑝 = 0.2 vs 𝐻𝐻𝐴𝐴 : 𝑝𝑝 ≠ 0.2. It is reasonable to reject 𝐻𝐻0 when |𝑝𝑝̂ −
STAT 2006 - Jan 2022 11

The sample proportion of CUHK students who watch Game of Thrones, 𝑃𝑃, � is a random variable.
The observed value is 𝑝𝑝̂ = 0.2742. The p-value of the test 𝐻𝐻0 : 𝑝𝑝 = 0.2 vs 𝐻𝐻𝐴𝐴 : 𝑝𝑝 ≠ 0.2 is given by
𝑃𝑃 𝑃𝑃� − 0.2 > 0.2742 − 0.2 𝑝𝑝 = 0.2
= 𝑃𝑃 𝑃𝑃� − 0.2 > 0.0742 𝑝𝑝 = 0.2 + 𝑃𝑃 𝑃𝑃� − 0.2 < −0.0742 𝑝𝑝 = 0.2
0.0742 0.0742 0.0742
= 𝑃𝑃 𝑍𝑍 > + 𝑃𝑃 𝑍𝑍 < − = 2𝑃𝑃 𝑍𝑍 > = 0.1441
0.2 × 0.8 0.2 × 0.8 0.2 × 0.8
62 62 62
This is the probability of observing the given sample under the null hypothesis. Since the p-value is
above 0.05, we do not have enough evidence to reject the null hypothesis at 𝛼𝛼 = 5% significance
level.
STAT 2006 - Jan 2022 12

Hypothesis testing about proportions
Comparing two proportions.
Suppose that we have two independent random samples with sample proportions 𝑝𝑝̂1 and
𝑝𝑝1 1−𝑝𝑝1
�
𝑝𝑝̂ 2 respectively. The large sample distribution of 𝑃𝑃1 is 𝑛𝑛 𝑝𝑝1 , and that of 𝑃𝑃�2 is
𝑛𝑛1
𝑝𝑝2 1−𝑝𝑝2
𝑛𝑛 𝑝𝑝2 , . Given that 𝑃𝑃�1 and 𝑃𝑃�2 are independent, we have
𝑛𝑛2
𝑝𝑝1 1 − 𝑝𝑝1 𝑝𝑝2 1 − 𝑝𝑝2
𝑃𝑃�1 − 𝑃𝑃�2 ~𝑛𝑛 𝑝𝑝1 − 𝑝𝑝2 , + .
𝑛𝑛1 𝑛𝑛2
If our null hypothesis is 𝐻𝐻0 : 𝑝𝑝1 = 𝑝𝑝2 = 𝑝𝑝, we can combine the two independent random
𝑛𝑛 𝑝𝑝� +𝑛𝑛 𝑝𝑝�
samples to estimate 𝑝𝑝̂ = 1 1 2 2 . In this case, the large sample distribution of 𝑃𝑃�1 − 𝑃𝑃�2 is
𝑛𝑛1 +𝑛𝑛2
1 1 𝑃𝑃�1 −𝑃𝑃�2
𝑛𝑛 0, 𝑝𝑝 1 − 𝑝𝑝 + . Thus, an approximate distribution of 𝑍𝑍 = is
𝑛𝑛1 𝑛𝑛2 1 1
𝑝𝑝� 1−𝑝𝑝� +
𝑛𝑛1 𝑛𝑛2
standard normal.
STAT 2006 - Jan 2022 13

Example. The following table is the result of a telephone poll of 800 adults. The question
posed of the people who were surveyed was "Should the tax on cigarettes be raised to pay
for more health care?".
Non-Smokers Smokers
𝑛𝑛1 = 605 𝑛𝑛2 = 195
𝑝𝑝̂1 = 0.58 𝑝𝑝̂ 2 = 0.21
Is there sufficient evident at 5% significance to conclude that the two populations

(smokers and non-smokers) differ significantly on their opinions?
STAT 2006 - Jan 2022 14

Answer. The hypothesis is 𝐻𝐻0 : 𝑝𝑝1 = 𝑝𝑝2 vs 𝐻𝐻1 : 𝑝𝑝1 ≠ 𝑝𝑝2 . It is reasonable to reject 𝐻𝐻0 when
605×0.58+195×0.21
𝑝𝑝̂1 − 𝑝𝑝̂2 > 𝐶𝐶, for some 𝐶𝐶 > 0. Under 𝐻𝐻0 , 𝑝𝑝̂ = = 0.49. The p-value of the test is
605+195
𝑃𝑃 𝑃𝑃�1 − 𝑃𝑃�2 > 0.58 − 0.21 𝑝𝑝1 = 𝑝𝑝2
= 𝑃𝑃 𝑃𝑃�1 − 𝑃𝑃�2 > 0.37 𝑝𝑝1 = 𝑝𝑝2 + 𝑃𝑃 𝑃𝑃�1 − 𝑃𝑃�2 < −0.37 𝑝𝑝1 = 𝑝𝑝2
0.37 −0.37
= 𝑃𝑃 𝑍𝑍 > + 𝑃𝑃 𝑍𝑍 <
1 1 1 1
0.49 × 0.51 + 0.49 × 0.51 +
605 195 605 195
0.37
= 2𝑃𝑃 𝑍𝑍 > = 2𝑃𝑃 𝑍𝑍 > 8.9882 ≈ 0.
1 1
0.49 × 0.51 +
605 195
Since the p-value is less than 0.05, we reject the null hypothesis at 5% significance level. We can
conclude that the two populations differ with respect to their opinions concerning imposing a
additional tax to help pay for more health care.
STAT 2006 - Jan 2022 15

Hypothesis testing about mean
When population variance is known.
Suppose that we have a random sample of size 𝑛𝑛 with sample mean 𝑥𝑥̅ taken from a
population with an unknown mean 𝜇𝜇 and a known variance 𝜎𝜎 2 . The large sample
𝜎𝜎 2
distribution of 𝑋𝑋� is 𝑛𝑛 𝜇𝜇, . If our null hypothesis is 𝐻𝐻0 : 𝜇𝜇 = 𝜇𝜇0 , then an approximate
𝑛𝑛
� 0
𝑋𝑋−𝜇𝜇
distribution of 𝑍𝑍 = is standard normal. We have three cases.
𝜎𝜎2
𝑛𝑛
• 𝐻𝐻0 : 𝜇𝜇 = 𝜇𝜇0 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇 > 𝜇𝜇0
̅ 0
𝑥𝑥−𝜇𝜇
�
The p-value of the test is 𝑃𝑃 𝑋𝑋 > 𝑥𝑥̅ 𝜇𝜇 = 𝜇𝜇0 = 𝑃𝑃 𝑍𝑍 > 𝑛𝑛 .
𝜎𝜎
• 𝐻𝐻0 : 𝜇𝜇 = 𝜇𝜇0 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇 < 𝜇𝜇0
̅ 0
𝑥𝑥−𝜇𝜇
�
The p-value of the test is 𝑃𝑃 𝑋𝑋 < 𝑥𝑥̅ 𝜇𝜇 = 𝜇𝜇0 = 𝑃𝑃 𝑍𝑍 < 𝑛𝑛 .
𝜎𝜎
• 𝐻𝐻0 : 𝜇𝜇 = 𝜇𝜇0 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇 ≠ 𝜇𝜇0
̅ 0
𝑥𝑥−𝜇𝜇
The p-value of the test is 𝑃𝑃 𝑋𝑋� − 𝜇𝜇0 > 𝑥𝑥̅ − 𝜇𝜇0 𝜇𝜇 = 𝜇𝜇0 = 2𝑃𝑃 𝑍𝑍 > 𝑛𝑛 .
𝜎𝜎
STAT 2006 - Jan 2022 16

Example. Boys of a certain age are known to have a mean weight of 𝜇𝜇 = 85 pounds. A
complaint is made that the boys living in a municipal children's home are underfed. As one
bit of evidence, 𝑛𝑛 = 25 boys (of the same age) are weighed and found to have a mean
weight of 𝑥𝑥̅ = 80.94 pounds. It is known that the population standard deviation 𝜎𝜎 is 11.6
pounds (the unrealistic part of this example!). Based on the available data, what should be
concluded concerning the complaint?
Answer. The hypothesis is 𝐻𝐻0 : 𝜇𝜇 = 85 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇 < 85. The p-value of the test is
80.94 − 85
𝑃𝑃 𝑋𝑋� < 80.94 𝜇𝜇 = 𝜇𝜇0 = 𝑃𝑃 𝑍𝑍 < 25 = 𝑃𝑃 𝑍𝑍 < −1.75 = 0.0401.
11.6
Since the p-value is less than 0.05, we reject the null hypothesis at 5% significance level.
We can conclude that, with 5% significance, the boys living in a municipal children’s home
are underfed.
STAT 2006 - Jan 2022 17

When population variance is unknown.
Suppose that we have a random sample of size 𝑛𝑛 with sample mean 𝑥𝑥̅ taken from a
population with an unknown mean 𝜇𝜇 and an unknown variance 𝜎𝜎 2 . The large sample
�
𝑋𝑋−𝜇𝜇
distribution of is 𝑡𝑡𝑛𝑛−1 . If our null hypothesis is 𝐻𝐻0 : 𝜇𝜇 = 𝜇𝜇0 , then an approximate
𝑆𝑆/ 𝑛𝑛
� 0
𝑋𝑋−𝜇𝜇
distribution of 𝑇𝑇 = is 𝑡𝑡𝑛𝑛−1 . We have three cases.
𝑆𝑆/ 𝑛𝑛
• 𝐻𝐻0 : 𝜇𝜇 = 𝜇𝜇0 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇 > 𝜇𝜇0
̅ 0
𝑥𝑥−𝜇𝜇
The p-value of the test is 𝑃𝑃 𝑋𝑋� > 𝑥𝑥̅ 𝜇𝜇 = 𝜇𝜇0 = 𝑃𝑃 𝑇𝑇 > 𝑛𝑛 .
𝑠𝑠
• 𝐻𝐻0 : 𝜇𝜇 = 𝜇𝜇0 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇 < 𝜇𝜇0
̅ 0
𝑥𝑥−𝜇𝜇
The p-value of the test is 𝑃𝑃 𝑋𝑋� < 𝑥𝑥̅ 𝜇𝜇 = 𝜇𝜇0 = 𝑃𝑃 𝑇𝑇 < 𝑛𝑛 .
𝑠𝑠
• 𝐻𝐻0 : 𝜇𝜇 = 𝜇𝜇0 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇 ≠ 𝜇𝜇0
̅ 0
𝑥𝑥−𝜇𝜇
The p-value of the test is 𝑃𝑃 𝑋𝑋� − 𝜇𝜇0 > 𝑥𝑥̅ − 𝜇𝜇0 𝜇𝜇 = 𝜇𝜇0 = 2𝑃𝑃 𝑇𝑇 > 𝑛𝑛 .
𝑠𝑠
STAT 2006 - Jan 2022 18

Example. It is assumed that the mean systolic blood pressure is 𝜇𝜇 = 120 mm Hg. In the
Honolulu Heart Study, a sample of 𝑛𝑛 = 100 people had an average systolic blood pressure
of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. Is the group significantly
different (with respect to systolic blood pressure!) from the regular population?
Answer. The hypothesis is 𝐻𝐻0 : 𝜇𝜇 = 120 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇 ≠ 120. The p-value of the test is
130.1 − 120
𝑃𝑃 𝑋𝑋� > 130.1 𝜇𝜇 = 120 = 2𝑃𝑃 𝑇𝑇 > 100 = 2𝑃𝑃 𝑇𝑇 > 4.762 ≈ 0.
21.21
Since the p-value is less than 0.05, we reject the null hypothesis at 5% significance. We
conclude that the mean systolic blood pressure of the people living in Honolulu is different
from 120 mm Hg.
STAT 2006 - Jan 2022 19

Hypothesis testing about means
Paired T-test.
Suppose that we have a random sample of subjects of size 𝑛𝑛. For the 𝑖𝑖𝑡𝑡𝑡 subject in the
sample, we take observation 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 . We are interested in the null hypothesis 𝐻𝐻0 : 𝜇𝜇𝑋𝑋 =
𝜇𝜇𝑌𝑌 . For 𝑖𝑖 = 1,2, … , 𝑛𝑛, define 𝐷𝐷𝑖𝑖 = 𝑋𝑋𝑖𝑖 − 𝑌𝑌𝑖𝑖 . The null hypothesis becomes 𝐻𝐻0 : 𝜇𝜇𝐷𝐷 = 0.
� is 𝑛𝑛 𝜇𝜇𝐷𝐷 , 𝜎𝜎 2 . Since 𝜎𝜎 2 is
By the central limit theorem, the large sample distribution of 𝐷𝐷
� 𝐷𝐷
𝐷𝐷−𝜇𝜇
unknown, the large sample distribution of 𝑇𝑇 = is 𝑡𝑡𝑛𝑛−1 . If our null hypothesis is
𝑆𝑆𝐷𝐷 / 𝑛𝑛
�
𝐷𝐷
𝐻𝐻0 : 𝜇𝜇𝐷𝐷 = 0, then an approximate distribution of 𝑇𝑇 = is 𝑡𝑡𝑛𝑛−1 .
𝑆𝑆𝐷𝐷 / 𝑛𝑛
STAT 2006 - Jan 2022 20

Paired T-test.
We have three cases.
• 𝐻𝐻0 : 𝜇𝜇𝐷𝐷 = 0 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝐷𝐷 > 0
𝑑𝑑�
� > 𝑑𝑑̅ 𝜇𝜇𝐷𝐷 = 0 = 𝑃𝑃 𝑇𝑇 > 𝑛𝑛
The p-value of the test is 𝑃𝑃 𝐷𝐷 .
𝑠𝑠𝐷𝐷
• 𝐻𝐻0 : 𝜇𝜇𝐷𝐷 = 0 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝐷𝐷 < 0
𝑑𝑑�
� < 𝑑𝑑̅ 𝜇𝜇𝐷𝐷 = 0 = 𝑃𝑃 𝑇𝑇 < 𝑛𝑛
𝑠𝑠𝐷𝐷
• 𝐻𝐻0 : 𝜇𝜇𝐷𝐷 = 0 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝐷𝐷 ≠ 0
𝑑𝑑�
� > 𝑑𝑑̅ 𝜇𝜇𝐷𝐷 = 0 = 2𝑃𝑃 𝑇𝑇 > 𝑛𝑛
𝑠𝑠𝐷𝐷
STAT 2006 - Jan 2022 21

Example. Blood samples from 𝑛𝑛 = 10 people were sent to each of two laboratories (Lab 1
and Lab 2) for cholesterol determinations. The resulting data are summarized in the
following table.
Subject Lab1 Lab2 Diff
1 254 248 6
2 277 268 9
3 311 292 19
4 261 246 15
5 257 242 15
6 289 275 14
7 246 227 19
8 279 271 8
9 282 269 13
10 247 237 10
STAT 2006 - Jan 2022 22

Example. Is there a statistically significant difference at the 𝛼𝛼 = 0.01 level, say, in the
(population) mean cholesterol levels reported by Lab 1 and Lab 2?
Answer. The hypothesis is 𝐻𝐻0 : 𝜇𝜇𝐷𝐷 = 0 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝐷𝐷 ≠ 0. Note that 𝑑𝑑̅ = 12.8 and 𝑠𝑠𝐷𝐷 = 4.238.
The p-value of the test is
𝑑𝑑̅
2𝑃𝑃 𝑇𝑇 > 10 ≈ 0.
𝑠𝑠𝐷𝐷
Since the p-value is less than 0.01, we reject the null hypothesis at 𝛼𝛼 = 0.01 significance
level. We conclude that there are differences between Lab1 and Lab2.
STAT 2006 - Jan 2022 23

Testing for the equality of two means – equal variances
Suppose we have two normal populations. The mean for population 1 is 𝜇𝜇𝑋𝑋 and that for
population 2 is 𝜇𝜇𝑌𝑌 . Assume that the population variance of the two populations are equal,
that is 𝜎𝜎𝑋𝑋2 = 𝜎𝜎𝑌𝑌2 = 𝜎𝜎 2 . Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a random sample of size 𝑛𝑛 from population 1
and 𝑌𝑌1 , 𝑌𝑌2 , … , 𝑌𝑌𝑚𝑚 be a random sample of size 𝑚𝑚 from population 2. Assume that the two
random samples are independent. From Chapter 2, we learn that distribution of the
statistic
𝑋𝑋� − 𝑌𝑌� − 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌
𝑇𝑇 =
1 1
𝑆𝑆𝑝𝑝 +
𝑛𝑛 𝑚𝑚
is 𝑡𝑡𝑛𝑛+𝑚𝑚−2 , where
2 2
𝑛𝑛 − 1 𝑆𝑆𝑋𝑋 + 𝑚𝑚 − 1 𝑆𝑆𝑌𝑌
𝑆𝑆𝑝𝑝2 = .
𝑛𝑛 + 𝑚𝑚 − 2
STAT 2006 - Jan 2022 24
� 𝑌𝑌�
𝑋𝑋−
If the null hypothesis is 𝐻𝐻0 : 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 , the distribution of 𝑇𝑇 = is 𝑡𝑡𝑛𝑛+𝑚𝑚−2 . We have
1 1
𝑆𝑆𝑝𝑝 +
𝑛𝑛 𝑚𝑚
three cases.
• 𝐻𝐻0 : 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑋𝑋 > 𝜇𝜇𝑌𝑌
̅ 𝑦𝑦�
𝑥𝑥−
The p-value of the test is 𝑃𝑃 𝑇𝑇 > .
1 1
𝑠𝑠𝑝𝑝 𝑛𝑛+𝑚𝑚
• 𝐻𝐻0 : 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑋𝑋 < 𝜇𝜇𝑌𝑌

̅ 𝑦𝑦�
𝑥𝑥−
The p-value of the test is 𝑃𝑃 𝑇𝑇 < .
1 1
• 𝐻𝐻0 : 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑋𝑋 ≠ 𝜇𝜇𝑌𝑌

̅ 𝑦𝑦�
𝑥𝑥−
The p-value of the test is 2𝑃𝑃 𝑇𝑇 > .
1 1
STAT 2006 - Jan 2022 25


Example. Is the mean fastest speed driven by men different than the mean fastest speed
driven by women? A convey of a random 𝑛𝑛 = 34 men and a random 𝑚𝑚 = 29 women is
conducted.
Males 𝑋𝑋 Females 𝑌𝑌
𝑛𝑛 = 34 𝑚𝑚 = 29
𝑥𝑥̅ = 105.5 𝑦𝑦� = 90.9
𝑠𝑠𝑋𝑋 = 20.1 𝑠𝑠𝑌𝑌 = 12.2
Is there sufficient evidence at 𝛼𝛼 = 0.05 significance level to conclude that the mean
fastest speed driven by men differs from that driven by women?
STAT 2006 - Jan 2022 26

Answer. Assume that the variance of men fastest driving speeds is the same as that of
women. The hypothesis is 𝐻𝐻0 : 𝜇𝜇𝑀𝑀 = 𝜇𝜇𝐹𝐹 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑀𝑀 ≠ 𝜇𝜇𝐹𝐹 . The p-value of the test is
𝑥𝑥̅ − 𝑦𝑦�
2𝑃𝑃 𝑇𝑇 > = 2𝑃𝑃 𝑇𝑇 > 3.41 = 0.0012.
1 1
𝑠𝑠𝑝𝑝 +
𝑛𝑛 𝑚𝑚
level. We conclude that there are differences between the mean fastest driving speed by
men and that by women.
STAT 2006 - Jan 2022 27

Testing for the equality of two means – unequal variances

If the population variances are not equal, that is 𝜎𝜎𝑋𝑋2 ≠ 𝜎𝜎𝑌𝑌2 , the statistic
𝑋𝑋� − 𝑌𝑌� − 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌
𝑇𝑇 =
𝑆𝑆𝑋𝑋2 𝑆𝑆𝑌𝑌2
+
𝑛𝑛 𝑚𝑚
has an approximate 𝑡𝑡𝑟𝑟 distribution, where 𝑟𝑟 is the integral part of
2 2
𝑆𝑆𝑋𝑋2 𝑆𝑆𝑌𝑌
+
𝑛𝑛 𝑚𝑚
.
2 2 2
𝑆𝑆𝑋𝑋 𝑆𝑆𝑌𝑌2
𝑛𝑛 𝑚𝑚
+
𝑛𝑛 − 1 𝑚𝑚 − 1
STAT 2006 - Jan 2022 28

� 𝑌𝑌�
𝑋𝑋−
If the null hypothesis is 𝐻𝐻0 : 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 , the distribution of 𝑇𝑇 = is 𝑡𝑡𝑟𝑟 . We have three
𝑆𝑆2 2
𝑋𝑋 +𝑆𝑆𝑌𝑌
𝑛𝑛 𝑚𝑚
cases.
• 𝐻𝐻0 : 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑋𝑋 > 𝜇𝜇𝑌𝑌
̅ 𝑦𝑦�
𝑥𝑥−
The p-value of the test is 𝑃𝑃 𝑇𝑇 > .
𝑠𝑠2 2
𝑋𝑋 +𝑠𝑠𝑌𝑌
𝑛𝑛 𝑚𝑚
• 𝐻𝐻0 : 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑋𝑋 < 𝜇𝜇𝑌𝑌
̅ 𝑦𝑦�
𝑥𝑥−
𝑠𝑠2 2
𝑛𝑛 𝑚𝑚
• 𝐻𝐻0 : 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑋𝑋 ≠ 𝜇𝜇𝑌𝑌
̅ 𝑦𝑦�
𝑥𝑥−
The p-value of the test is 2𝑃𝑃 𝑇𝑇 > .
𝑠𝑠2 2
𝑛𝑛 𝑚𝑚
STAT 2006 - Jan 2022 29


Example. Is the mean fastest speed driven by men different than the mean fastest speed
driven by women? A convey of a random 𝑛𝑛 = 34 men and a random 𝑚𝑚 = 29 women is
conducted.
Males 𝑋𝑋 Females 𝑌𝑌
𝑛𝑛 = 34 𝑚𝑚 = 29
𝑥𝑥̅ = 105.5 𝑦𝑦� = 90.9
𝑠𝑠𝑋𝑋 = 20.1 𝑠𝑠𝑌𝑌 = 12.2
Is there sufficient evidence at 𝛼𝛼 = 0.05 significance level to conclude that the mean
fastest speed driven by men differs from that driven by women?
STAT 2006 - Jan 2022 30

Answer. Assume that the variance of men fastest driving speeds is not the same as that of
women. The hypothesis is 𝐻𝐻0 : 𝜇𝜇𝑀𝑀 = 𝜇𝜇𝐹𝐹 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑀𝑀 ≠ 𝜇𝜇𝐹𝐹 . The statistic 𝑇𝑇 has an
approximate 𝑡𝑡55 distribution. The p-value of the test is
𝑥𝑥̅ − 𝑦𝑦�
2𝑃𝑃 𝑇𝑇 > = 2𝑃𝑃 𝑇𝑇 > 3.54 = 0.00082.
𝑠𝑠𝑋𝑋2 𝑠𝑠𝑌𝑌2
+
𝑛𝑛 𝑚𝑚
level. We conclude that there are differences between the mean fastest driving speed by
men and that by women.
STAT 2006 - Jan 2022 31

Testing for the variance
Testing for single variance
Given a random sample 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 of size 𝑛𝑛 from a normal population 𝑛𝑛 𝜇𝜇, 𝜎𝜎 2 with
𝑛𝑛−1 𝑆𝑆 2 2
unknown mean 𝜇𝜇 and variance 𝜎𝜎 2 . The distribution of 𝜒𝜒 = 2
is 𝜒𝜒𝑛𝑛−1 . If our null
𝜎𝜎 2
2 𝑛𝑛−1 𝑆𝑆 2 2
hypothesis is 𝐻𝐻0 : 𝜎𝜎 = 𝜎𝜎0 , then the distribution of 𝜒𝜒 = is 𝜒𝜒𝑛𝑛−1 . We have three
𝜎𝜎 2 0
cases.
• 𝐻𝐻0 : 𝜎𝜎 = 𝜎𝜎0 vs 𝐻𝐻𝐴𝐴 : 𝜎𝜎 > 𝜎𝜎0
𝑛𝑛−1 𝑠𝑠 2
The p-value of the test is 𝑃𝑃 𝜒𝜒 2 > .
𝜎𝜎02
• 𝐻𝐻0 : 𝜎𝜎 = 𝜎𝜎0 vs 𝐻𝐻𝐴𝐴 : 𝜎𝜎 < 𝜎𝜎0
𝑛𝑛−1 𝑠𝑠 2
𝜎𝜎02
• 𝐻𝐻0 : 𝜎𝜎 = 𝜎𝜎0 vs 𝐻𝐻𝐴𝐴 : 𝜎𝜎 ≠ 𝜎𝜎0
𝑛𝑛−1 𝑠𝑠 2 𝑛𝑛−1 𝑠𝑠 2
The p-value of the test is 2 min 𝑃𝑃 𝑇𝑇 > , 𝑃𝑃 𝑇𝑇 < .
𝜎𝜎02 𝜎𝜎02
STAT 2006 - Jan 2021 32


Example. A manufacturer of hard safety hats for construction workers is concerned about
the mean and the variation of the forces its helmets transmits to wearers when subjected
to an external force. The manufacturer has designed the helmets so that the mean force
transmitted by the helmets to the workers is 800 pounds (or less) with a standard deviation
to be less than 40 pounds. Tests were run on a random sample of 𝑛𝑛 = 40 helmets, and the
sample mean and sample standard deviation were found to be 825 pounds and 48.5
pounds, respectively.
Do the data provide sufficient evidence, at the 𝛼𝛼 = 0.05 level, to conclude that the
population standard deviation exceeds 40 pounds?
STAT 2006 - Jan 2021 33


Answer. The hypothesis is 𝐻𝐻0 : 𝜎𝜎 = 40 vs 𝐻𝐻𝐴𝐴 : 𝜎𝜎 > 40. The p-value of the test is
𝑛𝑛 − 1 𝑠𝑠 2 40 − 1 × 48.5 2
𝑃𝑃 𝜒𝜒 2 > = 𝑃𝑃 𝜒𝜒 2 > = 𝑃𝑃 𝜒𝜒 2 > 57.3361 = 0.02928.
𝜎𝜎02 402
level. We can conclude that the population standard deviation exceeds 40 pounds.
STAT 2006 - Jan 2021 34


Do the data provide sufficient evidence, at the 𝛼𝛼 = 0.05 level, to conclude that the
population standard deviation differs from 40 pounds?
Answer. Assume that the variance of men fastest driving speeds is not the same as that of
women. The hypothesis is 𝐻𝐻0 : 𝜎𝜎 = 40 vs 𝐻𝐻𝐴𝐴 : 𝜎𝜎 ≠ 40. The p-value of the test is
𝑛𝑛 − 1 𝑠𝑠 2 𝑛𝑛 − 1 𝑠𝑠 2
2 min 𝑃𝑃 𝑇𝑇 > 2 , 𝑃𝑃 𝑇𝑇 < = 2𝑃𝑃 𝜒𝜒 2 > 57.3361 = 0.0586.
𝜎𝜎0 𝜎𝜎02
Since the p-value is larger than 0.05, we do not have enough evidence to reject the null
hypothesis.
STAT 2006 - Jan 2021 35

Testing for the equality of two variances
Suppose 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 is a random sample of size 𝑛𝑛 from a normal population 𝑛𝑛 𝜇𝜇𝑋𝑋 , 𝜎𝜎𝑋𝑋2 , and suppose,
independent of the first sample, 𝑌𝑌1 , … , 𝑌𝑌𝑚𝑚 is another random sample of size 𝑚𝑚 from a normal
2
2 2 𝑛𝑛−1 𝑆𝑆𝑋𝑋 2 𝑚𝑚−1 𝑆𝑆𝑌𝑌2 2 2
population 𝑛𝑛 𝜇𝜇𝑌𝑌 , 𝜎𝜎𝑌𝑌 . The distribution of 𝜒𝜒𝑋𝑋 = 2 and 𝜒𝜒𝑌𝑌 = are 𝜒𝜒𝑛𝑛−1 and 𝜒𝜒𝑚𝑚−1
𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌2
respectively, and 𝜒𝜒𝑋𝑋2 and 𝜒𝜒𝑌𝑌2 are independent. Therefore, the statistic
𝑛𝑛 − 1 𝑆𝑆𝑋𝑋2
� 𝑛𝑛 − 1
𝜎𝜎𝑋𝑋2 𝑆𝑆𝑋𝑋2 𝜎𝜎𝑌𝑌2
𝐹𝐹 = = 2� 2
𝑚𝑚 − 1 𝑆𝑆𝑌𝑌2 𝑆𝑆𝑌𝑌 𝜎𝜎𝑋𝑋
2 � 𝑚𝑚 − 1
𝜎𝜎𝑌𝑌
has a F-distribution with 𝑛𝑛 − 1 and 𝑚𝑚 − 1 degrees of freedom, denoted as 𝐹𝐹 𝑛𝑛 − 1, 𝑚𝑚 − 1 . If our
2
𝑆𝑆𝑋𝑋
null hypothesis is 𝐻𝐻0 : 𝜎𝜎𝑋𝑋 = 𝜎𝜎𝑌𝑌 , the distribution of 𝐹𝐹 = is 𝐹𝐹 𝑛𝑛 − 1, 𝑚𝑚 − 1 .
𝑆𝑆𝑌𝑌2
STAT 2006 - Jan 2021 36


We have three cases.
• 𝐻𝐻0 : 𝜎𝜎𝑋𝑋 = 𝜎𝜎𝑌𝑌 vs 𝐻𝐻𝐴𝐴 : 𝜎𝜎𝑋𝑋 > 𝜎𝜎𝑌𝑌
2
𝑠𝑠𝑋𝑋
The p-value of the test is 𝑃𝑃 𝐹𝐹 > 2 .
𝑠𝑠𝑌𝑌
• 𝐻𝐻0 : 𝜎𝜎𝑋𝑋 = 𝜎𝜎𝑌𝑌 vs 𝐻𝐻𝐴𝐴 : 𝜎𝜎𝑋𝑋 < 𝜎𝜎𝑌𝑌
2
𝑠𝑠𝑋𝑋
The p-value of the test is 𝑃𝑃 𝐹𝐹 < 2 .
𝑠𝑠𝑌𝑌
• 𝐻𝐻0 : 𝜎𝜎𝑋𝑋 = 𝜎𝜎𝑌𝑌 vs 𝐻𝐻𝐴𝐴 : 𝜎𝜎𝑋𝑋 ≠ 𝜎𝜎𝑌𝑌
2 2
𝑠𝑠𝑋𝑋 𝑠𝑠𝑋𝑋
The p-value of the test is 2 min 𝑃𝑃 𝐹𝐹 > 2 , 𝑃𝑃 𝐹𝐹 < 2 .
𝑠𝑠𝑌𝑌 𝑠𝑠𝑌𝑌
STAT 2006 - Jan 2021 37

Is there sufficient evidence at the level to conclude that the variance of the fastest speed
driven by men differs from the variance of the fastest speed driven by women?
Answer. The hypothesis is 𝐻𝐻0 : 𝜎𝜎𝑀𝑀 = 𝜎𝜎𝐹𝐹 vs 𝐻𝐻𝐴𝐴 : 𝜎𝜎𝑀𝑀 ≠ 𝜎𝜎𝐹𝐹 . The p-value of the test is
𝑠𝑠𝑋𝑋2 𝑠𝑠𝑋𝑋2
2 min 𝑃𝑃 𝐹𝐹 > 2 , 𝑃𝑃 𝐹𝐹 < 2 = 2𝑃𝑃 𝐹𝐹𝑛𝑛−1,𝑚𝑚−1 > 2.7144 = 0.00857.
𝑠𝑠𝑌𝑌 𝑠𝑠𝑌𝑌
Since the p-value is smaller than 0.05, we reject the null hypothesis at 5% significance level.
At 5% significance level, we conclude that the two population variances are not equal. It is
also worthwhile to note that
𝑠𝑠𝑌𝑌2 𝑠𝑠𝑌𝑌2
2 min 𝑃𝑃 𝐹𝐹 > 2 , 𝑃𝑃 𝐹𝐹 < 2 = 2𝑃𝑃 𝐹𝐹𝑚𝑚−1,𝑛𝑛−1 < 0.3684 = 0.00857.
𝑠𝑠𝑋𝑋 𝑠𝑠𝑋𝑋
STAT 2006 - Jan 2021 38

Testing for the homogeneity of population means
Suppose that there are 𝑘𝑘 > 2 populations. A random sample is drawn from each of the
population. Denote these random samples as
𝑋𝑋1,1 , 𝑋𝑋1,2 , … , 𝑋𝑋1,𝑛𝑛1 ~𝑛𝑛 𝜇𝜇1 , 𝜎𝜎 2
𝑋𝑋2,1 , 𝑋𝑋2,2 , … , 𝑋𝑋2,𝑛𝑛2 ~𝑛𝑛 𝜇𝜇2 , 𝜎𝜎 2
⋮
𝑋𝑋𝑘𝑘,1 , 𝑋𝑋𝑘𝑘,2 , … , 𝑋𝑋𝑘𝑘,𝑛𝑛𝑘𝑘 ~𝑛𝑛 𝜇𝜇𝑘𝑘 , 𝜎𝜎 2
Assume that
1. All population variances are equal, that is 𝜎𝜎12 = 𝜎𝜎22 = ⋯ = 𝜎𝜎𝑘𝑘2
2. Each of the population has a normal distribution
3. All random samples are independent
STAT 2006 - Jan 2021 39

𝑛𝑛𝑖𝑖
𝑛𝑛𝑖𝑖 −1 𝑆𝑆𝑖𝑖2 1 2
For the 𝑖𝑖𝑡𝑡𝑡 population, ~𝜒𝜒𝑛𝑛2𝑖𝑖 −1 , where 𝑆𝑆𝑖𝑖2 = � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋�𝑖𝑖 and 𝑋𝑋�𝑖𝑖 =
𝜎𝜎 2 𝑛𝑛𝑖𝑖 −1 𝑗𝑗=1
𝑛𝑛𝑖𝑖
1
� 𝑋𝑋𝑖𝑖𝑖𝑖 . Since addition of independent chi-squared distributed random variables is
𝑛𝑛𝑖𝑖 𝑗𝑗=1
still a chi-squared distributed random variable, we have
𝑘𝑘 𝑛𝑛𝑖𝑖 𝑘𝑘
1 2 1
2
� � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋�𝑖𝑖 = 2 � 𝑛𝑛𝑗𝑗 − 1 𝑆𝑆𝑗𝑗2 ~𝜒𝜒𝑛𝑛−𝑘𝑘
2
𝜎𝜎 𝜎𝜎
𝑖𝑖=1 𝑗𝑗=1 𝑗𝑗=1
𝑘𝑘
where 𝑛𝑛 = � 𝑛𝑛𝑗𝑗 .
𝑗𝑗=1
STAT 2006 - Jan 2021 40

The hypothesis is 𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = ⋯ = 𝜇𝜇𝑘𝑘 = 𝜇𝜇 vs 𝐻𝐻1 : 𝐻𝐻0 is not true.
Under 𝐻𝐻0 , the 𝑘𝑘 random samples can be pooled together to form a large random sample.
In this case, we have
𝑛𝑛 − 1 𝑆𝑆 2 2
~𝜒𝜒𝑛𝑛−1
𝜎𝜎 2
where
𝑘𝑘 𝑛𝑛𝑖𝑖
1 2
𝑆𝑆 2 = � � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋�
𝑛𝑛 − 1
𝑖𝑖=1 𝑗𝑗=1
and
𝑘𝑘
1 𝑛𝑛𝑖𝑖
𝑋𝑋� = � � 𝑋𝑋𝑖𝑖𝑖𝑖 .
𝑛𝑛 𝑗𝑗=1
𝑖𝑖=1
STAT 2006 - Jan 2021 41

The within-group sum of squares SSE

𝑘𝑘 𝑛𝑛𝑖𝑖 𝑘𝑘 𝑛𝑛𝑖𝑖 𝑘𝑘
2
� � 𝑋𝑋𝑖𝑖𝑖𝑖 − 𝑋𝑋�𝑖𝑖 2
= � � 𝑋𝑋𝑖𝑖𝑖𝑖 − � 𝑛𝑛𝑖𝑖 𝑋𝑋�𝑖𝑖2
𝑖𝑖=1 𝑗𝑗=1 𝑖𝑖=1 𝑗𝑗=1 𝑖𝑖=1
The between-group sum of squares SSR

𝑘𝑘 𝑘𝑘
� 𝑛𝑛𝑖𝑖 𝑋𝑋�𝑖𝑖 − 𝑋𝑋� 2 = � 𝑛𝑛𝑖𝑖 𝑋𝑋�𝑖𝑖2 − 𝑛𝑛𝑋𝑋� 2

𝑖𝑖=1 𝑖𝑖=1
Note that
• the total sum of squares SST is equal to the sum of SSE and SSR.
• SSE and SSR are independent.
STAT 2006 - Jan 2021 42

By the orthogonal decomposition of chi-squared distributed random variables,
𝑘𝑘
1
2
� 𝑛𝑛𝑖𝑖 𝑋𝑋�𝑖𝑖 − 𝑋𝑋� 2 ~𝜒𝜒 2 .
𝑘𝑘−1
𝜎𝜎
𝑖𝑖=1
By the definition of F-distribution, under 𝐻𝐻0 , we have
1 𝑘𝑘 � � 2 𝑆𝑆𝑆𝑆𝑆𝑆
∑ 𝑛𝑛
2 𝑗𝑗=1 𝑗𝑗 𝑗𝑗𝑋𝑋 − 𝑋𝑋 � 𝑘𝑘 − 1
𝐹𝐹 = 𝜎𝜎 = 𝑘𝑘 − 1 ~𝐹𝐹 𝑘𝑘 − 1, 𝑛𝑛 − 𝑘𝑘 .
1 𝑘𝑘 𝑛𝑛𝑖𝑖 � 2 𝑆𝑆𝑆𝑆𝑆𝑆
∑ ∑ 𝑋𝑋 − 𝑋𝑋 � 𝑛𝑛 − 𝑘𝑘
𝜎𝜎 2 𝑖𝑖=1 𝑗𝑗=1 𝑖𝑖𝑖𝑖 𝑖𝑖 𝑛𝑛 − 𝑘𝑘
The p-value of the test is
1 2
∑𝑘𝑘𝑗𝑗=1 𝑛𝑛𝑗𝑗 𝑥𝑥𝑗𝑗̅ − 𝑥𝑥̅
𝑃𝑃 𝐹𝐹 > 𝑘𝑘 1
−1
2
𝐻𝐻0
𝑛𝑛
∑𝑘𝑘𝑖𝑖=1 ∑𝑗𝑗=1 𝑖𝑖
𝑥𝑥𝑖𝑖𝑖𝑖 − 𝑥𝑥̅𝑖𝑖
𝑛𝑛 − 𝑘𝑘
STAT 2006 - Jan 2021 43
Example. A researcher for an automobile safety institute was interested in determining whether or
not the distance that it takes to stop a car going 60 miles per hour depends on the brand of the
tire. The researcher measured the stopping distance (in feet) of ten randomly selected cars for
each of five different brands. So that he and his assistants would remain blinded, the researcher
arbitrarily labeled the brands of the tires as Brand1, Brand2, Brand3, Brand4, and Brand5. Here are
the data resulting from his experiment: Brand1 Brand2 Brand3 Brand4 Brand5
194 189 185 183 195
184 204 183 193 197
189 190 186 184 194
189 190 183 186 202
186 189 179 194 200
195 207 191 199 211
186 203 188 196 203
183 193 196 188 206
188 181 189 193 202
188 206 194 196 195
STAT 2006 - Jan 2021 44

Example. Do the data provide enough evidence to conclude that at least one of the brands is
different from the others with respect to stopping distance?
Below is a horizontal boxplots of the data.
It appears that the box plots for Brand1 and Brand5 have very little, if any, overlap at all. The same
can be said for Brand3 and Brand5.
STAT 2006 - Jan 2021 45

Example. Do the data provide enough evidence to conclude that at least one of the brands is
different from the others with respect to stopping distance?
Below is a horizontal boxplots of the data.
Brand Mean Standard deviation

1 188.2 3.6824
2 195.2 8.5534
3 187.4 5.004
4 191.2 5.2688
5 200.5 5.1624
It appears that the box plots for Brand1 and Brand5 have very little, if any, overlap at all. The same
can be said for Brand3 and Brand5.
STAT 2006 - Jan 2021 46

Example. Do the data provide enough evidence to conclude that at least one of the
brands is different from the others with respect to stopping distance?
The hypothesis is 𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = 𝜇𝜇3 = 𝜇𝜇4 = 𝜇𝜇5 vs 𝐻𝐻1 : at least one of the 𝜇𝜇𝑖𝑖 is different.
We can calculate that the SSE is 1661.7 (df = 45) and the SSR is 1174.8 (df = 4). Hence, the
p-value of the test is
1 𝑘𝑘 2
∑𝑗𝑗=1 𝑛𝑛𝑗𝑗 𝑥𝑥𝑗𝑗̅ − 𝑥𝑥̅ 1174.8/4
𝑃𝑃 𝐹𝐹 > 1 𝑘𝑘 − 1 𝐻𝐻0 = 𝑃𝑃 𝐹𝐹4,45 >
𝑛𝑛𝑖𝑖 2 1661.7/45
∑𝑘𝑘𝑖𝑖=1 ∑𝑗𝑗=1 𝑥𝑥𝑖𝑖𝑖𝑖 − 𝑥𝑥̅𝑖𝑖
𝑛𝑛 − 𝑘𝑘
= 𝑃𝑃 𝐹𝐹4,45 > 7.954 ≈ 0.
Since the p-value is very small, we reject the null hypothesis to conclude that the mean
stopping distance for at least one brand of tire is different than the mean stopping
distances of the others.
STAT 2006 - Jan 2021 47

We can organize the results in an ANOVA table as follows:
Source df SS MS F p-value
Brand 4 1174.8 293.7 7.95 0.000
Error 45 1661.7 36.9
Total 49 2836.5
In general, an ANOVA table summarizes the results as follows:

Factor 𝑘𝑘 − 1 𝑆𝑆𝑆𝑆𝑆𝑆 𝑀𝑀𝑀𝑀𝑀𝑀 = 𝑆𝑆𝑆𝑆𝑆𝑆/ 𝑘𝑘 − 1 𝐹𝐹 = 𝑀𝑀𝑀𝑀𝑀𝑀/𝑀𝑀𝑀𝑀𝑀𝑀 𝑃𝑃 𝐹𝐹𝑘𝑘−1,𝑛𝑛−𝑘𝑘 > 𝐹𝐹
Error 𝑛𝑛 − 𝑘𝑘 𝑆𝑆𝑆𝑆𝑆𝑆 𝑀𝑀𝑀𝑀𝑀𝑀 = 𝑆𝑆𝑆𝑆𝑆𝑆/ 𝑛𝑛 − 𝑘𝑘
Total 𝑛𝑛 − 1 𝑆𝑆𝑆𝑆𝑆𝑆
STAT 2006 - Jan 2021 48

Example. Suppose an education researcher is interested in determining whether learning

method affects students' exam scores. Specifically, suppose she considers these three
methods: Standard, Osmosis and Shock Therapy. Suppose she convinces 15 students to
take part in her study, so she randomly assigns 5 students to each method. Then, after
waiting eight weeks, she tests the students to get exam scores.
Method Exam Scores

Shock 71 72 73 74 75
Osmosis 58 62 63 64 68
Standard 40 41 41 44 51
What would you conclude?
STAT 2006 - Jan 2021 49

Example. The ANOVA table is given as

Method 2 2267.2 1133.6 94.99 0.000
Error 12 143.2 11.9333
Total 14 2410.4
Since the p-value of the F-test is small, we reject the null hypothesis to conclude that at
least one of the exam scores is different.
STAT 2006 - Jan 2021 50


Method Exam Scores

Shock 61 63 71 76 81
Osmosis 46 48 56 66 70
Standard 34 42 44 54 64
STAT 2006 - Jan 2021 51


Method 2 1083.733 541.867 4.314 0.03875
Error 12 1507.2 125.6
Total 14 2590.933
Since the p-value of the F-test is less than 0.05, we reject the null hypothesis at 5%
significance to conclude that at least one of the exam scores is different.
STAT 2006 - Jan 2021 52


Method Exam Scores

Shock 51 54 61 69 75
Osmosis 53 55 63 67 71
Standard 45 49 57 66 68
STAT 2006 - Jan 2021 53


Method 2 80.1333 40.067 0.4576 0.6434
Error 12 1050.8 87.567
Total 14 1130.933
Since the p-value of the F-test is large, we do not have enough evidence to reject the null
hypothesis.
STAT 2006 - Jan 2021 54

Remarks:
• The result of the F-test indicates the significance of fitting the group means model to
the data, that is to assume that 𝑋𝑋𝑖𝑖𝑖𝑖 = 𝜇𝜇𝑖𝑖 + 𝜀𝜀𝑖𝑖𝑖𝑖 , for 𝑗𝑗 = 1,2, … , 𝑛𝑛𝑖𝑖 and 𝑖𝑖 = 1,2, … , 𝑘𝑘,
where 𝜀𝜀𝑖𝑖𝑖𝑖 ’s are independent normally distributed with mean 0 and variance 𝜎𝜎 2 . The
𝑛𝑛𝑖𝑖
1
mle of 𝜇𝜇𝑖𝑖 is 𝑋𝑋�𝑖𝑖 = � 𝑋𝑋𝑖𝑖𝑖𝑖 .
𝑛𝑛𝑖𝑖 𝑗𝑗=1
1 2 𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆
• Since 𝑆𝑆𝑆𝑆𝑆𝑆~𝜒𝜒 𝐸𝐸
𝑛𝑛−𝑘𝑘 , = 𝜎𝜎 2 .
That is, 𝑀𝑀𝑀𝑀𝑀𝑀 = is an unbiased estimator of 𝜎𝜎 2 .
𝜎𝜎 2 𝑛𝑛−𝑘𝑘 𝑛𝑛−𝑘𝑘
This result is true irrespective of whether the null hypothesis is true or not.
• If the group means model fits the data, 𝑆𝑆𝑆𝑆𝑆𝑆 is large and 𝑆𝑆𝑆𝑆𝑆𝑆 is small. In this case, the
p-value of the F-test is small and the null hypothesis is rejected. We say that the model
is significant and fits the data well.
STAT 2006 - Jan 2021 55

Remarks:
• The F-test assumes that the random samples are independent, normally distributed
and the error variances are equal. However, the F-test works quite well even if the
underlying measurements are not normally distributed, unless the data are highly
skewed or the variances are markedly different.
• If the data are highly skewed, or if there is evidence that the variances differ greatly,
we have two analysis options at our disposal. We could attempt to transform the
observations (take the natural log of each value, for example) to make the data more
symmetric with more similar variances. Alternatively, we could use nonparametric
methods
STAT 2006 - Jan 2021 56

Example. A large body of evidence shows that soy has health benefits for most people.
Some of these benefits originate largely from isoflavones, plant compounds that have
estrogen-like properties. A consumer group purchased various soy products and ran
laboratory tests to determine the amount of isoflavones in each product. There were
three major sources of soy products: (1) cereals and snacks, (2) energy bars and (3) veggie
burgers. Five different products from each of the three categories were selected and the
amount of isoflavones (in mg) was determined for an adult serving. Our objective is to
determine if the average amount of isoflavones was different for the three sources of soy
products.
STAT 2006 - Jan 2021 57

Example. The data are given in the following table. Use these data to test the hypothesis
of a difference in the mean isoflavones level for the three categories.
Sample
Source of Sample Sample
Isoflavones Content (mg) Standard
Soy Size Mean
deviation
1 5 17 12 10 4 5 9.60 4.7582
2 19 10 9 7 5 5 10.00 4.8166
3 25 15 12 9 8 5 13.80 6.1123
Total 15 11.133
STAT 2006 - Jan 2021 58

Example. The testing of hypothesis is 𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = 𝜇𝜇3 versus 𝐻𝐻1 : At least one of the
three population means is different from the rest.
3
𝑆𝑆𝑆𝑆𝑆𝑆 = � 𝑛𝑛𝑖𝑖 𝑋𝑋�𝑖𝑖 − 𝑋𝑋� 2
𝑖𝑖=1
2 2 2
= 5 9.60 − 11.133 + 5 10.0 − 11.133 + 5 13.80 − 11.133 = 53.733
3
𝑆𝑆𝑆𝑆𝑆𝑆 = � 𝑛𝑛𝑖𝑖 − 1 𝑆𝑆𝑖𝑖2 = 5 − 1 4.75822 + 5 − 1 4.81662 + 5 − 1 6.11232 = 416.0

𝑖𝑖=1
STAT 2006 - Jan 2021 59

Example. The testing of hypothesis is 𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 = 𝜇𝜇3 versus 𝐻𝐻1 : At least one of the
three population means is different from the rest.
The ANOVA table is given as
Method 2 53.733 26.867 0.775 0.4824
Error 12 416 34.667
Total 14 469.733
Since the p-value of the F-test is large, we do not have enough evidence to reject the null
hypothesis.
STAT 2006 - Jan 2021 60

STAT 2006 Chapter 3 - 2022

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT 2006 Chapter 3 - 2022

Uploaded by

Copyright:

Available Formats

Hypothesis Testing

Department of Statistics, The Chinese University of Hong Kong

STAT 2006 - Jan 2022 1

Steps to conduct a hypothesis testing.

STAT 2006 - Jan 2022 2

STAT 2006 - Jan 2022 3

STAT 2006 - Jan 2022 4

STAT 2006 - Jan 2022 5

Example. If 𝐶𝐶 = 0.02, the probability of committing a type I error is

STAT 2006 - Jan 2022 6

STAT 2006 - Jan 2022 7

STAT 2006 - Jan 2022 8

STAT 2006 - Jan 2022 9

The p-value approach.

STAT 2006 - Jan 2022 10

STAT 2006 - Jan 2022 11

STAT 2006 - Jan 2022 12

STAT 2006 - Jan 2022 13

Is there sufficient evident at 5% significance to conclude that the two populations

STAT 2006 - Jan 2022 14

STAT 2006 - Jan 2022 15

STAT 2006 - Jan 2022 16

STAT 2006 - Jan 2022 17

STAT 2006 - Jan 2022 18

STAT 2006 - Jan 2022 19

STAT 2006 - Jan 2022 20

STAT 2006 - Jan 2022 21

STAT 2006 - Jan 2022 22

STAT 2006 - Jan 2022 23

• 𝐻𝐻0 : 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑋𝑋 < 𝜇𝜇𝑌𝑌

• 𝐻𝐻0 : 𝜇𝜇𝑋𝑋 = 𝜇𝜇𝑌𝑌 vs 𝐻𝐻𝐴𝐴 : 𝜇𝜇𝑋𝑋 ≠ 𝜇𝜇𝑌𝑌

STAT 2006 - Jan 2022 25

Testing for the equality of two means – equal variances

STAT 2006 - Jan 2022 26

STAT 2006 - Jan 2022 27

Testing for the equality of two means – unequal variances

STAT 2006 - Jan 2022 28

STAT 2006 - Jan 2022 29

Testing for the equality of two means – unequal variances

STAT 2006 - Jan 2022 30

STAT 2006 - Jan 2022 31

STAT 2006 - Jan 2021 32

Testing for single variance

STAT 2006 - Jan 2021 33

Testing for single variance

STAT 2006 - Jan 2021 34

Testing for single variance

STAT 2006 - Jan 2021 35

STAT 2006 - Jan 2021 36

Testing for the equality of two variances

STAT 2006 - Jan 2021 37

STAT 2006 - Jan 2021 38

STAT 2006 - Jan 2021 39

STAT 2006 - Jan 2021 40

STAT 2006 - Jan 2021 41

The within-group sum of squares SSE

The between-group sum of squares SSR

� 𝑛𝑛𝑖𝑖 𝑋𝑋�𝑖𝑖 − 𝑋𝑋� 2 = � 𝑛𝑛𝑖𝑖 𝑋𝑋�𝑖𝑖2 − 𝑛𝑛𝑋𝑋� 2

STAT 2006 - Jan 2021 42

STAT 2006 - Jan 2021 44

STAT 2006 - Jan 2021 45