Professional Documents
Culture Documents
STAT 2006 Chapter 5
STAT 2006 Chapter 5
Hypothesis Testing
STAT 2006 Chapter 5
More Hypothesis Testing
Presented by
Simon Cheung
Email: kingchaucheung@cuhk.edu.hk
Department of Statistics, The Chinese University of Hong Kong
STAT 2006 ‐ Jan 2021 1
Testing for the homogeneity of population means
Suppose that there are 2 populations. A random sample is drawn from each of the
population. Denote these random samples as
, , , ,…, , ~ ,
, , , ,…, , ~ ,
⋮
, , , ,…, , ~ ,
Assume that
1. All population variances are equal, that is ⋯
2. Each of the population has a normal distribution
3. All random samples are independent
STAT 2006 ‐ Jan 2021 2
Testing for the homogeneity of population means
1
⋮ ~ ⋮ , .
1
STAT 2006 ‐ Jan 2021 3
Testing for the homogeneity of population means
The within‐group sum of squares
1 1
⋯ ⋮
1
⋯ 1 1 ⋮
1
1 1 .
STAT 2006 ‐ Jan 2021 4
Testing for the homogeneity of population means
because 0. Hence,
0
1
exp 1 1
1 1 2 0 1 1
1 1 1 1
exp exp 1 1 1 1 .
2 2
STAT 2006 ‐ Jan 2021 5
Testing for the homogeneity of population means
It follows that and are independent.
Since and 1 1 1 1 1 1 ,
and
1 1 1
1 1 1 1 1 1
STAT 2006 ‐ Jan 2021 6
Testing for the homogeneity of population means
1
~
where ∑ and ∑ .
∑ ∑ ∑ 1
~
where ∑ .
STAT 2006 ‐ Jan 2021 7
Testing for the homogeneity of population means
where ∑ ∑ and ∑ ∑ .
Since ∑ ∑ and ∑ are independent, the orthogonal
decomposition of chi‐squared distributed random variables
STAT 2006 ‐ Jan 2021 8
Testing for the homogeneity of population means
indicates that
∑
~
Under ,
1
∑
1 ~
1 ,
∑ ∑
STAT 2006 ‐ Jan 2021 9
Testing for the homogeneity of population means
Example.
A large body of evidence shows that soy has health benefits for most people. Some of
these benefits originate largely from isoflavones, plant compounds that have estrogen‐
like properties. A consumer group purchased various soy products and ran laboratory
tests to determine the amount of isoflavones in each product. There were three major
sources of soy products: (1) cereals and snacks, (2) energy bars and (3) veggie burgers.
Five different products from each of the three categories were selected and the amount
of isoflavones (in mg) was determined for an adult serving. Our objective is to determine
if the average amount of isoflavones was different for the three sources of soy products.
STAT 2006 ‐ Jan 2021 10
Testing for the homogeneity of population means
Example.
The data are given in the following table. Use these data to test the hypothesis of a
difference in the mean isoflavones level for the three categories.
STAT 2006 ‐ Jan 2021 11
Testing for the homogeneity of population means
Example.
The testing of hypothesis is : versus : At least one of the three
population means is different from the rest.
STAT 2006 ‐ Jan 2021 12
Testing for the homogeneity of population means
Example.
1
60.40
2 0.83
1
437.60
12
STAT 2006 ‐ Jan 2021 13
Simple Linear Regression
When we are not interested in the random variable , except to use to improve our
estimation of , we can consider the conditional distribution . In this
section, we consider the following conditional model between , :
| , ~ 0, .
Under this model, | ~ , .
STAT 2006 ‐ Jan 2021 14
Simple Linear Regression
Example. We select a random sample of individuals and measure the height and weight
of each one. The following data is obtained.
68 , 151 , 72 , 163 , 69 , 146 , 72 , 180 , 70 , 157 , 73 , 170 , 70 , 164 ,
73 , 175 , 71 , 171 , 74 , 178 , 72 , 160 , 75 , 188
We can plot the relationship between the height and weight of individuals in the sample.
206.26 5.213
height
weight
STAT 2006 ‐ Jan 2021 15
Simple Linear Regression
STAT 2006 ‐ Jan 2021 16
Simple Linear Regression
The likelihood function of observing , ,…, given , ,…, is
1
, , ∝ exp ̅ .
2
The log‐likelihood function is given by
1
log , , log ̅
2 2
We must select and to minimize
, ̅
STAT 2006 ‐ Jan 2021 17
Simple Linear Regression
,
2 ̅ ̅
, ,
Setting 0 and 0,
̅ ̅
and
̅ ̅
STAT 2006 ‐ Jan 2021 18
Simple Linear Regression
Note that and are linear functions of , ,…, and hence have normal
distributions with respective means and variances.
1 1
̅
STAT 2006 ‐ Jan 2021 19
Simple Linear Regression
̅ ̅ ̅
̅ ̅
̅
.
̅ ̅
Note that
̅ ̅ ̅
̅ ̅
STAT 2006 ‐ Jan 2021 20
Simple Linear Regression
̅
~ , ~ , ~
Thus,
̅
~
STAT 2006 ‐ Jan 2021 21
Simple Linear Regression
To show that and are independent.
̅
, , ,
̅ ̅
̅
̅ 0.
̅
Note that
1
1 1
,
1
1 1
̅
̅
̅
STAT 2006 ‐ Jan 2021 23
Simple Linear Regression
, , ·
0.
Hence,
, ̅ 0.
Since 1 ,
, 1 0,
as 1 1 1 1 0.
STAT 2006 ‐ Jan 2021 24
Simple Linear Regression
have
∑ ̅
̅ ∑ ̅
STAT 2006 ‐ Jan 2021 25
Simple Linear Regression
Thus,
̅
~ .
~ .
2 2 ̅
STAT 2006 ‐ Jan 2021 26
Simple Linear Regression
~ .
2
2
, .
, ,
STAT 2006 ‐ Jan 2021 27
Simple Linear Regression
For a given in the sample, ̅ is a point estimate for .
Since and are normally and independently distributed, has a normal distribution.
̅
̅ 1 ̅
.
̅ ̅
Hence,
̅ ̅
1 ̅
̅
̅ ̅
~ .
1 ̅
2 2
̅
STAT 2006 ‐ Jan 2021 28
Simple Linear Regression
Let
1 ̅
.
2
̅
STAT 2006 ‐ Jan 2021 29
Simple Linear Regression
̅ ̅ 0
̅ ̅
̅
̅
1 ̅
1
̅
STAT 2006 ‐ Jan 2021 30
Simple Linear Regression
1 ̅
1
̅
~ .
1 ̅
1
2 2
̅
Let
1 ̅
1 .
2
̅
STAT 2006 ‐ Jan 2021 31
Test on Correlation
where 1, 0, 0, ∞ ∞, ∞ ∞. When 0, , ,
. and are independent.
STAT 2006 ‐ Jan 2021 32
Test on Correlation
, ,
Note that
.
STAT 2006 ‐ Jan 2021 33
Test on Correlation
2
~ .
1 1
2
STAT 2006 ‐ Jan 2021 34
Test on Correlation
Two‐tailed 0 0 ,
Left‐tailed 0 0 ,
Right‐tailed 0 0 ,
STAT 2006 ‐ Jan 2021 35
Test on Correlation
Example. We select a random sample of individuals and measure the height and weight
of each one. The following data is obtained.
68 , 151 , 72 , 163 , 69 , 146 , 72 , 180 , 70 , 157 , 73 , 170 , 70 , 164 ,
73 , 175 , 71 , 171 , 74 , 178 , 72 , 160 , 75 , 188
We shall test : 0 vs : 0. height and weight.
STAT 2006 ‐ Jan 2021 37
Chi‐squared Tests
Let the likelihood ratio of a test be . When is true, should be close to 1. Given an
observation , is rejected if , for some 0 1.
Asymptotic theory suggests that
2 log ,
where is the dimension of the parameter space in general and is the dimension of the
parameter space under . is rejected if 2 log , .
STAT 2006 ‐ Jan 2021 38
Chi‐squared Tests
Example ,…, ~Multinomial , ,…, .
Data ,…,
: , 1,2, … , , where 1.
Under ,
and .
STAT 2006 ‐ Jan 2021 39
Chi‐squared Tests
Example ,…, ~Multinomial , ,…, .
Data ,…,
: , 1,2, … , , where 1.
The Lagrange multiplier optimization function is given by
, log 1
STAT 2006 ‐ Jan 2021 40
Chi‐squared Tests
Under Θ,
STAT 2006 ‐ Jan 2021 41
Chi‐squared Tests
STAT 2006 ‐ Jan 2021 42
Chi‐squared Tests
Cell 1 2 3 4 Total
12 13 20 25 70
12.6 12.6 22.4 22.4 70
STAT 2006 ‐ Jan 2021 43
Chi‐squared Tests
log 1
Differentiating with respect to and setting it to zero and, after some algebra, yields
, ,
2 2 2
STAT 2006 ‐ Jan 2021 44
Chi‐squared Tests
, 3
STAT 2006 ‐ Jan 2021 45
Chi‐squared Tests
Since ,
⟹
It follows that
, ,
2 2 2
Since 1, .
STAT 2006 ‐ Jan 2021 46
Chi‐squared Tests
Cell 1 2 3 4 5 ⋯ c Total
n n n ⋯
STAT 2006 ‐ Jan 2021 47
Chi‐squared Tests
Example ,…, ~Multinomial , ,…, .
Data 12,13,20,25 , 70
: .
Under ,
0.154, 0.167, 0.32, 0.36.
Cell 1 2 3 4 Total
12 13 20 25 70
10.8 11.7 22.5 25 70
STAT 2006 ‐ Jan 2021 48
Chi‐squared Tests
Example
X 1 2 3 4 5 6
22 53 58 39 20 5 2 1
: The data was generated by a Poisson Distribution
0 22 1 53 2 58 3 39 4 20 5 5 6 2 7 1
22 53 58 39 20 5 2 1
2.05
X 0 1 2 3 4 6
22 53 58 39 20 5 2 1
STAT 2006 ‐ Jan 2021 49
Chi‐squared Tests
Example
X 1 2 3 4 5 6
22 53 58 39 20 5 2 1
2 log
22 53 58
2 22 log 53 log 58 log 39
25.747 52.781 54.101
39 20 5 2
log 20 log 5 log 2 log 1
36.969 18.947 7.768 2.654
1
log 2.3236 .
1.0332
Since p‐value of the test is 0.8877, is not rejected.
STAT 2006 ‐ Jan 2021 50
Chi‐squared Tests
1.
: , ,…, , 1 . Thus,
~ 1, .
It follows that, under , and Σ, where Σ 1 and
Σ .
STAT 2006 ‐ Jan 2021 51
Chi‐squared Tests
STAT 2006 ‐ Jan 2021 52
Chi‐squared Tests
1 1 1 ⋯ 1
⋯
Σ∗ ⋱
⋮ ⋮ ⋯ ⋮
1 1 1 1
∗ ∗ ∗
Let , ,…, . Then, , ,…, and
, ,…, .
STAT 2006 ‐ Jan 2021 53
Chi‐squared Tests
1 1 1 ⋯ 1
⋯ ⋯ ⋮
⋱
⋮ ⋮ ⋯ ⋮
1 1 1 1
STAT 2006 ‐ Jan 2021 54
Chi‐squared Tests
∗ ∗
Let , ,…, , and , ,…, .
∗ ∗ ∗ ∗ ∗
Σ ;
STAT 2006 ‐ Jan 2021 55
Chi‐squared Tests
STAT 2006 ‐ Jan 2021 56
Chi‐squared Tests
Example ,…, ~Multinomial , ,…, .
Data 12,13,20,25 , 70
: , .
Under ,
0.18, 0.32.
2 2
Cell 1 2 3 4 Total
12 13 20 25 70
12.6 12.6 22.4 22.4 70
. . . .
0.60 (df = 4‐1‐1=2)
. . . .
Since p‐value is 0.741, is not rejected.
STAT 2006 ‐ Jan 2021 57
Chi‐squared Tests
Example ,…, ~Multinomial , ,…, .
Data 12,13,20,25 , 70
: .
Under ,
0.154, 0.167, 0.32, 0.36.
Cell 1 2 3 4 Total
12 13 20 25 70
10.8 11.7 22.5 25 70
. . .
0.56 (df = 4‐1‐2=1)
. . .
Since p‐value is 0.454, is not rejected.
STAT 2006 ‐ Jan 2021 58
Chi‐squared Tests
Example. A journal reported that, in a bag of m&m's chocolate peanut candies, there are
30% brown, 30% yellow, 10% blue, 10% red, 10% green and 10% orange candies. Suppose
you purchase a bag of m&m's chocolate peanut candies at a nearby store and find 17
brown, 20 yellow, 13 blue, 7 red, 6 green and 9 orange candies, for a total of 72 candies.
At the 0.1 level of significance, does the bag purchased agree with the distribution
suggested by the journal?
: , , , , , 0.3,0.3,0.1,0.1,0.1,0.1 vs
: 0.3,0.3,0.1,0.1,0.1,0.1
Data 17,20,13,7,6,9
Cell Brown Yellow Blue Red Green Orange Total
17 20 13 7 6 9 72
21.6 21.6 7.2 7.2 7.2 7.2 72
STAT 2006 ‐ Jan 2021 59
Chi‐squared Tests
Example.
: , , , , , 0.3,0.3,0.1,0.1,0.1,0.1 vs
: 0.3,0.3,0.1,0.1,0.1,0.1
Cell Brown Yellow Blue Red Green Orange Total
17 20 13 7 6 9 72
21.6 21.6 7.2 7.2 7.2 7.2 72
At 10% level of significance, we cannot reject .
STAT 2006 ‐ Jan 2021 60
Chi‐squared Tests
Example.
: , , , , , 0.3,0.3,0.1,0.1,0.1,0.1 vs
: 0.3,0.3,0.1,0.1,0.1,0.1
. . . . . .
6.426
. . . . . .
(df = 6‐1=5)
Since p‐value is 0.27, at 10% level of significance, we cannot reject .
STAT 2006 ‐ Jan 2021 61
Chi‐Square Test for independence – 2 way Table
Suppose that data are collected on a pair of categorical variables , ,
where has categories and has categories. The data consists of pairs
, , , ,…, ,
which can be summarized in a frequency table.
/ ⋯ Total
⋯
⋯
⋮ ⋮ ⋮ ⋱ ⋮ ⋮
⋯
Total ⋯
STAT 2006 ‐ Jan 2021 62
Chi‐Square Test for independence – 2 way Table
STAT 2006 ‐ Jan 2021 63
Chi‐Square Test for independence – 2 way Table
Pearson statistics
Deviance Statistics 2 log
The is 1 1 1 1 1 1 .
A large value of or indicates that the independence model is not
plausible, and that and are related.
STAT 2006 ‐ Jan 2021 64
Chi‐Square Test for independence – 2 way Table
• The expected counts are sufficiently large (at least 80% of the
should be greater than 5)
STAT 2006 ‐ Jan 2021 65
Chi‐Square Test for independence – 2 way Table
Example
Study the relationship between gender and socio‐economic status (ses).
Socio‐economic status(ses)
Total 47 95 58 200
STAT 2006 ‐ Jan 2021 66
Chi‐Square Test for independence – 2 way Table
Example
91 47 91 95 91 58 109 47 109 95
15 47 29 32 48
200 200 200 200 200
91 47 91 95 91 58 109 47 109 95
200 200 200 200 200
109 58
29
200
4.577
109 58
200
1 0 Total
1 1
0 1 1 1 1
Total 1 1
Note that and .
STAT 2006 ‐ Jan 2021 68