You are on page 1of 68

More 

Hypothesis Testing

STAT 2006 Chapter 5
More Hypothesis Testing

Presented by
Simon Cheung
Email: kingchaucheung@cuhk.edu.hk

Department of Statistics, The Chinese University of Hong Kong

STAT 2006 ‐ Jan 2021 1
Testing for the homogeneity of population means
Suppose that there are 2 populations. A random sample is drawn from each of the
population. Denote these random samples as

, , , ,…, , ~ ,

, , , ,…, , ~ ,

, , , ,…, , ~ ,
Assume that
1. All population variances are equal, that is ⋯
2. Each of the population has a normal distribution
3. All random samples are independent

STAT 2006 ‐ Jan 2021 2
Testing for the homogeneity of population means

The testing hypothesis is : ⋯ vs : is not true.

Let ⋯ and ⋱ . We have

1
⋮ ~ ⋮ , .
1

STAT 2006 ‐ Jan 2021 3
Testing for the homogeneity of population means
The within‐group sum of squares
1 1

⋯ ⋮

The between‐group sum of squares

1
⋯ 1 1 ⋮

1
1 1 .

STAT 2006 ‐ Jan 2021 4
Testing for the homogeneity of population means

Since and 1 1 are linear combinations of , and are


both normally distributed.
1
, 1 1 0,

because 0. Hence,

0
1
exp 1 1
1 1 2 0 1 1
1 1 1 1
exp exp 1 1 1 1 .
2 2

STAT 2006 ‐ Jan 2021 5
Testing for the homogeneity of population means
It follows that and are independent.
Since and 1 1 1 1 1 1 ,

and
1 1 1
1 1 1 1 1 1

are sum of squares of and respectively. Thus, and 1 1


are independent.
Note that a general result is that if ~ , Σ , then and are independent if
Σ 0.

STAT 2006 ‐ Jan 2021 6
Testing for the homogeneity of population means

For the th population, we have

1
~

where ∑ and ∑ .

Since addition of independent chi‐squared distributed random variables is still a chi‐


squared distributed random variable, we can verify that

∑ ∑ ∑ 1
~

where ∑ .

STAT 2006 ‐ Jan 2021 7
Testing for the homogeneity of population means

Under  , the  random samples can be pooled together to form a large random sample. 


In this case, we have
1
~

where  ∑ ∑ and  ∑ ∑ .
Since ∑ ∑ and ∑ are independent, the orthogonal
decomposition of chi‐squared distributed random variables

STAT 2006 ‐ Jan 2021 8
Testing for the homogeneity of population means

indicates that

~
Under ,
1

1 ~
1 ,
∑ ∑

The null hypothesis of equality of the population means is rejected if value of


exceeds , , .

STAT 2006 ‐ Jan 2021 9
Testing for the homogeneity of population means

Example.
A large body of evidence shows that soy has health benefits for most people. Some of
these benefits originate largely from isoflavones, plant compounds that have estrogen‐
like properties. A consumer group purchased various soy products and ran laboratory
tests to determine the amount of isoflavones in each product. There were three major
sources of soy products: (1) cereals and snacks, (2) energy bars and (3) veggie burgers.
Five different products from each of the three categories were selected and the amount
of isoflavones (in mg) was determined for an adult serving. Our objective is to determine
if the average amount of isoflavones was different for the three sources of soy products.

STAT 2006 ‐ Jan 2021 10
Testing for the homogeneity of population means

Example.
The data are given in the following table. Use these data to test the hypothesis of a
difference in the mean isoflavones level for the three categories.

Source of  Sample  Sample  Sample 


Isoflavones Content (mg)
Soy Sizes Means Variances
1 5 17 12 10 4 5 9.20 33.70
2 19 10 9 7 5 5 10.00 29.00
3 25 15 12 9 8 5 13.80 46.70
Total 15 11.00

STAT 2006 ‐ Jan 2021 11
Testing for the homogeneity of population means

Example.
The testing of hypothesis is : versus : At least one of the three
population means is different from the rest.

5 9.20 11.00 5 10.0 11.0 5 13.80 11.00 60.40

1 5 1 33.7 5 1 29.0 5 1 46.7 437.60

STAT 2006 ‐ Jan 2021 12
Testing for the homogeneity of population means

Example.

1
60.40
2 0.83
1
437.60
12

Since . , , 3.89, is not rejected. Thus, there is not significant evidence


that the three sources of soy products provide on the average different levels of
isoflavones.

STAT 2006 ‐ Jan 2021 13
Simple Linear Regression

For a pair of random variables , , we let , , as their joint distribution. We can


write
, , .

When we are not interested in the random variable , except to use to improve our
estimation of , we can consider the conditional distribution . In this
section, we consider the following conditional model between , :
| , ~ 0, .
Under this model, | ~ , .

STAT 2006 ‐ Jan 2021 14
Simple Linear Regression
Example. We select a random sample of individuals and measure the height and weight
of each one. The following data is obtained.
68 , 151 , 72 , 163 , 69 , 146 , 72 , 180 , 70 , 157 , 73 , 170 , 70 , 164 ,
73 , 175 , 71 , 171 , 74 , 178 , 72 , 160 , 75 , 188
We can plot the relationship between the height and weight of individuals in the sample.

206.26 5.213
height
weight

STAT 2006 ‐ Jan 2021 15
Simple Linear Regression

Given a random sample , , , ,…, , , the model proposed relationship


between and gives

where , 1,2, … , , are independent and 0, .


By re‐centering , ,…, , we obtain a mean‐corrected version of the model
̅ , 1,2, … , ,
where ̅.

STAT 2006 ‐ Jan 2021 16
Simple Linear Regression
The likelihood function of observing , ,…, given , ,…, is
1
, , ∝ exp ̅ .
2
The log‐likelihood function is given by
1
log , , log ̅
2 2
We must select and to minimize

, ̅

STAT 2006 ‐ Jan 2021 17
Simple Linear Regression

Differentiating with respect to and , we have


,
2 ̅

,
2 ̅ ̅

, ,
Setting 0 and 0,
̅ ̅
and 
̅ ̅

STAT 2006 ‐ Jan 2021 18
Simple Linear Regression

We can also verify that


1
̅

Note that and are linear functions of , ,…, and hence have normal
distributions with respective means and variances.

1 1
̅

STAT 2006 ‐ Jan 2021 19
Simple Linear Regression
̅ ̅ ̅

̅ ̅

̅
.
̅ ̅

Note that

̅ ̅ ̅

̅ ̅

STAT 2006 ‐ Jan 2021 20
Simple Linear Regression

Since ~ , , ~ , and ̅ ~ 0, , we have


̅

̅
~ , ~ , ~

Thus,

̅
~

STAT 2006 ‐ Jan 2021 21
Simple Linear Regression
To show that and are independent.
̅
, , ,
̅ ̅

̅
̅ 0.
̅

Note that
1
1 1
,
1
1 1

where 1 1 , ,…, and ,…, .


STAT 2006 ‐ Jan 2021 22
Simple Linear Regression

̅
̅
̅

It follows that ̅ is a sum of squares of terms in

STAT 2006 ‐ Jan 2021 23
Simple Linear Regression

, , ·

0.

Hence,

, ̅ 0.

Since  1 ,

, 1 0,

as 1 1 1 1 0.

STAT 2006 ‐ Jan 2021 24
Simple Linear Regression

In summary,  ,  and  ̅ are independent. Therefore, we 

have

∑ ̅

̅ ∑ ̅

STAT 2006 ‐ Jan 2021 25
Simple Linear Regression
Thus,

̅
~ .

~ .

2 2 ̅

Let  . The 1 confidence interval for  is  , , , .


̅

STAT 2006 ‐ Jan 2021 26
Simple Linear Regression

~ .

2
2

The 1 confidence interval for  is  , , , .

Since  ~ , the 1 confidence interval for  is

, .
, ,

STAT 2006 ‐ Jan 2021 27
Simple Linear Regression
For a given in the sample, ̅ is a point estimate for .
Since and are normally and independently distributed, has a normal distribution.
̅
̅ 1 ̅
.
̅ ̅
Hence,
̅ ̅
1 ̅
̅
̅ ̅
~ .
1 ̅
2 2
̅

STAT 2006 ‐ Jan 2021 28
Simple Linear Regression

Let
1 ̅
.
2
̅

The 1 confidence interval for is given by


̅ ,
, ̅ ,

STAT 2006 ‐ Jan 2021 29
Simple Linear Regression

For a new observation , , ̂ ̂ ̅ . The prediction error


is normally distributed with

̅ ̅ 0

̅ ̅
̅

̅
1 ̅
1
̅

STAT 2006 ‐ Jan 2021 30
Simple Linear Regression

1 ̅
1
̅
~ .
1 ̅
1
2 2
̅

Let
1 ̅
1 .
2
̅

The 1 prediction interval for is given by


̅ ,
, ̅ ,

STAT 2006 ‐ Jan 2021 31
Test on Correlation

Let , have a bivariate normal distribution. The joint pdf is


, ,
1
2 exp
2
1 1
exp
2 1 2 1
1 1
exp 2 ,
2 1 2 1

where 1, 0, 0, ∞ ∞, ∞ ∞. When 0, , ,
. and are independent.

STAT 2006 ‐ Jan 2021 32
Test on Correlation

Given a random sample , , , ,…, , , we want to test : 0 vs : 0.


Define

, ,

The sample correlation coefficient is defined as

Note that
.

Under , and are independent. Thus, 0. The conditional distribution of given


,…, is 0, .

STAT 2006 ‐ Jan 2021 33
Test on Correlation

The conditional distribution of


∑ ̅ 1

given ,…, is and is independent of .


Hence, under ,

2
~ .
1 1
2

STAT 2006 ‐ Jan 2021 34
Test on Correlation

Under given that


, since the conditional distribution of ,…, does not
depend on , … , , the unconditional distribution of must be and and
,…, are independent.

Test Rejection Region p‐value

Two‐tailed 0 0 ,

Left‐tailed 0 0 ,

Right‐tailed 0 0 ,

STAT 2006 ‐ Jan 2021 35
Test on Correlation
Example. We select a random sample of individuals and measure the height and weight
of each one. The following data is obtained.
68 , 151 , 72 , 163 , 69 , 146 , 72 , 180 , 70 , 157 , 73 , 170 , 70 , 164 ,
73 , 175 , 71 , 171 , 74 , 178 , 72 , 160 , 75 , 188
We shall test : 0 vs : 0. height and weight.

244.5833, 46.917, ̅ 415.8686

166.917, 5.2131, 34.656, 0.9415


At 1% significance level, the p‐value is
5.2131
5.537 0.0002 0.01
0.9415
STAT 2006 ‐ Jan 2021 36
Test on Correlation
Example. We select a random sample of individuals and measure the height and weight
of each one. The following data is obtained.
68 , 151 , 72 , 163 , 69 , 146 , 72 , 180 , 70 , 157 , 73 , 170 , 70 , 164 ,
73 , 175 , 71 , 171 , 74 , 178 , 72 , 160 , 75 , 188
We shall test : 0 vs : 0. height and weight.
At 1% significance level, we reject .

We would also like to test : 0 vs : 0 at 5% significance level.

0.8684 and 5.53714

The p‐value of the test is 5.53714 0.0002. Hence, is rejected.

STAT 2006 ‐ Jan 2021 37
Chi‐squared Tests

Let the likelihood ratio of a test be . When is true, should be close to 1. Given an
observation , is rejected if , for some 0 1.
Asymptotic theory suggests that

2 log ,
where is the dimension of the parameter space in general and is the dimension of the
parameter space under . is rejected if 2 log , .

For example, if we test


: , , 1,2, … , vs : , , for at least one 1,2, … , ,
then and 0.

STAT 2006 ‐ Jan 2021 38
Chi‐squared Tests
Example ,…, ~Multinomial , ,…, .
Data ,…,

: , 1,2, … , , where 1.

Under ,

2 log ,…, 2 log

Under Θ, the log‐likelihood is log ,…, ∝ log , where 1

and .

STAT 2006 ‐ Jan 2021 39
Chi‐squared Tests
Example ,…, ~Multinomial , ,…, .
Data ,…,

: , 1,2, … , , where 1.
The Lagrange multiplier optimization function is given by

, log 1

Differentiating with respect to

Equating it to 0 yields . Since 1, and .

STAT 2006 ‐ Jan 2021 40
Chi‐squared Tests

Example ,…, ~Multinomial , ,…, .


Data ,…,
: , 1,2, … , , where ∑ 1.
Under ,

2 log ,…, 2 log

Under Θ,

2 log ,…, 2 log

STAT 2006 ‐ Jan 2021 41
Chi‐squared Tests

Example ,…, ~Multinomial , ,…, .



2 log ,…, 2 log 2 log ,

where ,…, is the vector of observed frequencies and ,…, are


the vector of expected frequencies under .

STAT 2006 ‐ Jan 2021 42
Chi‐squared Tests

Example ,…, ~Multinomial , ,…, .


Data 12,13,20,25 , 70
: , .
Under , 0.18, 0.32.

Cell 1 2 3 4 Total
12 13 20 25 70
12.6 12.6 22.4 22.4 70

2 log 2 12 log 13 log 20 log 25 log


. . . .
0.5992.  The p‐value is  0.6 0.75. Hence,  is not rejected.

STAT 2006 ‐ Jan 2021 43
Chi‐squared Tests

Example ,…, ~Multinomial , ,…, .


Data ,…,
: .
Under ,

log 1

Differentiating with respect to and setting it to zero and, after some algebra, yields

, ,
2 2 2

STAT 2006 ‐ Jan 2021 44
Chi‐squared Tests

, 3

Setting the derivatives to zero, we have


, , , , 3

STAT 2006 ‐ Jan 2021 45
Chi‐squared Tests
Since ,

It follows that
, ,
2 2 2

Since 1, .

STAT 2006 ‐ Jan 2021 46
Chi‐squared Tests

Example ,…, ~Multinomial , ,…, .


Data ,…,
: .

The ‐2 log‐likelihood ratio is given by



2 log ,…, 2 log

Cell 1 2 3 4 5 ⋯ c Total

n n n ⋯

STAT 2006 ‐ Jan 2021 47
Chi‐squared Tests
Example ,…, ~Multinomial , ,…, .
Data 12,13,20,25 , 70
: .
Under ,
0.154, 0.167, 0.32, 0.36.
Cell 1 2 3 4 Total
12 13 20 25 70
10.8 11.7 22.5 25 70

2 log 2 12 log 13 log 20 log 25 log


. . .
0.557 (p‐value is 0.4555). Hence,  is not rejected.

STAT 2006 ‐ Jan 2021 48
Chi‐squared Tests
Example
X 1 2 3 4 5 6
22 53 58 39 20 5 2 1

: The data was generated by a Poisson Distribution
0 22 1 53 2 58 3 39 4 20 5 5 6 2 7 1
22 53 58 39 20 5 2 1
2.05

X 0 1 2 3 4 6

22 53 58 39 20 5 2 1

0.1287 0.2639 0.2705 0.1848 0.0947 0.0389 0.0133 0.00517

25.747 52.781 54.101 36.969 18.947 7.768 2.654 1.0332

STAT 2006 ‐ Jan 2021 49
Chi‐squared Tests
Example
X 1 2 3 4 5 6
22 53 58 39 20 5 2 1

2 log
22 53 58
2 22 log 53 log 58 log 39
25.747 52.781 54.101
39 20 5 2
log 20 log 5 log 2 log 1
36.969 18.947 7.768 2.654
1
log 2.3236 .
1.0332
Since p‐value of the test  is 0.8877, is not rejected.

STAT 2006 ‐ Jan 2021 50
Chi‐squared Tests

Pearson Chi‐square statistic for goodness of fit


Suppose that , ,…, is a categorical variable with classes, where 1 if
the ‐th cell is selected. Since exactly one class is selected,

1.

: , ,…, , 1 . Thus,

~ 1, .
It follows that, under , and Σ, where Σ 1 and
Σ .

STAT 2006 ‐ Jan 2021 51
Chi‐squared Tests

Pearson Chi‐square statistic for goodness of fit


1 ⋯
1 ⋯
Σ ⋮ ⋱ ⋮

⋯ 1
Since Σ is singular, we use the leftmost 1 1 submatrix
1 ⋯
∗ 1 ⋯
Σ ⋮ ⋱ ⋮

⋯ 1

STAT 2006 ‐ Jan 2021 52
Chi‐squared Tests

Pearson Chi‐square statistic for goodness of fit


1 1 1 1

1 1 1 ⋯ 1

Σ∗ ⋱
⋮ ⋮ ⋯ ⋮
1 1 1 1

∗ ∗ ∗
Let , ,…, . Then, , ,…, and
, ,…, .

STAT 2006 ‐ Jan 2021 53
Chi‐squared Tests

Pearson Chi‐square statistic for goodness of fit


1 1 1 1

1 1 1 ⋯ 1
⋯ ⋯ ⋮

⋮ ⋮ ⋯ ⋮
1 1 1 1

STAT 2006 ‐ Jan 2021 54
Chi‐squared Tests

Pearson Chi‐square statistic for goodness of fit


Under , let , ,…, be random sample of . By CLT,

0, Σ

∗ ∗
Let , ,…, , and , ,…, .

∗ ∗ ∗ ∗ ∗
Σ ;

STAT 2006 ‐ Jan 2021 55
Chi‐squared Tests

Pearson Chi‐square statistic for goodness of fit


By direct verification, we can show that, under ,

where ∑ , is the observed frequency of class and is the expected


frequency. In general, is distributed as chi‐square with degree of freedom
1 number of independent variables estimated.

STAT 2006 ‐ Jan 2021 56
Chi‐squared Tests
Example ,…, ~Multinomial , ,…, .
Data 12,13,20,25 , 70
: , .
Under ,
0.18, 0.32.
2 2
Cell 1 2 3 4 Total
12 13 20 25 70
12.6 12.6 22.4 22.4 70

. . . .
0.60 (df = 4‐1‐1=2)
. . . .
Since p‐value is 0.741,  is not rejected.

STAT 2006 ‐ Jan 2021 57
Chi‐squared Tests
Example ,…, ~Multinomial , ,…, .
Data 12,13,20,25 , 70
: .
Under ,
0.154, 0.167, 0.32, 0.36.
Cell 1 2 3 4 Total
12 13 20 25 70
10.8 11.7 22.5 25 70

. . .
0.56 (df = 4‐1‐2=1)
. . .
Since p‐value is 0.454,  is not rejected.

STAT 2006 ‐ Jan 2021 58
Chi‐squared Tests
Example. A journal reported that, in a bag of m&m's chocolate peanut candies, there are
30% brown, 30% yellow, 10% blue, 10% red, 10% green and 10% orange candies. Suppose
you purchase a bag of m&m's chocolate peanut candies at a nearby store and find 17
brown, 20 yellow, 13 blue, 7 red, 6 green and 9 orange candies, for a total of 72 candies.
At the 0.1 level of significance, does the bag purchased agree with the distribution
suggested by the journal?
: , , , , , 0.3,0.3,0.1,0.1,0.1,0.1 vs
: 0.3,0.3,0.1,0.1,0.1,0.1
Data 17,20,13,7,6,9
Cell Brown Yellow Blue Red Green Orange Total
17 20 13 7 6 9 72
21.6 21.6 7.2 7.2 7.2 7.2 72

STAT 2006 ‐ Jan 2021 59
Chi‐squared Tests
Example.
: , , , , , 0.3,0.3,0.1,0.1,0.1,0.1 vs 
: 0.3,0.3,0.1,0.1,0.1,0.1
Cell Brown Yellow Blue Red Green Orange Total
17 20 13 7 6 9 72
21.6 21.6 7.2 7.2 7.2 7.2 72

2 log 2 17 log 20 log 13 log 7 log 6


. . . .
log 9 log 5.5761. The p‐value is  5.5761 0.35 0.1.
. .

At 10% level of significance, we cannot reject  .

STAT 2006 ‐ Jan 2021 60
Chi‐squared Tests

Example.
: , , , , , 0.3,0.3,0.1,0.1,0.1,0.1 vs 
: 0.3,0.3,0.1,0.1,0.1,0.1

Cell Brown Yellow Blue Red Green Orange Total


17 20 13 7 6 9 72
21.6 21.6 7.2 7.2 7.2 7.2 72

. . . . . .
6.426
. . . . . .
(df = 6‐1=5)
Since p‐value is 0.27, at 10% level of significance, we cannot reject  .

STAT 2006 ‐ Jan 2021 61
Chi‐Square Test for independence – 2 way Table
Suppose that data are collected on a pair of categorical variables , ,
where has categories and has categories. The data consists of pairs
, , , ,…, ,
which can be summarized in a frequency table.
/ ⋯ Total


⋮ ⋮ ⋮ ⋱ ⋮ ⋮

Total ⋯

where  is the number of pairs in  such that  and  . 

STAT 2006 ‐ Jan 2021 62
Chi‐Square Test for independence – 2 way Table

When and are independent, their joint distribution is equal to the


product of their marginal distributions. That is,
,
.
This is true for all pairs of , . Testing for the independence between and
is equivalence to testing : , for 1,2, … , , 1,2, … , .
Hence, the expected count for cell , is

STAT 2006 ‐ Jan 2021 63
Chi‐Square Test for independence – 2 way Table

Chi‐square test for independence between and :

Pearson statistics

Deviance Statistics 2 log

The is 1 1 1 1 1 1 .
A large value of or indicates that the independence model is not
plausible, and that and are related.
STAT 2006 ‐ Jan 2021 64
Chi‐Square Test for independence – 2 way Table

Assumptions on the validity of the asymptotic distribution

• The n sample units are i.i.d.

• The expected counts are sufficiently large (at least 80% of the
should be greater than 5)

STAT 2006 ‐ Jan 2021 65
Chi‐Square Test for independence – 2 way Table

Example
Study the relationship between gender and socio‐economic status (ses).
Socio‐economic status(ses)

Gender low medium high Total

Male 15 (21.385) 47 (43.225) 29 (26.39) 91

Female 32 (25.615) 48 (51.775) 29 (31.61) 109

Total 47 95 58 200

STAT 2006 ‐ Jan 2021 66
Chi‐Square Test for independence – 2 way Table
Example

91 47 91 95 91 58 109 47 109 95
15 47 29 32 48
200 200 200 200 200
91 47 91 95 91 58 109 47 109 95
200 200 200 200 200
109 58
29
200
4.577
109 58
200

200 15 200 47 200 29 200 32


2 15 log 47 log 29 log 32 log
91 47 91 95 91 58 109 47
200 48 200 29
48 log 29 log 4.68
109 95 109 58
The df is 2. The p‐value is 0.10. Hence the null hypothesis is not rejected at 5% significance 
level.
STAT 2006 ‐ Jan 2021 67
Chi‐Square Test for independence – 2 way Table

An independent model for a 2 2 table


Let 1 and 1 .

1 0 Total
1 1
0 1 1 1 1
Total 1 1

Note that  and  .

STAT 2006 ‐ Jan 2021 68

You might also like