P. 1
Analysis of Variance

Analysis of Variance

|Views: 125|Likes:

See more
See less

03/18/2014

pdf

text

original

Biostatistics

Xuezhong Shi M.D. Professor of Epidemiology & Biostatistics Phone:0371-66940840,66911486 E-mail: xzshi@zzu.edu.cn

Comparison of one sample mean

review

yes Is σ known? no yes Is sample size larger than 30? no

z − test X − µ0 z= σ/ n z − test z= X − µ0 S/ n

t− s te t t = X −µ 0 S/ n

Comparison of two samples
Yes Paired t test (samples must come from normal populations):

review
z test (normal distribution): z = (x1−x2) 2 s12 s2 + n1 n2

Are the two samples dependent? No Do n1 and n2 both exceed 30? No Are both populations normally distributed? Yes See if 2 σ1 = σ 2 2

with df = n - 1 Yes No Yes

t = d −0 sd n

Data transform
Use nonparametric tests Reject

H0

2 σ12 ≠ σ 2

t’ test

Not reject H0

σ 12 = σ 22

t=

( x1− 2) x s 2 (n −)+ 2 (n −) 1 s 1 1 1 2 2 ( ( 1 +1 ) n + − n 2 n1 n2 1 2

But when there are more than two samples, (three or more than three treat factors), t-test or u-test cann’t be used.
Treat 1 Treat 2 Treat 3

1-2 1-3 2-3

α=0.05 α=0.05 α=0.05

when there are more than two samples, which method should be used?

Analysis of Variance

(ANOVA)

ANOVA is a technique used to test a hypothesis concerning the means of three or more populations.

One-way ANOVA two-way ANOVA

ANOVA

(randomized block design ANOVA) Repeated measurement ANOVA ……

Ⅰ Model assumptions Ⅱ Basic ideas of ANOVA Ⅲ Basic steps of ANOVA Ⅳ Relationship between ANOVA and t-test

Teaching aims
• Master the applicable conditions and basic ideas of ANOVA • Be familiar with the steps of ANOVA

ANOVA is one of hypothesis tests of numerical variable, which is developed by R.A.FISHER, (a British statistician) So it is also called F-test.

Ⅰ Model assumptions
1 The k samples represent completely independent random samples drawn from k specific populations. 2 Each of the k populations is normal. 3 Each of the k populations has the same variances

Ⅱ Basic ideas of ANOVA
The total variation(SS) is decomposed into several components. The corresponding degree of freedom is also decomposed into several components.

Decomposition of total variation

SSB

SSW

SST SST= SSB+ SSW

Decomposition of total degree of freedom

νT
ν df＝N − 1
ν Between group＝k − 1

νB

νW

ν within group＝N − k

ν T = ν B +ν W

Generally, SSB>SSW

SS B MS B F= = SSW MSW
MS B = SS B /ν B

MSW = SSW /ν W

Ⅲ Basic steps of ANOVA

STEPS

The statisticians have made a set of steps as fixed as legal procedure corresponding to ANOVA, and made some formulas to calculate the T.S. we have many formulas, but their steps are same. You only remember the steps, these formulas will give you when you need.

Set up hypothesis and confirm α

STEPS

compute test statistics Find p value

P≤α

P>α
Make conclusion

Reject H0

Don’t reject H0

Example 1

A gerontologist investigating various aspects of the aging process wanted to see whether staying “lean and mean,” that is, being under normal body weight would lengthen life span. She randomly assigned newborn rats from a highly inbred line to one of three diets (table 1). She maintained the rats on three diets throughout their lives and recorded their life spans. Is there evidence that diet affect life span in this study?

Table 1 life spans of different groups
Unlimited 2.5 3.1 2.3 1.9 2.4 90% diet 2.7 3.1 2.9 3.7 3.5 80% diet 3.1 2.9 3.8 3.9 4.0

STEPS
Set up hypothesis and confirm α

H0 ： μ1=μ2=μ3 H1 ： At least two of them are different α=0.05

Here, the null hypothesis will be that all population means are equal, and the alternative hypothesis is that at least one mean is different.

STEPS
compute test statistics

(∑ x) 2 (1) SST = ∑ ∑ ( xij − x ) 2 = ∑ x 2 − = 5.597 N i =1 j =1
k ni

ν T = 3 × 5 − 1 = 14

groups i 处理组 （）

1

2

3

… … … … … … …

k

xij

x11 x12

x 21 x 22

x31 x32

xk 1 xk 2

x1n1
total 合 计

x 2n2

x3n3

x knk

∑x
j =1

n1

1j

∑x
j =1

n2

2j

∑x
j =1

n3

3j

∑x
j =1

nk

kj

ni

n1

n2

n3

nk

Unlimited 2.5 3.1 2.3 1.9 2.4

90% diet 2.7 3.1 2.9 3.7 3.5

80% diet 3.1 2.9 3.8 3.9 4.0

∑X
j =1

5

1j

= 12.2

∑X
j =1

5

2j

= 15.9

∑X
j =1

5

1j

= 17.7

n1 = 5 x1 = 2.44

n2 = 5 x2 = 3.18

n3 = 5 x3 = 3.54

(2) ss B = ∑ ni ( X i − X ) 2
i

= 5(3.145 − 3.0533) 2 + 5(3.18 − 3.0533) 2 + 5(3.54 − 3.0533) 2 = 3.145;

ν B = K −1 = 3 −1 = 2 SS B 3.145 MS B = = = 1.573 νB 2

(3) SSW = SST − SS B = 5.597 − 3.145 = 2.452

ν W = ν T −ν B = 14 − 2 = 12 SSW 2.452 MSW = = = 0.204 νW 12 MS B 1.573 F= = = 7.697 MSW 0.204

Summary table Source SS df MS SSB 3.145 2 1.573 SSW 2.452 12 0.204 SST 5.597 14

F 7.697

STEPS
Find p value and make conclusion look up F critical values table F(0.05,2,12) =3.88 F> F(0.05,2,12) So reject H0 At least two of them are different

Table 4 F critical value

ν

2

1 ……

10

The outcome of ANOVA only reflects on the whole the population mean is different. It doesn’t show any two population means are different. If you want to know which two population mean are different, you should do multiple comparisons ( also called post hoc test).

multiple comparisons There are many methods in multiple comparisons. Among them, SNK － q test and LSD － t test are used often.

Input data

Tests of normality

T e s t s o f N o r m a li t y K o lm o g o r o v -S m ir n o v S h a p ir o -W ilk g r o u p s S ta tis tic df S ig . S ta tis tic df S ig . life s p a n su n lim ite d .2 4 5 5 .2 0 0 * .9 5 1 5 .7 4 7 9 0 % d ie t .1 8 0 5 .2 0 0 * .9 5 2 5 .7 5 4 8 0 % d ie t .2 9 7 5 .1 7 0 .8 4 4 5 .1 7 6 * .T h is is a lo w e r b o u n d o f th e tr u e sig n ific a n c e . a .L illie fo r s S ig n ific a n c e C o r r e c tio n
a

ANOVA

Test of Homogeneity of Variances lifespans Levene Statistic .598 df1 2 df2 12 Sig. .566

ANOVA life sp a ns Sum of S qu a re s B e twe e n G ro up s 3.14 5 W ith in G ro up s 2.45 2 T o ta l 5.59 7 df M e a n S qu a re 2 1.57 3 12 .204 14 F 7 .697 S ig . .007

M u lt ip le C o m p a r is o n s D e p e n d e n t V a r ia b le : life sp a n s LSD Mean D iffe r e n c e (I) g r o u p s g r o u p s (I-J) (J) S td . E r r o r u n lim ite d9 0 % d ie t -.7 4 0 0 * .2 8 5 9 8 0 % d ie t -1 .1 0 0 * 0 .2 8 5 9 9 0 % d ie t u n lim ite d .7 4 0 0 * .2 8 5 9 8 0 % d ie t -.3 6 0 0 .2 8 5 9 8 0 % d ie t u n lim ite d 1 .1 0 0 * 0 .2 8 5 9 9 0 % d ie t .3 6 0 0 .2 8 5 9

S ig . .0 2 4 .0 0 2 .0 2 4 .2 3 2 .0 0 2 .2 3 2

9 5 % C o n fid e n c e In te r v a l L o w e r B o u nU p p e r B o u n d d -1 .3 6 3 -.1 1 7 -1 .7 2 3 -.4 7 7 .1 1 7 1 .3 6 3 -.9 8 3 .2 6 3 .4 7 7 1 .7 2 3 -.2 6 3 .9 8 3

* .T h e m e a n d iffe r e n c e is s ig n ific a n t a t th e .0 5 le v e l.

Ⅳ Relationship between ANOVA and t-test
Example2

Survivable Days after taking some drug

Experiments 5 10 14 21 17

control 18 21 30 23 22 22

STEPS
Set up hypothesis and confirm α

1. H0 ：µ1 = µ 2 H1 ： α = 0.05

µ1 ≠ µ 2

STEPS
compute test statistics

Use ANOVA
(∑ x) (1) SST = ∑ ∑ ( xij − x ) = ∑ x − = 466.727 N i =1 j =1
k ni 2 2 2

ν T = 11 − 1 = 10

Survivable Days after taking some drug
Experiments control

total

5

18

17

22 22

n

∑x ∑x

x

5 13.4 67 1051

6 22.7 136 3162

11 18.45 203 4213

2

( 2) ss B = ∑ni ( X i − X ) 2
i

= 234 .194

ν B = 2 −1 =1
MS B = SS B

νB

= 234 .194

(3) SSW = SST − SS B = 232.233

ν W = ν T −ν B = 10 − 1 = 9 SSW 232.233 MSW = = = 25.804 νW 9 MS B 234.194 F= = = 9.076 MSW 25.804

Summary table of ANOVA Summary table of ANOVA
Source SST SSB SSW Summary table SS df MS F 466.727 10 234.194 1 234.194 9.076 232.233 9 25.804

Use t-test
t= (x −x ) s 2 (n −1)+s 2 (n −1) 2 2 (1 + 1 ) (1 1 n +n −2 n1 n2 1 2 =3.012
1 2

STEPS
Find p value and make conclusion

F0.05,1,9 =5.12 F = 9.076> F0.05, 1,9 ， P<0.05

t = 3.012 ， P<0.05

F=t2
So reject H0

When treat factors are 2, the effect of F-test and t-test is equivalent (F=t2). But it is more easier choosing t-test than choosing F-test. So when treat factors are 2, we had better choose t-test. Only when treat factors are larger than 3, can we choose F-test.

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->