You are on page 1of 51

Biostatistics

Xuezhong Shi M.D.


Professor of Epidemiology & Biostatistics

Phone:0371-66940840,66911486
E-mail: xzshi@zzu.edu.cn
Comparison of one sample mean
review

z − test
yes
Is σ known? X − µ0
z=
σ/ n
no
yes z − test
Is sample size larger than 30? X − µ0
z=
S/ n
no

t −test
X −µ0
t =
S/ n
Comparison of two samples
review
Paired t test (samples must come
Yes from normal populations):
Are the
two samples t = d −0
dependent? sd n
with df = n - 1 z test (normal distribution):
No z = (x1−x2)
Do n1 and n2 Yes s12 s22
+
both exceed n1 n2
30? No
Data transform
No
Are both populations Yes
normally distributed? Use nonparametric tests

Yes
See if
Reject H0 σ12 ≠ σ 22
σ12 = σ 22 t’ test

Not reject H0 σ 12 = σ 22

t= ( x1−x2)
s 2 (n −1)+s 2 (n − 1)
( 1 1 2 2 ( 1 +1 )
n +n −2 n1 n2
1 2
But when there are more than two
samples, (three or more than three treat
factors), t-test or u-test cann’t be used.
Treat 1
1-2 α=0.05
Treat 2 1-3 α=0.05
2-3 α=0.05
Treat 3

when there are more than two samples,


which method should be used?
Analysis of Variance

(ANOVA)
ANOVA is a technique used to test

a hypothesis concerning the means

of three or more populations.


One-way ANOVA

two-way ANOVA
(randomized block design ANOVA)
ANOVA
Repeated measurement ANOVA

……
Main contents
one-way analysis of variance

Ⅰ Model assumptions
Ⅱ Basic ideas of ANOVA
Ⅲ Basic steps of ANOVA
Ⅳ Relationship between ANOVA and t-test
Teaching aims
• Master the applicable conditions and basic ideas

of ANOVA

• Be familiar with the steps of ANOVA


ANOVA is one of
hypothesis tests of
numerical variable,
which is developed
by R.A.FISHER, (a
British statistician)
So it is also called
F-test.
Ⅰ Model assumptions
1 The k samples represent completely
independent random samples drawn from
k specific populations.
2 Each of the k populations is normal.
3 Each of the k populations has the same
variances
Ⅱ Basic ideas of ANOVA

The total variation(SS) is decomposed into

several components. The corresponding

degree of freedom is also decomposed

into several components.


Decomposition of total variation

SSB SSW

SST

SST= SSB+ SSW


Decomposition of total degree of freedom

νT νB
νW

ν df=N − 1
ν Between group=k − 1
ν within group=N − k
ν T = ν B +ν W
Generally, SSB>SSW

SS B MS B
F= =
SSW MSW

MS B = SS B /ν B

MSW = SSW /ν W
Ⅲ Basic steps of ANOVA STEPS

The statisticians have made a set of steps as


fixed as legal procedure corresponding to
ANOVA, and made some formulas to
calculate the T.S. we have many formulas, but
their steps are same. You only remember the
steps, these formulas will give you when you
need.
Set up hypothesis and confirm α

STEPS
compute test statistics

Find p value

P≤α P>α
Make conclusion

Reject H0 Don’t reject H0


Example 1
A gerontologist investigating various
aspects of the aging process wanted to
see whether staying “lean and mean,” that
is, being under normal body weight would
lengthen life span. She randomly assigned
newborn rats from a highly inbred line to
one of three diets (table 1). She
maintained the rats on three diets
throughout their lives and recorded their
life spans. Is there evidence that diet
affect life span in this study?
Table 1 life spans of different groups

Unlimited 90% diet 80% diet

2.5 2.7 3.1


3.1 3.1 2.9
2.3 2.9 3.8
1.9 3.7 3.9
2.4 3.5 4.0
STEPS

Set up hypothesis and confirm α

H0 : μ1=μ2=μ3

H1 : At least two of them


are different
α=0.05
Here, the null hypothesis will be that
all population means are equal, and
the alternative hypothesis is that at
least one mean is different.
STEPS
compute test statistics

k ni
( ∑ x ) 2
(1) SST = ∑ ∑ ( xij − x ) 2 = ∑ x 2 − = 5.597
i =1 j =1 N
ν T = 3 × 5 − 1 = 14
groups(i)
处理组
1 2 3 … k
x11 x 21 x31 … xk 1
x12 x 22 x32 … xk 2
xij
… … … … …
x1n1 x 2n2 x3n3 … x knk
n1 n2 n3 nk
total
合 计 ∑x 1j ∑x 2j ∑x 3j … ∑x
j =1
kj
j =1 j =1 j =1

ni n1 n2 n3 … nk
Unlimited 90% diet 80% diet

2.5 2.7 3.1


3.1 3.1 2.9
2.3 2.9 3.8
1.9 3.7 3.9
2.4 3.5 4.0

∑X
5 5

∑X 1j = 12.2 ∑X
j =1
2j = 15.9
j =1
1j = 17.7
j =1

n1 = 5 n2 = 5 n3 = 5
x1 = 2.44 x2 = 3.18 x3 = 3.54
(2) ss B = ∑ ni ( X i − X ) 2
i

= 5(3.145 − 3.0533) 2 + 5(3.18 − 3.0533) 2 + 5(3.54 − 3.0533) 2


= 3.145;
ν B = K −1 = 3 −1 = 2
SS B 3.145
MS B = = = 1.573
νB 2
(3) SSW = SST − SS B = 5.597 − 3.145 = 2.452
ν W = ν T −ν B = 14 − 2 = 12
SSW 2.452
MSW = = = 0.204
νW 12
MS B 1.573
F= = = 7.697
MSW 0.204
Summary table
Source SS df MS F
SSB 3.145 2 1.573
SSW 2.452 12 0.204 7.697
SST 5.597 14
STEPS
Find p value and make conclusion

look up F critical values table


F(0.05,2,12) =3.88
F> F(0.05,2,12)
So reject H0
At least two of them are different
Table 4 F critical value

ν 2
1

10 ……
The outcome of ANOVA only reflects on
the whole the population mean is different.
It doesn’t show any two population means
are different. If you want to know which two
population mean are different, you should
do multiple comparisons ( also called post
hoc test).
multiple comparisons

There are many methods in multiple


comparisons. Among them, SNK - q
test and LSD - t test are used often.
Input data
Tests of normality
T e s t s o f N o r m a li t y
a
K o lm o g o r o v -S m ir n o v S h a p ir o -W ilk
g r o u p s S ta tis tic df S ig . S ta tis tic df S ig .
life s p a n su n lim ite d .2 4 5 5 .2 0 0* .9 5 1 5 .7 4 7
9 0 % d ie t .1 8 0 5 .2 0 0* .9 5 2 5 .7 5 4
8 0 % d ie t .2 9 7 5 .1 7 0 .8 4 4 5 .1 7 6
* .T h is is a lo w e r b o u n d o f th e tr u e sig n ific a n c e .
a .L illie fo r s S ig n ific a n c e C o r r e c tio n
ANOVA
Test of Homogeneity of Variances

lifespans
Levene
Statistic df1 df2 Sig.
.598 2 12 .566

ANOVA

life sp a ns
Sum of
S qu a re s df M e a n S qu a re F S ig .
B e twe e n G ro up s 3.14 5 2 1.57 3 7 .697 .007
W ith in G ro up s 2.45 2 12 .204
T o ta l 5.59 7 14
M u lt ip le C o m p a r is o n s

D e p e n d e n t V a r ia b le : life sp a n s
LSD

Mean
D iffe r e n c e 9 5 % C o n fid e n c e In te r v a l
(I) g r o u p s(J) g r o u p s (I-J) S td . E r r o r S ig . L o w e r B o u nUdp p e r B o u n d
u n lim ite d9 0 % d ie t -.7 4 0 0* .2 8 5 9 .0 2 4 -1 .3 6 3 -.1 1 7
8 0 % d ie t -1 .1 0 0 *0 .2 8 5 9 .0 0 2 -1 .7 2 3 -.4 7 7
9 0 % d ie t u n lim ite d .7 4 0 0* .2 8 5 9 .0 2 4 .1 1 7 1 .3 6 3
8 0 % d ie t -.3 6 0 0 .2 8 5 9 .2 3 2 -.9 8 3 .2 6 3
8 0 % d ie t u n lim ite d 1 .1 0 0 *0 .2 8 5 9 .0 0 2 .4 7 7 1 .7 2 3
9 0 % d ie t .3 6 0 0 .2 8 5 9 .2 3 2 -.2 6 3 .9 8 3
* .T h e m e a n d iffe r e n c e is s ig n ific a n t a t th e .0 5 le v e l.
Ⅳ Relationship between ANOVA and t-test
Example2
Survivable Days after taking some drug

Experiments control
5 18
10 21
14 30
21 23
17 22
22
STEPS

Set up hypothesis and confirm α

1. H0 :µ1 = µ 2
µ1 ≠ µ 2
H1 :
α = 0.05
STEPS
compute test statistics

Use ANOVA

k ni
(∑ x)
2
(1) SST = ∑ ∑ ( xij − x ) = ∑ x −
2 2
= 466.727
i =1 j =1 N
ν T = 11 − 1 = 10
Survivable Days after taking some drug
Experiments control total
5 18
∶ ∶
17 22
22

n 5 6 11
x 13.4 22.7 18.45
∑x 67 136 203

∑x 2
1051 3162 4213
( 2) ss B = ∑ni ( X i − X ) 2
i

= 234 .194
ν B = 2 −1 =1
SS B
MS B = = 234 .194
νB
(3) SSW = SST − SS B = 232.233
ν W = ν T −ν B = 10 − 1 = 9
SSW 232.233
MSW = = = 25.804
νW 9
MS B 234.194
F= = = 9.076
MSW 25.804
Summary
Summary table
table of
of ANOVA
ANOVA

Summary table
Source SS df MS F
SST 466.727 10
SSB 234.194 1 234.194 9.076
SSW 232.233 9 25.804
Use t-test

t= (x −x )
1 2

s 2 (n −1)+s 2 (n −1)
(1 1 2 2 (1 + 1 )
n +n −2 n1 n2
1 2
=3.012
STEPS
Find p value and make conclusion

F0.05,1,9 =5.12
F = 9.076> F0.05, 1,9 , P<0.05

t = 3.012 , P<0.05
F=t2
So reject H0
When treat factors are 2, the effect of F-test
and t-test is equivalent (F=t2). But it is more
easier choosing t-test than choosing F-test.
So when treat factors are 2, we had better
choose t-test. Only when treat factors are larger
than 3, can we choose F-test.

You might also like