You are on page 1of 9

Statistical methods for comparing multiple groups

Continuous data: comparing multiple means


Analysis of variance
Lecture 7: ANOVA
Binary data: comparing multiple proportions
Sandy Eckel Chi-square tests for r × 2 tables
seckel@jhsph.edu Independence
Goodness of Fit
Homogeneity

30 April 2008
Categorical data: comparing multiple sets of categorical responses
Similar chi-square tests for r × c tables

1 / 34 2 / 34

ANOVA: Definition ANOVA: Concepts

Statistical technique for comparing means for multiple


(usually ≥ 3) independent populations
Estimate group means
To compare the means in 2 groups, just use the methods we
learned to conduct a hypothesis test for the equality of two Assess magnitude of variation attributable to specific sources
population means (called a t-test) Partition the variation according to source
Extension of 2-sample t-test to multiple groups
Partition the total variation in a response variable into Population model
Variability within groups
Sample model: sample estimates,
Variability between groups
standard errors (sd of sampling distribution)

ANOVA = ANalysis Of VAriance

3 / 34 4 / 34
Types of ANOVA Emphasis

One-way ANOVA
One-way ANOVA is an extension of the t-test to 3 or more
One factor — e.g. smoking status samples
(never, former, current)
focus analysis on group differences

Two-way ANOVA
Two-way ANOVA (and higher) focuses on the interaction of
Two factors — e.g. gender and smoking status factors
Does the effect due to one factor change as the level of
Three-way ANOVA another factor changes?
Three factors — e.g. gender, smoking and beer consumption

5 / 34 6 / 34

Recall: sample estimator of variance ANOVA partitioning of the variance I

Sample variance: Pn
2 i (Xi − X̄ )2 Variation Variation
s =
n−1 Variation between each between each
in all = observation + group mean
observations and its group and the overall
mean mean
The distance from any data point to the mean is the deviation
from this point to the mean:
(Xi − X̄ ) In other words,

Total = Within group + Between groups


sum of squares sum of squares sum of squares
The sum of squares is the sum of all squared deviations:
n
X In shorthand:
SS = (Xi − X̄ )2
SST = SSW + SSB
i
7 / 34 8 / 34
ANOVA partitioning of the variance II ANOVA partitioning of the variance III

SST This is the sum of the squared deviations between


SSB This is the sum of the squared deviations between
each observation and the overall mean:
each group mean and the overall mean, weighted by
n
X the sample size of each group (ngroup ):
SST = (Xi − X̄ )2
X
i
SSB = ngroup (X̄group − X̄ )2
group
SSW This is the sum of the squared deviations between
each observation the mean of the group to which it
belongs:
If the group means are not very different, the variation between
n
X them and the overall mean (SSB) will not be much more than the
SSW = (Xi − X̄group(i) )2 variation between the observations within a group (SSW)
i

9 / 34 10 / 34

One-way ANOVA: the picture Within groups mean square: MSW

We assume the variance σ 2 is the same for each of the


group’s populations
We can pool (combine) the estimates of σ 2 across groups and
use an overall estimate for the common population variance:
2
Variation within a group = σ̂W
SSW
= = MSW
N −k
MSW is called the “within groups mean square”

11 / 34 12 / 34
Between groups mean square: MSB An ANOVA table

Suppose there are k groups (e.g. if smoking status has


categories current, former or never, then k=3)
We calculate a test statistic for a hypothesis test using the
sum of square values as follows:
We can also look at systematic variation among groups

Variation between groups = σ̂B2


SSB
= = MSB
k −1

13 / 34 14 / 34

Hypothesis testing with ANOVA A test statistic

Goal: Compare the two sources of variability: MSW and MSB


In ANOVA we ask: Our test statistic is
is there truly a difference in means across groups?
Formally, we can specify the hypotheses: MSB σ̂ 2 variance between the groups
Fobs = = 2B =
MSW σ̂W variance within the groups
H0 : µ1 = µ2 = · · · = µk
Ha : at least one of the µi ’s is different
If Fobs is small (close to 1), then variability between groups is
The null hypothesis specifies a global relationship between the
negligible compared to variation within groups
means
⇒ The grouping does not explain much variation in the data
Get your test statistic (more next slide...)
If the result of the test is significant (p-value ≤ α), If Fobs is large, then variability between groups is large
then perform individual comparisons between pairs of groups compared to variation within groups
⇒ The grouping explains a lot of the variation in the data
15 / 34 16 / 34
The F-statistic The F-distribution

Remember that a χ2 distribution is always specified by its


degrees of freedom
For our observations, we assume X ∼ N(µgp , σ 2 ), where µgp
An F-distribution is any distribution obtained by taking the
is the (possibly different) mean for each group’s population.
quotient of two χ2 distributions divided by their respective
Note: under H0 , we assume all the group means are the same
degrees of freedom
We have assumed the same variance σ 2 for all groups —
When we specify an F-distribution, we must state two
important to check this assumption
parameters, which correspond to the degrees of freedom for
Under these assumptions, we know the null distribution of the the two χ2 distributions
MSB
statistic F= MSW
If X1 ∼ χ2df1 and X2 ∼ χ2df2 we write:
The distribution is called an F-distribution
X1 /df1
∼ Fdf1 ,df2
X2 /df2

17 / 34 18 / 34

Back to the hypothesis test . . . ANOVA: F-tests I

How to find the value of Fα;k−1,N−k


Knowing the null distribution of MSB , we can define a decision
MSW

rule for the ANOVA test statistic (Fobs ):

Reject H0 if Fobs ≥ Fα;k−1,N−k


Fail to reject H0 if Fobs < Fα;k−1,N−k

Fα;k−1,N−k is the value on the Fk−1,N−k distribution that,


used as a cutoff, gives an area in the uppper tail = α
We are using a one-sided rejection region

Use the R function qf(alpha, df1, df2, lower.tail = F)


19 / 34 20 / 34
ANOVA: F-tests II Example: ANOVA for HDL

How to find the p-value of an ANOVA test statistic


Study design: Randomized controlled trial
132 men randomized to one of
Diet + exericse
Diet
Control

Follow-up one year later:


119 men remaining in study

Outcome: mean change in plasma levels of HDL cholesterol from


baseline to one-year follow-up in the three groups

Use the R function pf(Fobs, df1, df2, lower.tail = F)


21 / 34 22 / 34

Model for HDL outcomes HDL ANOVA Table

We model the means for each group as follows:


We obtain the following results from the HDL experiment:
µc = E (HDL|gp = c) = mean change in control group
µd = E (HDL|gp = d) = mean change in diet group
µde = E (HDL|gp = de) = mean change in diet and exercise group

We could also write the model as

E (HDL|gp) = β0 + β1 I (gp = d) + β2 I (gp = de)

Recall that I(gp=D), I(gp=DE) are 0/1 group indicators

23 / 34 24 / 34
HDL ANOVA results & conclusions Which groups are different?

F-test
H0 : µc = µd = µde (or H0 : β1 = β2 = 0)
Ha : at least one mean is different from the others We might proceed to make individual comparisons
Conduct two-sample t-tests to test for a difference in means
Test statistic for each pair of groups (assuming equal variance):
Fobs = 13.0
(X̄1 − X̄2 ) − µ0
df1 = k − 1 = 3 − 1 = 2 tobs = q 2
sp sp2
df2 = N − k = 116 n1 + n2
Rejection region: F > F0.05;2,116 = 3.07
Recall: sp2 is the ‘pooled’ sample estimate of the common
variance
Conclusions
df = n1 + n2 − 2
Since Fobs = 13.0 > 3.07, we reject H0
We conclude that at least one of the group means is different
from the others (p< 0.05)
25 / 34 26 / 34

Multiple Comparisons Bonferroni adjustment

Performing individual comparisons require multiple hypothesis


tests
If α = 0.05 for each comparison, there is a 5% chance that A possible correction for multiple comparisons
each comparison will falsely be called significant Test each hypothesis at level α∗ = (α/3) = 0.0167
Overall, the probability of Type I error is elevated above 5% Adjustment ensures overall Type I error rate does not exceed
For n independent tests, the probability of making a Type I α = 0.05
error at least once is 1 − 0.95n
However, this adjustment may be too conservative
Example: For n = 10 tests, the probability of at least one
Type I error is 0.40 !
Question: How can we address this multiple comparisons
issue?

27 / 34 28 / 34
Multiple comparisons α HDL: Pairwise comparisons I

Hypothesis α∗ = α/3 Control and Diet groups


H0 : µc = µd (or β1 = 0) 0.0167 H0 : µc = µd (or β1 = 0)
H0 : µc = µde (or β2 = 0) 0.0167 t= −0.05−0.02
q = −1.87
0.028
H0 : µd = µde (or β1 − β2 = 0) 0.0167 40
+ 0.028
40

Overall α = 0.05 p-value = 0.06

29 / 34 30 / 34

HDL: Pairwise comparisons II HDL: Pairwise comparisons III

Control and Diet + exercise groups Diet and Diet + exercise groups
H0 : µc = µde (or β2 = 0) H0 : µd = µde (or β1 − β2 = 0)
t= −0.05−0.14
q
0.028
= 5.05 t= −0.02−0.14
q = −3.19
40
+ 0.028
39
0.028
40
+ 0.028
39

p-value = 4.4 × 10−7 p-value = 0.0014

31 / 34 32 / 34
Bonferroni corrected p-values Summary of Lecture 7

Hypothesis p-value adjusted α Sample variance, sums of squares


H0 : µc = µd 0.06 0.0167 ANOVA
H0 : µc = µde 4.4 × 10−7 0.0167 partition of the sums of squares
H0 : µd = µde 0.0014 0.0167 ANOVA table
Overall α = 0.05 hypothesis test setup
test statistic
F distribution
The p-value must be less than the adjusted α to reject H0 Multiple hypothesis testing, Bonferroni correction
ANOVA ”dull hypothesis” joke:
Conclusion: Significant difference in HDL change for DE group http://www.phdcomics.com/comics/archive.php?comicid=905
compared to other groups

33 / 34 34 / 34

You might also like