Professional Documents
Culture Documents
Anova: Analysis of Variation: Math 243 Lecture R. Pruim
Anova: Analysis of Variation: Math 243 Lecture R. Pruim
Analysis of Variation
Math 243 Lecture
R. Pruim
[7.25]
[8.875]
[10.11]
Informal Investigation
Graphical investigation:
side-by-side box plots
multiple histograms
Whether the differences between the groups are
significant depends on
the difference in the means
the standard deviations of each group
the sample sizes
ANOVA determines P-value from the F statistic
treatment
Assumptions of ANOVA
each group is approximately normal
Normality Check
We should check for normality using:
assumptions about population
histograms for each group
normal quantile plot for each group
With such small data sets, there really isnt a
really good way to check normality from data,
but we make the common assumption that
physical measurements of people tend to be
normally distributed.
treatment
A
B
P
N
8
8
9
Mean
7.250
8.875
10.111
Median
7.000
9.000
10.000
StDev
1.669
1.458
1.764
xi x
xi
ij
Between MSG
F
Within
MSE
A large F is evidence against H0, since it
indicates that there is more difference
between groups than within groups.
F
6.45
P
0.006
R ANOVA Output
treatment
Residuals
( x ij x i )
SUMMARY
Groups
Column 1
Column 2
Column 3
Count
6.884
df
MS
F
P-value F crit
2 2.563667 10.21575 0.008394 4.737416
7 0.250952
9
group
mean
6.00
6.00
6.00
5.95
5.95
5.95
5.95
7.53
7.53
7.53
WITHIN
BETWEEN
difference:
difference
data - group mean
group mean - overall mean
plain
squared
plain
squared
-0.70
0.490
-0.4
0.194
0.00
0.000
-0.4
0.194
0.70
0.490
-0.4
0.194
-0.45
0.203
-0.5
0.240
0.25
0.063
-0.5
0.240
0.45
0.203
-0.5
0.240
-0.25
0.063
-0.5
0.240
-0.03
0.001
1.1
1.188
-0.33
0.109
1.1
1.188
0.37
0.137
1.1
1.188
1.757
5.106
0.25095714
2.55275
F = 2.5528/0.25025 = 10.21575
F
6.45
P
0.006
(x
ij
xi )
obs
(x ij x)
P
0.006
(x x)
obs
obs
F
6.45
F = MSG / MSE
(P-values for the F statistic are in Table E)
F
6.45
P
0.006
P-value
comes from
F(DFG,DFE)
So How big is F?
Since F is
(x
=
ij
x)
n 1
SST
=
= MST
DFT
xi
ij
ni 1
SS [ Within Group i ]
=
dfi
SSE
sp
MSE
DFE
so MSE is the
pooled estimate
of variance
In Summary
SST = (x ij x ) = s (DFT)
2
obs
SSE = (x ij x i ) = si (df i )
2
obs
groups
SSG = (x i x) =
2
obs
n (x
i
x)
groups
SS
MSG
SSE +SSG = SST; MS =
; F=
DF
MSE
R2 Statistic
R2 gives the percent of variance due to between
group variation
SSBetween SSG
R
SS Total
SST
N
8
8
9
Pooled StDev =
Mean
7.250
8.875
10.111
1.641
StDev
1.669
1.458
1.764
F
6.45
P
0.006
Multiple Comparisons
Once ANOVA indicates that the groups do not all
have the same means, we can compare them two
by two using the 2-sample t test
We need to adjust our p-value threshold because we
are doing multiple tests with the same data.
There are several methods for doing this.
If we really just want to test the difference between one
pair of treatments, we should set the study up that way.
95% confidence
Use alpha = 0.0199 for
each test.
-3.685
0.435
-4.863
-0.859
Only P vs A is significant
(both values have same sign)
Tukeys Method in R
Tukey multiple comparisons of means
95% family-wise confidence level
diff
lwr
upr
B-A 1.6250 -0.43650 3.6865
P-A 2.8611 0.85769 4.8645
P-B 1.2361 -0.76731 3.2395