Professional Documents
Culture Documents
Week 11
Parametric technique
One-way between groups ANOVA
One-way repeated measures ANOVA
Two-way between groups ANOVA
Mixed between-within group ANOVA
Non-parametric technique
Kruskal-Wallis Test
Friedman Test
None
None
Analysis of variance is used when you have two or more groups or time points.
Paired-sample/ repeated measures/ within-group techniques are used when you test the same
people on more than one occasion, or you have matched pairs.
Independent/ between-group techniques are used when the participants in each group are different
people (independent of one another).
One-way ANOVA one categorical independent variable (e.g.: gender) and one continuous
dependent variable (e.g.: scores)
Two-way ANOVA two independent variables (e.g.: gender, age group) and one continuous
dependent variable (e.g: scores)
Heart disease is one of the largest causes of premature death and it is now known that chronic, lowlevel inflammation is a cause of heart disease. Exercise is known to have many benefits, including
protection against heart disease. A researcher wants to know whether this protection against heart
disease might be due to exercise reducing inflammation. The researcher was also curious as to
whether this protection might be gained over a short period of time or whether it took longer. In
order to investigate this idea, the researcher recruited 20 participants who underwent a 6-month
exercise training program. In order to determine whether inflammation had been reduced, the
researcher measured the inflammatory marker called CRP at pre-training, 2 weeks into training and
after 6 months of training.
One-way repeated measures ANOVA/ One-way within-group ANOVA
2
Assumptions for one-way between-group/within-group ANOVA:
Before running any parametric test, we always need to make sure that the data we want to analyse
can actually be analysed using a one-way ANOVA.
Between-group
Within-group
Assumption #1: The dependent variable should be measured at the interval or ratio
level (i.e., continuous scale rather than discrete scale).
For example: Revision time (measured in hours), intelligence (measured using IQ score), exam
performance (measured from 0 to 100), weight (measured in kg).
Assumption #2: The independent
variable should consist of two or more
categorical, independent groups. When you
have only two groups (e.g.: gender: male and
female), an independent-samples t-test is
commonly used, although one-way ANOVA will
generate the same results.
For example: Ethnicity (e.g., 3 groups:
Caucasian, African American and Hispanic),
physical activity level (e.g., 4 groups: sedentary,
low, moderate and high), profession (e.g., 5
groups: surgeon, doctor, nurse, dentist,
therapist).
Assumption #3: You should have independence of observations, which means that there is no
relationship between the observations in each group or between the groups themselves. For
example, when using between-group techniques, there must be different participants in each
group with no participant being in more than one group. This is more of a study design issue than
something you can test for, but it is an important assumption of the one-way ANOVA. If your
study fails this assumption, you will need to use another statistical test instead of the one-way
ANOVA (e.g., a repeated measures design).
N/A for within-group techniques
Assumption #4: There should be no significant outliers. Outliers are simply single data points
within your data that do not follow the usual pattern (e.g., in a study of 100 students' IQ scores,
where the mean score was 108 with only a small variation between students, one student had a
score of 156, which is very unusual, and may even put her in the top 1% of IQ scores globally). The
problem with outliers is that they can have a negative effect on the one-way ANOVA, reducing the
validity and accuracy of your results.
Assumption #5: Your dependent variable should be approximately normally distributed for each
category of the independent variable. We talk about the one-way ANOVA only
requiring approximately normal data because it is quite "robust" to violations of normality,
meaning that assumption can be a little violated and still provide valid results, provided that the
experiment design is balanced. You can test for normality using the Shapiro-Wilk test of
normality, which is easily tested for using SPSS Statistics.
3
Assumption #6: There needs to
be homogeneity of variances. You can test this
assumption in SPSS Statistics using Levene's
test for homogeneity of variances. If your data
fails this assumption, you will need to carry out
a Welch ANOVA instead of a one-way ANOVA,
which you can do using SPSS Statistics, and also
use a different post-hoc test.
Example:
Research question: Is there a statistically significant difference in undergraduate students grade
points for a Statistics class based on the type of lecture medium (online conference class, traditional
lecture, and traditional lecture supplemented by online conference class).
: There is no statistically significant difference in undergraduate students grade points for a
Statistics class based on the type of lecture (online conference class, traditional lecture, and
traditional lecture supplemented by online conference class).
Null hypothesis: No difference between population means (1 = 2 = 3 ).
=
Research hypothesis: Population means are different (1 2 3 ).
>
One-Way ANOVA
Control
Control
Outcome
(Dependent
variable)
Treatment
1
Treatment
Treatment
2
Outcome
(Dependent
variable)
4
Example #1 (post hoc):
We have three teaching program (online conference class, traditional lecture, and traditional lecture
supplemented by online conference class) and we are interested in the effectiveness of each
program on increasing undergraduate students grade points for a Statistics class. Below is the data
for analysis.
MEAN, X
Online conference
class
12
15
9
12.00
Traditional lecture
20
19
23
20.67
Table Summary:
SS
Between
Within
Total
df
k-1
MS
SS (B)
k-1
N-k
SS (W)
N-k
)
n (X X
)2
(X X
(X
X)2
N-1
F-ratio
MS (B)
MS (W)
5
Computation of ANOVA:
Sum of squares between-groups examines the differences among the group means by calculating
) around the grand mean (X
). This is variation in scores that is due to
the variation of each mean (X
the treatment (or independent variable).
SSA = n (X
X)2
= 12.00+20.67+39.00
X
X=
3
12+15+9+20+19+23+40+35+42
9
= .
Sum of squares within-group examines error variation or variation of individual scores around each
group mean. This is variation in scores that is not due to the treatment (or independent variable) but
due to variation in individuals.
)2
SSS/A = (X X
6
Total sum of squares can be computed by adding SSA and SSS/A , but also by simply subtracting each
score from the grand mean, squaring, and then summing across all cases.
SST = SSA + SSS/A
SST = (X
X)2
SST = (12 23.89)2 + (15 23.89)2 + (9 23.89)2 + (20 23.89)2 + (19 23.89)2 +
(23 23.89)2 + (40 23.89)2 + (35 23.89)2 + (42 23.89)2
= 141.35 + 79.01 + 221.68 + 15.12 + 23.90 + 0.79 + 259.57 + 123.46 + 328.01
= 1192.89
Between
Within
Total
SS
1140.21
df
3 1 =2
MS
570.105
52.67
9 -3 = 6
8.779
1192.89
91=8
F-ratio
64.94
7
Test procedures in SPSS Statistics
1.
2.
3.
4.
5.
6.
Data Outcome:
Levenes Test tests the homogeneity of variance (HOV)/ whether the variances of the groups are the
same.
***If Levenes test is significant, (i.e.: the p-value is less than .05), then we can say that the
variances are significantly different and we have violated the assumption of homogeneity. We
always want the p-value for Levenet test to be more than .05, and not violate the assumption of
HOV.
When we found that we have violated the HOV assumption, we will need to refer to the table
Robust Tests of Equality of Means.
Solution: Adjust the F-test to correct the problem using Brown-Forsythe (1974) F-ratio or Welchs F.
Effect size:
One can estimate the magnitude of the effect of the independent variable by computing 2 or 2 .
SSA
2 =
SST
2 =
1140.222
1192.9
= 0.95 or 95%
2 =
2 =
1140.222(21)(8.778)
1192.9+ 8.778
SSA ( 1)(MSS/A )
SST + MSS/A
= 0.94 or 94%
Reporting of results:
There was a statistically significant difference between groups as determined by one-way ANOVA
(F(2,6) = 64.949, p < .001). A Tukey HSD post-hoc test revealed that the effectiveness of the
treatment is significant for all treatments: online class (M = 12.00, SD = 3.00); traditional class (M =
20.67, SD = 2.08) and traditional class with online class (M = 39.00, SD = 3.61). Traditional class is
more statistically significant with both Treatment A and B at p < .001. The proportion of variance in
undergraduates grade accounted by the type of teaching program was approximately 95% (2 =
0.95).
8
Pairwise comparison
(i)
(ii)
Post-hoc comparison: Conducted if the F-ratio is significant and are exploratory. The
common ones are Fishers LSD, Tukeys and Scheffe tests.
Post hoc comparisons are designed to guard against the possibility of an increased
Type 1 error due to the large number of different comparisons being made. This is
done by setting more stringent criteria for significance, and therefore it is often
harder to achieve significance. With small samples, this can be a problem, as it can
be very hard to find a significance result even when the apparent difference in
scores between the groups is quite large.
** It is not appropriate to try both and see which results you prefer!
9
Example # 1 (Planned comparison):
We have three teaching program (online conference class, traditional lecture, and traditional lecture
supplemented by online conference class) and we are interested in the effectiveness of each
program on increasing undergraduate students grade points for a Statistics class. Below is the data
for analysis.
Online conference
Traditional lecture
Traditional lecture supplemented
class
by online conference class
12
20
40
15
19
35
9
23
42
MEAN
12.00
20.67
39.00
Research question 1: Is combination of traditional lecture and online conference class is superior to
online conference class and traditional stand-alone?
: A traditional lecture supplemented by online conference class is NOT superior to online
conference class and traditional alone.
: No difference between population means for online class vs. combination of lecture and online
class (1 = 3 ) and traditional lecture vs. combination of lecture and online class (2 = 3 ).
Coded as
A
B
C
Coefficients
-1
-1
2
Coded as
A
B
C
Coefficients
-1
0
1
10
Example #2 (Planned comparison):
A researcher wants to test the effectiveness of Drug X on preventing seasonal allergy and she
administered the drugs to the patients in her research clinic. She randomly grouped them into 3
conditions: placebo (sugar pill), low dose and high dose. The dependant variable is an objective
measure of the effectiveness of the drug.
s
2
Placebo
3
2
1
1
4
2.20
1.30
1.70
Low Dose
5
2
4
2
3
3.20
1.30
1.70
Grand Mean= 3.467
Grand SD= 1.767
Grand Variance= 3.124
High Dose
7
4
5
3
6
5.00
1.58
2.50
One-Way ANOVA
= Means for the three groups are the same = = =
1 = Means for the three groups are different=
Planned comparisons
Research question 1: Is Drug X superior to placebo? Is Drug X effective in preventing seasonal
allergy?
Conditions
Placebo
Low dose
High dose
Coded as
A
B
C
Coefficients
-2
1
1
Research question 2: What is the amount of dose that is needed to prevent seasonal allergy?
Conditions
Placebo
Low dose
High dose
Coded as
A
B
C
Coefficients
0
-1
1
11
Figure 1: Overview of the general procedure for one-way ANOVA
Explore data
Boxplots, histograms,
descriptive statistics
Correct outliers/normality
problems
Levene's test
significant
Follow-up tests
Specific hypotheses
Planned comparisons
No hypotheses
Post-hoc tests
12
One-way Repeated Measures ANOVA/ Within-subjects ANOVA/ ANOVA for correlated samples
It is equivalent of the one-way ANOVA, but for related and not-independent groups. You can also
think of it as an extension of the dependent t-test.
There is one categorical (e.g.: nominal or ordinal) independent variable and one continuous (e.g.:
interval or ratio) dependent variable.
We use a repeated measures of ANOVA when:
(1) It is a study that investigates changes in mean scores over three or more time points.
For example, you might be investigating the effect of a 6-month exercise training
programme on blood pressure and want to measure blood pressure at 3 separate time
points (pre-, midway and post-exercise intervention), which would allow you to develop a
time-course for any exercise effect.
In repeated measures ANOVA, the independent variable has categories
called levels or related groups. Where measurements are repeated over time, such as when
measuring changes in blood pressure due to an exercise-training programme, the
independent variable is time. Each level (or related group) is a specific time point. Hence,
for the exercise-training study, there would be three time points and each time-point is a
level of the independent variable (a schematic of a time-course repeated measures design is
shown below):
13
(2) It is a study that investigates differences in mean scores under three or more different
conditions.
For example, you might get the same subjects to eat different types of cake (chocolate,
caramel and lemon) and rate each one for taste, rather than having different people flavour
each different cake.
Where measurements are made under different conditions, the conditions are the levels (or
related groups) of the independent variable (e.g., type of cake is the independent variable
with chocolate, caramel, and lemon cake as the levels of the independent variable). A
schematic of a different-conditions repeated measures design is shown below. It should be
noted that often the levels of the independent variable are not referred to as conditions,
but treatments. Which one you want to use is up to you. There is no right or wrong naming
convention. You will also see the independent variable more commonly referred to as
the within-subjects factor.
***It is important to note that for these two studies mentioned above, the same people are being
measured more than once on the same dependent variable. This is also why it is called repeated
measures design.
Hypothesis for Repeated Measures ANOVA
The repeated measures ANOVA tests for whether there are any differences between related
population means. The null hypothesis (H0) states that the means are equal:
H0: 1 = 2 = 3 = = k
where = population mean and k = number of related groups. The alternative hypothesis (HA) states
that the related population means are not equal (at least one mean is different to another mean):
14
F-Ratio:
One-way ANOVA
In one-way ANOVA, we partition the variability attributable to the differences between groups
(SSconditions) and variability within groups (SSw).
However, with a repeated measures ANOVA, as we are using the same subjects in each group, we
can remove the variability due to the individual differences between subjects, referred to as
SSsubjects, from the within-groups variability (SSw). Each subject becomes a level of a factor
called subjects. And, with the ability to subtract SSsubjects it will leave us with a smaller
SSerror term.
The between-subjects variability, our new SSerror only reflects individual variability to each
condition. You might recognise this as the interaction effect of subject by conditions; that is, how
subjects react to the different conditions.
15
Example # 3
You are interested to investigate the effect of a 6-month exercise training programme on blood
pressure and want to measure blood pressure at 3 separate time points (pre-, midway and postexercise intervention), which would allow you to develop a time-course for any exercise effect.
Subjects
Pre
Exercise intervention
3 months
6 months
1
2
3
4
5
6
Mean, X
45
42
36
39
51
44
42.83
50
42
41
35
55
49
45.33
55
45
43
40
59
56
49.67
Subject Means, X
50
43
40
38
55
49.67
df
k-1
MS
SS (B)
k-1
F-ratio
MS (B)
MS (e)
)2
(X X
n-1
SS (W)
N-k
MS (W)
MS (e)
SS = SS + SS
(k-1)(n-1)
SS (e)
(k-1)(n-1)
Between
(Treatments)
)
n (X X
Within
(Subjects)
Error
SS = SS SS
)2
SS =k (X X
Total
(X
X)2
N-1
16
Computation of ANOVA:
Sum of squares between-groups examines the differences between related group means by
) around the grand mean (X
). This is variation in scores that
calculating the variation of each mean (X
is due to the treatment (or independent variable).
SSBetween = n (X
X)2
143.44
Sum of squares within-group examines error variation or variation of individual scores around each
group mean. This is variation in scores that is not due to the treatment (or independent variable) but
due to variation caused by other factors.
SSwithin = (X
X )2
SSpre = (45 42.8)2 + (42 42.8)2 + (36 42.8)2 + (39 42.8)2 + (51 42.8)2 + (44 42.8)2
= 134.83
SS3months = (50 45.3)2 + (42 45.3)2 + (41 45.3)2 + (35 45.3)2 + (55 45.3)2 + (49 45.3)2
= 265.33
SS6months = (55 49.7)2 + (45 49.7)2 + (43 49.7)2 + (40 49.7)2 + (59 49.7)2 + (56 49.7)2
= 315.33
Sum of squares error examines error variation or variation of individual scores around each group
mean. This is variation in scores that is not due to the treatment (or independent variable) but due
to variation caused by other factors.
SSSubjects = 3 [(50 45.9)2 + (43 45.9)2 + (40 45.9)2 + (38 45.9)2 + (55 45.9)2 + (49.7 45.9)2 ]
= 658.3
17
SS = SS + SS
SS = SS SS
SS = 715.5 658.3
= .
Total sum of squares can be computed by adding SSA and SSS/A , but also by simply subtracting each
score from the grand mean, squaring, and then summing across all cases.
SST = SSbetween + SSwithin + SSerror
SST = (X
X)2
SST = (12 23.89)2 + (15 23.89)2 + (9 23.89)2 + (20 23.89)2 + (19 23.89)2 +
(23 23.89)2 + (40 23.89)2 + (35 23.89)2 + (42 23.89)2
= 141.35 + 79.01 + 221.68 + 15.12 + 23.90 + 0.79 + 259.57 + 123.46 + 328.01
= 1192.89
Between
SS
143.44
df
3 1 =2
MS
71.72
Within
715.5
6-1 =5
143.1
Error
57.2
(3-1)(6-1) =10
5.72
Total
858.94
18 1 = 17
F-ratio
12.53
There was a statistically significant effect of time on exercise-induced fitness, F (2, 10) = 12.53, p =
.002.
SS
SS + SS
18
19
Increased Power in a Repeated Measures ANOVA
The major advantage with running a repeated measures ANOVA over an independent
ANOVA is that the test is generally much more powerful. This particular advantage is achieved by the
reduction in MSerror (the denominator of the F-statistic) that comes from the partitioning of
variability due to differences between subjects (SSsubjects) from the original error term in an
independent ANOVA (SSw): i.e. SSerror = SSw - SSsubjects.
We achieved a result of F(2, 10) = 12.53, p = .002, for our example repeated measures
ANOVA. How does this compare to if we had run an independent ANOVA instead? Well, if we ran
through the calculations, we would have ended up with a result ofF(2, 15) = 1.504, p = .254, for the
independent ANOVA. We can clearly see the advantage of using the same subjects in a repeated
measures ANOVA as opposed to different subjects.
For our exercise-training example, the illustration below shows that after taking away
SSsubjectsfrom SSw we are left with an error term (SSerror) that is only 8% as large as the
independent ANOVA error term.
This does not lead to an automatic increase in the F-statistic as there are a greater number
of degrees of freedom for SSw than SSerror. However, it is usual for SSsubjects to account for such a
large percentage of the within-groups variability that the reduction in the error term is large enough
to more than compensate for the loss in the degrees of freedom (as used in selecting an Fdistribution).
Underlying Assumptions: Sphericity
ANOVAs with repeated measures (within-subject factors) are particularly susceptible to the
violation of the assumption of sphericity. Sphericity is the condition where the variances of the
differences between all combinations of related groups (levels) are equal. Violation of sphericity is
when the variances of the differences between all combinations of related groups are not equal.
Sphericity can be likened to homogeneity of variances in a between-subjects ANOVA.
The violation of sphericity is serious for the repeated measures ANOVA, with violation
causing the test to become too liberal (i.e., an increase in the Type I error rate). Therefore,
determining whether sphericity has been violated is very important. Luckily, if violations of sphericity
do occur, corrections have been developed to produce a more valid critical F-value (i.e., reduce the
increase in Type I error rate). This is achieved by estimating the degree to which sphericity has been
violated and applying a correction factor to the degrees of freedom of the F-distribution.
Testing for sphericity is an option in SPSS using Mauchly's Test for Sphericity as part of the
GLM Repeated Measures procedure. Mauchly's Test of Sphericity tests the null hypothesis that the
variances of the differences are equal. Thus, if Mauchly's Test of Sphericity is statistically significant
(p < .05), we can reject the null hypothesis and accept the alternative hypothesis that the variances
of the differences are not equal (i.e., sphericity has been violated).
20
Mauchly's
Approx. Chi-
Square
Effect
time
.434
df
3.343
Epsilonb
Sig.
Greenhouse-
Huynh-
Lower-
Geisser
Feldt
bound
.188
.638
.760
.500
Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent
variables is proportional to an identity matrix.
a. Design: Intercept
Within Subjects Design: time
b. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are
displayed in the Tests of Within-Subjects Effects table.
Small: 0.01
Medium: 0.059
Large: 0.138
So if you end up with = 0.45, you can assume the effect size is very large. It also means that 45%
of the change in the DV can be accounted for by the IV.
21
Results:
Table 97
Descriptive statistics for effect of a 6-month exercise training at 3 time points: pre, mid and
post exercise.
Source
Pre
3 months
6 months
n
6
6
6
Mean
42.83
45.33
49.67
Standard deviation, SD
5.19
7.29
7.94
Table 98
Analysis of variance (ANOVA) summary
Source
Time
Error (Time)
SS
143.44
57.22
df
2
10
MS
71.72
5.72
F
12.53
p
.002
2
.715
Table 99
Bonferroni comparison for time: pre, 3 months, and 6 months.
Comparisons
Time
Pre vs. 6 months
3 months vs. 6 months
Mean
Difference
Std.
Error
6.83*
4.33*
1.70
.72
95% CI
Lower
Upper
Bound
Bound
.82
1.81
12.85
6.86
* p < 0.05
Reporting the result:
A repeated measures ANOVA was conducted to investigate the effect of a 6-month exercise training
programme on blood pressure at 3 separate time points (pre-, midway and post-exercise
intervention). The mean and standard deviations of word status are presented in Table 3. Mauchly's
Test of Sphericity indicated that the assumption of sphericity had not been violated, 2 = 3.343, p =
.188.
The repeated measures ANOVA determined that blood pressure due to exercise effect and time
differed statistically significantly (F(2, 10) = 12.53, P = 0.002). Partial eta squared is reported at .715
(large). Post hoc tests using the Bonferroni correction revealed that the mean difference in blood
pressure for pre and 6-month (MD = 6.83, SD = 1.70, CI = .821 to 12.846) and 3-month and 6-month
(MD = 4.33, SD = 4.33, CI= 1.81 to 6.86) were statistically significant. However, there is no significant
difference in blood pressure for pre and 3-month (MD = 2.50, SD = 1.52, CI = 2.88 to 7.88).
Therefore, we can conclude that a long-term exercise training programme (6-month) elicits a
significant reduction in blood pressure, but not after only 3 months of training.
22
Example # 4
Research conducted by: Pearson et al. (2003)
Case study prepared by: David Lane and Emily Zitek
Overview: This study investigated the cognitive effects of stimulant medication in children with
mental retardation and Attention-Deficit/Hyperactivity Disorder. This case study shows the data for
the Delay of Gratification (DOG) task. Children were given various dosages of a drug,
methylphenidate (MPH) and then completed this task as part of a larger battery of tests. The order
of doses was counterbalanced so that each dose appeared equally often in each position. For
example, six children received the lowest dose first, six received it second, etc. The children were on
each dose one week before testing.
This task, adapted from the preschool delay task of the Gordon Diagnostic System (Gordon, 1983),
measures the ability to suppress or delay impulsive behavioral responses. Children were told that a
star would appear on the computer screen if they waited long enough to press a response key. If a
child responded sooner in less than four seconds after their previous response, they did not earn a
star, and the 4-second counter restarted. The DOG differentiates children with and without ADHD of
normal intelligence (e.g., Mayes et al., 2001), and is sensitive to MPH treatment in these children
(Hall & Kataria, 1992).
Questions to Answer
Does higher dosage lead to higher cognitive performance (measured by the number of correct
responses to the DOG task)?
Design Issues
This is a repeated-measures design because each participant performed the task after each dosage.
Descriptions of Variables
Variable
Description
d0
d15
d30
d60
References:
Pearson, D.A., Santos, C.W., Jerger, S.W., Casat, C.D., Roache, J., Loveland, K.A., Lane, D.M., Lachar,
D., Faria, L.P., & Getchell, C. (2003) Treatment effects of methylphenidate on cognitive
functioning in children with mental retardation and ADHD. Journal of the American Academy
of Child and Adolescent Psychiatry, 43, 677-685.