You are on page 1of 46

One Way ANOVA

By
Mohammed Nawaiseh
One-Way ANOVA ("analysis of variance") compares the means of two or more independent groups in order to determine whether there is
statistical evidence that the associated population means are significantly different.
● One-Way ANOVA is a parametric test.
● for testing if 3(+) population means are all equal.
This test is also known as:
● One-Factor ANOVA /// One-Way Analysis of Variance /// Between Subjects ANOVA
The variables used in this test are known as:
● Dependent variable
● Independent variable (also known as the grouping variable, or factor)
● This variable divides cases into two or more mutually exclusive levels, or groups
The One-Way ANOVA is commonly used to test the following:
● Statistical differences among the means of two or more groups or two or more interventions or two or more change scores
Note: Both the One-Way ANOVA and the Independent Samples t Test can compare the means for two groups. However, only the One-Way
ANOVA can compare the means across three or more groups.
Note: If the grouping variable has only two groups, then the results of a one-way ANOVA and the independent samples t test will be equivalent. In
fact, if you run both an independent samples t test and a one-way ANOVA in this situation, you should be able to confirm that t2=F.

Effect Size - (Partial) Eta Squared “η2”


● (partial) eta-squared is the proportion of variance accounted for by a factor.
● ANOVA tell you if the means of the different groups are equal or not (depending on the P value)
○ But does not tell you how much is the difference between the means?
○ So Effect Size is used to determine the difference
○ (Partial) Eta Squared is used to calculate effect size in ANOVA
Some rules of thumb are that
● η2 > 0.01 indicates a small effect;
● η2 > 0.06 indicates a medium effect;
● η2 > 0.14 indicates a large effect.
Assumptions
The assumptions that must be met when using one-way or factorial ANOVA are as follows (all types of ANOVA)
● Dependent variable that is continuous (i.e., interval or ratio level)
● Independent variable that is categorical (i.e., two or more groups)
○ Cases that have values on both the dependent and independent variables
● Independent samples/groups (i.e., independence of observations)
○ There is no relationship between the subjects in each sample. This means that:
■ subjects in the first group cannot also be in the second group /// no subject in either group can influence subjects in the other group
■ no group can influence the other group
● Random sample of data from the population
● Normal distribution (approximately) of the dependent variable for each group (i.e., for each level of the factor)
○ Non-normal population distributions, especially those that are thick-tailed or heavily skewed, considerably reduce the power of the test
○ Among moderate or large samples, a violation of normality may yield fairly accurate p values
● Homogeneity of variances (i.e., variances approximately equal across groups)
○ Assumption violated + different sample sizes → Browne-Forsythe or Welch statistics
■ When this assumption is violated and the sample sizes differ among groups, the p value for the overall F test is not trustworthy. These
conditions warrant using alternative statistics that do not assume equal variances among populations, such as the Browne-Forsythe
or Welch statistics (available via Options in the One-Way ANOVA dialog box).
○ Assumption violated + equal sample sizes → post hoc ”eg; Dunnett’s C”
■ When this assumption is violated, regardless of whether the group sample sizes are fairly equal, the results may not be trustworthy for
post hoc tests. When variances are unequal, post hoc tests that do not assume equal variances should be used (e.g., Dunnett’s C).
● No outliers
● All cells have an adequate sample size (more than 30), minimum cell size is 30
● The cell size ratio is no larger than 1:4 → (cell with minimum number /cell with maximum number)
● The variances are similar between groups → Variance ratio ⇒ between the lowest and highest values
● The residuals are normally distributed
Note: When the normality, homogeneity of variances, or outliers assumptions for One-Way ANOVA are not met, you may want to run the nonparametric
Kruskal-Wallis test instead.
Researchers often follow several rules of thumb for one-way ANOVA:
● Each group should have at least 6 subjects (ideally more; inferences for the population will be more tenuous with too few subjects)
● Balanced designs (i.e., same number of subjects in each group) are ideal; extremely unbalanced designs increase the possibility that violating any of
the requirements/assumptions will threaten the validity of the ANOVA F test
Sample size
● When an ANOVA is conducted, the data are divided into cells according to the number of groups in the explanatory variable.
● Small cell sizes, that is cell sizes less than 10, are always problematic because of the lack of precision in calculating the mean value for
the cell.
● The minimum cell size in theory is 10 but in practice 30 is preferred.
● In addition to creating imprecision, low cell counts lead to a loss of statistical power. The assumption of a low cell size ratio is also
important. A cell size imbalance of more than 1:4 across the model would be a concern, for example when one cell has 10 cases and
another cell has 60 cases and the ratio is then 1:6.
● If there is small sample size
○ Recoding
■ It may be difficult to avoid small cell sizes because it is not possible to predict the number of cases in each cell prior to
data collection. Even in experimental studies in which equal numbers can be achieved in some groups, drop-outs and
missing data can lead to unequal cell sizes. If small cells are present, they can be re-coded into larger cells but only if it is
possible to meaningfully interpret the re-coding

Normal distribution and Homogeneity of variances


● As with a t-test, ANOVA is robust to some deviations from normality of distributions and some imbalance of variances
● The assumption that the outcome variable is normally distributed is of most importance when the sample size is small and/or when
univariate outliers increase or decrease mean values between cells by an important amount and therefore influence perceived
differences between groups. The main effects of non-normality and unequal variances, especially if there are outliers, are to bias the P
values
● When variances are not significantly different between cells, the model is said to be homoscedastic. The assumption of equal variances
is of most importance when there are small cells, say cells with less than 30 cases, when the cell size ratio is larger than 1:4 or when
there are large differences in variance between cells, say larger than 1:10. The main effect of unequal variance is to reduce statistical
power and thus lead to type II errors.
One way vs factorial ANOVA vs ANCOVA
● A one-way ANOVA is used when the effect of only one categorical variable (explanatory variable) on a single continuous
variable (outcome) is explored, for example when the effect of socioeconomic status, which has three groups (low, medium
and high), on weight is examined. The concept of ANOVA can be thought of as an extension of a two-sample t-test but the
terminology used is quite different.
● A factorial ANOVA is used when the effects of two or more categorical variables (explanatory variables) on a single continuous
variable (outcome) are explored, for example when the effects of gender and socioeconomic status on weight are examined.
● An analysis of covariance (ANCOVA) is used when the effects of one or more categorical factors (explanatory variables,IV) on
a single continuous variable (outcome) are explored after adjusting for the effects of one or more continuous variables
(covariates). A covariate is any variable that correlates with the outcome variable. For example, ANCOVA would be used to
test for the effects of gender and socioeconomic status on weight after adjusting for height.
ANOVA & variance
The ANOVA test is called an analysis of variance and not an analysis
of means because this test is used to assess whether the mean values
of different groups are far enough apart in terms of their spread
(variance) to be considered significantly different.
ANOVA vs independent two sample t-tests
● If a factor has three groups, it is possible to conduct three independent two sample t-tests, that is to test the mean values of
group 1 vs 2, group 3 vs 2 and group 1 vs 3.
● However, this approach of conducting multiple two-sample t-tests increases the probability of obtaining a significant result
merely by chance (a type I error). The probability of a type I error not occurring for each t-test is 0.95 (i.e. 1 − 0.05). The three
tests are independent, therefore the probability of a type I error not occurring over all three tests is 0.95 × 0.95 × 0.95, or 0.86.
Therefore, the probability of at least one type I error occurring over the three two-sample t-tests is 0.14 (i.e. 1 − 0.86), which is
higher than the P level set at 0.05.
● A one-way ANOVA is therefore used to investigate the differences between several groups within a factor in one model and to
reduce the number of pairwise comparisons that are made.
● If multiple two-sample t-tests have to be used → use bonferroni correction → which is the original P value (eg; 0.05) over the
number of tests performed
Post hoc tests
● ANOVA tell you if the means of the different groups are equal or not (depending on the P value)
○ But does not tell you which means are different?
● To answer the question → One approach would be running independent samples t-tests on all possible pairs of means
● Another answer (most commonly used, less risk of drawing wrong conclusion) is to run a post hoc analysis “after that”
○ You could think of it as running all possible t-tests for which the results have been corrected with some sort of Bonferroni
correction but less conservative
● Which post hoc test should I use?
○ One post hoc → There are a great number of different post hoc tests that you can use. However, you should only run one post hoc
test – do not run multiple post hoc tests.
○ For a one-way ANOVA, you will probably find that just two tests need to be considered.
○ If your data met the assumption of homogeneity of variances → use Tukey's honestly significant difference (HSD) post hoc
test.
■ Note that if you use SPSS Statistics, Tukey's HSD test is simply referred to as "Tukey" in the post hoc multiple
comparisons dialogue box).
■ If your data did not meet the homogeneity of variances assumption → Games Howell post hoc test.
Post Hoc (advanced)
● A post-hoc test may consist of pairwise comparisons, group-wise comparisons or a combination
of both.
● Pairwise comparisons are used to compare the differences between each pair of means.
● Group-wise comparisons are used to identify subsets of means that differ significantly from
each other.
● conservative test → Post-hoc tests also vary from being exceedingly conservative to simply
conducting multiple t-tests with no adjustment for multiple comparisons. A conservative test is
one in which the actual significance is smaller than the stated significance level. Thus,
conservative tests may incorrectly fail to reject the null hypothesis because a larger effect size
between means is required for significance.
○ The advantages of using a conservative post-hoc test have to be balanced against the
probability of type II errors, that is missing real differences
● Equality of the variances → The choice of post-hoc test should be determined by equality of
the variances, equality of group sizes and by the acceptability of the test in a particular research
discipline. For example, Scheffe is often used in psychological medicine, Bonferroni in clinical
applications and Duncan in epidemiological studies.
● The LSD test is the most liberal post-hoc test because it performs all possible tests between
means. This test is not normally recommended when more than three groups are being
compared or when there are unequal variances or cell sizes.With no adjustments made for
multiple tests or comparisons, the results of the LSD test amount to multiple t-testing
● The Bonferroni post-hoc comparison is a conservative test in which the critical P value of 0.05
is divided by the number of comparisons made. Thus, if five comparisons are made, the critical
value of 0.05 is divided by 5 and the adjusted new critical value is P = 0.01.
○ In SPSS the P levels in the Multiple Comparisons table have already been adjusted for
the number of multiple comparisons. Therefore, each P level obtained from a Bonferroni
test in the Multiple Comparisons table should be evaluated at the critical level of 0.05.
● By using the Bonferroni test, which is a conservative test, the significant differences between
some groups identified by the LSD test are now nonsignificant. The mean values are identical but
the confidence intervals are adjusted so that they are wider.
● The Duncan test shown in the Homogeneous Subsets table is one of the more liberal post-hoc
tests.
○ Under this test, there is a progressive comparison between the largest and smallest
mean values until a difference that is not significant at the P < 0.05 level is found and
the comparisons are stopped. In this way, the number of comparisons is limited.
Reference
● https://www.researchgate.net/publication/46284798_Statnote_6_post-hoc_ANOVA_tests
https://www.spss-tutorials.com/spss
-one-way-anova/
Building an ANOVA model
● Advanced
Tutorial (1)
● In the sample dataset, the variable Sprint is the respondent's time (in seconds) to sprint a given distance,
and Smoking is an indicator about whether or not the respondent smokes (0 = Nonsmoker, 1 = Past
smoker, 2 = Current smoker).
● File → Smoking Vs Sprint
One way ANOVA
ANOVA will be used to test if there is a statistically significant difference in sprint time with respect to smoking status. Sprint time will serve as the dependent
variable, and smoking status will act as the independent variable.
BEFORE THE TEST
● Analyze > Descriptive Statistics > Descriptives
● Analyze > Compare Means > Means
● The sprint times are a continuous measure of time to sprint a given distance in seconds. From the Descriptives procedure (Analyze > Descriptive
Statistics > Descriptives), we see that the times exhibit a range of 4.5 to 9.6 seconds, with a mean of 6.6 seconds (based on n=374 valid cases).
From the Compare Means procedure (Analyze > Compare Means > Means), we see these statistics with respect to the groups of interest:
● Notice that, according to the Compare Means procedure, the valid sample size is actually n=353. This is because Compare Means (and additionally,
the one-way ANOVA procedure itself) requires there to be nonmissing values for both the sprint time and the smoking indicator.
● Comparative boxplot
○ No outliers → From the boxplots, we see that there are no outliers; that the distributions are roughly symmetric; and that the center of the
distributions don't appear to be hugely different. The median sprint time for the nonsmokers is slightly faster than the median sprint time of
the past and current smokers.
Tests for normal distribution
● Analyze > Descriptive Statistics > Explore
○ Check off histograms and normality plots with tests, and remove the check
from Stem and leaf
● Skewness and kurtosis
○ Within the normal (-1 to +1) for all levels
● Tests of Normality
○ Shapiro-Wilk
■ Significant for non smoker → not normally distributed
○ Histogram
■ non smoker → not normally distributed
○ The sample size is large → so ANOVA will be robust to the unmet assumption, But
This should be reported in the limitation section
RUNNING THE PROCEDURE
● Click Analyze > Compare Means > One-Way ANOVA.
● Add the variable Sprint to the Dependent List box, and add the variable Smoking to the Factor
box.
● Click Options. Check the box for Means plot, then click Continue.
● Click OK when finished.

Notes
● (B) Factor: The independent variable. The categories (or groups) of the independent variable will
define which samples will be compared. The independent variable must have at least two
categories (groups), but usually has three or more groups when used in a One-Way ANOVA.
● (C) Contrasts: (Optional) Specify contrasts, or planned comparisons, to be conducted after the
overall ANOVA test.
a. When the initial F test indicates that significant differences exist between group means,
contrasts are useful for determining which specific means are significantly different when
you have specific hypotheses that you wish to test. Contrasts are decided before
analyzing the data (i.e., a priori).
● (E) Options: Clicking Options will produce a window where you can specify which Statistics to
include in the output (Descriptive, Fixed and random effects, Homogeneity of variance test,
Brown-Forsythe, Welch), whether to include a Means plot, and how the analysis will address
Missing Values (i.e., Exclude cases analysis by analysis or Exclude cases listwise). Click
Continue when you are finished making specifications.
Post Hoc
● Equal Variances Assumed → Tukey (preferred) or LSD or
bonferroni
● Equal Variances Not Assumed → Games-Howell

Notes
● (D) Post Hoc: (Optional) Request post hoc (also known as multiple comparisons) tests.
Specific post hoc tests can be selected by checking the associated boxes.
○ (1) Equal Variances Assumed: Multiple comparisons options that assume
homogeneity of variance (each group has equal variance).
○ (2) Test: By default, a 2-sided hypothesis test is selected. Alternatively, a
directional, one-sided hypothesis test can be specified if you choose to use a
Dunnett post hoc test.
■ Click the box next to Dunnett and then specify whether the Control
Category is the Last or First group, numerically, of your grouping
variable. In the Test area, click either “< Control” or “> Control”.
■ The one-tailed options require that you specify whether you predict that
the mean for the specified control group will be less than (> Control) or
greater than (< Control) another group.
○ (3) Equal Variances Not Assumed: Multiple comparisons options that do not
assume equal variances.
○ (4) Significance level: The desired cutoff for statistical significance. By default,
significance is set to 0.05.
● When the initial F test indicates that significant differences exist between group means,
post hoc tests are useful for determining which specific means are significantly different
when you do not have specific hypotheses that you wish to test. Post hoc tests compare
each pair of means (like t-tests), but unlike t-tests, they correct the significance estimate to
account for the multiple comparisons.
OUTPUT
● The Means plot is a visual representation of what we saw in the
Compare Means output. The points on the chart are the average of
each group. It's much easier to see from this graph that the current
smokers had the slowest mean sprint time, while the nonsmokers
had the fastest mean sprint time.
● Means plot is displayed.
○ Graph → chart builder → line → simple lines

DISCUSSION AND CONCLUSIONS


● We conclude that the mean sprint time is significantly different for at
least one of the smoking groups F (2, 350) = 9.209, p < 0.001.
● Note that the ANOVA alone does not tell us specifically which means
were different from one another. To determine that, we would need to
follow up with multiple comparisons (or post-hoc) tests.
Post hoc test
● Test of Homogeneity of Variances → Levene Statistic should be done to
determine which post hoc to do (from the options box)
○ Sig → .091 which is more than .05 (not significant)
■ Equal variances assumed
● Tukey test will be used
○ If levene test was significant (p < .05) → Equal variances Not
assumed
■ Games-Howell will be used
● Post hoc test
○ Tukey HSD
■ Non smoker vs Past smoker
● Not significant, p= 0.13
■ Non smoker vs current smoker
● Significant, p < .001
■ Past smoker vs current smoker
● Not significant, p = 0.52
Report
● Analysis of variance showed a main effect of Smoking on the mean sprint time, F(2, 350) = 9.2, p < .001.
● Post Hoc analyses using Tukey’s HSD indicated that mean sprint time was lower for non smokers than for
current smokers (p < .001). However, mean sprint time did not differ significantly between Past smokers
and non smokers (p= 0.13). Also, there was no statistical difference between Past smokers and current
smokers (p = 0.52)
Tutorial (2)
● A hospital wants to know how a homeopathic medicine “‫ ”اﻟطب اﻟﺑدﯾل‬for depression performs in comparison
to alternatives. They administered 4 treatments to 100 patients for 2 weeks and then measured their
depression levels.
● File → depression
One way ANOVA will be done
● Assumptions (normality and Homogeneity) will be tested
● Post hoc test will be used (tukey)
● Effect size will be determined (eta square)
Data Inspection - Split Histogram
● Before running any statistical test, always make sure
your data make sense in the first place. In this case, a
split histogram basically tells the whole story in a single
chart.
● These simple charts give a lot of insight into our data.
The important points are:
○ Normally distributed, no missing values, no
outliers → All distributions look plausible. We
don't see very low or high BDI scores that should
be set as user missing values and the BDI scores
even look reasonably normally distributed.
○ The medicine “None” results in the highest BDI
scores, indicating the worst depressive
symptoms. “Pharmaceutical” results in the lowest
levels of depressive illness and the other two
treatments are in between.
○ The four histograms are roughly equally wide,
suggesting BDI scores have roughly equal
variances over our four medicines.
○ BDI → beck’s depression inventory
Means Table
● Analyze → Compare Means → Means

ANOVA Assumptions
● Independent observations often holds if each case (row of cells in SPSS) represents a unique person or other statistical unit. That is, we usually don't
want more than one row of data for one person, which holds for our data;
● Normally distributed variables in the population seems reasonable if we look at the histograms we inspected earlier. Besides, violation of the normality
assumption is no real issue for larger sample sizes due to the central limit theorem.
○ Skewness and kurtosis (the p value should be between -1 and +1, to be considered normally distributed)
○ This meet our data
● Homogeneity means that the population variances of BDI in each medicine group are all equal, reflected in roughly equal sample variances. Again, our
split histogram suggests this is the case but we'll try and confirm this by including Levene's test when running our ANOVA.
Running ANOVA
● There's many ways to run the exact same ANOVA in SPSS. Today,
we'll go for General Linear Model because creates nicely detailed
output.
● The post hoc test we'll run is Tukey’s HSD (Honestly Significant
Difference), denoted as “Tukey”.
Options
● “Estimates of effect size” refers to partial eta squared. “Homogeneity tests” includes
Levene’s test for equal variances in our output.
SPSS ANOVA Output - Levene’s Test
● Levene’s Test checks if the population variances of BDI for the four medicine groups are all equal, which is
a requirement for ANOVA. As a rule of thumb,
● we reject the null hypothesis if p (or “Sig.”) < 0.05.
● In our case, p = 0.949 so we do not reject the null hypothesis of equal variances (or homogeneity). We
assume the population variances are all equal so this ANOVA assumption is met by our data.
SPSS ANOVA Output - Between Subjects Effects
1. Our null hypothesis is that the population means are equal for all medicines administered. P (“Sig.”) < 0.000 -way less than
0.05- so we reject this hypothesis: the population means are not all equal. Some medicines result in lower mean BDI scores
than other medicines.
2. The different medicines administered account for some 39% of the variance in the BDI scores. This is the effect size as
indicated by partial eta squared.
3. Partial Eta Squared is the Sums of Squares for medicine divided by the corrected total sums (5) of squares (2780 / 7071 =
0.39).
4. Sums of Squares Error represents the variance in BDI scores not accounted for by medicine. Note that 3+4 =5 .
SPSS ANOVA Output - Multiple Comparisons
● Tukey’s HSD
● Right, now comparing 4 means results in (4 - 1) x 4 x 0.5 = 6
distinct comparisons, each of which is listed twice in this
table. There's three ways for telling which means are likely to
be different:
● (1)Statistically significant mean differences are flagged with
an asterisk (*). For instance, the very first line tells us that
“None” has a mean BDI score of 6.7 points higher than the
placebo -which is quite a lot actually since BDI scores can
range from 0 through 63.
● (2) As a rule of thumb, “Sig.” < 0.05 indicates a statistically
significant difference between two means.
● (3) A confidence interval not including zero means that a
zero difference between these means in the population is
unlikely.

Results
● only 1 pair of means do not differ.
○ Homeopathic with placebo
● That means that the Homeopathic Treatment acts like a
placebo, because both decreased the mean of the
depression score similarly.
Tutorial (3)
● File → weights.sav
● Medical statistics book → P 113 to 1
● The spreadsheet weights.sav contains the data from a population sample of 550 term babies who had their weight recorded
at 1 month of age. The babies also had their parity recorded, that is their birth order in their family
● Q → Are the weights of babies related to their parity?
● Variables
○ Outcome (DV) variable = weight (continuous)
○ Explanatory (IV) variable = parity (categorical, four groups)
Analysis
● Descriptive statistics
● Assumptions
○ Normality
● Running the one-way ANOVA
○ homogeneity of variances
● Post-hoc tests
● Means plot
● Reporting the results
Descriptive statistics
● Analyze →Descriptive Statistics →Frequencies
● Sample size assumption
○ The Frequency table shows that the sample size of each group is large in that all cells have more than 30 participants. The cell
size ratio (cell with maximum number /cell with minimum number) is 62:192 or 1:3 and does not violate the ANOVA assumptions
○ ANOVA will be robust to some degrees of non-normality, outliers and unequal variances
Normality Assumption
● Analyze→Descriptive Statistics→ Explore
● The Descriptives table shows that means and medians for weight in each group are approximately equal and the values for skewness and
kurtosis are all between +1 and −1 suggesting that the data are close to normally distributed. The variances in each group are 0.384,
0.351, 0.366 and 0.287 respectively. The variance ratio between the lowest and highest values is 0.287:0.384 which is 1:1.3.
● Shapiro–Wilk → only One sibling group does not conform to normality ( P < 0.05)
● Histograms
○ confirm the tests of normality and show that the distribution for babies with one sibling has slightly spread tails
● Normal Q–Q plots
○ have small deviations at the extremities
○ normal Q–Q plot for babies with one sibling deviates slightly from normality at both extremities. Although the histogram for babies
with three or more siblings is not classically bell shaped, the normal Q–Q plot suggests that this distribution conforms to
normality

Outliers Assumption
● there are two outlying values, one in the group of babies with one sibling and one in the group of babies with two siblings.
● It is unlikely that these outlying values, which are also univariate outliers, will have a large influence on the summary statistics and ANOVA result
because the sample size of each group is large. However, the outliers should be confirmed as correct values and not data entry or data recording
errors.
● Once they are verified as correctly recorded data points, the decision to include or omit outliers from the analyses is the same as for any other statistical
tests. In a study with a large sample size, it is expected that there will be a few outliers.
● In this data set, the outliers will be retained in the analyses and the extreme residuals will be examined to ensure that these values do not have undue
influence on the results
Running the one-way ANOVA
Test of Homogeneity of Variances table
● the P value of 0.590 in the significance column, which is larger
than the critical value of 0.05, indicates that the variance of each
group is not significantly different from one another.
○ ⇒ Assumption has been met
ANOVA table
● The degrees of freedom for the between-group sum of squares is the number of groups minus 1, that is 4 − 1 = 3, and for the
within-group sum of squares is the number of cases in the total sample minus the number of groups, that is 550 − 4 = 546.
● In this model, the F value, which is the between-group mean square divided by the within-group mean square, is large at 3.239
and is significant at P = 0.022. This indicates that there is a significant difference in the mean values of the four parity groups.

Effect size
● First method
○ The amount of variation in weight that is explained by parity can be calculated as the between-group sum of squares
divided by the total sum of squares to provide a statistic that is called eta squared as follows:
○ Eta2 = Between-group sum of squares/Total sum of squares
■ = 3.477/198.842
■ = 0.017
○ This statistic indicates that only 1.7% of the variation in weight is explained by parity.
● Seconde Method
○ Alternatively, eta2 can be obtained using the commands Analyze→ Compare Means→Means, clicking on Options and requesting
ANOVA table and eta. This will produce the same ANOVA table as above and include eta2 but does not include a test of homogeneity
or allow for post-hoc testing.
Post-hoc tests
● Although the ANOVA statistics show that there is a significant difference in mean weights between parity groups, they do not indicate
which groups are significantly different from one another
○ To know that, post hoc tests are used
● planned and post-hoc tests should only be requested after the main ANOVA has shown that there is a statistically significant difference
between groups.
● When the F test is not significant, it is unwise to explore whether there are any between-group differences
● The choice of post-hoc test should be determined by equality of the variances, equality of group sizes and by the acceptability of the test in a particular
research discipline. For example, Scheffe is often used in psychological medicine, Bonferroni in clinical applications and Duncan in epidemiological
studies. The advantages of using a conservative post-hoc test have to be balanced against the probability of type II errors, that is missing real
differences
● The LSD test is the most liberal post-hoc test because it performs all possible tests between means. This test is not normally recommended when more
than three groups are being compared or when there are unequal variances or cell sizes.With no adjustments made for multiple tests or comparisons,
the results of the LSD test amount to multiple t-testing
● The Bonferroni post-hoc comparison is a conservative test in which the critical P value of 0.05 is divided by the number of comparisons made. Thus, if
five comparisons are made, the critical value of 0.05 is divided by 5 and the adjusted new critical value is P = 0.01. In SPSS the P levels in the Multiple
Comparisons table have already been adjusted for the number of multiple comparisons. Therefore, each P level obtained from a Bonferroni test in the
Multiple Comparisons table should be evaluated at the critical level of 0.05.
● By using the Bonferroni test, which is a conservative test, the significant differences between some groups identified by the LSD test are now
nonsignificant. The mean values are identical but the confidence intervals are adjusted so that they are wider
LSD Bonferroni
Just one significant Difference
Just one significant Difference
● The Duncan test shown in the Homogeneous Subsets table is one of the more liberal
post-hoc tests.
○ Under this test, there is a progressive comparison between the largest and smallest
mean values until a difference that is not significant at the P < 0.05 level is found
and the comparisons are stopped. In this way, the number of comparisons is limited.
○ The output from this test is presented as subsets of groups that are not significantly
different from one another.
○ The between-group P value (0.05) is shown in the top row of the Homogenous
subtests table and the within-group P values at the foot of the columns.
○ Thus in the table, the mean values for groups of singletons and babies with one
sibling are not significantly different from one another with a P value of 0.104.
○ Similarly, the mean values of groups with one sibling, two siblings, or three or more
siblings are not significantly different from one another with a P value of 0.403.
○ Singletons do not appear in the same subset as babies with two siblings or with
three or more siblings which indicates that the mean weight of singletons is
significantly different from these two groups at the P < 0.05 level.
Means plot
● there is a trend for weight to increase with increasing parity and helps in the interpretation of the post-hoc tests.
● It also shows why the group with one sibling is not significantly different from singletons or babies with two siblings or with three or
more siblings, and why singletons are significantly different from the groups with two siblings or with three or more siblings
Trend test
● Polynomial option → The increase in weight with increasing parity suggests that it is appropriate to test whether there is a significant
linear trend for weight to increase across the groups within this factor. A trend test can be performed by re-running the one-way ANOVA
and ticking the Polynomial option in the Contrasts box with the Degree: Linear (default) option used.
● If each of the parity cells had the same number of cases then the unweighted linear term would be used to assess the significance of the
trend. However, the cell sizes are unequal and therefore the weighted linear term is used. The table shows that the weighted linear term
sum of squares is significant at the P = 0.006 level indicating that there is a significant trend for mean weight to increase as parity or the
number of siblings increases
Reporting the results
● Weight was approximately normally distributed in each group and that the group sizes were all large (minimum 62) with a cell size ratio of 1:3 and a
variance ratio of 1:1.3. The significant difference in weight at 1 month between children with different parities can be described as F = 3.24, df = 3, 546,
P = 0.022 with a significant linear trend for weight to increase with increasing parity (P = 0.006). The degrees of freedom are conventionally shown as
the between-group and within-group degrees of freedom separated with a comma.
● Post hoc
○ If the Bonferroni post-hoc test had been conducted, it could be reported that the only significant difference in mean weights was between
singletons and babies with two siblings (P = 0.029) with no significant differences between any other groups.
○ If Duncan’s post-hoc test had been conducted, it could be reported that babies with two siblings and babies with three or more siblings were
significantly heavier than singletons (P < 0.05).
■ However, babies with one sibling did not have a mean weight that was significantly different from either singletons (P = 0.104) or from
babies with two siblings, or with three or more siblings (P = 0.403)
References
● https://libguides.library.kent.edu/SPSS/OneWayANOVA
● https://www.spss-tutorials.com/spss-one-way-anova-with-post-hoc-tests-example/
● Medical statistics book

You might also like