Professional Documents
Culture Documents
Specifically, when you obtain a significant F-ratio (reject H0), it simply indicates that somewhere among
the entire set of mean differences there is at least one that is statistically significant. In other words, the
overall F-ratio only tells you that a significant difference exists; it does not tell exactly which means are
significantly different and which are not.
Consider, for example, a research study that uses three samples to compare three treatment conditions.
Suppose that the three sample means are M1 3, M2 5, and M3 10. In this hypothetical study there are
three mean differences:
If an ANOVA were used to evaluate these data, a significant F-ratio would indicate that at least one of the
sample mean differences is large enough to satisfy the criterion of statistical significance. In this example,
the 7-point difference is the biggest of the three and, therefore, it must indicate a significant difference
between the first treatment and the third treatment(𝜇1 ≠ 𝜇2 ). But what about the 5-point difference? Is
it also large enough to be significant? And what about the 2-point difference between M1 and M2? Is it
also significant? The purpose of post hoc tests is to answer these questions.
Post Hoc –or posttest are additional hypothesis test that are done after an ANOVA to
determine exactly the differences are significant and which are not.
As the name implies, post hoc tests are done after an ANOVA. More specifically, these tests are
done after ANOVA when
1. You reject H0 and
2. There are three or more treatments (k ≥ 3)
Answer the following question:
1. Imagine you are trap in a dark room where there is no existing light for about
one week. What will be reaction about it?
2. What is the greatest lesson you learned from your parents?
3. If you become blind by the challenges in life, how you were able to surpassed
this?
Rejecting H0 indicates that at least one difference exists among the treatments. If there are only two
treatments, then there is no question about which means are different and, therefore, no need for
posttests. However, with three or more treatments (k≥ 3), the problem is to determine exactly which
means are significantly different.
In general, a post hoc test enables you to go back through the data and compare the individual
treatments two at a time. In statistical terms, this is called making pairwise comparisons. For
example, with k = 3, we would compare 𝜇1 versus 𝜇 2, then 𝜇 2 versus 𝜇 3, and then 𝜇 1 versus
𝜇 3. In each case, we are looking for a significant mean difference. The process of conducting
pairwise comparisons involves performing a series of separate hypothesis tests, and each of these
tests includes the risk of a Type I error. As you do more and more separate tests, the risk of a
Type I error accumulates and is called the experiment wise alpha level.
We have seen, for example, that a research study with three treatment conditions produces three
separate mean differences, each of which could be evaluated using a post hoc test. If each test
uses 𝑎 = .05, then there is a 5% risk of a Type I error for the first posttest, another 5% risk for
the second test, and one more 5% risk for the third test. Although the probability of error is not
simply the sum across the three tests, it should be clear that increasing the number of separate
tests definitely increases the total, experiment wise probability of a Type I error. Whenever you
are conducting posttests, you must be concerned about the experiment wise alpha level.
Statisticians have worked with this problem and have developed several methods for trying to
control Type I errors in the context of post hoc tests. We consider two alternatives
DIFFERENT KINDS OF POST HOC ANALYSIS
- The first post hoc test we consider is Tukey’s HSD test. We selected Tukey’s HSD test because
it is a commonly used test in psychological research.
- Tukey’s test allows you to compute a single value that determines the minimum difference
between treatment means that is necessary for significance.
- This value, called the honestly significant difference, or HSD, is then used to compare any two
treatment conditions.
- If the mean difference exceeds Tukey’s HSD, then you conclude that there is a significant
difference between the treatments. Otherwise, you cannot conclude that the treatments are
significantly different.
where the value of q is found in Table B.5 (Appendix B, p. 708), MSwithin is the within
treatments variance from the ANOVA, and n is the number of scores in each treatment. Tukey’s
test requires that the sample size, n, be the same for all treatments. To locate the appropriate
value of q, you must know the number of treatments in the overall experiment (k), the degrees
of freedom for MSwithin (the error term in the F-ratio), and you must select an alpha level
(generally the same 𝑎 used for the ANOVA).
Example:
Thus, the mean difference between any two samples must be at least 2.36 to be significant. Using
this value, we can make the following conclusions:
Scheffe Test
1. Although you are comparing only two treatments, the Scheffé test uses
the value of k from the original experiment to compute df between
treatments. Thus, df for the numerator of the F-ratio is k - 1.
2. The critical value for the Scheffé F-ratio is the same as was used to
evaluate the F-ratio from the overall ANOVA. Thus, Scheffé requires
that every posttest satisfy the same criterion that was used for the
complete ANOVA.
Bonferroni Procedure (Bonferonni Correction)
- This multiple-comparison post hoc correction is used when you are performing many
independent or dependent statistical tests at the same time.
- The problem with running many simultaneous tests is that the probability of
a significant result increases with each test run.
- This post hoc test sets the significance cut off at α/n.
For example, if you are running 20 simultaneous tests at α = 0.05, the correction would be
0.0025. More detail. The Bonferroni does suffer from a loss of power. This is due to several
reasons, including the fact that Type II error rates are high for each test. In other words, it
overcorrects for Type I errors.
- When you run Analysis of Variance (ANOVA), the results will tell you if there is a difference in
means. However, it won’t pinpoint the pairs of means that are different. Duncan’s Multiple
Range Test will identify the pairs of means (from at least three) that differ. The MRT is similar to
the LSD, but instead of a t-value, a Q Value is used.
Newman-Keuls
- Like Tukey’s, this post hoc test identifies sample means that are different from each other.
Newman-Keuls uses different critical values for comparing pairs of means. Therefore, it is
more likely to find significant differences.
Rodgers Method
- Considered by some to be the most powerful post hoc test for detecting differences among
groups. This test protects against loss of statistical power as the degrees of freedom increase.
Dunnett’s correction
- Like Tukey’s this post hoc test is used to compare means. Unlike Tukey’s, it compares every
mean to a control mean.
1. With k=2 treatments, are post hoc tests necessary when the null hypothesis
is rejected? Explain why or why not.
Listen to the song of James Ingram “There’s No Easy Way” and answer
the following question:
1. Which among the following lines you hit the most?
2. What does the song conveys?
3. TRUE or FALSE: There’s no easy way to break somebody’s heart
Example:
You are researching which type of fertilizer and planting density produces the greatest
crop yield in a field experiment. You assign different plots in a field to a combination of fertilizer
type (1, 2, or 3) and planting density (1=low density, 2=high density), and measure the final
crop yield in bushels per acre at harvest time.
You can use a two-way ANOVA to find out if fertilizer type and planting density have an
effect on average crop yield.
- If the variance within groups is smaller than the variance between groups, the
F-test will find a higher F-value, and therefore a higher likelihood that the
difference observed is real and not due to chance.
A two-way ANOVA with interaction tests three null hypotheses at the same time: There is
no difference in group means at any level of the first independent variable. There is no
difference in group means at any level of the second independent variable. The effect of
one independent variable does not depend on the effect of the other independent variable
(a.k.a. no interaction effect). A two-way ANOVA without interaction (a.k.a. an additive
two-way ANOVA) only tests the first two of these hypotheses.
To use a two-way ANOVA your data should meet certain assumptions. Two-way ANOVA makes
all of the normal assumptions of a parametric test of difference:
1. Homogeneity of variance (a.k.a. homoscedasticity)
The variation around the mean for each group being compared should be similar among
all groups. If your data don’t meet this assumption, you may be able to use a non-
parametric alternative, like the Kruskal-Wallis test.
2. Independence of observations
Your independent variables should not be dependent on one another (i.e. one should
not cause the other). This is impossible to test with categorical variables – it can only
be ensured by good experimental design.
In addition, your dependent variable should represent unique observations – that is,
your observations should not be grouped within locations or individuals.
If your data don’t meet this assumption (i.e. if you set up experimental treatments
within blocks), you can include a blocking variable and/or use a repeated-measures
ANOVA.
The values of the dependent variable should follow a bell curve. If your data don’t meet
this assumption, you can try a data transformation.
Differences of data from One Way ANOVA and Two way ANOVA?
How to perform the two-way ANOVA?
Example:
= 16.33
Statisticians use a measure called the correlation coefficient to determine the strength of
the linear relationship between two variables. There are several types of correlation coefficient
The population correlation coefficient - denoted by the Greek letter 𝜌 is the correlation
computed by using all possible pairs of data values (x,
y) taken from a population.
The linear correlation coefficient - computed from the sample data measures the strength
and direction of a linear relationship between two
quantitative variables. The symbol for the sample correlation
coefficient is r.
The linear correlation coefficient explained in this section is called the Pearson product moment
correlation coefficient (PPMC), named after statistician Karl Pearson, who pioneered the research in this
area.
The range of the linear correlation coefficient is from −1 to +1. If there is a strong positive linear
relationship between the variables, the value of r will be close to +1. If there is a strong negative linear
relationship between the variables, the value of r will be close to −1.
When there is no linear relationship between the variables or only a weak relationship, the value
of r will be close to 0.
When the value of r is 0 or close to zero, it implies only that there is no linear relationship between
the variables. The data may be related in some other nonlinear way.
Levene’s Test Equality of Variance
- It is used to check hat variances are equal for all samples when your data comes
from a non normal distribution. You can use Levene’s test to check the assumption of
equal variances before running a test like One-Way ANOVA.
The null hypothesis for Levene’s is that the variances are equal across all samples. In more formal terms,
that’s written as:
H0: σ12 = σ22 = … = σk2.
The alternate hypothesis (the one you’re testing), is that the variances are not equal for at least one pair:
H0: σ12 ≠ σ22 ≠… ≠ σk2.
The test statistic is a little ugly and involves a few summations:
Zi,j can take on three meanings, depending on if you use the mean, median, or trimmed mean of any
subgroup. The three choices actually determine the robustness and power of the test.
Robustness, is a measure of how well the test does not falsely report unequal variances (when the
variances are actually equal).
Power is a measure of how well the test correctly reports unequal variances.
Trimmed means work best with heavy-tailed distributions like the Cauchy distribution.
For skewed distributions, or if you aren’t sure about the underlying shape of the distribution, the median may
be your best choice.
For symmetric and moderately tailed distributions, use the mean.
Levene’s test is built into most statistical software. For example, the Independent Samples T Test in
SPSS generates a “Levene’s Test for Equality of Variances” column as part of the output. The result from
the test is reported as a p-value, which you can compare to your alpha level for the test. If the p-value is
larger than the alpha level, then you can say that the null hypothesis stands — that the variances are
equal; if the p-value is smaller than the alpha level, then the implication is that the variances are unequal.
Example:
Boys Girls
26 36
15 32
21 42
20 33
38 29
19 29
28 46
49 33
18 25
16 35
Boys Girls
26 36
15 32
21 42
20 33
38 29
19 29
28 46
49 33
18 25
16 35
Mean: 25 Mean: 34
Second Step: Get the absolute differences of x-𝜇
38 29 13 5
19 29 6 5
28 46 3 12
49 33 24 1
18 25 7 9
16 35 9 1
Mean: 25 Mean: 34
38 29 13 5
19 29 6 5
28 46 3 12
49 33 24 1
18 25 7 9
16 35 9 1
Mean: 25 Mean: 34 8.2 4.6
Fourth step: Get the mean of sum of x- 𝜇 of boys and girls
38 29 13 5
19 29 6 5
28 46 3 12
49 33 24 1
18 25 7 9
16 35 9 1
Mean: 25 Mean: 34 8.2 4.6
Mean: 6.4
Fifth step: Get again the value of each boys and girls with the overall mean of two groups
38 29 13 5 6.6 -1.4
19 29 6 5 -0.4 -1.4
28 46 3 12 -3.4 5.6
49 33 24 1 17.6 -5.4
18 25 7 9 0.6 2.6
16 35 9 1 2.6 -5.4
Mean: 25 Mean: 34 8.2 4.6
Mean: 6.4
Sixth step: Squared the xb and xg
Mean: 6.4
Seventh step: Get the sum of xb2 and xg 2 and overall sum
SS Df MS F
Between
Within
Total 592.8
Ninth step: Get the degrees of freedom between, within and total
SS Df MS F
Between 1
Within 18
Total 592.8 19
Tenth step: Get the sum of square between group (Formula for excel: 10*(xboys-overall xboys
and xgirl)^2
SS Df MS F
Between 1
Within 64.8 18
Total 592.8 19
Eleventh step: Get the sum of squares within by subtracting sum of squares total and sum of
squares between
SSw= SSt – SSb
SS Df MS F
Between 64.8 1
Within 528 18
Total 592.8 19
Twelve step: Get the mean sum of square by dividing the Sum of squares by degrees of
freedom
SS Df MS F
Between 64.8 1 64.8
Total 592.8 19
Thirteenth step: Get the f-value by dividing the two mean sum of squares
SS Df MS F
Between 64.8 1 64.8 2.20934..
Total 592.8 19