Professional Documents
Culture Documents
LEARNING OBJECTIVES:
• Conduct a formal hypothesis test of a claim about three or more population means () using F-test
• Conduct a Post Hoc Test using Tukey HSD Test
One-Way ANOVA is a test of hypotheses that three or more population means are all equal, as in the
null hypothesis: Ho: 1 = 2 = 3 = . . . = k. The calculations are intimidating and challenging. The
term one-way is used because the sample data are separated into groups according to one
characteristic. Instead of referring to the main objective of testing for equal means, the term analysis
of variance refers to the method we use, which is based on an analysis of sample variances.
F Distribution
The analysis of variance (ANOVA) methods require the F distribution that has the following properties
1. The F distribution is not symmetric.
2. Values of the F distribution cannot be negative.
3. The exact shape of the F distribution depends on the two different degrees of freedom
RATIONALE
The method of ANOVA, is based on the following concept: With the assumption that the populations all have the
same variance, we estimate the common value of the variance using two different approaches. The two approaches
for estimating the common value of variances are as follows:
(1) The variance between samples (also called variation due to treatment) is an estimate of the common
population variance that is, based on the variation among sample means.
(2) The variance within samples (also called variation due to error) is an estimate of the common
population variance based on the sample variances.
A Self-regulated Learning Module 1
Test Statistic for One-Way ANOVA
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝐹=
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
The numerator of the test statistic F measures variation between sample means. The estimate of variance in the
denominator depends only on the sample variances and is not affected by differences among sample means.
Preliminary Computations
Overall Total: N = n1 + n2 + n3 + … + nk
SS(between) also referred to as SS(treatment) or SS(factor) is a measure of the variation between the sample
means.
SS(within) also referred to as SS(error) is a sum of squares representing the variation that is assumed to be
common to all the populations being considered.
Formula: 𝑆𝑆(𝑤𝑖𝑡ℎ𝑖𝑛) = (𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22 + (𝑛3 − 1)𝑠32 + ⋯ + (𝑛𝑘 − 1)𝑠𝑘2
Given the preceding expressions for SS(total), SS(between) and SS(within), the following relationship will always
hold. SS(total) = SS(between) + SS(within). SS(between) and SS(within) are both sum of squares and if divided
by its corresponding number of degrees of freedom, the mean squares are the results
MS(between) is a mean square for between, MS(within) is a mean square for within, obtained as
obtained as follows: follows:
Formula: Formula:
𝑆𝑆(𝑏𝑒𝑡𝑤𝑒𝑒𝑛) 𝑆𝑆(𝑤𝑖𝑡ℎ𝑖𝑛)
𝑀𝑆(𝑏𝑒𝑡𝑤𝑒𝑒𝑛) = 𝑀𝑆(𝑤𝑖𝑡ℎ𝑖𝑛) =
𝑘−1 𝑁−𝑘
F-Ratio:
𝑀𝑆(𝑏𝑒𝑡𝑤𝑒𝑒𝑛)
𝐹=
𝑀𝑆(𝑤𝑖𝑡ℎ𝑖𝑛)
Which procedure to use for determining the nature of the relationship after the null hypothesis has been rejected
is controversial among statisticians. Among the most common multiple comparison procedures are the Scheffe
test, the Newman-Keuls test, Duncan’s multiple range test, Tukey’s honest significant difference (HSD) test,
Bonferroni t-test, and Fisher’s least significant difference (LSD) test. The most general technique is the one
proposed by Scheffe but tends to produce a high incidence of type II error.
Tukey is used only after a significant F ratio has been obtained. By Tukey method, we compare the difference
between any two mean scores against HSD. A mean difference is statistically significant only if it exceeds HSD.
MSWITHIN
HSD = q
N GROUP
Where
q = table value at a given level of significance for the total number of group means being compared
q(, k, dfERROR)
When n1 ≠ n2 ≠ n3= ≠ . . . ≠ nk NGROUP is replaced by the harmonic mean of the group size.
1 1 1 1 1
HSD = q MSWITHIN + + + ... +
k n1 n2 n3 nk
Steps:
1.) Construct a table of difference between ordered means
2.) Find q (Tabulated Value)
3.) Find HSD
4.) Compare HSD against the table of difference between means
To be regarded as statistically significant, any obtained difference between means must exceed the HSD.
(Number of Groups)
= 5% = 0.05
= 1% = 0.01
H0: μ1 = μ2 = μ3 (mean voltage reading is the same under the three different types of day)
H1: μ1 ≠ μ2 ≠ μ3 (mean voltage reading is different under the three different types of day)
Overall Total: N = n1 + n2 + n3 = 6 + 6 + 6 = 18
SS BETWEEN = ni ( xi − x ) 2 = n1 ( x1 − x ) 2 + n2 ( x2 − x ) 2 + n3 ( x3 − x ) 2
SSBETWEEN = 6(13.57 – 12.79)2 + 6(12.75 – 12.79)2 + 6(12.05 – 12.79)2 = 6.9456
F – Ratio
MS BETWEEN 3.4728
F= = = 38.33112583 = 38.3311
MSWITHIN 0.0906
Step 5: If the value of the test statistic falls in the rejection region, reject H0; otherwise, do not reject H0.
Decision:
A Self-regulated Learning Module 9
Reject Ho since F=38.3311 is within the critical region and greater than the critical value FCRITICAL = 3.68
Conclusion: At 5% level, the mean voltage reading is different under the three different types of day
Summary Table
Source of Variation Sum of Squares Degrees of Freedom Means Square F
(SS) (df) (MS) Test Statistic
Treatments (between) 6.9456 2 3.4728
Error (within) 1.3585 15 0.0906 38.3311
CV[0.05, 2/15) = 3.68 Decision: Reject Ho
Since H0 was rejected, then proceed with the Post Hoc Test using Tukey’s HSD for equal number of entries per
groups
MSWITHIN
HSD = q
N GROUP
Using Table I: Studentized Range (q) for the 0.05 and 0.01 levels
MSWITHIN = 0.0906
NGROUP = n1 = n2 = n3 = 6 (since each group has the same number of cases/observations)
MSWITHIN 0.0906
HSD = q = 3.67 = (3.67)(0.1229) = 0.451043
N GROUP 6
A Self-regulated Learning Module 10
Compare HSD (0.451043) against the table of difference between means. To be regarded as statistically
significant, any obtained difference between means must exceed the HSD.
Interpretation:
A one-way analysis of variance compared the mean voltage reading under the three different types of day. The
alpha level was 0.05 and the test was found to be statistically significant, F(df = 2/15, Fcrit = 3.68) = 38.3311. A
Tukey HSD test indicated that the mean voltage reading for Sunny Days (M = 13.57, SD = 0.40) is significantly
greater than the mean voltage readings for Cloudy Days (M = 12.75, SD = 0.21) and Rainy Days (M = 12.05, SD
= 0.26). Likewise, the mean voltage reading for Cloudy Days is significantly greater than the mean voltage reading
for Rainy Days.
Example: A researcher is interested in the effect type of residence has on the personal happiness of college
students. She selected samples of students who live in campus dorms, in off-campus apartments, and home and
asks the 15 respondents to rate their happiness on a scale of 1 (not happy) to 10 (happy). Test the null hypothesis
that happiness does not differ by types of residence.
Overall Total: N = n1 + n2 + n3 = 5 + 5 + 5 = 15
A Self-regulated Learning Module 11
Overall Mean:
SS BETWEEN = ni ( xi − x ) 2 = n1 ( x1 − x ) 2 + n2 ( x2 − x ) 2 + n3 ( x3 − x ) 2
dfBETWEEN = K – 1 = (3 – 1) = 2
dfWITHIN = N – k = 15 – 3 = 12
F – Ratio
MSBETWEEN 30.46675
F= = = 21.7848
MSWITHIN 1.39853
Summary Table
Source of Sum of Degrees of Means Square F
Variation Squares Freedom (MS) Test Statistic
(SS) (df)
Treatments
(between) 60.9335 2 30.46675
21.7848
Error 16.7824 12
(within) 1.39853
CV[0.05, 2/12) = 3.89 Decision: Reject Ho
Since H0 was rejected, then proceed with the Post Hoc Test using Tukey’s HSD for unequal number of entries
per groups
MSWITHIN
HSD = q
N GROUP
MSWITHIN 1.39853
HSD = q = 3.77 = (3.77)(0.5289) = 1.9938
N GROUP 5
Compare HSD (1.994) against the table of difference between means. To be regarded as statistically significant,
any obtained difference between means must exceed the HSD.
Interpretation:
A one-way analysis of variance compared the mean rating of college students’ happiness of who live in campus
dorms, in off-campus apartments, and home. The alpha level was 0.05 and the test was found to be statistically
significant, F(df = 2/12, Fcrit = 3.89) = 21.7848. A Tukey HSD test indicated that the students’ happiness rating
living in Dorms (M = 7.6, SD = 1.14) is significantly greater than the students’ happiness rating living in
Apartments (M = 2.8, SD = 1.48) and at home (M = 4.2, SD = 0.84). However, the students’ happiness rating
living in Apartments does not differ from students’ happiness rating living at home.
Does exposure to lead affect IQ scores of children? The table below summarized the mean IQ score of children
with Low, Medium and High Blood Lead Levels. At 5% level, test the hypothesis that exposure to leads does not
affect IQ scores.
Low Blood Lead Level Medium Blood Lead Level High Blood Lead Level
Sample size 78 22 21
Sample Mean IQ Score 102.7 94.1 94.2
Standard Deviation 16.8 15.5 11.4
Solution
H0: Exposure to leads does not affect IQ scores
H1: Exposure to leads does affect IQ scores
Overall total: N = n1 + n2 + n3
N = 78 + 22 + 21 = 121
Overall Mean:
𝑛1 𝑥̅1 + 𝑛2 𝑥̅2 + 𝑛3 𝑥̅3 (78)(102.7) + (22)(94.1) + 21(94.2) 12059
𝑥̿ = = = = 99.66
𝑁 121 121
Degrees of Freedom
dfbetween = dfB = (number of groups) – 1
dfB = 3 – 1 = 2
𝑆𝑆𝑊 29376.93
𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑀𝑆𝑊 = = = 𝟐𝟒𝟖. 𝟗𝟓𝟕
𝑑𝑓𝑊 118
F-ratio:
𝑀𝑆𝐵 1013.4938
𝐹= = = 𝟒. 𝟎𝟕𝟏
𝑀𝑆𝑊 248.957
Non
Critical
Region Critical Region
Do not Reject H0
Reject H0
CV = 3.07 F = 4.071
ANOVAL TABLE
Source of Variation Sum of Squares df Mean Squares F-ratio
Interpretation:
IQ scores of children with low blood lead levels (M = 102.7, SD = 16.8) is significantly higher compared to IQ
score of children with medium blood lead levels (M = 94.1, SD = 15.5). Because the mean IQ scores under
medium level is practically the same with the mean IQ score under high level, we can also say that IQ scores of
children with low blood lead levels (M = 102.7, SD = 16.8) is significantly higher than IQ scores of children with
high blood lead levels (M = 94.2, SD = 11.4).
Low Blood Lead Level Medium Blood Lead Level High Blood Lead Level
Sample size 78 22 21
Sample Mean IQ Score 102.7 94.1* 94.2*
Standard Deviation 16.8 15.5 11.4
*practically the same
Example 2
Arsenic in Rice. Listed below are the mean amounts of arsenic in samples of brown rice from three different
farm islands. The amounts are in micrograms of arsenic and all samples have the same serving size. Use a 0.05
significance level to test the claim that the three samples are from populations with the same mean. Do the
amounts of arsenic appear to be different in the different farms? Given that the amounts of arsenic in the samples
from Mindanao have the highest mean, can we conclude that brown rice from Mindanao poses the greatest health
problem?
Luzon Visayas Mindanao
Sample size 12 12 12
Sample mean 5.48 4.71 6.97
Sample standard deviation 0.42 1.19 0.69
Solution:
H0: Amounts of arsenic are the same in the different farm islands
H1: Amounts of arsenic are different in the different farm islands
Overall total: N = n1 + n2 + n3
N = 12 + 12 + 12 = 36
Overall Mean:
𝑛1 𝑥̅1 + 𝑛2 𝑥̅2 + 𝑛3 𝑥̅3 (12)(5.48) + (12)(4.71) + 12(6.97) 205.92
𝑥̿ = = = = 5.72
𝑁 36 36
𝑆𝑆𝑊 22.7546
𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑀𝑆𝑊 = = = 𝟎. 𝟔𝟖𝟗𝟓
𝑑𝑓𝑊 33
F-ratio:
𝑀𝑆𝐵 15.8402
𝐹= = = 𝟐𝟐. 𝟗𝟕𝟑
𝑀𝑆𝑊 0.6895
Critical Value at 5%
dfB = v1 = 2
dfW = v2 = 33
Because v2 = 33 is between 30 and 40 and closer to 30, we choose 30, thus the critical value CV = 3.32
CV = 3.32 F = 22.973
ANOVAL TABLE
Source of Variation Sum of Squares df Mean Squares F-ratio
Between Groups 31.6804 2 15.8402 22.973
Within Groups 22.7546 33 0.6895
CV = 3.32 Decision: Reject H0
MSWITHIN
Since the sample size of the three groups are the same n = 12, we used the formula HSD = q
N GROUP
where MSwithin = 0.6895 and NGROUP = 12 and q (k = 3, dfw = 33, α = 5%) = 3.49
0.6895
𝐻𝑆𝐷 = (3.49)√ = 0.837 = 0.837
12
Compare HSD (0.837) against the table of difference between means. To be regarded as statistically significant,
any obtained difference between means must exceed the HSD.
Conclusion:
At 5% level, amounts of arsenic are different in the different farm islands.
Interpretation:
The arsenic amount of brown rice in Mindanao farm (M = 6.97, SD = 0.69) is significantly greater than the arsenic
amount of brown rice in Visayas farm (M = 4.71, SD = 1.19). Likewise the arsenic amount of brown rice in
Mindanao is also significantly greater than in Luzon (M = 5.48, SD = 0.42) . However, no significant difference
on the arsenic amount of brown rice is found between Luzon and Visayas islands . Further we conclude that
brown rice from Mindanao poses the greatest health problem.
Problem 1. Starting Salaries. The National Association of Colleges and Employers (NACE) conducts surveys
on salary offers to college graduates by field and degree. The following table provides summary statistics for
starting annual salaries, in thousands of dollars, to samples of bachelor’s-degree graduates in four fields.
At the 1% significance level, do the data provide sufficient evidence to conclude that a difference exists in mean
starting salaries among bachelor’s-degree candidates in the four fields? If there are significant differences,
conduct a multiple comparison of means by Tukey’s method to determine where the significant differences occur.
Problem 2. A medical researcher wants to determine whether there is a difference in the mean lengths of time it
takes three types of pain relievers to provide relief from headache pain. Several headache sufferers are randomly
selected and given one of the three medications. Each headache sufferer records the time (in minutes) it takes the
medication to begin working. The results are shown in the table. At 1% level, can you conclude that at least one
mean time is different from the others? If there are significant differences, conduct a multiple comparison of
means by Tukey’s method to determine where the significant differences occur. Assume that each population of
relief times is normally distributed and that the population variances are equal.
Problem 3. A sales analyst wants to determine whether there is a difference in the mean monthly sales of a
company’s four sales regions (cities). Several salespersons from each region are randomly selected and they
provide their sales amounts (in thousands of pesos) for the previous month. The results are shown in the table. At
5% level, can the analyst conclude that there is a difference in the mean monthly sales among the sales regions?
If there are significant differences, conduct a multiple comparison of means by Tukey’s method to determine
where the significant differences occur. Assume that each population of sales is normally distributed and that the
population variances are equal.