You are on page 1of 8

One-Way Analysis of Variance

As we have discussed in the previous module, we need to use Analysis of Variance when
comparing three or more groups. But what is ANOVA?
Analysis of Variance or ANOVA is a group of techniques for statistical analysis of data to
determine whether there is a significant different in two or more levels of a treatment variable.
Recall, in two samples, we are interested to know if the two groups are equal or not. If we have
three groups – A, B, and C, what happens? Well, we can check if A and B are equal or not, then
B and C are equal or not, and finally A and C are equal or not. Analysis of Variance will compare
the three simultaneously and will ONLY tell us if there is at least 1 pair that is significantly
different. It will not tell us which group is statistically different.
So, how does ANOVA do this? It asks the equation, why does the variation exist? If we look back
at the example in the previous module, we have the following.

If we observe the 15% cotton weight group, we have 5 replicates. As I asked previously, why is
it that each result is different despite all of them being 15%? These are errors that we are
unable to explain and just occurs as part of experimentation. Hence, within the 15% cotton
weight group, we can say that there is “internal variation” – factors that cause differences
within the group.
Now, observe the red rectangle around averages. This shows how each of the five groups differ.
For 15%, it averages 9.8 psi. For 20%, it is 15.4 psi. And so on. The question is, why is the tensile
strength different between the groups? This can be thought of as “external variation” – factors
that cause differences between the groups. This kind of variation is explained by our
independent variables. Since we are changing our cotton weight percentage, the tensile
strength also changes.
Now, we are only going to look at one factor or one independent variable. However, this one
independent variable may have three or more groups, levels, or classifications. Applying ANOVA
to one factor experiments leads to One-Way ANOVA.
A generalization of the previous example gives us the following data table

If this table intimidates you, just recall that it is exactly similar to the other table above. The first
column under Treatment shows the different groups, levels, or classifications. Under
Observations, this is where we can see the results for each replication per treatment level.
Averages indicate the means for each treatment level.
The total variation in the results is explained by Sum of Squares Total or SST. This quantifies the
errors and variations across every result in the experiment. To explain what causes this
variation, we have the internal and external variations explained above. Since we are interested
to see if our treatments affect our results, the external variation is good. This external variation
between groups is explained by Sum of Squares Treatments. This kind of variation can be
explained clearly by our independent variable. The other kind of variation is the kind that we
cannot explain. This is the internal variation within groups explained by Sum of Squares Errors.
Mathematically, we can relate the three in the relationship below.
𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = 𝑆𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 + 𝑆𝑆𝐸𝑟𝑟𝑜𝑟𝑠
Individually, each sum of squares can be calculated using the equations below.
𝑎

𝑆𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 = 𝑛 ∑(𝑦̅𝑖. − 𝑦̅.. )2


𝑖=1

For Sum of Square Treatments, we are taking the differences of each of the averages per
treatment in the red rectangle to the grand average in the blue rectangle.
𝑎 𝑛
2
𝑆𝑆𝐸𝑟𝑟𝑜𝑟 = ∑ ∑(𝑦𝑖𝑗 − 𝑦̅𝑖. )
𝑖=1 𝑗=1

For Sum of Squares Error, we are comparing the differences between individual result per
group in orange with the average of that group in green. This has to be done per treatment
level.

𝑎 𝑛
2
𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = ∑ ∑(𝑦𝑖𝑗 − 𝑦̅.. )
𝑖=1 𝑗=1

For Sum of Squares Total, we are quantifying all of the variations in our results. This is
comparing every individual result per treatment and replicate in violet with the grand average
in black.
Practically speaking when calculating manually, we are interested in determining Sum of
Squares Treatments and Sum of Squares Total first since it is relatively easier to do so. The
Sum of Squares Error is just determined by difference.
If we closely observe the equations for the three Sum of Squares, it should appear to you that it
is very similar to the equation for variance. As a result, we can think of Sum of Squares as
variances. However, recall that in variance, we have the denominator usually 𝑛 − 1 for sample
variances. For ANOVA, this denominator is defined by the degrees of freedom. For Sum of
Squares Treatment, the degrees of freedom is 𝑎 − 1. This is because we have a total of a
groups. For Sum of Squares Total, it is easy since we will have a total of N measurements then
the degrees of freedom is just 𝑁 − 1. For Sum of Squares Error, the degrees of freedom can be
determined by difference.
𝐷𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝐹𝑟𝑒𝑒𝑑𝑜𝑚 𝑇𝑜𝑡𝑎𝑙 = 𝐷𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝐹𝑟𝑒𝑒𝑑𝑜𝑚 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 + 𝐷𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝐹𝑟𝑒𝑒𝑑𝑜𝑚𝐸𝑟𝑟𝑜𝑟
In essence, ANOVA compares the variation between groups with variation within groups.
Hence, we are comparing “two variances”. Recall that this requires the F-test and F distribution.
If the variation between groups is larger, it implies that the application of the treatments
caused a more significant amount of the variation in results. This tells us that our treatments
are working.
Okay, hopefully we were able to understand the mechanism behind ANOVA. We will not be
calculating this manually. Instead, we will use MS Excel to help us calculate the ANOVA easily.
For me, understanding the analysis is more important than simply calculating the results.

Example

This is the same experiment in the previous module. Take note of the hypotheses. The null
hypothesis assumes that all groups are equal. If all groups are equal, it implies that our
treatments do not affect our dependent variable. If at least 1 is different, then we can infer that
our treatments do affect our dependent variable.
To use Excel, we just simply have to input our data similar to how it is presented in the given
table.
After, we go to Data then Data Analysis

Data Analysis opens a new window. Select ANOVA: Single Factor


Selecting ANOVA: Single Factor will open another window

For Input Range, select the cells which includes all of the data points. Note that the headings
per row is included as we are checking “Labels in first column” shown in blue rectangle. We can
also select the level of significance shown in Alpha. Normally, we can just leave it at 0.05. Also,
note that the data is grouped by rows in this example. In others, it can be grouped by columns.
After all of this, select OK.

A new worksheet will open showing two tables. The first table is the descriptive statistics of the
data provided. As you can see, each group is labeled according to the percent cotton weight.
Also, the sum, average, and variance per group is indicated. Comparing this to the given data,
we can see that we obtained the correct results.
The important part here is the second table labeled ANOVA.

As you can see, we have sources of variation split to two – Between Groups and Within Groups.
Recall that Variation Between Groups refers to Sum of Squares Treatments; Variation Within
Groups refers to the Sum of Squares Error; and Total refers to the Sum of Squares Total. You
can also see that the degrees of freedom is indicated correctly. There are 5 groups – 15%, 20%,
25%, 30%, and 35%. As a result, there are 4 degrees of freedom between groups. Since there
are 5 groups with 5 replicates each, there are a total of 25 data points. Hence, the total degrees
of freedom is 24. By difference, we can see that the degrees of freedom within groups is 24 −
4 = 20. These can all be determined automatically using Excel. Hence, we will not be stressing
ourselves with manual calculations.
Finally, focus on the P-value. This will be our decision rule for the ANOVA.
𝑅𝑒𝑗𝑒𝑐𝑡 𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑖𝑓 𝑃 − 𝑣𝑎𝑙𝑢𝑒 < α
𝐹𝑎𝑖𝑙 𝑡𝑜 𝑟𝑒𝑗𝑒𝑐𝑡 𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑖𝑓 𝑃 − 𝑣𝑎𝑙𝑢𝑒 > α
In this case, our level of significance, α, is 0.05. Meanwhile, our P-value from the table is
approximately 0. Since,
𝟎 < 𝟎. 𝟎𝟓
𝑹𝒆𝒋𝒆𝒄𝒕 𝒏𝒖𝒍𝒍 𝒉𝒚𝒑𝒐𝒕𝒉𝒆𝒔𝒊𝒔
Whenever we have ANOVA and we reject the null hypothesis, our conclusion is that the
independent variable significantly affects our dependent variable. For this example, cotton
weight percent significantly affects tensile strength of the synthetic fiber.
However, take note that ANOVA does not tell us which percentage is the best to use. It does
not tell us which cotton weight percent significantly differs from the rest. ANOVA only tells us
that at least one group significantly differs. The easiest way to have this conclusion is to look
back at the first table shown and compare the averages.

You might also like