You are on page 1of 32

 Statistic for ANOVA

The test statistic for testing H0: μ1 = μ2 = ... =   μk is:

and the critical value is found in a table of probability values for the F distribution with (degrees of
freedom) df1 = k-1, df2=N-k. The table can be found in "Other Resources" on the left side of the pages.

In the test statistic, nj = the sample size in the jthgroup (e.g., j =1, 2, 3, and 4 when there are 4
comparison groups),

is the sample mean in the jth group, and

is the overall mean.  k represents the number of independent groups (in this example, k=4), and N
represents the total number of observations in the analysis. Note that N does not refer to a
population size, but instead to the total sample size in the analysis (the sum of the sample sizes in
the comparison groups, e.g., N=n1+n2+n3+n4). The test statistic is complicated because it
incorporates all of the sample data. While it is not easy to see the extension, the F statistic shown
above is a generalization of the test statistic used for testing the equality of exactly two means.  

NOTE: The test statistic F assumes equal variability in the k populations (i.e., the population
variances are equal, or s12 = s22 = ... = sk2 ). This means that the outcome is equally variable in each of
the comparison populations. This assumption is the same as that assumed for appropriate use of
the test statistic to test equality of two independent means. It is possible to assess the likelihood
that the assumption of equal variances is true and the test can be conducted in most statistical
computing packages. If the variability in the k comparison groups is not similar, then alternative
techniques must be used.

The F statistic is computed by taking the ratio of what is called the "between treatment" variability to
the "residual or error" variability. This is where the name of the procedure originates. In analysis of
variance we are testing for a difference in means (H 0: means are all equal versus H1: means are not
all equal) by evaluating variability in the data. The numerator captures between treatment variability
(i.e., differences among the sample means) and the denominator contains an estimate of the
variability in the outcome. The test statistic is a measure that allows us to assess whether the
differences among the sample means (numerator) are more than would be expected by chance if
the null hypothesis is true. Recall in the two independent sample test, the test statistic was
computed by taking the ratio of the difference in sample means (numerator) to the variability in the
outcome (estimated by Sp).  

The decision rule for the F test in ANOVA is set up in a similar way to decision rules we established
for t tests. The decision rule again depends on the level of significance and the degrees of freedom.
The F statistic has two degrees of freedom. These are denoted df 1 and df2, and called the numerator
and denominator degrees of freedom, respectively. The degrees of freedom are defined as follows:

df1 = k-1 and df2=N-k,

where k is the number of comparison groups and N is the total number of observations in the
analysis.   If the null hypothesis is true, the between treatment variation (numerator) will not exceed
the residual or error variation (denominator) and the F statistic will small. If the null hypothesis is
false, then the F statistic will be large. The rejection region for the F test is always in the upper (right-
hand) tail of the distribution as shown below.

Rejection Region for F   Test with a =0.05, df1=3 and df2=36 (k=4, N=40)

For the scenario depicted here, the decision rule is: Reject H 0 if F > 2.87.

The ANOVA Procedure


We will next illustrate the ANOVA procedure using the five step approach. Because the computation
of the test statistic is involved, the computations are often organized in an ANOVA table. The
ANOVA table breaks down the components of variation in the data into variation between treatments
and error or residual variation. Statistical computing packages also produce ANOVA tables as part
of their standard output for ANOVA, and the ANOVA table is set up as follows: 
How ANOVA Works

How does the ANOVA procedure compute a p-value? This section shows you the formulas
and carries through the computations for the example with fat for frying donuts.

Remember, long ago in a galaxy called Descriptive Statistics, how the variance was defined:
find the mean, then for each data point take the square of its difference from the mean. Add
up all those squares, and you have SS(x), the sum of squared deviations in x. The variance
was SS(x) divided by the degrees of freedom n−1, so it was a kind of average or mean
squared deviation. You probably learned the shortcut computational formulas:
SS(x) = ∑x² − (∑x)²/n or SS(x) = ∑x² − nx̅ ²
and then
s² = MS(x) = SS(x)/df where df = n−1

In 1-way ANOVA, we extend those concepts a bit. First you partition SS(x) into between-
treatments and within-treatments parts, SSBand SSW. Then you compute the mean square
deviations:
 MSB is called the between-treatments mean square, between-groups variance, or factor MS. It
measures the variability associated with the different treatment levels or different values of the factor.

 MSW is called the within-treatments mean square, within-group variance, pooled variance, or error
MS. It measures the variability that is not associated with the different treatments.

Finally you divide the two to obtain your test statistic, F = MSB/MSW, and you look up the p-
value in a table of the F distribution.
(The F distribution is named after “the celebrated R.A. Fisher” (Kuzma & Bohnenblust
2005, 176). You may have already seen the F distribution in computing a different ratio of
variances, as part of testing the variances of two populations for equality.)

There are several ways to compute the variability, but they all come up with the same
answers and this method in Spiegel and Stephens 1999 pages 367–368 is as easy as any:

SS df MS F

Between groups SSB = ∑njx̅ j²−Nx̅ ² dfB= r−1 MSB = SSB/dfB F = MSB/MSW


(or “Factor”)

Within groups SSW = SStot−SSB dfW= N−r MSW = SSW/dfW


(or “Error”)*

Total* SStot = ∑x²−Nx̅ ² dftot= N−1

* or, if you know the standard deviations of the samples,


SS df MS F

SSW = ∑(nj−1)sj² 
SStot = SSB + SSW

where
 r is the number of treatments.
 nj, x̅ j, sj for each treatment are the sample size, sample mean, and sample standard
deviation.
 N is the total sample size and x̅  = ∑x/N is the overall sample mean or “grand mean”.x̅  can
also be computed from the sample means by
x̅  = ∑njx̅ j/N

You begin with the treatment means x̅ j={72, 85, 76, 62} and the overall mean x̅ =73.75, then
compute
SSB = (6×72²+6×85²+6×76²+6×62²) − 24×73.75² = 1636.5
MSB = 1636.5 / 3 = 545.4

The next step depends on whether you know the standard deviations s j of the samples. If
you don’t, then you jump to the third row of the table to compute the overall sum of
squares:
∑x² = 64² + 72² + 68² + ... + 70² + 68² = 134192
SStot = ∑x² − Nx̅ ² = 134192 − 24×73.75² = 3654.5

Then you find SSW by subtracting the “between” sum of squares SSB from the overall sum of
squares SStot:
SSW = SStot−SSB = 3654.5−1636.5 = 2018.0
MSW = 2018.0 / 20 = 100.9

Now you’re almost there. You want to know whether the variability between treatments,
MSB, is greater than the variability within treatments, MSW. If it’s enough greater, then you
conclude that there is a real difference between at least some of the treatment means and
therefore that the factor has a real effect. To determine this, divide
F = MSB/MSW = 5.41
This is the F statistic. The F distribution is a one-tailed distribution that depends on both
degrees of freedom, dfB and dfW.
At long last, you look up F=5.41 with 3 and 20 degrees of freedom, and you find a p-
value of 0.0069. The interpretation is the usual one: there’s only a 0.0069 chance of getting
an F statistic greater than 5.41 (or higher variability between treatments relative to the
variability within treatments) if there is actually no difference between treatments. Since
the p-value is less than α, you conclude that there is a difference.
Estimating Individual Treatment Means

Usually you’re interested in the contrast between two treatments, but you can also
estimate the population mean for an individual treatment. You do use a t interval, as you
would when you have only one sample, but the standard error and degrees of freedom are
different (NIST 2012 section 7.4.3.6).

To compute a confidence interval on an individual mean for the jth treatment, use


df = dfW
standard error = √(MSW/nj)
Therefore the margin of error, which is the half-width of the confidence interval, is
E = t(α/2,dfW) · √(MSW/nj)

Example: Refer back to the fats for frying donuts. Estimate the population mean for Fat 2
with 95% confidence? In other words, if you fried a great many batches of donuts in Fat 2,
how much fat per batch would be absorbed, on average?

Solution: First, marshal your data:


sample mean for Fat 2: x̅ 2 = 85
sample size: n2 = 6
degrees of freedom: dfW = 20 (from the ANOVA table)
MSW = 100.9 (also from the table)
1−α = 0.95

TI-83 or TI-84 users, please see an easy procedure below.

Computation by Hand

Begin by finding the critical t. Since 1−α = 0.95, α/2 = 0.025. You therefore need
t(0.025,20). You can find this from a table:
t(0.025,20) = 2.0860

Next, find the standard error. This is


standard error = √(MSW/nj) = √(100.9/6) = 4.1008

Now you’re ready to finish the confidence interval. The margin of error is
E = t(α/2,df) · √(MSW/nj) = 2.0860×4.1008 = 8.5541
Therefore the confidence interval is
μ2 = 85 ± 8.6 g (95% confidence)
or
76.4 g ≤ μ2 ≤ 93.6 g (95% confidence)

Conclusion: You’re 95% confident that the true mean amount of fat absorbed by a batch of
donuts fried in Fat 2 is between 76.4 g and 93.6 g.
TI-83/84 Procedure

Your TI calculator is set up to do the necessary calculations, but there’s one glitch because
the degrees of freedom is not based on the size of the individual sample, as it is in a regular
t interval. So you have to “spoof” the calculator as follows.
Press [STAT] [◄] [8] to bring up the TInterval screen. First I’ll tell you what to enter;
then I’ll explain why.

 x̅ : mean of the treatment sample, here 85


 Sx: √(MSW*(dfW+1)/nj), here √(100.9*21/6)
 n: dfW+1, here 21
 C-Level: as specified in the problem, here .95
Now, what’s up with n and Sx? Well, the calculator uses n to compute degrees of freedom
for critical t as n−1. You want degrees of freedom to be dfW, so you lie to the calculator and
enter the value of n as dfW+1 (20+1 = 21).
But that creates a new problem. The calculator also divides s by √n to come up with
the standard error. But you want it to use nj (6) and not your fake n (21). So you have to
multiply MSW by dfW+1 and divide by nj to trick the calculator into using the value you
actually want.
By the way, why is MSW inside the square root sign? Because the calculator wants a
standard deviation, but MSW is a variance. As you know, standard deviation is the square
root of variance.

All this fakery achieves the desired result: the confidence interval matches the one that you
would have if you computed it by hand.

Conclusion: You’re 95% confident that the true mean amount of fat absorbed by a batch of
donuts fried in Fat 2 is between 76.4 g and 93.6 g.

η²: Strength of Association

Lowry 1988 chapter 14 part 2 mentions a measure that is usually neglected in ANOVA: η².
(η is the Greek letter eta, which rhymes with beta.)
η² = SSB/SStot, the ratio of sum of squares between groups to total sum of squares. For
the donut-frying example,
η² = SSB/SStot = 1636.5 / 3654.5 = 0.45

What does this tell you? η² measures how much of the total variability in the dependent
variable is associated with the variation in treatments. For the donut example, η² = 0.45
tells you that 45% of the variability in fat absorption among the batches is associated with
the choice of fat.

References

Abdi, Hervé, and Lynne J. Williams. 2010.


“Newman-Keuls Test and Tukey Test”. In Encyclopedia of Research Design. Sage. Retrieved
15 May 2016 from https://www.utdallas.edu/~herve/abdi-NewmanKeuls2010-pretty.pdf

Lowry, Richard. 1988.


Concepts & Applications of Inferential Statistics. Retrieved 15 May 2016
from http://vassarstats.net/textbook/

Lowry, Richard. 2001a.


“Critical Values of Q”, part ofCalculators for Statistical Table Entries. Retrieved 15 May 2016
from http://www.vassarstats.net/tabs.html#q

Lowry, Richard. 2001b.


One-Way Analysis of Variance for Independent or Correlated Samples(online calculator).
Retrieved 15 May 2016 from http://vassarstats.net/anova1u.html

Miller, Rupert G., Jr. 1986.


Beyond ANOVA: Basics of Applied Statistics. Wiley.

NIST, National Institute of Standards and


Technology. 2012.
NIST/SEMATECH e-Handbook of Statistical Methods. Retrieved 15 May 2016
from http://www.itl.nist.gov/div898/handbook/

Snedecor, George W., and William G.


Cochran. 1989.
Statistical Methods. 8th ed. Iowa State.

Spiegel, Murray R., and


Larry J. Stephens. 1999.
Theory and Problems of Statistics. 3d ed. McGraw-Hill.

-tests for Different Purposes


There are different types of t-tests for different purposes. Some of the more common types are
outlined below.

1. F-test for testing equality of variance is used to test the hypothesis of the equality of
two population variances. The height example above requires the use of this test.
2. F-test for testing equality of several means. The test for equality of several means is carried
out by the technique called ANOVA.
For example, suppose that an experimenter wishes to test the efficacy of a drug at three levels:
100 mg, 250 mg and 500 mg. A test is conducted among fifteen human subjects taken at random,
with five subjects being administered each level of the drug.

To test if there are significant differences among the three levels of the drug in terms of efficacy,
the ANOVA technique has to be applied. The test used for this purpose is the F-test.
3. F-test for testing significance of regression is used to test the significance of the regression
model. The appropriateness of the multiple regression model as a whole can be tested by this
test. A significant F value indicates a linear relationship between Y and at least one of the Xs.
Assumptions
Irrespective of the type of F-test used, one assumption has to be met: the populations from which
the samples are drawn have to be normal. In the case of the F-test for equality of variance, a second
requirement has to be satisfied in that the larger of the sample variances has to be placed in the
numerator of the test statistic.
Like t-test, F-test is also a small sample test and may be considered for use if sample size is < 30.

Deciding
In attempting to reach decisions, we always begin by specifying the null hypothesis against a
complementary hypothesis called the alternative hypothesis. The calculated value of the F-test with
its associated p-value is used to infer whether one has to accept or reject the null hypothesis.
All statistics software packages provide these p-values. If the associated p-value is small i.e. (< 0.05)
we say that the test is significant at 5% and we may reject the null hypothesis and accept the
alternative one.

On the other hand if the associated p-value of the test is > 0.05, we should accept the null
hypothesis and reject the alternative. Evidence against the null hypothesis will be considered very
strong if the p-value is less than 0.01. In that case, we say that the test is significant at 1%.

Free Six Sigma Practice Exams

Easily prepare for your Six Sigma Black Belt or Green Belt
Certification Exam

 Sign Up Now
Search
Search for:
Skip to content

Six Sigma Study Guide


Study notes and guides for Six Sigma certification tests
Home  /  Six Sigma Study Guide Articles  /  F Distribution, F Statistic, F Test

F Distribution, F Statistic, F Test


Posted by Ramana PV
What is the F Distribution

The F-distribution, also known Fisher-Snedecor distribution is


extensively used to test for equality of variances from two normal
populations. F-distribution got its name after R.A. Fisher who initially
developed this concept in 1920s. It is a probability distribution of an
F-statistic.

The F-distribution is generally a skewed distribution and also related


to a chi-squared distribution. The f distribution is the ratio of
X1 random chi-square variable with degrees of freedom ϑ1 and
X2 random chi-square variable with degrees of freedom ϑ2. (In other
words each of the chi-square random variable has been divided by
its degrees of freedom)

The shape of the distribution depends on the degrees of freedom of


numerator ϑ1 and denominator ϑ2.  

What are the properties of an F Distribution?

 F distribution curve is positively skewed towards right with


range of 0 and ∞
 The value of F always positive or zero. No negative values
 The shape of the distribution depends on the degrees of
freedom of numerator ϑ1 and denominator ϑ2.  
 From the above graph it is clear that degree of skewness
decreased with increase of degrees of freedom of numerator
and denominator
 F distribution curve never be symmetrical, if degrees of
freedom increases it will be more similar to the symmetrical
When would you use the F Distribution?
The F-test compares the more than one level of independent
variable with multiple groups which uses the F distribution. This is
generally used in ANOVA calculations. Always use F-distribution for
F-test to compare more than two groups.
Example: In a manufacturing unit torque values are key parameters
in terminals squeeze welding. To check the significant effect of
various torque (nm) values of the squeeze welding, operator set up
trails of 5nm, 8nm, 10nm and 12nm of terminals in four randomly
selected batches of 30. ANOVA can determine whether the means of
these 4 trails are different. ANOVA uses F-tests to statistically test
the equality of means.
Assumptions of F distribution
 Assumes both populations are normally distributed
 Both the populations are independent to each other
 The larger sample variance always goes in the numerator to
make the right tailed test, and the right tailed tests are always
easy to calculate.
F distribution Videos
What is an F Test?

F test is to find out whether the two independent estimates of


population variance differ significantly. In this case F ratio is
or

To find out whether the two samples drawn from the normal
population having the same variance. In this case F ratio is

In both the cases σ12 > σ22 ,  S12 > S22 in other words larger estimate of
variance always be in numerator and smaller estimate of variance in
denominator 

Degrees of freedom (ϑ)

 DF of larger variance (i.e numerator) =n1-1


 DF of smaller variance (i.e denominator) =n2-1
What is an F Statistic?
F statistic also known as F value is used in ANOVA and regression
analysis to identify the means between two populations are
significantly different or not. In other words F statistic is ratio of two
variances (Variance is nothing but measure of dispersion, it tells
how far the data is dispersed from the mean). F statistic accounts
corresponding degrees of freedom to estimate the population
variance.
F statistic is almost similar to t statistic. t-test states a single
variable is statistically significant or not whereas F test states a
group of variables are statistically significant or not.

F statistics are based on the ratio of mean squares. F statistic is the


ratio of the mean square for treatment or between groups with the
Mean Square for error or within groups.

F = MS Between / MS Within
If calculated F value is greater than the appropriate value of the F
critical value (found in a table or provided in software), then the null
hypothesis can be rejected. (helpful in ANOVA)

The calculated F-statistic for a known source of variation is found by


dividing the mean square of the known source of variation by the
mean square of the unknown source of variation.

When would you use an F Test?


There are different types of F tests are exists for different purpose.

 In statistics, an F-test of equality of variances is a test for the


null hypothesis that two normal populations have the same
variance.
 F-test is to test equality of several means. While ANOVA uses
to test the equality of means.
 F-test for linear regression model is to tests any of the
independent variables in a multiple linear regression are
significant or not. It also indicates a linear relationship between
dependent variable and at least one of the independent
variable.
Steps to conduct F test
 Choose the test: Note down the independent variables and
dependent variable and also assume the samples are normally
distributed
 Calculate the F statistic, choose the highest variance in the
numerator and lowest variance in the denominator with a
degrees of freedom (n-1)
 Determine the statistical hypothesis
 State the level of significance
 Compute the critical F value from F table. (use α/2 for two
tailed test)
 Calculate the test statistic
 Finally, draw the statistical conclusion. If Fcalc > Fcritical, reject the
null hypothesis and if Fcalc< Fcritical fail to reject the null hypothesis
What is an Example of an F Test in DMAIC?

In Measure and Analyze phase of DMAIC. F test is to find out


whether the two independent estimates of population variance differ
significantly (or) to find out whether the two samples drawn from
the normal population having the same variance.

Example: A botanical research team wants to study the growth of


plants with the usage of urea. Team conducted 8 tests with a
variance of 600 during initial state and after 6 months 6 tests were
conducted with a variance of 400. The purpose of the experiment is
to know is there any improvement in plant growth after 6 months at
95% confidence level

 Degrees of freedom ϑ1=8-1 =7 (highest variance in


numerator)
 ϑ2 = 6-1= 5
 Statistical hypothesis:
o Null hypothesis H0: σ12≤ σ22
o Alternative hypothesis H1: σ12≥ σ22
 Since team wants to see the improvement it is a one-tail (right)
test
 Level of significance α= 0.05

 Compute the critical F from table = 4.88


 Reject the null hypothesis if the calculated F value more than
or equal to 4.88
 Calculate the F value F= S12/ S22 =600/400= 1.5
 Fcalc< Fcritical Hence fail to reject the null hypothesis
p-value

From F table we can find F critical values that gives us a certain area
of to the right. From the above table the area to the right of 4.88 is
0.05 and area to the right of 3.37 is 0.100. So, the area to the right
of 1.5 from the graph must be more than 0.100. But we can find
exact p-value using any statistical tool or excel very easily.

 Statistical conclusion: So, calculated value does not lie in the


critical region. Hence fail to reject the null hypothesis at 95%
confidence level
F Test Sample Questions

In a manufacturing facility 2 Six Sigma Greenbelts monitoring part


that runs on 2 different stamping presses. Each press runs the same
progressive die. Student A says that he is 90% confident that the
stamping presses have the same variance, while student B says at
the 90% confidence level the variances are different. Which student
is right? Press1: s = 0.035, n = 16 ; Press 2: s = 0.057, n = 10
F test vidoes
Important links
 http://people.richland.edu/james/lecture/m170/ch13-f.html
This entry was posted in Measure and tagged ASQ, Black Belt, Green
Belt, IASSC, Ramana, Update needed, Villanova. Bookmark
the permalink.
Comments (4)

Afeef Ajmalsays:
January 9, 2018 at 10:45 am

How to find p-value say if my F- Test value is 9.46

Reply

Six Sigma Study Guidesays:


January 10, 2018 at 4:43 am
What have you tried to do, Afeef?

Reply

rodney javier ramirez teransays:


May 14, 2018 at 10:52 am

to calculate with the variances … the highest goes first always (in
the numerator)?

Reply

Ted Hessingsays:
November 18, 2019 at 11:34 am

I’ll follow up with you in the member area, Rodney.

Reply
Leave a Reply
Your email address will not be published. Required fields are marked  *

Comment

Name*
Email*

Website

This site uses Akismet to reduce spam. Learn how your comment


data is processed.
Ready to Pass Your Six Sigma Exam?

30 Day No-Questions Asked Guarantee

Do the Work, Pass Your Exam, or Your $ Back.


100% of candidates who complete my study guide report passing
their exam! Full refund if you complete the study guide but fail your
exam. This is your 100% Risk Free option!

Want to Pass Your Six Sigma Exam the First Time through?
If so, you need a study guide. I’ll show you how and what to study
for maximum results.

 Back to Top
Psychological Statistics
1. Preview of ANOVA

Definition of Analysis of Variance


Analysis of Variance (ANOVA) is a hypothesis testing procedure that is used to evaluate
differences between the means of two or more treatments or groups (populations). ANOVA
uses sample data to make inferences about populations.

Goals of ANOVA

Conceptually, the goal of ANOVA is to determine the amount of variability in groups of data,
and to see if the variability is greater between groups than within groups.

ANOVA & T-Tests:

ANOVA is a more general version of the t-test in two ways:


1. Both tests use sample data to test hypotheses about population means. ANOVA,
however, can test hypotheses about two or more population means. The T-Test can
only test hypotheses about two population means.
2. T-Test can only be used with one independent (classification) variable, whereas
ANOVA can be used with more any number of independent (classification)
variables.

Like the T-Test, ANOVA can be used with either independent or dependent measures


designs. That is, the several measures can come from several different samples
(independent measures design), or they can come from repeated measures taken on the
same sample of subjects (repeated --- dependent --- measures design).

Coverage

1. Chapter 13 covers only the very simplest type of ANOVA: One-Way Independent


Measures ANOVA (called "single-factor Independent Measures" designs in the book).
2. Chapter 14 covers one-way (single-factor) repeated measures designs. We don't
cover this.
3. Chapter 15 covers two-way independent measures designs. I'll talk about this briefly.
o One-Way, Independent-Measures design

Example:
This is hypothetical data from an experiment examining learning performance under three
temperature conditions. There are three separate samples, with n=5 in each sample. These
samples are from three different populations of learning under the three different
temperatures. The dependent variable is the number of problems solved correctly.

Independent Variable:
Temperature (Farenheit)

Treatment Treatment Treatment


1 2 3
50-F 70-F 90-F

0 4 1
1 3 2
3 6 2
1 3 0
0 4 0

Mean=1 Mean=4 Mean=1

This is a one-way, independent-measures design. It is called "one-way" ("single-factor")


because "Temperature" is the only one independent (classification) variable. It is called
"independent-measures" because the measures that form the data (the observed values on
the number of problems solved correctly) are all independent of each other --- they are
obtained from seperate subjects.

Hypotheses:

In ANOVA we wish to determine whether the classification (independent) variable affects


what we observe on the response (dependent) variable. In the example, we wish to
determine whether Temperature affects Learning.

In statistical terms, we want to decide between two hypotheses: the null hypothesis (Ho),
which says there is no effect, and the alternative hypothesis (H1) which says that there is an
effect.

In symbols:

Note that this is a non-directional test. There is no equivalent to the directional (one-tailed)


T-Test.

The t test statistic for two-groups:

Recall the generic formula for the T-Test:

For two groups the sample statistic is the difference between the two sample means, and in
the two-tail test the population parameter is zero. So, the generic formula for the two-group,
two-tailed t-test can be stated as:

(We usually refer to the estimated standard error as, simply, the standard error).

The F test statistic for ANOVA:


The F test statistic is used for ANOVA. It is very similar to the two-group, two-tailed T-test.
The F-ratio has the following structure:

Note that the F-ratio is based on variance rather than difference.

But variance is difference: It is the average of the differences of a set of values from their
mean.

The F-ratio uses variance because ANOVA can have many samples of data, not just two as in
T-Tests. Using the variance lets us look at the differences that exist between all of the many
samples.

1. The numerator: The numerator (top) of the F-ratio uses the variance between the
sample means. If the sample means are all clustered close to each other (small
differences), then their variance will be small. If they are spread out over a wider
range (bigger differences) their variance will be larger. So the variance of the sample
means measures the differences between the sample means.
2. The denominator: The denominator (bottom) of the F-ratio uses the error variance,
which is the estimate of the variance expected by chance. The error variance is just
the square of the standard error. Thus, rather than using the standard deviation of
the error, we use the variance of the error. We do this so that the denominator is in
the same units as the numerator.
o The logic of ANOVA

We demonstrate the logic of ANOVA by using the set of data we introduced above. These are
the data concerning learning under three different temperature conditions. Once again, here
are the data:
Independent Variable:
Temperature (Farenheit)

Treatment Treatment Treatment


1 2 3
50-F 70-F 90-F

0 4 1
1 3 2
3 6 2
1 3 0
0 4 0

Mean=1 Mean=4 Mean=1


The most obvious thing about the data is that they are not all the same: The scores are
different; they are variable.

The goals of ANOVA:

1. To measure the amount of variability;


2. To explain where it comes from.

Total Variability:
Our measure of the amount of variability is simply the variability of all of the data: We
combine all of the data in the experiment together and calculate its variability.

Once we have defined our measure of the total amount of variability, we wish to explain
where it comes from: Does it come from the experimental treatment, or is it just random
variation? We answer this question by analyzing the sources of variability:

Between-Treatments Variability:

Looking at the data above, we can clearly see that much of the variability is due to the
experimental treatments: The scores in the 70-F condition tend to be much higher than those
in the other conditions: The mean for 70-F is higher than for 50-F and 90-F. Thus, we can
calculate the variability of the means to measure the variability between treatments.

Mean Square Between: The between-treatments variability is measured by the variance of


the means. In ANOVA it is called the mean square between. For these data:

Within-Treatment Variability:

In addition to the between-treatments variability, there is variability within each treatment.


The within treatments variability will provide a measure of the variability inside each
treatment condition.

Mean Square Within: The within-treatment variability measure is a variance measure that
summarizes the three within-treatment variances. It is called the mean square within. For
these data:

The heart of ANOVA is analyzing the total variability into these two components, the mean
square between and mean square within. Once we have analyzed the total variability into its
two basic components we simply compare them. The comparison is made by computing the
F-ratio. For independent-measures ANOVA the F-ratio has the following structure:
or, using the vocabulary of ANOVA,

For the data above:

(Note: The book says 11.28, but this is a rounding error. The correct value is 11.25.)

2. The Distribution of F-ratios

The F-ratio is constructed so that

1. The numerator and denominator of the ratio measure exactly the same variance
when the null hypothesis is true. Thus: when Ho is true, F is about 1.00.
2. F-ratios are always positive, because the F-ratio is a ratio of two variances, and
variances are always positive.

Given these two factors, we can sketch the distribution of F-ratios. The distribution piles up
around 1.00, cuts off at zero, and tapers off to the right.

Degrees of Freedom: Note that the exact shape depends on the degrees of freedom of the
two variances. We have two separate degrees of freedom, one for the numerator (sum of
squares between) and the other for the denominator (sum of squares within). They depend
on the number of groups and the total number of observations. The exact number of degrees
of freedom follows these two formulas (k is the number of groups, N is the total number of
observations):

Here are two examples of F distributions. They differ in the degrees of freedom:

3. For the data about learning under different termperature condtions (discussed
above), the df(between)=3-1=2, and the df(within)=15-3=12. We can look up the
critical value of F (.05) and find that it is 3.88. The observed F=11.28, so we reject the
null hypothesis. The F-ratio distribution is:

4. For data where df=5,30 (6 groups, 36 observations), the F-ratio distribution is:

3. A Conceptual View of ANOVA

Conceptually, the goal of ANOVA is to determine the amount of variability in groups of data,
to determine where it comes from, and to see if the variability is greater between groups than
within groups.

We can demonstrate how this works visually. Here are three possible sets of data. In each
set of data there are 3 groups sampled from 3 populations. We happen to know that each set
of data comes from populations whose means are 15, 30 and 45.

We have colored the data to show the groups. We use

1. Red for the group with mean=15


2. Green for the group with mean=30
3. Blue for the group with mean=45

With each visualization we present the corresponding F-Test value and its p value.

4. For the first example the populations each have a variance of 4.

F=854.24, p<.0001.
5. For the second example the outer two populations still have a variance of 4, but the
middle one has a variance of 64, so it overlaps the outer two (though they are still
fairly well separated).

F=11.66, p<.0001.
6. For the third example the three populations have a variance of 64, so they all overlap
a lot.

F=1.42, p=.2440.

Note that in these examples, the means of the three groups haven't varied, but the variances
have. We see that when the groups are well separated, the F value is very significant. On the
other hand, when they overlap a lot, the F is much less significant.

4. Post Hoc Tests


You will recall, that in ANOVA the null and alternative hypotheses are:

When the null hypothesis is rejected you conclude that the means are not all the same. But
we are left with the question of which means are different:

Post Hoc tests help give us an answer to the question of which means are different.

Post Hoc tests are done "after the fact": i.e., after the ANOVA is done and has shown us that
there are indeed differences amongst the means. Specifically, Post Hoc tests are done when:

1. you reject Ho, and


2. there are 3 or more treatments (groups).

A Post Hoc test enables you to go back through the data and compare the individual
treatments two at a time, and to do this in a way which provides the appropriate alpha level.

T-Tests can't be used: We can't do this in the obvious way (using T-Tests on the various
pairs of groups) because we would get too "rosy" a picture of the significance (for reasons I
don't go into). The Post Hoc tests gaurantee we don't get too "rosy" a picture (actually, they
provide a picture that is too "glum"!).

Two Post Hoc tests are commonly used (although ViSta doesn't offer any Post Hoc tests):

1. Tukey's HSD Test (thats HSD for Honestly Significant Difference). This test can be
used only when the groups are all the same size. It determines a single value that is
the minimum difference between a pair of groups that is needed for the difference to
be significant at a specific alpha level.
2. Scheffe's Test is very conservative. It involves computing an F-Ratio that has a
numerator that is a mean-square that is based on only the two groups being
compared (the denominator is the regular error variance term).
o Example

We look at hypothetical data about the effect of drug treatment on the amount of time (in
seconds) a stimulus is endured. We do an ANOVA following the formal hypothesis testing
steps. Note that the books steps are augmented here to reflect current thinking about using
visualizations to investigate the assumptions underlying the analysis.
1. State the Hypotheses:
The hypotheses, for ANOVA, are:

2. Set the Decision Criterion


We arbitrarily set

3. Gather the Data:


The data are obtained from 60 subjects, 20 in each of 3 different experimental
conditions. The conditions are a Placebo condition, and two different drug
conditions. The independent (classification) variable is the experimental condition
(Placebo, DrugA, DrugB). The dependent variable is the time the stimulus is endured.

Here are the data as shown in ViSta's data report:

The data may be gotten from the ViSta Data Applet. Then, you can do the analysis
that is shown below yourself.

4. Visualize the Data


We visualize the data and the model in order to see if the assumptions underlying the
independent-measures F-test are met. The assumptions are:
1. The observations within each sample must be independent (this assumption
is satisfied by the nature of the experimental design).
2. The populations from which the samples are selected must be normal (the
data and model visualizations can inform us about this).
3. The populations from which the samples are selected must have equal
variance (the data and model visualizations can inform us about this also).
This is called homogeneity of variance.

The data visualization is shown below. The boxplot shows that there is somewhat
more variance in the "DrugA" group, and that there is an outlier in the "DrugB" group.
The Q plots (only the "DrugB" Q-Plot is shown here) and the Q-Q plot show that the
data are normal, except for the outlier in the "DrugB" group.

5. Evaluate the Null Hypothesis


We use ViSta to calculate the observed F-ratio, and the observed probability level.
The report produced by ViSta is shown below. The information we want is near the
bottom:
We note that F=4.37 and p=.01721. Since the observed p < .05, we reject the null
hypothesis and conclude that it is not the case that all group means are the same.
That is, at least one group mean is different than the others.

Here is the F distribution for df=2,57 (3 groups, 60 observations). I have added the
observed F=4.37:

6. Visualize the Model


Finally, we also visualize the ANOVA model to see if the assumptions underlying the
independent-measures F-test are met. The boxplots are the same as those for the
data. The partial regression plot shows that the model is significant at the .05 level of
significance, since the curved lines cross the horizontal line. The residual plot shows
the outline in the "DrugB" group, and shows that the "DrugA" group is not as well fit by
the ANOVA model as the other groups. Here is the model visualization:

You might also like