Professional Documents
Culture Documents
The test statistic for testing H0: μ1 = μ2 = ... = μk is:
and the critical value is found in a table of probability values for the F distribution with (degrees of
freedom) df1 = k-1, df2=N-k. The table can be found in "Other Resources" on the left side of the pages.
In the test statistic, nj = the sample size in the jthgroup (e.g., j =1, 2, 3, and 4 when there are 4
comparison groups),
is the overall mean. k represents the number of independent groups (in this example, k=4), and N
represents the total number of observations in the analysis. Note that N does not refer to a
population size, but instead to the total sample size in the analysis (the sum of the sample sizes in
the comparison groups, e.g., N=n1+n2+n3+n4). The test statistic is complicated because it
incorporates all of the sample data. While it is not easy to see the extension, the F statistic shown
above is a generalization of the test statistic used for testing the equality of exactly two means.
NOTE: The test statistic F assumes equal variability in the k populations (i.e., the population
variances are equal, or s12 = s22 = ... = sk2 ). This means that the outcome is equally variable in each of
the comparison populations. This assumption is the same as that assumed for appropriate use of
the test statistic to test equality of two independent means. It is possible to assess the likelihood
that the assumption of equal variances is true and the test can be conducted in most statistical
computing packages. If the variability in the k comparison groups is not similar, then alternative
techniques must be used.
The F statistic is computed by taking the ratio of what is called the "between treatment" variability to
the "residual or error" variability. This is where the name of the procedure originates. In analysis of
variance we are testing for a difference in means (H 0: means are all equal versus H1: means are not
all equal) by evaluating variability in the data. The numerator captures between treatment variability
(i.e., differences among the sample means) and the denominator contains an estimate of the
variability in the outcome. The test statistic is a measure that allows us to assess whether the
differences among the sample means (numerator) are more than would be expected by chance if
the null hypothesis is true. Recall in the two independent sample test, the test statistic was
computed by taking the ratio of the difference in sample means (numerator) to the variability in the
outcome (estimated by Sp).
The decision rule for the F test in ANOVA is set up in a similar way to decision rules we established
for t tests. The decision rule again depends on the level of significance and the degrees of freedom.
The F statistic has two degrees of freedom. These are denoted df 1 and df2, and called the numerator
and denominator degrees of freedom, respectively. The degrees of freedom are defined as follows:
where k is the number of comparison groups and N is the total number of observations in the
analysis. If the null hypothesis is true, the between treatment variation (numerator) will not exceed
the residual or error variation (denominator) and the F statistic will small. If the null hypothesis is
false, then the F statistic will be large. The rejection region for the F test is always in the upper (right-
hand) tail of the distribution as shown below.
Rejection Region for F Test with a =0.05, df1=3 and df2=36 (k=4, N=40)
For the scenario depicted here, the decision rule is: Reject H 0 if F > 2.87.
How does the ANOVA procedure compute a p-value? This section shows you the formulas
and carries through the computations for the example with fat for frying donuts.
Remember, long ago in a galaxy called Descriptive Statistics, how the variance was defined:
find the mean, then for each data point take the square of its difference from the mean. Add
up all those squares, and you have SS(x), the sum of squared deviations in x. The variance
was SS(x) divided by the degrees of freedom n−1, so it was a kind of average or mean
squared deviation. You probably learned the shortcut computational formulas:
SS(x) = ∑x² − (∑x)²/n or SS(x) = ∑x² − nx̅ ²
and then
s² = MS(x) = SS(x)/df where df = n−1
In 1-way ANOVA, we extend those concepts a bit. First you partition SS(x) into between-
treatments and within-treatments parts, SSBand SSW. Then you compute the mean square
deviations:
MSB is called the between-treatments mean square, between-groups variance, or factor MS. It
measures the variability associated with the different treatment levels or different values of the factor.
MSW is called the within-treatments mean square, within-group variance, pooled variance, or error
MS. It measures the variability that is not associated with the different treatments.
Finally you divide the two to obtain your test statistic, F = MSB/MSW, and you look up the p-
value in a table of the F distribution.
(The F distribution is named after “the celebrated R.A. Fisher” (Kuzma & Bohnenblust
2005, 176). You may have already seen the F distribution in computing a different ratio of
variances, as part of testing the variances of two populations for equality.)
There are several ways to compute the variability, but they all come up with the same
answers and this method in Spiegel and Stephens 1999 pages 367–368 is as easy as any:
SS df MS F
SSW = ∑(nj−1)sj²
SStot = SSB + SSW
where
r is the number of treatments.
nj, x̅ j, sj for each treatment are the sample size, sample mean, and sample standard
deviation.
N is the total sample size and x̅ = ∑x/N is the overall sample mean or “grand mean”.x̅ can
also be computed from the sample means by
x̅ = ∑njx̅ j/N
You begin with the treatment means x̅ j={72, 85, 76, 62} and the overall mean x̅ =73.75, then
compute
SSB = (6×72²+6×85²+6×76²+6×62²) − 24×73.75² = 1636.5
MSB = 1636.5 / 3 = 545.4
The next step depends on whether you know the standard deviations s j of the samples. If
you don’t, then you jump to the third row of the table to compute the overall sum of
squares:
∑x² = 64² + 72² + 68² + ... + 70² + 68² = 134192
SStot = ∑x² − Nx̅ ² = 134192 − 24×73.75² = 3654.5
Then you find SSW by subtracting the “between” sum of squares SSB from the overall sum of
squares SStot:
SSW = SStot−SSB = 3654.5−1636.5 = 2018.0
MSW = 2018.0 / 20 = 100.9
Now you’re almost there. You want to know whether the variability between treatments,
MSB, is greater than the variability within treatments, MSW. If it’s enough greater, then you
conclude that there is a real difference between at least some of the treatment means and
therefore that the factor has a real effect. To determine this, divide
F = MSB/MSW = 5.41
This is the F statistic. The F distribution is a one-tailed distribution that depends on both
degrees of freedom, dfB and dfW.
At long last, you look up F=5.41 with 3 and 20 degrees of freedom, and you find a p-
value of 0.0069. The interpretation is the usual one: there’s only a 0.0069 chance of getting
an F statistic greater than 5.41 (or higher variability between treatments relative to the
variability within treatments) if there is actually no difference between treatments. Since
the p-value is less than α, you conclude that there is a difference.
Estimating Individual Treatment Means
Usually you’re interested in the contrast between two treatments, but you can also
estimate the population mean for an individual treatment. You do use a t interval, as you
would when you have only one sample, but the standard error and degrees of freedom are
different (NIST 2012 section 7.4.3.6).
Example: Refer back to the fats for frying donuts. Estimate the population mean for Fat 2
with 95% confidence? In other words, if you fried a great many batches of donuts in Fat 2,
how much fat per batch would be absorbed, on average?
Computation by Hand
Begin by finding the critical t. Since 1−α = 0.95, α/2 = 0.025. You therefore need
t(0.025,20). You can find this from a table:
t(0.025,20) = 2.0860
Now you’re ready to finish the confidence interval. The margin of error is
E = t(α/2,df) · √(MSW/nj) = 2.0860×4.1008 = 8.5541
Therefore the confidence interval is
μ2 = 85 ± 8.6 g (95% confidence)
or
76.4 g ≤ μ2 ≤ 93.6 g (95% confidence)
Conclusion: You’re 95% confident that the true mean amount of fat absorbed by a batch of
donuts fried in Fat 2 is between 76.4 g and 93.6 g.
TI-83/84 Procedure
Your TI calculator is set up to do the necessary calculations, but there’s one glitch because
the degrees of freedom is not based on the size of the individual sample, as it is in a regular
t interval. So you have to “spoof” the calculator as follows.
Press [STAT] [◄] [8] to bring up the TInterval screen. First I’ll tell you what to enter;
then I’ll explain why.
All this fakery achieves the desired result: the confidence interval matches the one that you
would have if you computed it by hand.
Conclusion: You’re 95% confident that the true mean amount of fat absorbed by a batch of
donuts fried in Fat 2 is between 76.4 g and 93.6 g.
Lowry 1988 chapter 14 part 2 mentions a measure that is usually neglected in ANOVA: η².
(η is the Greek letter eta, which rhymes with beta.)
η² = SSB/SStot, the ratio of sum of squares between groups to total sum of squares. For
the donut-frying example,
η² = SSB/SStot = 1636.5 / 3654.5 = 0.45
What does this tell you? η² measures how much of the total variability in the dependent
variable is associated with the variation in treatments. For the donut example, η² = 0.45
tells you that 45% of the variability in fat absorption among the batches is associated with
the choice of fat.
References
1. F-test for testing equality of variance is used to test the hypothesis of the equality of
two population variances. The height example above requires the use of this test.
2. F-test for testing equality of several means. The test for equality of several means is carried
out by the technique called ANOVA.
For example, suppose that an experimenter wishes to test the efficacy of a drug at three levels:
100 mg, 250 mg and 500 mg. A test is conducted among fifteen human subjects taken at random,
with five subjects being administered each level of the drug.
To test if there are significant differences among the three levels of the drug in terms of efficacy,
the ANOVA technique has to be applied. The test used for this purpose is the F-test.
3. F-test for testing significance of regression is used to test the significance of the regression
model. The appropriateness of the multiple regression model as a whole can be tested by this
test. A significant F value indicates a linear relationship between Y and at least one of the Xs.
Assumptions
Irrespective of the type of F-test used, one assumption has to be met: the populations from which
the samples are drawn have to be normal. In the case of the F-test for equality of variance, a second
requirement has to be satisfied in that the larger of the sample variances has to be placed in the
numerator of the test statistic.
Like t-test, F-test is also a small sample test and may be considered for use if sample size is < 30.
Deciding
In attempting to reach decisions, we always begin by specifying the null hypothesis against a
complementary hypothesis called the alternative hypothesis. The calculated value of the F-test with
its associated p-value is used to infer whether one has to accept or reject the null hypothesis.
All statistics software packages provide these p-values. If the associated p-value is small i.e. (< 0.05)
we say that the test is significant at 5% and we may reject the null hypothesis and accept the
alternative one.
On the other hand if the associated p-value of the test is > 0.05, we should accept the null
hypothesis and reject the alternative. Evidence against the null hypothesis will be considered very
strong if the p-value is less than 0.01. In that case, we say that the test is significant at 1%.
Easily prepare for your Six Sigma Black Belt or Green Belt
Certification Exam
Sign Up Now
Search
Search for:
Skip to content
To find out whether the two samples drawn from the normal
population having the same variance. In this case F ratio is
In both the cases σ12 > σ22 , S12 > S22 in other words larger estimate of
variance always be in numerator and smaller estimate of variance in
denominator
F = MS Between / MS Within
If calculated F value is greater than the appropriate value of the F
critical value (found in a table or provided in software), then the null
hypothesis can be rejected. (helpful in ANOVA)
From F table we can find F critical values that gives us a certain area
of to the right. From the above table the area to the right of 4.88 is
0.05 and area to the right of 3.37 is 0.100. So, the area to the right
of 1.5 from the graph must be more than 0.100. But we can find
exact p-value using any statistical tool or excel very easily.
Afeef Ajmalsays:
January 9, 2018 at 10:45 am
Reply
Reply
to calculate with the variances … the highest goes first always (in
the numerator)?
Reply
Ted Hessingsays:
November 18, 2019 at 11:34 am
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment
Name*
Email*
Website
Want to Pass Your Six Sigma Exam the First Time through?
If so, you need a study guide. I’ll show you how and what to study
for maximum results.
Back to Top
Psychological Statistics
1. Preview of ANOVA
Goals of ANOVA
Conceptually, the goal of ANOVA is to determine the amount of variability in groups of data,
and to see if the variability is greater between groups than within groups.
Coverage
Example:
This is hypothetical data from an experiment examining learning performance under three
temperature conditions. There are three separate samples, with n=5 in each sample. These
samples are from three different populations of learning under the three different
temperatures. The dependent variable is the number of problems solved correctly.
Independent Variable:
Temperature (Farenheit)
0 4 1
1 3 2
3 6 2
1 3 0
0 4 0
Hypotheses:
In statistical terms, we want to decide between two hypotheses: the null hypothesis (Ho),
which says there is no effect, and the alternative hypothesis (H1) which says that there is an
effect.
In symbols:
For two groups the sample statistic is the difference between the two sample means, and in
the two-tail test the population parameter is zero. So, the generic formula for the two-group,
two-tailed t-test can be stated as:
(We usually refer to the estimated standard error as, simply, the standard error).
But variance is difference: It is the average of the differences of a set of values from their
mean.
The F-ratio uses variance because ANOVA can have many samples of data, not just two as in
T-Tests. Using the variance lets us look at the differences that exist between all of the many
samples.
1. The numerator: The numerator (top) of the F-ratio uses the variance between the
sample means. If the sample means are all clustered close to each other (small
differences), then their variance will be small. If they are spread out over a wider
range (bigger differences) their variance will be larger. So the variance of the sample
means measures the differences between the sample means.
2. The denominator: The denominator (bottom) of the F-ratio uses the error variance,
which is the estimate of the variance expected by chance. The error variance is just
the square of the standard error. Thus, rather than using the standard deviation of
the error, we use the variance of the error. We do this so that the denominator is in
the same units as the numerator.
o The logic of ANOVA
We demonstrate the logic of ANOVA by using the set of data we introduced above. These are
the data concerning learning under three different temperature conditions. Once again, here
are the data:
Independent Variable:
Temperature (Farenheit)
0 4 1
1 3 2
3 6 2
1 3 0
0 4 0
Total Variability:
Our measure of the amount of variability is simply the variability of all of the data: We
combine all of the data in the experiment together and calculate its variability.
Once we have defined our measure of the total amount of variability, we wish to explain
where it comes from: Does it come from the experimental treatment, or is it just random
variation? We answer this question by analyzing the sources of variability:
Between-Treatments Variability:
Looking at the data above, we can clearly see that much of the variability is due to the
experimental treatments: The scores in the 70-F condition tend to be much higher than those
in the other conditions: The mean for 70-F is higher than for 50-F and 90-F. Thus, we can
calculate the variability of the means to measure the variability between treatments.
Within-Treatment Variability:
Mean Square Within: The within-treatment variability measure is a variance measure that
summarizes the three within-treatment variances. It is called the mean square within. For
these data:
The heart of ANOVA is analyzing the total variability into these two components, the mean
square between and mean square within. Once we have analyzed the total variability into its
two basic components we simply compare them. The comparison is made by computing the
F-ratio. For independent-measures ANOVA the F-ratio has the following structure:
or, using the vocabulary of ANOVA,
(Note: The book says 11.28, but this is a rounding error. The correct value is 11.25.)
1. The numerator and denominator of the ratio measure exactly the same variance
when the null hypothesis is true. Thus: when Ho is true, F is about 1.00.
2. F-ratios are always positive, because the F-ratio is a ratio of two variances, and
variances are always positive.
Given these two factors, we can sketch the distribution of F-ratios. The distribution piles up
around 1.00, cuts off at zero, and tapers off to the right.
Degrees of Freedom: Note that the exact shape depends on the degrees of freedom of the
two variances. We have two separate degrees of freedom, one for the numerator (sum of
squares between) and the other for the denominator (sum of squares within). They depend
on the number of groups and the total number of observations. The exact number of degrees
of freedom follows these two formulas (k is the number of groups, N is the total number of
observations):
Here are two examples of F distributions. They differ in the degrees of freedom:
3. For the data about learning under different termperature condtions (discussed
above), the df(between)=3-1=2, and the df(within)=15-3=12. We can look up the
critical value of F (.05) and find that it is 3.88. The observed F=11.28, so we reject the
null hypothesis. The F-ratio distribution is:
4. For data where df=5,30 (6 groups, 36 observations), the F-ratio distribution is:
Conceptually, the goal of ANOVA is to determine the amount of variability in groups of data,
to determine where it comes from, and to see if the variability is greater between groups than
within groups.
We can demonstrate how this works visually. Here are three possible sets of data. In each
set of data there are 3 groups sampled from 3 populations. We happen to know that each set
of data comes from populations whose means are 15, 30 and 45.
With each visualization we present the corresponding F-Test value and its p value.
F=854.24, p<.0001.
5. For the second example the outer two populations still have a variance of 4, but the
middle one has a variance of 64, so it overlaps the outer two (though they are still
fairly well separated).
F=11.66, p<.0001.
6. For the third example the three populations have a variance of 64, so they all overlap
a lot.
F=1.42, p=.2440.
Note that in these examples, the means of the three groups haven't varied, but the variances
have. We see that when the groups are well separated, the F value is very significant. On the
other hand, when they overlap a lot, the F is much less significant.
When the null hypothesis is rejected you conclude that the means are not all the same. But
we are left with the question of which means are different:
Post Hoc tests help give us an answer to the question of which means are different.
Post Hoc tests are done "after the fact": i.e., after the ANOVA is done and has shown us that
there are indeed differences amongst the means. Specifically, Post Hoc tests are done when:
A Post Hoc test enables you to go back through the data and compare the individual
treatments two at a time, and to do this in a way which provides the appropriate alpha level.
T-Tests can't be used: We can't do this in the obvious way (using T-Tests on the various
pairs of groups) because we would get too "rosy" a picture of the significance (for reasons I
don't go into). The Post Hoc tests gaurantee we don't get too "rosy" a picture (actually, they
provide a picture that is too "glum"!).
Two Post Hoc tests are commonly used (although ViSta doesn't offer any Post Hoc tests):
1. Tukey's HSD Test (thats HSD for Honestly Significant Difference). This test can be
used only when the groups are all the same size. It determines a single value that is
the minimum difference between a pair of groups that is needed for the difference to
be significant at a specific alpha level.
2. Scheffe's Test is very conservative. It involves computing an F-Ratio that has a
numerator that is a mean-square that is based on only the two groups being
compared (the denominator is the regular error variance term).
o Example
We look at hypothetical data about the effect of drug treatment on the amount of time (in
seconds) a stimulus is endured. We do an ANOVA following the formal hypothesis testing
steps. Note that the books steps are augmented here to reflect current thinking about using
visualizations to investigate the assumptions underlying the analysis.
1. State the Hypotheses:
The hypotheses, for ANOVA, are:
The data may be gotten from the ViSta Data Applet. Then, you can do the analysis
that is shown below yourself.
The data visualization is shown below. The boxplot shows that there is somewhat
more variance in the "DrugA" group, and that there is an outlier in the "DrugB" group.
The Q plots (only the "DrugB" Q-Plot is shown here) and the Q-Q plot show that the
data are normal, except for the outlier in the "DrugB" group.
Here is the F distribution for df=2,57 (3 groups, 60 observations). I have added the
observed F=4.37: