You are on page 1of 33

ANALYSIS OF VARIANCE

The F Distribution
 The probability distribution will be used

here is the F distribution. It was named


to honour Sir Ronald Fisher, one of the
founders of modern-day statistics.
 This probability distribution is as the

distribution of the test statistic for


several situations.
 It is used to test
 whether two samples are from populations
having equal variances
 when we want to compare several population
means simultaneously
 The simultaneous comparison of
several population means is called
analysis of variance ( ANOVA).
 What are the characteristics of the F
distribution?

 .There is a “family” of F distribution. A


particular member of the family is determined
by two parameters: the degrees of freedom in
the numerator and the degrees of freedom in
the denominator.
 There is one F distribution for the
combination of 29 degrees of freedom in
the numerator and 28 degrees of freedom
in the denominator. There is another F
distribution for 19 degrees of freedom in
the numerator and 6 degrees of freedom in
the denominator. The shape of the curves
changes as the degrees of freedom change.
 The F distribution is continuous. This means
that it can assume an infinite number of values
between 0 and plus infinity.
 The F distribution cannot be negative. The
smallest value F can assume is 0.
 It is positively skewed. The long tail of the
distribution is to the right –hand side. As the
number of degrees of freedom increases in both the
numerator and denominator the distribution
approaches a normal distribution.
 It is asymptotic. As the value of X increase,
the F curve approaches the X-axis but never
touches it. This is similar to the behaviour of
the normal distribution
Comparing two Population variances
 The F distribution is also used to test the
hypothesis that the variance of one normal
population equals the variance of another
normal population.
 The null hypothesis is that the variance of one
normal population, 12, equals the variance of
the other normal population, 22 . The
alternate hypothesis could be that the
variances differ.
 In this case the null hypothesis and the alternate
hypothesis are:

 H 0 :  12   22
H1 :  12   22
 To conduct the test, a random sample is selected
of n1 observations from one population, and a
sample of n2 observations from the second
population.
 TEST STATISTIC FOR COMPARING TWO
VARIANCES:

 If the null hypothesis is true, the test statistic follows the


F distribution with n1 -1 and n2 –1 degrees of freedom.
 In order to reduce the size ofsthe
2 table of
F  1
critical values, the larger sample
2 variance is
placed in the numerator; hence,
s2 the tabled F
ratio is always larger than 1.00.
 Thus, the right –tail critical value is the
only one required. The critical value of
F for a two-tailed test is found by
dividing the significance level in half
( /2 ) and then referring to the
appropriate degrees of freedom in
Appendix G.
Example:

 The BRTC is considering two routes of going


from Gulistan to the Dhaka International airport.
They want to study the time it takes to drive to
the airport using each route and then compare
the results. They collected the following sample
data, which is reported in minutes. Using the .10
significance level, is there a difference in the
variation in the driving times using the two
routes?
EXAMPLE
Route 1 Route 2
52 59
67 60
56 61
45 51
70 56
54 63
64 57
65
ANOVA Page# 386

 Another use of the F distribution is the


analysis of variance (ANOVA) technique in
which we compare three or more population
means to determine whether they could be
equal .
Assumptions
 The populations are normally

distributed
 The populations have equal standard

deviations
 The samples are selected independently.

The population is independent.


When these conditions are met, F is used as
the distribution of the test statistic.
The ANOVA Test
Some terms are to be understood:
 TOTAL VARIATION: The sum of the squared
differences between each observation and the
overall mean.
 TREATMENT: The term treatment is used to
identify the different populations being
examined. A treatment is a source of variation.
 Total variation is divided into: Treatment
variation and random variation.
 TREATMENT VARIATION: The sum of
the squared differences between each
treatment mean and the overall mean.
 RANDOM VARIATION: The sum of the

squared differences between each


observation and its treatment mean.
SKETCH OF ANOVA TABLE
 Go to slide 25
EXAMPLE.Pae #456

 Clean All is a new all-purpose cleaner


being test marketed by placing displays
in three different locations within
various supermarkets. The number of
12-ounce bottles sold from each
location within the supermarket is
reported below.
Dhanmondi Gulshan Banani
18 12 26
14 18 28
19 10 30
17 16 32
At the .05 significance level, is there a
difference in the mean number of
bottles sold at the three locations?
INFERENCES ABOUT PAIRS OF
TREATMENT MEANS

In our previous example we may want to know:


Between which groups do the treatment means
differ?
We will use confidence intervals to answer this
question. Is there enough disparity to justify the
conclusion that there is a significant difference in
the mean number of bottles sold at the two
locations?
The t distribution is used as the basis for this test.
One of the assumptions of ANOVA is that the
population variances are the same for all
treatments. This common population value is the
mean square error, or MSE, and is determined
by SSE/(n-k).
 A confidence interval for
the difference between
two populations is
found by:
CONFIDENCE 1 1
( X1  X 2 )  t MSE (  )
INTERVALFOR THE n1 n2
DIFFERENCE IN
TREATMENT MEANS
EXAMPLE
 Professor X had students in his marketing class
rate his performance as Excellent, Good, fair,
or Poor. A graduate student collected the
ratings. The rating (i.e., the treatment) a
student gave the professor was matched with
his or her course grade, which could range
from 0 to 100. The sample information is given
below. Is there a difference in the mean score
of the students in each of the four rating
categories? Use the .01 significance level.
Graduation Grades
Excellent Good Fair Poor
94 75 70 68
90 68 73 70

85 77 76 72
80 83 78 65
88 80 74
68 65
65
Solution:
 H0: 1  2  3  4
 H1: Not all the treatment means are the same.
[The mean scores are not all equal].
 =.01
 The test statistic follows the F distribution.
 Degrees of freedom in the numerator = k- 1= 4-1= 3
 Degrees of freedom in the denominator = n- k= 22-4=
18
 The critical value is 5.09. So the decision rule is to reject
Ho if the computed value of F exceeds or equals 5.09.
 It is convenient to summarize the
calculations of the F statistic in an
ANOVA Table. The format of an ANOVA
table is as follows:
ANOVA Table

Source of Sum of Degrees


Variation Square of Mean Squares F
s Freedo
m
Treatment SST k– 1 SST/(k-1)=MST MST/
s MSE
Error SSE n–k SSE/(n-k)=MSE
Total SS n–1
total
 Key: SST= Variation due to the treatments;
SSE= Variation within the treatments
 We start the process by finding SS total.
 SUM OF SQUARES TOTAL: SS Total
 X
2

 X 2

n
SUM OF SQUARES TREATMENT, SST
  X 
2
T 2
SST    c
 
Where,  nc  n
Tc is the column total for each treatment
nC is the number of observations ( sample size) for
each treatment
TWO-WAY ANALYSIS OF VARIANCE
Example:
 BRTC is expanding bus service from Motijheel
to the Dhaka International Airport. There are
four routes. BRTC conducted several tests to
determine whether there was a difference in
the mean travel times along the four routes.
Because there will be many different drivers,
the test was set up so each driver drove
along each of the four routes. Below is the
travel time, in minutes, for each driver –
route combination
Source of Sum of df Mean Square F
Variation Square
s
Treatme SST k-1 SST/k-1=MST MST/
nt MSE
Block SSB b-1 SSB/b-1=MSB MSB/
MSE
Error SSE (k-1)(b- SSE/(k-1)(b-1)
1) =MSE
Total SS n-1
TOTAL
SUM OF SQUARES BLOCKS

B
SSB   ( ) 
tX
2 2

nr n
EXAMPLE

Travel Time from Motijheel to Airport


( Minutes)
Driver
Route – 1 Route - 2 Route - 3 Route - 4
Abul 18 20 20 22
Babul 21 22 24 24
Ajit 20 23 25 23
Peter 25 21 28 25
Shantu 26 24 28 25
At the .05 significance level, is there a difference in the mean travel time along the four
routes and by drivers?
ANOVA TABLE

Source of Sum of df Mean F


Variation Squares Square
Treatments 32.4 3 10.80 10.80/2.383
= 4.53
Blocks 78.2 4 19.550 19.550/2.383
=8.20
Error 28.6 12 2.383

Total 139.2

You might also like