You are on page 1of 2

Lecture 2:

ANOVA (Analysis of Variance)

Already consider comparing 1 population mean with a specific value, comparing two population means.
The two tests: using t-test.

Assume we have to compare more than two population means (of k groups from one factor), if we use
the t test for each pair of two means, we have to conduct C(k,2)=k(k-1)/2. Then the overall type 1 error
is not kept at the value of α, it will be bigger.

In this case, we will not use the t-test tow identify whether all of the group means are the same. Use use
F-test to do that Analysis of variance.

Example: We want to test for the effect of the factor educational level (including three groups: lower
than High school level, Lower than University level, from University level up) on Income. In other word,
we want to look at the effect of a qualitative variable (factor) on a quantitative variable. The method is
to compare the means of three groups: Ho: µ1= µ2= µ3

If the three means are equal, there is no effect of the factor. Otherwise, there is the effect.

Look at the data set “Stress” in folder “ch 13 ANOVA”

Real Estate Architec Stockbroke


Agent t r
81 43 65
48 63 48
68 60 57
69 52 91
54 54 70
In this data, we want to consider the effect of the occupations on the level of stress. They took a survey
on the sample of 15 observations for each occupation, the total number of obs is

n=n1+n2+n3=15+15+15=45. Let µ1, µ2 and µ3 are the mean of stress of three occupations.

Let xij be the observation j of the group i. In general, assume we have a factor with k group

Let be the overall mean (grand mean). That is the mean of alll observations.

Let be the mean of group i: we have k the group means

xij - = (xij - )+ ( - )
Total variation of each observation around the grand mean is divided into two variations: the first
variation is the variation within each group (within group variation). The second variation is the variation
of each group around the grand mean (between group variation).

We now can compare the k group means by comparing between group variation with the within group
variation. The larger of between group variance copare to within group variance, the higher possibility
that the means of k groups are different.

To make the total variation of all observations compared, we will square the variation and take sum.
Finally, we have the following relationship:

SST=SSW+SSG

In which:

SST: Total Sum of Squares (Total variation)

SSW: Within Group Sum of Squares

SSG: Between group Sum of Squares

Now we can compare SSG and SSW by taking their ratio. Before that, we take the average of each of SS

We have MSW=SSW/(n-k); MSG=SSG/(k-1): they are the means of two SS.

We calculate the F statistic: F=MSG/MSW ~Fisher (k-1, n-k)

Conclusion: Reject the hypothesis Ho (means are equal) if F>F α(k-1,n-k)

 There is the effect of factor on the dependent variable.

You might also like