You are on page 1of 19

Chapter Five

Analysis of Variance
.

1
Analysis of Variance (ANOVA)
• is a procedure to test the hypothesis that several populations from which samples are
drawn have the same mean (it is used to test the equality of several means).

• It makes the inferences abut whether samples are drawn, from the population having
the same mean.

• ANOVA is a technique that enables to test for significance of difference among


more than two sample means.

2
Count…
In order to use ANOVA, we assume the following:
1. All the samples were randomly selected and are independent of one another.
2. The populations are normally distributed. If however, the sample sizes are large
enough, we do not need the assumption of normality.
3. All the population variances are equal.

❖ANOVA is based on a comparison of two different estimates of the variances, σ2, of


population.
❖It compares these two estimates by computing their ratio called F distribution.

3
The two different estimates of the population variances, σ2
1. The variance obtained by calculating the variation within the samples themselves, Mean
Square within (MSW).
2. The variance obtained by calculating the variation among sample means, Mean Square
between (MSB).

• Since both are estimates of σ2, they should be approximately equal in value when, the Ho is
true. If the Ho is not true, these two estimates will differ considerably.

❖ The three steps in ANOVA are:


1. Determine one estimate of the population variance from the variation among sample means
2. Determine a 2nd estimate of the population variance from the variation within the samples
3. Compare these two estimates. If they are approximately equal in value, accept the Ho.
4
Calculating the Variance among the Sample Means – MSB
• The variance among the sample means is called Between Column Variance or Mean
Square between (MSB).
2
2
σ 𝑋−𝑋
Sample variance = 𝑆 =
𝑛−1
• Because of working with sample means and the grand mean, substitute 𝑋 for X, 𝑋ധ
for 𝑋, and K (number of samples) for n then;
2

2

σ 𝑋−𝑋
Variance among sample means = 𝑆𝑋 =
𝐾−1

• In sampling distribution of the mean, the standard error of the mean is calculated as
𝜎
𝜎𝑋 = . Cross multiplying the terms
𝑛
𝟐 𝟐
𝜎 = 𝜎𝑋 𝑛. Squaring both sides 𝝈 = 𝝈𝑿 ∗ 𝒏. 5
Count…
• In ANOVA, we do not have all the information needed to use the above

equation to find σ2. Specifically, we do not know 𝜎𝑋2 . However, we calculate


2
σ 𝑋−𝑋ധ
the variance among the sample means, 𝑆𝑋2 , using 𝑆𝑋2 = .
𝐾−1

• So, substitute 𝑆𝑋2 for 𝜎𝑋2 & calculate an estimate of the population variance is:

𝟐 𝟐
𝟐 ന
σ 𝒏 𝑿−𝑿 ന
𝒏 σ 𝑿−𝑿
MSB =𝝈 = 𝑺𝟐𝑿 ∗ 𝒏 = = , 𝐼𝑓𝑛1 , 𝑛2 , … . . . . 𝑛𝑘 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙.
𝑲−𝟏 𝑲−𝟏

6
If n1 n2 … nk are not equal, which sample size to use?
• n represents the sample size, but which sample size should we use when different
samples have different sizes?
2 𝟐
• We solve this problem by multiplying 𝑋𝑗 − 𝑋ധ by its own appropriate nj, and 𝝈
becomes:
𝟐
𝟐 ന
σ 𝒏𝒋 𝑿𝒋 −𝑿
MSB =𝝈 = .
𝑲−𝟏
• Where:
2
𝜎 = 1st estimate of σ2 based on the variation among sample means (the
Between Column Variance – MSB)
nj = the size of the jth sample, j=1, 2, 3 …. k
𝑋𝑗 = the sample mean of the jth sample
𝑋ധ = the grand mean
K = the number of samples
K-1 = the degrees of freedom associated with SSB,v 7
Calculating the Variance with In the Samples (MSW)
➢ It is based on the variation of the sample observations within each sample.
➢ It is called the within column variance or Mean Square Within (MSW). The sample
𝟐
σ 𝑿−𝑿
variance for each sample is calculated as, 𝑺𝟐 = 𝒏−𝟏
.

• Since, assumed that the variances of the populations from which samples have been
drawn are equal, we could use any one of the sample variances as the second
estimate of the population variance. Statistically, we can get a better estimate of the
population variance by using a weighted average of all sample variances.

8
count…
• The general formula for this second estimate of 𝜎 2 is:
𝟐 σ𝒌 𝟐
𝒊=𝟏 𝒏𝒋 −𝟏 𝑺𝒋 2 𝑛−1 σ𝑘 2
𝑖=1 𝑆𝑗
MSW = 𝝈 = 𝒏𝑻 −𝒌
, If n1, n2,…, nk are equal MSW = 𝜎 = .
𝑘 𝑛−1
• Where:
2
𝜎 = 2nd estimate of 𝜎 2 based on the variation within the samples (the Within
Column Variance – MSB)
nj = the size of the jth sample
nj-1 = degree of freedom in each sample
nT – k = degrees of freedom associated with SSB
𝑆𝑗2 = The sample variance of jth sample
K = the number of samples
nT = Σnj = the total sample size = n1 + n2 + …….. + nk.
9
count…
• The estimate of population variance based on variation that exists between sample means
(MSB) is some what suspect because it is based on the notion that all the populations have the
same mean. That is, the estimate MSB is a good estimate of the σ2 only if Ho is true and all
the populations’ means are equal; μ1 = μ2 = μ3 = ------ = μk.
• If the unknown population means are not equal, and probably are radically different from one

another, then the sample means (𝑋𝑗 ) will most likely be radically different from each other

too. This difference will have a marked effect on MSB. That is to say, the 𝑋𝑗 values will vary
2
a great deal and the 𝑋𝑗 − 𝑋 terms will be large. Thus, if the population means are not all
equal, then the MSB estimate will be large relative to the MSW estimate. That is, is the MSB
is large relative to the MSW, and then the hypothesis that all the population means are equal is
not likely to be true. 10
count…
❖ANOVA compares the two estimates of the variances, σ2 by computing their ratio
called F distribution.
• Ratio of the MSB to the MSW is an F-value that follows an F-probability
distribution.
𝑀𝑆𝐵
𝐹=
𝑀𝑆𝑊

❖ As a result
✓ The denominator and nominator are equal, if Ho is true, the F ratio is nearly
become to 1, then we inclined to accept Ho.
✓ If F ratio becomes larger, we more inclined to reject Ho.
11
THE F-DISTRIBUTION
• Characteristics of F-distribution
1. It is a continuous probability distribution
2. It is unimodal
3. It has two parameters; pair of degrees of freedom, ν1 and ν2
ν1 = the number of degrees of freedom in the numerator of F-ratio; ν1 = k – 1
ν2 = the number of degrees of freedom in the denominator of F-ratio; ν2 = nT - k
4. It is a positively skewed distribution, and tends to get more symmetrical as the
degrees of freedom in the numerator and denominator increase.
𝜈
5. The mean for an F-distribution is 2 , for ν2 > 2; and
𝜈2 −2
2𝜈22 𝜈1 +𝜈2 −2
the standard deviation is for ν2 > 4.
𝜈1 𝜈2 −2 2 𝜈2 −4
12
Example 1

The training director of a company is trying to evaluate three different methods of


training new employees. The first method assigns each to an experienced employee
for individual help in the factory. The second method puts all new employees in a
training room separate from the factory, and the third method uses training films and
programmed learning materials. The training director chooses 18 new employees
assigned at random to the three training methods and records their daily production
after they complete the programs. Below are productivity measures for individuals
trained by each method.

13
Count…
Method 1 Method 2 Method 3
daily production

45 59 41
40 43 37
50 47 43
39 51 40
Grand mean
53 39 52
44 49 37
Total σ 𝑿𝒊 = 271, σ 𝑿𝒊 = 288 σ 𝑿𝒊 = 250
𝑋 𝑋 𝑋 ഥ
𝑋1 =σ 𝑛 𝑖 = 45.17 𝑋2 =σ 𝑛 𝑖 = 48.00 𝑋3 =σ 𝑛 𝑖 = 41.67 𝑋ധ = σ 𝑿𝑗 = 44.94
1 2 3 𝑘

𝑆12 = 30.17 𝑆22 =47.60 𝑆32 =31.07 𝟐


𝑺 𝒋=
ഥ 𝒋 )𝟐
σ(𝑿𝒋𝒊 −𝑿
𝒌−𝟏
𝒏𝟏 = 𝒏𝟐 = 𝒏𝟑

At the 0.05 level of significance, do the three training methods lead to different levels
14
of productivity?
Solution
1. Ho: μ1 = μ2 = μ3
Ha: μ1, μ2, and μ3 are not all equal
2. α = 0.05
ν1 = K – 1= 3 - 1 = 2 and ν2 = nT - k= 18 – 3 = 15
F0.05, 2,15 = 3.68, Reject Ho if sample F > 3.68
3. Sample F
2
σ 𝑛𝑗 𝑋𝑗 −𝑋 6 45.17−44.94 2 + 48.00−44.94 2 + 41.67−44.94 2 120.66
MSB = = = = 60.33
𝐾−1 3−1 2
σ 𝑛𝑗 −1 𝑆12 5 30.17+47.60+31.07 108.84
MSW = = = = 36.28
𝑛𝑇 −𝐾 15 3
𝑀𝑆𝐵 60.33
𝐹= = = 1.663
𝑀𝑆𝑊 36.28
4. Do not reject Ho. There are no differences in the effects of the three training programs
(methods) on employee productivity. 15
Example 2
A department store chain is considering building a new store at one of the four
different sites. One of the important factors in the decision is the annual household
income of the residents of the four areas. Suppose that, in a preliminary study, various
residents in each area are asked what their annual household incomes are. The results
are shown in the accompanying table below. Is there sufficient evidence to conclude
that differences exist in the average annual household incomes among the four
communities? Use α = 0.01.

16
Count…
Area 1 Area 2 Area 3 Area 4
25 32 27 18
27 35 32 23
21 30 48 29 𝑛1 = 6, 𝑛2 = 9,
17 46 25 26 𝑛3=7 , 𝑛4 = 5
29 32 20 42
30 22 12
19 18
51
27
159 294 182 138
𝑿𝟏 = 26.50 𝑿𝟐 = 32.67 𝑿𝟑 = 26.00 𝑿𝟒 = 27.60 ന = 28.63
𝑿

𝑆12 = 26.30 𝑆22 = 107.5 𝑆32 = 136.33 𝑆42 = 81.30

17
Solution
1. Ho: μ1 = μ2 = μ3 = μ4
Ha: μ1, μ2, μ3 and μ4 are not all equal
2. α = 0.01
ν1 = K - 1= 4 - 1 = 3 and ν2 = nT - k= 27 – 4 = 23
F0.01, 3,23 = 4.76, Reject Ho if sample F > 4.76
3. Sample F
2
σ 𝑛𝑗 𝑋𝑗 −𝑋 6 26.5−28.63 2 +9 32.67−28.63 2 +7 26.00−28.63 2 +5 27.60−28.63 2
MSB = =
𝐾−1 4−1
227.84
= = 75.95
3

σ 𝑛𝑗 −1 𝑆1
2
5 26.3 +8 107.5 +6 136.33 +4 81.3 2134.68
MSW = = = = 92.81
𝑛𝑇 −𝐾 27−4 23
𝑀𝑆𝐵 75.95
𝐹 = 𝑀𝑆𝑊 = 92.81 = 0.82
4. Do not reject Ho. No difference exists in the average annual household incomes among the
four communities. 18

You might also like