You are on page 1of 11

# Analysis of Variance (ANOVA

)
• The analysis of variance, frequently referred to as ANOVA is a statistical technique specially designed to test whether the means of more than two quantitative populations are equal. The analysis is capable of fruitful application to a diversity of practical problems. Basically, it consists of classifying and cross classifying statistical results and testing whether the means of a specified classification differs significantly. !n this way it is determined whether the given classification is important in affecting the results. "or e#ample, the output of a given process might be cross classified by machines and operators \$each operator having wor%ed on each machine. "rom this classification it could be determined whether the mean qualities of it could be determined whether the mean qualities of outputs of various machines differed significantly. Also it could independently be determined whether the mean qualities of outputs of the various machines deferred significantly. &uch a study would help us in determining whether uniformity in quality of outputs could be increased by standardi'ing the procedures of the operators \$say through special training( and li%ewise whether it could be increased by standardi'ing the machines. Analysis of variance thus enables us to analyse the total variance of our data into components which may be attributed to various )sources* or )causes* of variation. The analysis of variance originated in agrarian research and its language is thus loaded with agricultural terms li%e +bloc%s, \$referring to land( and +treatments, \$referring to populations or samples(.

• •

• •

• •

Assumption in ANOVA
• ANOVA is based on the following assumptions \$i( Normality - The universe from which the sample is drawn is normally distributed.

ypothesis* is 2 2 2 H 0 : σ 1 = σ 2 = . !t measures difference from one group to another...The samples drawn from the universe is random and independent of each other.The )variances* of the population from which the samples have been ta%en do not significantly differ from one another.. unless the universe are highly s%ewed.owever.. The variance between samples ta%es into account the random variations from observation to observation.\$ii( Homogeneity . the technique of analysis of variance has been classified as \$i( \$ii( One way classification and Two way classification One"#ay \$lassification • • !n a one way classification.. these assumptions may or may not hold good.. \$ii( 2 . = σ n Independence of Error . \$iii( • !n the problems faced in actual life. . the data are classified according to one criterion The null hypothesis is H 0 = µ1 = µ 2 = µ3 = ... %teps in \$arrying out Analysis (I) \$alculate Variance &etween the %amples \$i( The variance between samples \$groups( measures the difference between the sample mean of each group and the overall mean weighted by the number of observations in each group.µ k • !t means that the arithmetic means of populations from which )/* samples were randomly drawn were equal to one another.. minor differences in the assumptions do not affect the validity of +" test. echniques of Analy!ing Variance • "or the sta%e of clarity. = µ k and H i = µ1 ≠ µ 2 ≠ µ 3 ≠ .. !n other words +Null .

.. − − 3 . etc. Ta%e the deviations of the various items in a sample from the mean values of respective samples.e... i. if there are 4 samples. where % is the number of samples. Thus. 0alculate the grand average X pronounced as )1 double bar*... the steps in calculating variance between samples will be \$a( \$b( 0alculate the mean of each sample i.e. X 1 .. X 2 . etc. &quare these deviations and obtain the total which will give sum of the squares between the samples2 and 3ivide the total obtained in step \$d( by the degree of freedom will be one less than the number of samples. X 2 . The variance within samples \$groups( measures variability around mean of each group. then the degree of freedom will be 4 567 or υ = k − 1 . &teps ta%en in calculating the variance within the samples are as follows \$i( \$ii( 0alculate the mean value of each sample X 1 ...... !t is denoted by &&8.. − − \$c( \$d( \$e( Ta%e the difference between the means of the various samples and the grand average. \$alculation of Variance within samples • • • The variance \$or sum of squares( within samples measures these inter sample differences due to chance only. !ts value is obtained as follows X = X 1 + X 2 + X 3 + .. N 1 + N 2 + N 3 + .\$iii( \$iv( The sum of the squares between samples is denoted by &&0 "or calculating variance between the samples we ta%e the total of the square of the deviations of the means of various samples from the grand average and divide this total by the degree of freedom.

by the degree of freedom.e. !t is customary to summarise calculations for sum of squares.n . !t is shown as Analysis of variance (ANNOVA) a&le One"#ay \$lassification (odel • • • %ources of Variation Between &ample <ithin &amples Total %% (%um of %quares) &&0 &&8 &&T ()egree of *reedom) υ υ1 = c − 1 υ1 = n − c n 5 (% ((ean %quare) :&0 6 &&0. together with the r numbers of degrees of freedom and mean squares in a table called +Analysis of variance table.\$iii( \$iv( &quare these deviations and obtain the total which give the sum of the square within the samples. the difference is ta%en as not significant and may have arisen due to fluctuation of sampling. where % refers to the number of samples and +n.5 :&8 6 &&8. The degree of freedom is obtained by deducting from the total number of items.c Variance 'atio of * (%\$ (%E <here &&T = &&0 = Total sum of square of variations &um of square between samples \$columns( 4 . the difference in the sample means is ta%en to be significant. !f the calculated value of " is greater than the table value. On the other hands. the number of observations.c . the number of samples i. and 3ivide the total obtained in step \$iii(.. if the calculated value of " is less than table value. \$alculation of 'atio *= -etween " collumn variance #ithin " column variance %ym&olically *= %. υ = n − k . %+ 2 2 • 0ompare the calculated value of " for the degree of freedom at a certain critical level \$generally ta%en to be 9 percent level of significance(..

&&8 = :&0 = &:8 = E. The mean of the sample 5 is C but the grand mean is 55. The results are given below.+ %ample 1 /1 57 C 5A 5D 59 52 . 5 . thus the difference and its square is ta%en.0 otal X @ @ @ @ @ 3 3 @ @ @ @ @ 3 3 @ @ @ @ @ 3 3 ∴7rand (ean X1 + X 2 + X 3 + X 4 N 9 + 10 + 12 + 13 = = 11 20 X = Variance -etween %amples • • To obtain the variation between samples. calculate the square of the deviation of various samples from the grand average.4 %ample 0 /0 5? 5A 5D D ? 54 . @ @ @ @ @ 5A 55 C 54 4 @ @ @ @ @ \$ 5? 5A 5D D ? @ @ @ @ @ ) 57 C 5A 5D 59 A ? 5B 5A ? E %olution %ample .ample • &um of square within samples \$>ows( :ean sum of squares between samples :ean sum of square within samples To assess the significance of possible variation in performance in a certain test between the grammer schools of a city. ? 5B 5A ? E 12 6 %ample + /+ 5A 55 C 54 4 24 . :a%e an analysis of variance of data. a common test was given to a number of students ta%en at random from the senior fifth class of each of the four schools concerned. /.

7 ( 4 −1) 3 ∴:ean the square between the samples of \$Because the df here is 7( Variance within %amples • . • %ample . for first sample. the mean is C. >epeating the procedure in the ne#t samples. and the grand mean is 55 the difference of 5B and 55 is ta%en and is squared. Thus.• • Fi%ewise for sample A. (X 1 −X (9 ) 4 4 4 4 4 +4 ) 2 (X 2 −X (10) 5 5 5 5 5 2 ) 2 (X 3 −X (12) 5 5 5 5 5 2 ) 2 (X 4 −X (13) 4 4 4 4 4 +4 ) 2 ∴&um of the square between samples 6 AB G 9 G 9 G AB 6 9B sum of 50 50 = = = 16. the mean is 5B.5 (X 1 −X 5 5 C 5 4 ) %ample + /+ 5A @ 55 @ C @ 2 (X 2 −X 4 5 5 5D 7D ) %ample 0 /0 5? @ 5A @ 5D @ D ? @ @ . ? 5 B 5 A ? E @ @ @ @ @ . /. we get the following table %ample + @ @ @ @ @ 3 3 %ample 0 @ @ @ @ @ 3 3 %ample 1 @ @ @ @ @ 3 3 %ample .41 2 (X 3 −X 7D B 5D 7D 5D ) %ample 1 /1 57 @ C @ 5A @ 5D @ 59 @ 04 2 (X 4 −X B 5D 5 C 4 ) 2 54 @ 4 @ 28 ∴Total sum of squares within the samples 6 5D G 9? G 5B4 G 7B 6 AB? 6 . The squared deviations are given in the following tables. so we ta%e the deviations from respective items of the sample and so on.ere we find the sum of the squares is the deviation of various items in a sample from the mean values of the respective samples.

285 Variance within %amples 13 The table value of " or υ1 = 3 and υ2 = 16 at 9 percent level of significance is 7. we can say that the samples could have come from same universe. %ample + /+ 5A 55 C 54 4 9B \$5B ( @ @ @ @ @ @ @ %ample 0 /0 5? 5A 5D D ? DB \$5A ( @ @ @ @ @ @ @ %ample 1 /1 57 C 5A 5D 59 D9 \$57 ( @ @ @ @ @ @ @ %ample . Now all the above results are tabulated as %um of %quares 9B AB? A9? )egree of *reedom 7 5D 5C (ean %quare 5D. when we add the sum of square between samples and sum of squares within samples.A4. our calculation is correct. The total variation is calculated by ta%ing the square of the deviations of each item from the grand average. Thus. /. ? 5B 5A ? E 49 \$C( • • • • • @ @ @ @ @ @ @ (X 1 −X C 5 5 C 5D 7D \$55( ) 2 (X 2 −X 5 B 4 C 4C D7 \$55( ) 2 (X 3 −X 4C 5 A9 A9 C 5BC \$55( ) 2 (X 4 −X 4 4 5 A9 5D 9B \$55( ) 2 ∴Total sum of squares 6 \$7D G D7 G 5BC G 9B( 6 A9? The 3egree of "reedom 6 AB .7 = = 1. we get the same total 9B G AB? 6 A9?. The calculated value of " is less than the table value. hence the difference in the mean values of the sample is not significant. Analysis of Variance9 wo"#ay \$lassification (odel 7 .∴:ean sum of square within the samples = 208 208 = = 13 20 − 4 16 • !t is advisable to chec% up the calculations by finding out the total variation.E 57. Thus.B %ource of Variation Between &amples <ithin &amples Total • • ∴* = Variance -etween %amples 16.5 6 5C Thus.

or location of the store or the number of competitive products sold by the store all. petrol mileage may be affected by the car driven.υ2 ) = 8 . the si'e of and.(r −1) F (υ1 . the treatment constitutes different levels of a single factor which is controlled in the e#periment.\$c .5( \$c . the analysis of variance table ta%es the following form %um of %quares &&0 &&> &&8 &&T )egree of *reedom \$c .Values are calculated as MSC MSE υ = <here.5(. "or e#ample.\$c .5( 'atio of * :&0. 1 (c − 1) and υ2 = (c −1). &&0 &&> &&8 &&T 6 6 6 6 &um of &quare between columns &um of &quare between rows &um of &quare due to 8rror Total sum of &quare " .5( \$r . <ith two factor analysis of variance. we can test two sets of hypothesis with the same data at the same time. in addition to being affected by point of display.5(. the data are classified according to two different criteria or factors. road conditions and other factors in addition to the brand of petrol used. might also be affected by the price charged.:&8 • • • • • %ource of Variation Between &amples Between >ows >esidual or 8rror Total <here. the way it is driven. such a test is called a two factor analysis of variance. <hen it is believed that two independent factors might have an affect on the response variable of interest. There could be many situations in which the response variable of interest may be affected by more than one factor.5( n 5 (ean %um of %quares :&0 6 &&0. !n a two way classification. &imilarly.\$r . the sale of cosmetics. !n a two way classification. it is possible to design the test so that an analysis of variance can be used to test the affects of two factors simultaneously.\$r .:&8 :&>.• • • !n a one factor analysis of variance.5( :&> 6 &&>.\$r .5( :&8 6 &&8.

winter and monsoon. the null hypothesis is reHected. \$a( \$b( • • %easons &ummer <inter :onsoon Total &alesmen and &easons 7D AC A? C7 %alesmen \$ A5 75 AC ?5 ) 79 7A AC CD %eason:s otal 5A? 5AB 55A 7DB • %easons &ummer <inter :onsoon Total 3o the salesmen significantly differ in performanceI !s there a significant difference between the seasonI "urther. υ1 = (c − 1) and υ2 = (c −1). %alesmen \$ C G5 5 C %eason:s otal G? B ? B A GD A 4 B GD 5 A 7 ) G9 GA 5 D 9 . A 7D A? AD CB \$i( \$ii( %olution • The above data are classified to criteria.summer. 0 and 3 and observes their sales in three seasons . in order to simplify the calculations we code the data by subtracting 7B from each figure. E. !f calculated value of " is greater than the table value at a pre assigned level of significance.υ2 ) = MSR MSE <here. The data in coded form is summari'ed as follows. B. The figures \$in la%hs( are given in the following table.(r −1) • !t should be carefully noted that υ 1 may not be same in both cases in υ = ( c − 1 ) υ one case 1 and in other 1 = (r − 1) .F(υ1 . The calculated values of " are compared with the table values.ample • A tea company appoints four sales men A. otherwise accepted.

adding these figures and subtracting the correlation factor from them ∴Thus. the sum of squares between seasons = ( 8) 2 ∴υ 2 = ( r − 1) = ( 3 − 1) = 2 4 4 4 = 16 + 0 + 16 − 0 = 32 + ( 0) 2 + ( − 8) 2 +− T2 N otal %um of %quares • This is obtained by adding the squares of all the items in the total items in the table and subtracting the correction factor therefore. Total sum of squares 2 • ( 6) 2 + ( − 2) 2 + ( − 4) 2 + ( − 1) 2 + ( − 2) 2 + ( − 9 ) 2 + (1) 2 + ( − 1) 2 + ( 5) 2 + ( 2 ) 2 + ( − 1) 2 − T ∴υ = ( n − 1) = (12 − 1) = 11 = 210 − 0 = 210 N 10 . adding all such figures and subtracting them from the correction factors. thus. ∴Thus.s totals dividing each total by the number of items included in it. sum of squares between salesmen = 3 3 3 = 0 + 3 + 27 + 12 − 0 = 42 ( 0 ) 2 + ( 3) 2 + ( − 9 ) 2 + ( 6) 2 − ( 0) 2 3 12 ∴υ 2 = ( c − 1) = ( 4 − 1) = 3 %ums of the squares &etween seasons • This is obtained by dividing the squares of the seasons total by the number of items that ma%e up each total.∴0orrection "actor = T2 0 = =0 N 12 \$Number of items or N is 5A( %um of the %quare &etween %alesmen • This will be obtained by squaring up the salesmen.

first compare the salesmen variance estimate with the residual variance estimate. %um of %quares 4A 7A 57D A5B %ources of Variation Between &amples Between >ows \$&easons( >esidual Total • (υ) 7 A D 55 df (ean %quares 54 5D AA. • Thus.54. The calculated value is less than the table value and we conclude that the sales of different salesman do not differ significantly.67 = 1. Now.417 16 • • The critical value of " for υ1 = 2 and υ1 = 3 at 9J level of significance is 9. hence we conclude that the difference is not significant. F = 22. let us compare the )season variance* estimate with residual estimates.67 = 1. The calculated the difference is not significant. the above information is presented in the following table of Analysis of variance. we say that sale of salesman in different season do not differ significantly.• Now. F = 22.DE Now let us ta%e the hypothesis that there is no difference between the sales of salesmen and of seasons or in other words. 11 .ED. .619 14 • The table value of " for υ1 = 3 and υ 2 = 6 at 9J level of significance is 4.ence. the three independent estimates of estimates of variance are the estimates of variance of a common population. Now. • Thus.