You are on page 1of 10

ria

nodel

amly
16
erage
dfor

ANALYSISOF VARIANCE (ANOVA)


AND REGRESSION ANALYSIS
6 . 1 INTRODUCTION

le explaining simple linear regression model and multiple


i r laroe sample and t-test for small samples to test the
regression model, we used Z
statistical significance of single reares
eficient (Ct or B) separately. But when we are mterested to test the joint or overall test
of
iomiífícance of estimated regression coefficients (1.e. all slope coefficients
ave to explain Goodness of fit or Coefficient of determination (R ). Goodnesssimultaneously), we
of fit is a measure
bat helps us to know how well the regression coefficients or parameters (not
aplained the variation in dependent variable Y. The total variation in the observed intercept
Y value
have
a)about
heir mean value can be explained by the variation due to explanatory variable or regressor and the
ariation due to Residual or unexplained variation of Y.
SoTSS ESS+RSS
A
Study of these components of TSS is known as the
Analysis of Variance (ANOVA) from
Regression point of view.
n e basis of the t-test or confidence interval, it is possible to accept the bypothesis that a
slope coefficient (B) is zero, but yet it rejects joint hypothesis that all slope coCie
imple and ml . Hence we can not use t-test for over allsignificanceoftheestimate
imple and multiple
regression
model and go ANOVA to for this.
16:2 THEALYSS
ANALY OF VARIANCE (ANOVA) AS ASTATISTICAL METHOD
cally Now, we Aplain
will
One of thee explain statistical
sta tool of ANOVA and then use it for our regression anauy
statistical method is

Analedysis of Variancemegant,
(
powerful and useful technique in modern
known F-iest as as
1 s
on NOVA) as developed by R.A. Fisher. It is also
F-test
To test thedeveloped by GW. Snedecor. and t-test (SImall
ample), But to equality
test the
of two population means, have
we Z-test (large sample)
we can Z t test.
not use or We can
three or more population,
anong
294 Introductory Econometrics
ANOVA to test for the equality of (k23) population for
ormean aa
use
also using data observed from an compleey
nal lctely
observational
experimental design and
ANOVA is used to determine whether there 1s any statistical signifco
study.
randoz
three ference
the means of three or more independent (unrelaea) groups. In case of
more betvween
or
cannot use t-test to compare means.
groupg We
Problems of t test in case of many groups
t-tests to compare the groups. i.e.
There is need to apply many
1.
2nd group and 3group and 3rd group and 1Isto comparision between
group and 2nd group, For krumbet I
have to apply
kk- times of t test for
comparison.
of
groups, we
2
2. Every time, when t test is conducted there is a chance of making a type . 1
2 time t-test, chance of type - I error is 5%, for 2nd times, error will be 1 t. For one
10% a
to rise to 15% for three use of -test. 10 avoid such problem, we use ANOVACOntin
ontimues
statistically significant result at 5% type-l error.
gives
3. t-test evaluates only the variance BETWEEN groups but ANOVA evaluates hol. ne
and WITHIN VARIANCE groups and calculate its ratio. BETWEEN
For these problems, ANOVA 1S most important statistical procedure to compare
two populations or population having more than two subgroups (samples). In other worde

compares the means between the groups


and determines whether any of those means O4 a
ANOVA
significantly diferent from each other. To be very frank, we may say, ANOVA determines
nin whether
isticaly
not.
all groups are taken from common (same) population or

ANOVA is ratio between "Mean sum of squares between groups (MSSB) and mean
sum of squares within groups (MSSW). So

F MSSB Between variance Variability between means


MSSW within vanance Variability within the distribution
Variability between groups: It means the totality ofvariations from one group to anohe
ie. variation due to groups is called variability between grups.
Variability within groups: The variation among the observations of each specific group is
called its internal variation and the totality of the internal variation is called variability within
groups. It can be easily explained by the following table and diagram.
Table-16.1

Group- A Group- B Group-C


19t
25
32 EBetween 30 Between 33
58 51 50
59Egroups 66 groups
52
94 70
AN

sis Varance
V ara 295
of
ds
Sample B
Population
Sample A
with in Sample C
X
between
beiween y

(samples) drawn fromalation and each sample has its


own
T ' h e r ea nce. The vertical arrow lines near numbers in table showsmean and
and variance.
andard
deviation
A, Group B and Group C. But the horzontal Arrow line between Group variance
A and
Group B and Group C shows the variance between groups.
b e t w e e n

and
B pulation mean and X ,X, and
up
Gioup is popula X, show the sample
mean. The difference
figure,
Inthe
and X F . is variance between groups but variance with in
means is the
and p, p arrow
ween
X1
(sample).
each group
e side
LsSumption o f A N O V A

randomly are selected


Random selection: Samples
variables should be normally distributed
Normal distribution Independent
:

Homogenity of variances:All sub-populations (Group / sample) have the same variance


(homosedastic) o,=O, 03 . . = 0
Additivity of variances : Total variance should be equal to sum of between variance and
with in variance.

Formulating Hypothesis
t tests the Null hypothesis H , : u = 4 = h =4, 1.e. Tbere is no significant ditfarce

ketween the means of all groups (All groups have same mean) = group mean K = mumber of

g0ups. Allernative hypothesis, H,:p*kh#H, ..*Pp. ie. There are at least two groups meas
Bat
statistically significantly different from each other.
Cassification of ANOVA
owo
Observation in sample data in ANOVA is classified according to one factor and two factors.
is actors
classified according to one factor, it is called one way ANOVA and if it is classified according

ctors, it is called two


nalysis. ANOVA summar way ANOVA. We here take only one-way ANOVA for our Regression
Summary table is given below for our guidance to test overall signiicance of
parameters.
296 Introductory Economet
Table 16.2 Analysis ofvariance summary table (for one way classification) metric
table for one way ANOVA

Mean sum of
Summary
Sum of Degree of F tabulated a
Sources of
square (SS) freedom (dot) squares (MSS)
F-calculated SY% or 1%
variation

MSSB
F level
table for
Between k-1 SSB
SSB
-1) 1) and (n4)(k
groups
F= MSSB degree of
MSSW MSSW
SSW
freedom at 5
Within groups SSW n-k
or 1% level o
(error) (nk-
TSS n-1
siFea>
gnificFableance.Wel
Total
reject Ho
SOLVED PROBLEM ON ANOVA
o.1. To assess the significance of possible variation in performance in a certain test
the convent schools of a city, a common test was given to a number of students to
random from the 4th clas ofthe 3 schools concerned. The results given ow.
School A School B School C
8 17
10 10
7 12
14 8 12
11 8 15
16 13 12
Make the analysis of variance for the given data
Ans. Given the data, the mull bypothesis H, :P =H, =ke against the alternative hypothesis

17
10 10
7 11 12
14 8 12
11 15
16 13 12
2X, =66 X, = 54 | X, =78

- ll, 9 --13
n n 6
Grand mean X=4tXitXc_11+9+13 33
3 3 3
Varnance(Aro

ween 297
sum
o f
fsquare groups (samples) (SSB)
of
ervations, wehave to take 6 times of
pklatnon

a r
6
e
oi
deviation of grand mean
sine
lthere

(x) from
- - , -N*| ,-X) |,- -X) -Xj
9-11--2
I1-11=0

0 9-11--2
13-11=-2
I1-l1=0

0 9-11=-2
13-11=-2
11-11=0
13-11--2
0 9-11 -2
11-1=0
13-11=-2
9-11--2 4
l1-11=0
13-11-2
9-11=-2
I1-11=0

13-11--2
X-X=
4
X-=
0
=24 2Xe-xy =24
g f - 1 ++
2tD - X +MX-X = + 2 4 +24 = 48

fsum of square
with samples (SSW or SSE)
htonof
(4-XA) (4-X| (B-Xa) (B-X
8-11=-3 9 8-9--1 (X-Xc)(X-Xc}
17-13=4 16
10-11-1 6-9-3 9
10-13-3 9
7-11=-4 11-9=2 4
12-13=-1
14-11-3 9 8-9-1 12-13=-1
l1-1=0 8-9-1 15-13=2 4
16-11 5 25 13-9=4 16 12-13-1 1
|A-XA) =
XB-X» =32 XC-X=32 60

sSWorSSE) (A-XA ={B -X»)} +E(C-Xc} 60+32+32 124


=
=

NOVA Summary Table (one way classification)


Sarces of Sum of Degree of Mean sum of
Variation F-calculated F-table at 5%
square (SS)| freedom (dof)| squares (MSS)| level
SSB
Between SSB MSS =
V=k-1 k-1
Samples 48 3-1-2 MSS
MSS = = 24
MSE Fv)
WHihin = F2,15) at
Samples SSW (SSE) V n-k
MSE = E
n-k
242.90\
8.24 S% level of
emor 124 18-3-15 124
=8.24 significance
15 3.68
Iotal
TSS K= number of samples =3 (A, |
Total sum of n-1
,C)
Square 18-1 17 n= number of observation = 18|

Sbe we
accept H, i.e. there no significant variation in schools and p=sHe
298
Introductory Econometrics
Q.2. An experiment is conducted
study the ettectiveness of three methode
to

group four studer of teaching


like lecture, question answer and library method. In each
randomly. The obtained scores are given in the lollowing table. Is there an. choose
difference among three methods of teaching ? there any significant

Lecture Questions Answer Library


method(A) Method(B) Method (C)
9 2
10 4
9 2

2 6 2

Ans. We can solve this problem by a different method by using correction method. For thi
have to calculate TSS and SSB and get SSW(SSE) by deducting SSB from TSS (SSWTo
SS-
SSB)
First step is to calcualte correction factor (C) C= =

Given the data

So C- (56)3150 261.33
15
. Correction factor (C) = 261.33

Calculation of total sum of square (TSS)


4BC
4 9
Total
2 15
TSS=2(X,- Correction Factor 5 10 4 19
92 12
= EX + x + EX-Correction Factor
2 6 10
= 46 + 298 +28-261.33
56
= 372-261.33 1234 10
TSS = 110.67

Calculation of sum of square between samples (SSB)

X(X,X,(X,X(X|
4 16 9 81 2 4

25 10 100 4 16
9 81 2 4

4 6 36 2 4
Ex=41 X=298 EX=28
SSB- +(X,}+(XJ Correction Factor
ariance (Anova) and Regression Analysis

299
e n c en =
=
number
of obse ation of each sample =F4
(12)+(34)+(10L-261.33 X
O P =
4

+1156+100
-.
261.33 10
9
350-261.33 88.67
1400 261.33
=

6
2
EX =12 EX, =34 EX
8oSSB 8 8 . 6 7

ktion of sum of square within sample (SSW) or (SSE)


=10
itvltios

110.67 88.07 22.6


SSW=TSS-SSB
Gormulating Null Hypothesis H, 4 =4
against Alternative
Hypothesis
mgaration of ANOVA Table
Soarceof Sum o f Degree of Mean sum of
ariaton square (SS Freedom (df squre(MSs) FClcuated FTbuated at 5%
level
Berween
= k-1 4SSB=2
SSB = 88.67 k-1
Sample 3-1-2
98.643-33 FUSS
SSW= =n-k
ALSSW S 43.33 18.16 F(v=F(2,9)
2 4 1 8 16
446
Samaic Eror 22.6
12-3=9
n-k -
a
5%lo
of signiicance
2.44
Total TSS(1 10 67) -1, |K=o.of sample 3(A,B, C)
12-1-11 n=
no.
ofobservation =12
Since
Ong thrce methods>Fabie
calculated The null
of
hypothesis H
significant is rejected so there is difference
83 REGRESSION teaching.
ANALYSIS AND ANALYSIS OF VARIANCE
Havi
aualysis. ng firsta
hand (ANOVA)
knowledge ANOVA, will see how ANOVA is useful in Regression
on we

163.1:In case of simple Linear


In the
ySing R? simple linear regressionRegresson Model (Two variable case)
1
statistics as t
model, analys Goodness of fit of the estimated equation
we

Todel. Thus riable (Y) which is explaining the proportion of total variation in
in explalained by the
ariable) and the regressionnaanalysis independent or explanatory variable (X) of the
the total variation
unexplained (error terms)variation. are split into explained (by explanat

y-P-z-7 +2(y-
306
To conclude, the F-test (ANOVA) is versatile in the sense
that it can t .
Introductory Econometio
such as whether (a) an individual regression coetficient is est a
slope coefficients are zero (c) two or more coefficients are
statistically variety of hyn
satisfy some linear restrictions and (e) there is structural
statisticallv
e b) alli
stability of the () the coe

EXERCISE odelcoeficients
regression mod
1. Choose the correct alternative for each questions
) The analysis of variance is a
(a) Mathematical method (b) Statistical method
(c) Econometric method (d) none of these
(i) The analysis of variance (ANOVA) method was
developed by
(a) Ragnar Frisch b) Chow
(c) Oscar Lange (d) R.A. Fisher
ii) To estimate statistical significance of the diference
between two variances
the test is used.

(a)x (chi-square) test (b) chow test


(c) t-test (d) F-test
iv) To estimate overall significance of the parameters, which of test is used.
(a) t-test b) Z-test
(c) x(chi-square) t (d) ANOVA
(v) In testing the overall significance of the model Y, = a
+B,X, +B,X, +U, using F.
statistics or ANOVA, we test the null hypothesis
(a) a=B, =B, =0 (b) a =B, =0
(c) B,+P =0 (d)P =B =0
(vi) In ANOVA, we calculate F-statistics and compare this value to the table F(m,n-k).
Here m refers
(a) number of parameters (k) (b) k-1
(c) number of sample (n) (d) n-1

Answers
0 6) i) ( () (c) (v) (d) () (a) (vi) (b)
2. Join Group A and Group B correctly
Group B
Group-A
(i) Testing hypothesis about individual (a) Chow test (ANOVA)
regression (or) to test equallity of
two population mean.
Varce

gs Varnance
e
Anova)
(Anova)
andKegression Analysis
307
rall ssignificance of
io
(b) Tinter test
Testing
the
overall

regression
model (or) (ANOVA)
are
are
e s t i m a t e d

parameters

estimated

to zero
equal
simultaneou
all ously
two or more
coeffecient
(c) ANOVA
that
) T e s t i n g
another

one
toO

are
eqaul the partial regressi (d) t-test
) Testingthat
certain restrictions
Certain restrictions
satisfy
(e) Chow-Test (a bit different
efficients

ability of
regression
the
Testing
effecients with
increasing the size form) ANOVA
o fs a m p l e s

(i)>(©), >(C), (ii) >(a), (iv) >(b), (V) > (e)


(0. (i)
(d),
) the blanks
Fillin demand functioin for potatoes in India.
the following
+B,logX, +B, logX +U
Given

=a +P, logX +B,logX,


ogY, demand for potatoes, X,
income per capita, X, price of potatoes, X, price
= = =

whre =
where
price ofbrinjal.
Answe the following questions.
=

X,
ofcabbage,want to test the hypothesis that own price elasticity of demand is negative as
a) If we theory, the null hypothesis
would be
economic
predicted by
Ans. B, =0 each
and brinjal are complementary to
test the hypothesis that potatoes
6) If we want to
other, the null hypothesis would be

Ans.B,=0 income elasticity


demand for potatoes, then null bypothesis
c) To test the hypothesis that
would be
the
Ans.p,=0 brinjal are uncorelated proucis,
cabbaye and
10 test the hypothesis that potatoes,

null hypothesis would be

Ans. P, =B, = P, =0
Answer the questions.
a) ANOVA
(b) t-test and ANOVA comparis1on
) Assumption of ANOVA
d) Mean sum of square between groups
Mean sum of square within groups and ANOVA

( Difference analysis
between regression
analysis
ANOVA
and ANOVA
and
Similiarities
(h)
between ee
T gg
rreesss
siio
onn

Prove that 2t?


TOve =
F
308 Introductory Econnometrics
been obtained from normal population
The three samples below have
wit
5.
means are equal equa variance
Test the hypothesis that the sample
B

6
6. The following data represents the number of units of tablet production (in
day by five different technicians by using four type of machines. thousands) per
D

Workers A B C D

P 54 48 57 46
Q 56 50 62 53
R 44 46 54 42
S 53 48 56 44
T 48 52 59 48
Test whether the mean productivity of the different machines are same?
7. The philips curve implies that there is a negative relationship between the percentage chanoe
in wage and unemployment rate. Philips curve relation was fitted to a certain data and
following ANOVA table was obtained.
Source of variation Sum of square Degree of freedom Mean ssum of square
Due to regression 2.153 2.153
Due to residual or error 1.144 11 0.104
Total 3.297 12
Do the data support the existence of the philips curve relationship?

You might also like