You are on page 1of 35

ANOVA

&
ANCOVA
ANOVA

Analysis of variance (ANOVA) is a collection of


statistical models and their associated estimation
procedures (such as the "variation" among and
between groups) used to analyze the differences
among group means in a sample. ANOVA was
developed by statistician and evolutionary biologist
The ANOVA is based on the law of total variance, where the
observed variance in a particular variable is partitioned into
components attributable to different sources of variation. In
its simplest form, ANOVA provides a statistical test of whether
two or more population means are equal, and therefore
generalizes the t-test beyond two means.While the analysis of
variance reached fruition in the 20th century, antecedents
extend centuries into the past according to Stigler. These
include hypothesis testing, the partitioning of sums of
squares, experimental techniques and the additive model.
Laplace was performing hypothesis testing in the 1770s.
The analysis of variance has been studied from several
approaches, the most common of which uses a linear model that
relates the response to the treatments and blocks. Note that the
model is linear in parameters but may be nonlinear across factor
levels. Interpretation is easy when data is balanced across factors
but much deeper understanding is needed for unbalanced data.

The analysis of variance can be presented in terms of a linear


model, which makes the following assumptions about the
probability distribution of the responses:
• Independence of observations – this is an assumption of the
model that simplifies the statistical analysis.
• Normality – the distributions of the residuals are normal.
• Equality (or "homogeneity") of variances, called homoscedasticity
— the variance of data in groups should be the same.
In a randomized controlled experiment, the
treatments are randomly assigned to experimental
units, following the experimental protocol. This
randomization is objective and declared before the
experiment is carried out. The objective random-
assignment is used to test the significance of the
null hypothesis, following the ideas of C. S. Peirce
and Ronald Fisher. This design-based analysis
was discussed and developed by Francis J.
Anscombe at Rothamsted Experimental Station
and by Oscar Kempthorne at Iowa State
University. 
ANCOVA
Analysis of covariance (ANCOVA) is a general linear model which
blends ANOVA and regression. ANCOVA evaluates whether the
means of a dependent variable (DV) are equal across levels of a
categorical independent variable (IV) often called a treatment, while
statistically controlling for the effects of other continuous variables
that are not of primary interest, known as covariates (CV) or nuisance
variables. Mathematically, ANCOVA decomposes the variance in the
DV into variance explained by the CV(s), variance explained by the
categorical IV, and residual variance. Intuitively, ANCOVA can be
thought of as 'adjusting' the DV by the group means of the CV(s).

The ANCOVA model assumes a linear relationship between the


response (DV) and covariate (CV):
Analysis of covariance (ANCOVA) is used in examining the differences in the
mean values of the dependent variables that are related to the effect of the
controlled independent variables while taking into account the influence of the
uncontrolled independent variables.

The Analysis of covariance (ANCOVA) is used in the field of business. This


document will detail the usability of Analysis of covariance (ANCOVA) in
market research.

Analysis of covariance (ANCOVA) can be used to determine the variation in


the intention of the consumer to buy a particular brand with respect to
different levels of price and the consumer’s attitude towards that brand.

Analysis of covariance (ANCOVA) can be used to determine how a change in


the price level of a particular commodity will affect the consumption of that
commodity by the consumers.
Example 1. Comparative Effects of Two Methods of
Hypnotic Induction

X = the score on the index of primary suggestibility

Y = the score on the index of hypnotic induction


Method A Method B

Xa Ya Xb Yb
Sub- Sub-

ject ject

a1     5 20 b1     7 19

a2   10 23 b2   12 26

a3   12 30 b3   27 33

a4  the index of primary


X = the score on   9 suggestibility 25 b4   24 35

a5  the index of hypnotic


Y = the score on 23 induction 34 b5   18 30

a6   21 40 b6   22 31

a7   14 27 b7   26 34

a8   18 38 b8   21 28

a9     6 24 b9   14 23

a10 13 31 b10   9 22

Means 13.1  29.2  18.0 


28.1 
A basic one-way analysis of covariance requires four
sets of calculations. In the first set you will clearly
recognize the analysis-of-variance aspect of ANCOVA.
The two middle sets are aimed at the covariance aspect,
and the final set ties the two aspects together. As in
earlier chapters, SS refers to the sum of squared
deviates. The designation SC in the third set refers to
the sum of co-deviates, the raw measure of covariance
introduced in Chapter 3. In both cases, the subscripts
"T," "wg," and "bg" refer to "Total," "within-groups,"
and "between-groups," respectively.
• SS values for Y, the dependent variable in which one is chiefly interested.T
Items to be calculated:

 SST(Y)

 SSwg(Y)

 SSbg(Y)

• SS values for X, the covariate whose effects upon Y one wishes to bring under statistical
control.
• Items to be calculated:

 SST(X)

 SSwg(X)
• SC measures for the covariance of X and Y;
• Items to be calculated:

 SCT

 SCwg
• And then a final set of calculations, which begin by
removing from the Y variable the portion of its
variability that is attributable to its covariance with X.
• The calculations for the first two of these sets are
exactly like those for a one-way independent-samples
ANOVA, as described in Chapter 14. I will therefore
show only the results of the calculations, along with the
summary values on which they are based, and leave it
to you to work out the computational details. If there is
any step in these first two sets of calculations that you
find unclear, it would be a good idea to go back and
review Chapter 14.
Calculations for the Dependent Variable Y

Ya Yb

20 19
23 26
30 33

25 35
34 30
40 31
27 34 Click here if you wish

38 28 for total to see the details of


24 23 array of calculation for this
31 22 data data set.

N 10 10 20

292 281 573


.∑Yi

8920 8165 17085


.∑Yi2

SS 393.6 268.9

Mean 29.2 28.1 28.7

 SST(Y) = 668.5 SSwg(Y) = 662.5 SSbg(Y) = 6.0
Calculations for the Covariate X
Xa Xb

  5   7
10 12
12 27
  9 24

23 18
21 22
14 26

18 21 Click here if you wish


  6 14 for total to see the details of
13   9 array of calculation for this

data data set.

N 10 10 20

131 180 311


.∑Xi

2045 3700 5745


.∑Xi2

SS 328.9 460.0

Mean 13.1 18.0


15.6
 SST(X) = 908.9 SSwg(X) = 788.9
Calculations for the Covariance of X and Y

SC = ∑(Xi—MX)(Yi—MY) (Xi—MX) = deviateX(Yi—MY) = deviateY(Xi—MX)(Yi—MY) = co-deviateXY


Groups

A B

XaYa XbYb

100 133
230 312

360 891
225 840
782 540
840 682

378 884
684 588
 For group A:
144 322
for total  o∑Xai = 131
403 198
array of  o∑Yai = 292 For group B:

data  o∑Xbi = 180

 o∑Ybi = 281 For total array:


Sums 4146 5390  o∑XTi = 311 9536

 o∑YTi = 573
.∑(XaiYai) .∑(XbiYbi) .∑(XTiYTi)
SCT SCT (311)(573)20 (∑XTi)(∑YTi)NT
= 625.9
= ∑(XTiYTi)— = 9536 —
The Final Set of Calculations

Co
var
ian
X Y ce
SST( SST( SCT = 625.9
X) = Y) =
90 66 SCwg = 652.8
8.9 8.5
 
SSw SSw
g(X)  g(Y) 
= =
78 66
8.9 2.5

  SSb
g(Y) 
=
6.0
[adj]MYa=31.23  versus  [adj]MYb=26.07
This, however, does not distinguish the analysis of
covariance fundamentally from ANOVA or any other
inferential statistical procedure, for they are all
wrapped up in a chain of if/then logical
constructions. It is simply that the chain for the
analysis of covariance is a few links longer.
• Assumptions of ANCOVAThe analysis of covariance has the same underlying
assumptions as its parent, the analysis of variance. It also has the same
robustness with respect to the non-satisfaction of these assumptions, providing
that all groups have the same number of subjects. There is, however, one
assumption that the analysis of covariance has in addition, by virtue of its co-
descent from correlation and regression—namely, that the slopes of the regression
lines for each of the groups considered separately are all approximately the same.
• The operative word here is "approximately." Because of random variability, it would
rarely happen that two or more samples of bivariate XY values would all end up
with precisely the same slope, even though the samples might be drawn from the
very same population. And so it is for the slopes of the separate regression lines
for our two present samples. They are clearly not precisely the same. The
question is, are they close enough to be regarded as reflecting the same underlying
relationship between X and Y? In the calculations of step 4d, we found the slope of
the line for the overall within-groups regression to be bwg=+.83, and that was the
value used in adjusting the means of group A and group B. The analysis of
covariance is assuming that the slopes of the separate regression lines for the two
samples do not significantly differ from +.83. We will examine this assumption more
thoroughly in Part 3, after working through our second computational example.
The obvious difference between ANOVA and
ANCOVA is the the letter "C", which stands for
'covariance'. Like ANOVA, "Analysis of
Covariance" (ANCOVA) has a single
continuous response variable. ... ANCOVA is
also commonly used to describe analyses
with a single response variable, continuous
IVs, and no factors

You might also like