You are on page 1of 0

MULTIVARIATE ANALYSIS OF VARIANCE

RAJENDER PARSAD AND L.M. BHAR
Indian Agricultural Statistics Research Institute
Library Avenue, New Delhi - 110 012
lmb@iasri.res.in

1. Introduction
In many agricultural experiments, generally the data on more than one character is
observed. One common example is grain yield and straw yield. The other characters on
which the data is generally observed are the plant height, number of green leaves,
germination count, etc. The analysis is normally done only on the grain yield and the best
treatment is identified on the basis of this character alone. The straw yield is generally not
taken into account. If we see the system as a whole, the straw yield is also important either
for the cattle feed or for mulching or manuring, etc. Therefore, while analyzing the data,
the straw yield should also be taken into consideration. Similarly, in varietal trials also the
data is collected on several plant characteristics and quality parameters. In these
experimental situations also the data is generally analyzed separately for each of the
characters. The best treatment or genotype is identified separately for each of the
characters. In these situations, Multivariate Analysis of Variance (MANOVA) can be
helpful. Before discussing about MANOVA, a brief description about Analysis of
Variance (ANOVA) is given in Section 2. A general procedure of performing MANOVA
on the data generated from RCB design is given in Section 3. The procedure of
MANOVA has been illustrated with the help of an example in Section 4.

2. Overview of ANOVA
The ANOVA looks at the variance within classes relative to the overall variance. The
dependent variable must be metric, and the independent variables, which can be many,
must be nominal. ANOVA is used to uncover the main and interaction effects of
categorical independent variables (called "factors") on an interval dependent variable. A
main effect is the direct effect of an independent variable on the dependent variable. An
interaction effect is the joint effect of two or more independent variables on the dependent
variable. Whereas regression models cannot handle interaction unless explicit
crossproduct interaction terms are added, ANOVA uncovers interaction effects on a built-
in basis.

The key statistic in ANOVA is the F-test of difference of group means, testing if the
means of the groups formed by values of the independent variable (or combinations of
values for multiple independent variables) are different enough not to have occurred by
chance. If the group means do not differ significantly then it is inferred that the
independent variable(s) did not have an effect on the dependent variable. If the F test
shows that overall the independent variable(s) is (are) related to the dependent variable,
then multiple comparison tests of significance are used to explore just which values of the
independent(s) have the most to do with the relationship.

3. Multivariate Analysis of Variance
Why do MANOVA, when one can get also much more information by doing a series of
ANOVAs? Even if all our dependent variables are completely independent of one another,
MANOVA

2
when we do lots of tests like that, error inflates. But in many ecological or biological
studies, the variables are not independent at all. Many times they have strong actual or
potential interactions, inflating the error even more highly. In many cases where multiple
ANOVAs were done, MANOVA was actually the more appropriate test.

Consider an experiment conducted to compare v treatments using a randomized complete
block (RCB) design with r replications and the data is collected on p-variables. Let
ijk
y denote the observed value of the k
th
response variable for the i
th
treatment in the j
th

replication p ,..., 2 , 1 k ; r ..., , 2 , 1 j ; v ,..., 2 , 1 i = = = . The data is rearranged as follows:

← ←← ← Replications → →→ →
Treatments
↓ ↓↓ ↓
1 2

j

r Treatment
Mean ↓ ↓↓ ↓
1 y
11
y
12


y
1j


y
1r

. 1
y
2 y
21
y
22


y
2j


y
2r

. 2
y
… …
i y
i1
y
i2


y
ij


y
ir

. i
y
… …
v y
v1
y
v1


y
v1


y
v1

. v
y
Replication
Mean→ →→ →
1 .
y
2 .
y …
j .
y …
r .
y
..
y

Here
ij
y = ( ) y ... ...y y y
ijp ijk ij2 1 ij
is a p-variate vector of observations taken from the plot
receiving the treatment i in replication j.


=
=
r
1 j
ij . i
r
1
y y ;

=
=
v
1 i
ij j .
v
1
y y and
∑∑
= =
=
v
1 i
r
1 j
ij ..
vr
1
y y .

The observations can be represented by a two way classified multivariate model Ω ΩΩ Ω

ij j i ij
: e b t µ y Ω + + + = i = 1, 2,…,v; j = 1, 2,…,b, …(3.1)

µ µµ µ = (µ
1
µ
2
… µ
k
… µ
p
)’ is the 1 p × vector of general means, t
i
= (t
i1
t
i2
… t
ik
… t
ip
)’ are the
effects of treatment i on p-characters, and b
j
=(b
j1
b
j2
… b
jk
… b
jp
)’ are the effects of
replication j on p-characters. e
ij
= (e
ij1
e
ij2
… e
ijk
… e
ijp
)’ is a p-variate random vector
associated with y
ij
and assumed to be distributed independently as p variate normal
distribution ) , ( Σ 0 N
p
. The equality of treatment effects is to be tested i.e. H
0
: (t
i1

t
i2
…t
ik
…t
ip
)’ = (t
1
t
2
…t
k
…t
p
)’ (say) p , , 2 , 1 i = ∀ against the alternative : H
1
at least two
of the treatment effects are unequal. Under the null hypothesis, the model (3.1) reduces to

ij j ij 0
: e b α y Ω + + = …(3.2)
where ) t µ ..., , t µ t µ (
p p 2 2 1 1
′ + + + = α .

An outline of MANOVA Table for testing the equality of treatment effects and replication
effects is

MANOVA

3
MANOVA

Source DF SSCPM (Sum of Squares and Cross Product Matrix)
Treatment v-1 = h
H = ( )( )

=

− −
v
1 i .. . i .. . i
b y y y y
Replication r-1 = t
B= ( )( )

=

− −
b
1 j .. j . .. j .
v y y y y
Residual (v-1)(r-1)
= s
R= ( )( )
∑ ∑
= =

+ − − + − −
v
1 i
b
1 j .. j . . i ij .. j . . i ij
y y y y y y y y
Total vr-1
T= ( )( )
∑ ∑
= =

− −
v
1 i
b
1 j .. ij .. ij
y y y y =H+B+R

Here H, B, R and T are the sum of squares and sum of cross product matrices of
treatments, replications, errors (residuals) and totals respectively. The residual sum of
squares and cross products matrix for the reduced model
0
Ω is denoted by
0
R and is
given by H R R + =
0
.

The null hypothesis of equality of treatment mean vectors is rejected if the ratio of
generalized variance (Wilk's lambda statistic)
R H
R
+
= Λ is too small. Assuming the
normal distribution, Rao (1973) showed that under null hypothesis Λ is distributed as the
product of independent beta variables. A better but more complicated approximation of
the distribution of Λis
ph
) c ab ( 1
b / 1
b / 1

Λ
Λ −
~ F (ph, ab-c)

where





 + −
− =
2
1 h p
s a , ( ) ( ) { } 5 h p / 4 h p b
2 2 2 2
− + − = ,
2
2 ph
c

=

For some particular values of h and p, it reduces to exact F-distribution. The special cases
are given below:

For h = 1 and any p, this reduces to
p Λ
) 1 p s ( Λ) 1 ( + − −
~ F (p, s – p + 1)
For h=2 and any p, it reduces to
p Λ
) 1 p s ( ) Λ 1 ( + − −
~ F (2p, 2(s – p + 1))

For p=2 and any h:
h Λ
) 1 s ( ) Λ 1 ( − −
~ F (2h, 2(s – 1)).
For p = 1, the statistic reduces to the usual variance ratio statistics.

The hypothesis regarding the equality of replication effects can be tested by replacing Λ
by
R B
R
+
and h by t in the above.
MANOVA

4
Several other criteria viz. Pillai's Trace, Hotelling-Lawley Trace or Roy's Greatest Root
are available in literature for testing the null hypothesis in MANOVA. Wilks' Lamda is,
however, the commonly used criterion. Here, we shall restrict to the use of Wilks' Lamda
criterion. For further details on MANOVA, a reference may be made to Seber (1983) and
Johnson and Wichern (1988).

Remark 3.1: One complication of multivariate analysis that does not arise in the
univariate case is the ranks of the matrices. The rank of R should not be smaller than p or
in other words error degrees of freedom s should be greater than or equal to p (s ≥ p).

3.1 Multivariate Treatment Contrast Analysis
If the treatments are found to be significantly different through MANOVA, then the next
question is “which treatments are significantly different?” This question can be answered
through multivariate treatment contrast analysis. In the literature, the multivariate
treatment contrast analysis is generally carried out using the
2
χ -statistic. The
2
χ -statistic
is based on the assumption that the error variance-covariance matrix is known. The error
variance-covariance matrix is, however, generally unknown. Therefore, the estimated
value of error variance-covariance matrix is used. The error variance-covariance matrix is
estimated by sum of squares and cross products (SSCP) matrix for error divided by the
error degrees of freedom. As a consequence, test based on
2
χ -statistic is an approximate
solution. The procedure using the Wilk’s Lambda criterion is also described in the sequel.

Suppose the hypothesis to be tested is H
0
: '
i
i
t t = against H
1
: '
i
i
t t ≠ . This hypothesis can
be rewritten as

H
0
: = ) ( '
i
i
t t − = 0 against H
1
: = ) ( '
i
i
t t − ≠ ≠≠ ≠ 0, …(3.3)
where ) ( '
i
i
′ −t t = ( )
p i ip k i ik 2 i 2 i 1 i 1 i
t t ... t t ... t t t t
′ ′ ′ ′
− − − − . Here
ik
t denote the
effect of treatment i for the dependent variable k. The best linear unbiased estimate of
) ( '
i
i
t t − is
( ) ( )
p i ip k i ik 2 i 2 i 1 i 1 i
. i
. i
y y ... y y ... y y y y '
′ ′ ′ ′
− − − − =

− y y
where
ik
y is the mean of treatment i for variable k.


3.1.1 −
2
χ Test
The statistic based on
2
χ , requires covariance matrix of the contrast of interest. The
covariance matrix, in case of a RCB design for elementary treatment contrast is obtained
by dividing the SSCP matrix for errors obtained in MANOVA by half of the product of
error degrees of freedom and the number of replications. Let this variance-covariance
matrix is denoted by
c
Σ . Under null hypothesis, x = '
. i
. i
y y − follows p- variate normal
distribution with mean vector 0 and variance-covariance matrix
c
Σ . Applying the Aitken's
transformation, it can be shown that x Σ z
2 / 1
c

= follows a p-variate normal distribution
with mean vector 0 and variance-covariance matrix I
g
, where I
g
, denotes the identity
matrix of order g. Then using the results of quadratic forms, it can easily be seen that z z′
x Σ x
1 −
′ = follows a
2
χ distribution with p-degrees of freedom.
MANOVA

5
3.1.2 Wilk’s Lambda Criterion
For testing the null hypothesis (3.3), we obtain a sum of squares and products matrix for
the above elementary treatment contrast. Let the SSCP matrix for above elementary
treatment contrast be
p p×
G . The diagonal elements of G are then obtained by
( ) v ,..., 2 , 1 ' i i ; p ,..., 2 , 1 k y y
2
r
g
2
k ' i ik kk
= ≠ = ∀ − 





= …(3.4)
and the off diagonal elements are obtained by
( )( )
' k ' i ' ik k ' i ik ' kk
y y y y
2
r
g − − = …(3.5)
The null hypothesis is rejected if the value of Wilk's Lambda
| |
| |
*
R G
R
Λ
+
= is small,
where R is the SSCP matrix due to residuals as obtained through MANOVA. The
hypothesis is then tested using the following F-test statistics based on Wilk's Lambda for h
= 1


p
1 p edf
*
* 1 + − −
Λ
Λ
∼ F(p, s-p+1).


4. Illustration using Multivariate Techniques
In this section, the results obtained from bivariate analysis of variance of the data
generated from the experiments conducted under PDCSR are given, where the data on
grain yield and straw yield were observed.

Illustration 4.1: An experiment entitled Studies on the experimentation on
conservation of organic carbon in the soil to improve soil condition was conducted at
Bhubaneshwar on rice crop. The experiment was initiated in the year 1997. The data on
grain and straw used for the illustration pertains to the Kharif season of 2001. Ten
treatments were tried in the experiment. The details of the treatments are given below:
T1 - Recommended N 100%
T2 - Recommended N 100% out of which 10 Kg at first ploughing
T3 - Recommended N 100% out of which 20 Kg at first ploughing
T4 - Recommended N 100% and add 10 Kg N/ha at first ploughing
T5 - Recommended N 100% and add 20 Kg N/ha at first ploughing
T6 - Recommended N + 10 Kg N/ha
T7 - Recommended N + 20 Kg N/ha
T8 - Recommended N + cellulose decomposing enzyme (FYM)
T9 - Recommended N + FYM 5 t/ha during Kharif
T10 - Recommended N + FYM 5 t/ha during Rabi

The results of multivariate analysis of variance are given in the sequel. First the results for
each of the two characters are presented separately.

ANOVA: Grain Yield (GYLD)

Source DF SS MS F Value Pr > F
Model 12 104.6889 8.7241 5.94 <.0001
Error 27 39.6605 1.4689
Total 39 144.3494
MANOVA

6
R-Square CV Root MSE GYLD Mean
0.7252 12.9895 1.212 9.3305

Source DF SS MS F -ratio Pr > F
REP 3 2.7312 0.9104 0.62 0.6083
TRT 9 101.9577 11.3286 7.71 <.0001

ANOVA: Straw Yield (SYLD)
Source DF SS MS F -ratio Pr > F
Model 12 161.1923 13.4326900 6.88 <.0001
Error 27 52.7144 1.9523851
Total 39 213.9067
R-Square CV RMSE SYLD Mean
0.7536 12.6574 1.39737 11.0393
Source DF SS MS F -ratio Pr > F
REP 3 4.4419 1.4806 0.76 0.5273
TRT 9 156.7503 17.4167 8.92 <.0001

It can be seen that for both the characters, the replication effects are not significantly
different whereas the treatments are significantly different. Therefore, for making all
possible paired comparisons, the least significant difference procedure of multiple
comparisons was used. The results are given in the sequel:

t Tests (LSD) for GYLD
Alpha 0.05
Error Degrees of Freedom 27
Error Mean Square 1.4689
Critical Value of t 2.0518
Least Significant Difference at 5% 1.7584

t Grouping Mean N TRT
A 10.8000 4 8
A 10.6000 4 7
A 10.5575 4 6
A 10.5425 4 9
A 10.5125 4 10
A 9.8325 4 5
A 9.1150 4 4
B C 8.0525 4 3
D C 7.3550 4 2
D 5.9375 4 1
*The treatments with the same alphabet are not significantly different.
MANOVA

7
t Tests (LSD) for SYLD
Alpha 0.05
Error Degrees of Freedom 27
Error Mean Square 1.952385
Critical Value of t 2.05183
Least Significant Difference 2.0273

t Grouping Mean N TRT
A 12.9375 4 7
A 12.7750 4 6
B A 12.4375 4 9
B A 12.3550 4 10
B A 12.2625 4 8
B A C 11.7975 4 5
B C 10.7425 4 4
D C 9.8875 4 3
E D 8.5275 4 2
E 6.6700 4 1
*The treatments with the same alphabet are not significantly different.

It can be concluded that the treatment T8 is at Rank 1 for GYLD and T7 gets rank 1 for
SYLD, although the two treatments are not significantly different among themselves. The
treatments T4 and T2 are not significantly different for GYLD and significantly different
for SYLD. Therefore, to rank the treatments collectively for both the characters, the
multivariate analysis of variance was carried out. The results obtained are given below:

Multivariate Analysis of Variance
E = Error SSCP Matrix
gyld syld
gyld 39.66052 44.60303
syld 44.60303 52.7143975
Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
DF = 27
gyld syld
gyld 1.000000 0.975485
<.0001
syld 0.975485 1.00000
<.0001
MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall TRT
Effects.
Statistic Value F-ratio Num DF Den DF Pr > F
Wilks' Lambda 0.11997 5.45 18 52 <.0001
Pillai's Trace 1.2550 5.05 18 54 <.0001
Hotelling-Lawley Trace 4.2100 5.91 18 39.714 <.0001
Roy's Greatest Root 3.2475 9.74 9 27 <.0001
MANOVA

8
From the above Table, it can be concluded that the treatment effects are significantly
different. Now the next question is that what is ranking of treatments? Which treatments
are significantly different? This can be achieved through multivariate contrast analysis.
However, most of the software, carry out the univariate contrast analysis on the combined
average values of all the dependent variables. To account for the correlation structure
between the two variables, the principal component analysis was carried out. The results
are:
Eigenvalues of the Correlation Matrix
Eigenvalue Difference Proportion Cumulative
1 1.98428028 1.96856056 0.9921 0.9921
2 0.01571972 0.0079 1.0000
Eigenvectors
Prin1 Prin2
gyld 0.707107 0.707107
syld 0.707107 -0.707107

It can be seen that the first principal component explains 99.21% of the variance.
Therefore, the principal component scores of the observations for the first principal
components are obtained and the univariate analysis of variance was carried out. The
results obtained are:
ANOVA: Principal Component Scores
Source DF SS MS F -ratio Pr > F
Model 12 261.2948 21.7746 6.48 <.0001
Error 27 90.7905 3.3626
Corrected Total 39 352.0854
R-Square CV Root MSE Prin1 Mean
0.7421 12.7311 1.8337 14.4036
Source DF SS MS F -ratio Pr > F
REP 3 6.7790 2.2597 0.67 0.5767
TRT 9 254.5158 28.2795 8.41 <.0001

It can be seen that the treatments are highly significantly different. Therefore, multiple
comparisons using the least significant difference procedure was used.

t Tests (LSD) for prin1
Alpha 0.05
Error Degrees of Freedom 27
Error Mean Square 3.3626
Critical Value of t 2.0518
Least Significant Difference 2.6605
MANOVA

9
t Grouping Mean N TRT
A 16.644 4 7
A 16.499 4 6
A 16.308 4 8
A 16.249 4 9
A 16.170 4 10
B A 15.295 4 5
B A 14.041 4 4
B C 12.685 4 3
D C 11.231 4 2
D 8.915 4 1

The treatment T7 gets the first rank and is non-significantly different from T8. The
treatments T4 and T2 are significantly different among themselves. This procedure
answers the question to some extent. But a multivariate contrasts analysis is the best
answer for this situation. The results of multivariate treatment contrast analysis for making
all possible paired comparisons of the treatments are given in the sequel.

Probabilities of Significance of All Possible Paired Treatment Comparisons using
Wilks' Lamda Criterion
Treats
1 2 3 4 5 6 7 8 9 10
1 .
2
0.1525 .

3
0.0006 0.0388 .

4
0.0010 0.0938 0.1673 .

5
0.0001 0.0055 0.1352 0.3945 .

6
0.0001 0.0004 0.0270 0.0631 0.5497 .

7
0.0001 0.0001 0.0194 0.0253 0.3271 0.8742 .

8
0.0001 0.0020 0.0006 0.0531 0.0200 0.0058 0.0017 .

9
<0.0001 0.0023 0.0181 0.2604 0.5636 0.3653 0.1657 0.1264 .

10
<0.0001 0.0030 0.0159 0.2904 0.4866 0.2667 0.1113 0.1828 0.9755
.
*bold face type shows the treatment pairs that are not significantly different.

From the above results, it is seen that treatments T7 and T8 are significantly different
where as they were found to be not significantly different when analyzed for individual
characters or 1
st
principal component score was used.

Note: The MANOVA described in Sections 2 and 3 can usefully be employed for the
experimental situations where the experiment is continued for several years/ seasons with
same treatments and same randomized layout. For a detailed discussion on this one may
refer to Parsad et al. (2004).



MANOVA

10
SPSS Commands for MANOVA
1. Enter the data
2. ClickAnalyzeGeneral Linear ModelMultivariate



3. Put dependent variables (grain, straw) in “Dependent Variables” box and independent
variables (treat., rep) in “Fixed Factors” box



4. Then Click model Custom and bring the independent variables in model box . Then
click continue



MANOVA

11



5. If you want contrast analysis, then click ‘contrast’ and mark the variables for which
you want contrast analysis otherwise click continue

6. If you want post-hoc analysis, then click ‘post hoc…’ and bring the required variables
in ‘Post hoc test for’ box and then click continue



7. For other statistics, then click ‘Options’ and for diagnostics results, then click ‘Save’
MANOVA

12


8. Finally click ‘OK’ button and get the results




References and Suggested Reading
Johnson, R.A. and Wichern, D.W. (1988). Applied Multivariate Statistical Analysis, 2
nd

Edition. Prentice-Hall International, Inc., London.
Parsad, R., Gupta, V.K., Batra, P.K., Srivastava, R., Kaur, R., Kaur, A. and Arya, P.
(2004). A diagnostic study of design and analysis of field exeriments. Project
Report, IASRI, New Delhi.
Rao, C.R.(1973). Linear Statistical Inference and Application. Wiley Eastern Ltd., New
Delhi.
Seber, G.A.F.(1983). Multivariate Observations. Wiley series in Probability and Statistics.