You are on page 1of 6

20 Analysis of Variance

Suppose we have a field trial in which various types of treatments have been
tried on different subjects and the effects recorded as an observation x for each
individual. There are ni individuals with treatment i and the observations
from them are xi,1 , . . . , xi,ni . We have k such sets of observatons of sizes
n1 , . . . , nk respectively, for a total of N = n1 + · · · + nk observations. The
model assumes that each xi,ni is normally distributed with mean µi due to
the effect of the i-th treatment and they all have a common variance σ 2 .
The null hypothesis is that there is no difference between the treatments or
equivalently µ1 = µ2 = · · · = µk . The loglikelihood under the null hypothesis
is
1 XX
k ni
N
log L0 = − log 2π − N log σ − 2 (xi,j − µ)2
2 2σ i=1 j=1

Maximization with respect to σ and µ yields

1 XX
k ni
µ̂ = xi,j
N i=1 j=1

1 XX
k ni
σ̂02 = (xi,j − µ̂)2
N i=1 j=1
N N
log L̂0 = − log 2π − N log σ̂0 −
2 2
Similarly under H1 ,
ni
1 XX
k
N
log L1 = − log 2π − N log σ − 2 (xi,j − µi )2
2 2σ i=1 j=1

1 X
ni
µ̂i = xi,j
ni j=1

1 XX
k ni
σ̂12 = (xi,j − µ̂i )2
N i=1 j=1
N N
log L̂1 = − log 2π − N log σ̂1 −
2 2

43
The loglikelihood ratio criterion takes the form

L0 σ̂ 2
−2 log = N log 02
L1 σ̂1

so that a test can be based on


σ̂02 − σ̂12
U=
σ̂12

A computation yields

X
k X
ni
N σ̂02 = (xi,j − µ̂)2
i=1 j=1

X
k X
ni X
ni
= x2i,j − 2µ̂ xi,j + N µ̂2
i=1 j=1 j=1

X
k X
ni
= x2i,j − N µ̂2
i=1 j=1

On the other hand


X
k X
ni
N σ̂02 = (xi,j − µ̂)2
i=1 j=1

X
k X
ni X
k X
ni X
k
2
= (xi,j − µ̂i ) + 2 (µ̂i − µ̂) (xi,j − µ̂) + ni (µ̂i − µ̂)2
i=1 j=1 i=1 j=1 i=1

X
k
= N σ̂12 + ni (µ̂i − µ̂)2
i=1
Xk
= N σ̂12 + ni µ̂2i − N µ̂2
i=1

44
The following quantities are to be computed.
X
ni
Ti = xi,j
j=1

X
k
T = Ti
i=1
T2
c=
N
Xk X
ni
A= x2i,j
i=1 j=1

X
k
T2 i
B=
i=1
ni

Then
A = c + (B − c) + (A − B)
N σ̂02 = A − c
and
N σ̂12 = (A − B)
so that
B−c
U=
A−B
It can be seen that under the null hypothesis (B − c) and (A − B) are
independent σ 2 χ2 with (k − 1) and (N − k) degrees of freedom and
B−c
N −k k−1
F =U = A−B
k−1 N −k

is an Fk−1,N −k . A large values of F leads to the rejection of H0 that


µ1 = · · · = µk .
Some times the population on which the treatments are tried is not uni-
form. For example each treatment i could be tried on the j-th group once
leading to an observation xi,j . The number of treatments is k and the number
of groups is n, each group consisting of k observations. The total number of
observations is N = kn. The model is that the observation xi,j is normally

45
distributed with mean µi +aj and variance σ 2 . Actually there is a redundancy
of P here, and it is better to write the mean as µ + µi + aj with
P parameters
µ
i i = j aj = 0 with a total of n + k − 1 parameters. We are interested
in testing the null hypothesis µ1 = · · · = µk = 0 and we really do not care
about a1 , . . . , an . With similar calculations we obtain
1 X
µ̂i = xi,j
N i,j
1X
µ̂i = xi,j − µ̂
n j
1X
âj = xi,j − µ̂
k i
X X X
x2i,j = Nσ12 + (k µ̂2i − N µ̂2 ) + (n â2j − N µ̂2 ) + N µ̂2
i,j i j

If we define as before

X
n
Ti = xi,j
j=1

X
k
Sj = xi,j
i=1
X
k X
n
T = Ti = Sj
i=1 j=1
2
T
c=
N
X ni
k X
A= x2i,j
i=1 j=1

1X 2
k
B= T
n i=1 i
1X 2
n
C= S
k j=1 j

46
We see that
A = c + (B − c) + (C − c) + E = Q1 + Q2 + Q3 + Q4
where E = Nσ12 and Nσ02 − Nσ12 = B − c. The ratio
B−c
k−1
F = E
(n−1)(k−1)

is an Fk−1,(n−1)(k−1) . The proof that the various components of the sum of


squares are independent χ2 s depends on two observations. First each term
is of the form
Qr = kx̃k2 − inf kx̃ − yk2
y∈Xr

where x̃ is the N = nk dimensional vector {xi,j } and {Xr : r = 1, 2, 3, 4}


are orthogonal subspaces of RN . If we show that X1 , X2 , X3 are mutually
orthogonal then clearly X4 is the orthogonal complement of X1 ⊕ X2 ⊕ X3 .
It is easily verified that
X1 = {x̃ : xi,j ≡ a for all i, j}
X
k
X2 = {x̃ : xi,j ≡ ai for all i, j with ai = 0}
i=1
X
n
X3 = {x̃ : xi,j ≡ aj for all i, j with aj = 0}
j=1

and that they are mutually orthogonal.

21 General Linear Models


General linear model is of the following form. There are unknown parameters
σ 2 , θ1 , . . . , θk and observations x1 , x2 , . . . , xn . The xi are assumed to be
independent and normally distributed with mean
X
k
E[xi ] = ai,j θj
j=1

and variance σ 2 . The factors {ai,j } are assumed to be known constants. The
matrix A is assumed to be of rank k (otherwise we can reduce the number

47
of real parameters). The null hypothesis is that θ ∈ Θ0 , a linear subspace
(hyperplane) of Rk of dmension r < k, specified by k − r linear relations.
The analysis depends on the two quantities

Q1 = inf kx − Aθk2
θ∈Rk
Q0 = inf kx − Aθk2
θ∈Θ0

The ratio
Q0 −Q1
k−r
F = Q1
n−k

is an Fk−r,n−k . The actual minimization involves inverting the matrix A∗ A


for the computation of Q1 and a similar one for the computation of Q0 . In
the examples we discussed this is particularly easy.
Rather than discuss the general theory we will do some examples.

48

You might also like