You are on page 1of 8

Loglinear Models for Two-way Tables(cont’d)

We continue with the example consisting of an investigation into the association between
service in Vietnam and sleep problems. A cross-sectional study was conducted, so that the
cell counts n ij follow a multinomial distribution.

We fitted the loglinear model of independence


logm ij  = μ + λ Xi + λ Yj
to the data in this study, obtaining estimates for the parameters in the model, and ultimately
for the expected frequencies m ij . The latter were used to compute X 2 and G 2 in order to
assess the fit of the model and the hypothesis that service in Vietnam and sleep problems
are independent.

It is also possible to derive an asymptotic covariance matrix for the estimators of the
parameters in the model assuming that the model holds. To do so, we extend the delta
method to vector functions of random vectors.

Recall that, for the multinomial distribution, we have shown that


d
n p − π → N0, Diagπ − ππ ′ 
where p = p 1 , p 2 , ..., p N  ′ , π = π 1 , π 2 , ..., π N  ′ and Diagπ is a diagonal matrix with the
values of π along the diagonal.

Let
gp = g 1 p, g 2 p, ..., g q p ′
so that
gπ = g 1 π, g 2 π, ..., g q π ′ .

∂gπ
In addition let ∂π
denote the q by N matrix for which the entry in row i and column j is
∂g i π
∂π j .

According to the delta method,



d ∂gπ ∂gπ
n gp − gπ → N0, Diagπ − ππ ′ 
∂π ∂π

and thus the asymptotic covariance matrix of gp is


∂gπ ∂gπ ′
∂π
Diagπ − ππ ′ ∂π
n .

1
For our example, p = p 11 , p 12 , p 21 , p 22  ′ , π = π 11 , π 12 , π 21 , π 22  ′ , gp = λ X1 , λ Y1  ′ . where

logπ 1+  + logπ 2+  logπ 11 + π 12  − logπ 21 + π 22 


λ X1 = logπ 1+  −  =
2 2
logπ +1  + logπ +2  logπ + π  − logπ 12 + π 22 
λ Y1 = logπ +1  −  = 11 21
2 2

so that
∂λ X1 ∂λ X1 ∂λ X1 ∂λ X1
∂gπ ∂π 11 ∂π 12 ∂π 21 ∂π 22
=
∂π ∂λ Y1 ∂λ Y1 ∂λ Y1 ∂λ Y1
∂π 11 ∂π 12 ∂π 21 ∂π 22

To obtain an asymptotic covariance matrix for gp= λ X1 , λ Y1  ′ under the model, we first
compute


∂gπ ∂gπ
n −1 Diagπ − ππ ′
∂π ∂π
and then simply the result using
π ij = π i+ π +j

since this is the condition under which the model holds.

This yields
1
4nπ 1+ π 2+
0
1
0 4nπ +1+ π +2

as the asymptotic covariance matrix for gp= λ X1 , λ Y1  ′ .

We estimate this matrix using


1
4np 1+ p 2+
0
1
.
0 4np +1+ p +2

For our example we obtain


0.000571 0
.
0 0.000923

2
As an alternative to the loglinear model of independence for the example above, consider a
model that only acknowledges a main effect for service in Vietnam:

logm ij  = μ + λ Xi
where
I
∑ logπ h+ 
λ Xi = logπ i+  − h=1
I
and
I
∑ logπ h+ 
μ = log n + h=1
.
J I
I
The parameters λ Xi satisfy ∑ λ Xi = 0.
i=1

Under this model, since the maximum likelihood estimate of π i+ is 


π i+ = n i+
n = p i+ , we can
determine estimates for μ and λ Xi using
I
∑ logp h+ 
X
λ i = logp i+  − h=1
I
and
I
∑ logp h+ 

μ = log n + h=1 .
J I
These estimates can be used to obtain estimates for logm ij  under the model.
The degrees of freedom associated with the Pearson X 2 and G 2 statistics are determined
by the number of cells in the table less the number of independent parameters in the
model. Thus for an I by J table,
df = IJ − 1 + I − 1 = IJ − I = IJ − 1.

For our example we obtain

3
X X
λ 1 = −0.1349 (so λ 2 = 0.1349)
and

μ = 6.0907.
These estimates are used to obtain estimates for log m ij under the model.

For example,
log m 11 = 6.0907 − 0.1349 = 5.9558
log m 12 = 5.9558
log m 21 = 6.2256
log m 22 = 6.2256

and hence
m 11 = 386.0
m 12 = 386.0
m 21 = 505.5
m 22 = 505.5

I J I J
n ij −m ij  2 n ij 
Thus Pearson’s X 2 =∑∑ = 707.36 and G 2 = 2 ∑∑ n ij log = 767.14, both with
m ij m ij 
i=1 j=1 i=1 j=1
df = 2 and associated p −values of approximately 0. As a result, we can say there is strong
evidence to indicate that the above model is not appropriate for these data.

Using the delta method in the above form, it is possible to obtain the asymptotic variance of
X
λ 1 . For our example.
p = p 11 , p 12 , p 21 , p 22  ′ ,
π = π 11 , π 12 , π 21 , π 22  ′ ,
X
gp = λ 1 ,
gπ = λ X1 
where

logπ 1+  + logπ 2+  logπ 11 + π 12  − logπ 21 + π 22 


λ X1 = logπ 1+  −  =
2 2

so that

4
∂gπ ∂λ X1 ∂λ X1 ∂λ X1 ∂λ X1
= .
∂π ∂π 11 ∂π 12 ∂π 21 ∂π 22

X
To obtain the asymptotic variance for gp = λ 1  under the model, we first compute


∂gπ ∂gπ
n −1 Diagπ − ππ ′
∂π ∂π

π i+
and then simply the result using π ij = 2
, since this is the condition under which the model
holds.

X
This yields 1
4nπ 1+ π 2+
as the asymptotic variance of gp = λ 1 .

For our example, this can be estimated by 1


4np 1+ p 2+
= 0.000571.

SAS code for example:


data service;
input Vietnam $ Slpprob $ count;
cards;
Yes Yes 173
Yes No 599
No Yes 160
No No 851
;
proc catmod order=data;
model Vietnam*Slpprob=_response_/ covb pred=freq;
loglin Vietnam;
weight count;
run;

Now consider a more complex loglinear model


logm ij  = μ + λ Xi + λ Yj + λ XY
ij

5
J I I J
η ij η ij η ij
If η ij = logm ij , η i⋅ =∑ J
, η ⋅j =∑ I
and μ = η ⋅⋅ =∑∑ IJ
then we can write
j=1 i=1 i=1 j=1
λ Xi = η i⋅ − η ⋅⋅
λ Yj = η ⋅j − η ⋅⋅
λ XY
ij = η ij − η i⋅ − η ⋅j + η ⋅⋅

This model is called the saturated model; it is the most general model for two-way
contingency tables.

I J
The parameters .λ Xi and λ Yj are deviations about a mean and satisfy ∑ λ Xi =∑ λ Yj = 0
i=1 j=1
Thus there are I − 1 linearly independent row parameters and J − 1 linearly independent
column parameters. The λ XY ij are association parameters that reflect deviations from
independence of X and Y. They represent interactions between X and Y whereby the effect
of one variable on the expected cell count depends on the level of the other variable.

Under this model, since the maximum likelihood estimate of m ij is m ij = n ij , we have


η ij = log m ij ,
J 
 η
η i⋅ = ∑ ij ,
J
j=1
 I
 η
η ⋅j = ∑ ij and
I
i=1
I J 
  η
μ = η ⋅⋅ =∑∑ ij
IJ
i=1 j=1

so
X 
λ i = η i⋅ − 
η ⋅⋅
Y 
λ j = η ⋅j − 
η ⋅⋅
λ XY    
ij = η ij − η i⋅ − η ⋅j + η ⋅⋅

Now for our example, we obtain



μ = 5.8425
X X
λ 1 = −0.0683 (and λ 2 = 0.0683)
Y Y
λ 1 = −0.7283(and λ 2 = 0.7283)
λ XY
11 = 0.1073(and λ ij = −0.1073
XY

λ XY
ij = −0.1073(and λ ij = 0.1073
XY

6
These estimates are used to obtain estimates for logm ij  under the model.

For example,
log m 11 = 5.8425 − 0.0683 − 0.7283 + 0.1073 = 5.1532
log m 12 = 6.3952
log m 21 = 5.0752
log m 22 = 6.7464
so that
m 11 = 173
m 12 = 599
m 21 = 160
m 22 = 851

This model describes perfectly any set of expected frequencies (hence both X 2 = 0 and
G 2 = 0 under this model). The degrees of freedom, reflecting the number of cells in the
table less the number of independent paramters in the model are
df = IJ − 1 + I − 1 + J − 1 + I − 1J − 1 = 0.

It is also possible to derive an asymptotic covariance matrix for the estimators of the
parameters in the model assumingthat the model holds. As above with the independence
model, we use the delta method based on vector functions of random vectors.
p = p 11 , p 12 , p 21 , p 22  ′ ,
π = π 11 , π 12 , π 21 , π 22  ′ ,
 X  Y  XY
gp = λ 1 , λ 1 , λ 11  ′ ,

gπ = λ X1 , λ Y1 , λ XY
11  .

Since m ij = nπ ij it is possible to write λ X1 , λ Y1 , and λ XY


11 in terms of π ij allowing us to
evaluate
∂λ X1 ∂λ X1 ∂λ X1 ∂λ X1
∂π 11 ∂π 12 ∂π 21 ∂π 22
∂gπ ∂λ Y1 ∂λ Y1 ∂λ Y1 ∂λ Y1
=
∂π ∂π 11 ∂π 12 ∂π 21 ∂π 22
∂λ XY
11 ∂λ XY
11 ∂λ XY
11 ∂λ XY
11
∂π 11 ∂π 12 ∂π 21 ∂π 22

 X  Y  XY
To obtain an asymptotic covariance matrix for gp = λ 1 , λ 1 , λ 11  ′ under the model, we
compute

7

∂gπ ∂gπ
n −1 Diagπ − ππ ′
∂π ∂π

Unlike the two previous models above, there are no special conditions under which to
simplify the result. However, the resulting matrix can be estimated by keepin in mind that
the maximum likelihood estimate of π ij under the model is
 n
π ij = p ij = nij .

Direct relationships exist between the odds ratio and association parameters in loglinear
models. The relationship is simplest for a 2x2 table. For a cross-sectional study based on
a 2x2 table, using m ij = nπ ij

logθ = log π 11 π 22 m 11 m 22
π 12 π 21 = log m 12 m 21
= log m 11 + log m 22 − log m 12 − log m 21
Thus
logθ = μ + λ X1 + λ Y1 + λ XY
11  + μ + λ 2 + λ 2 + λ 22  − μ + λ 1 + λ 2 + λ 12  − μ + λ 2 + λ 1 + λ 21 
X Y XY X Y XY X Y XY

This simplifies to
logθ = λ XY
11 + λ 22 − λ 12 − λ 21
XY XY XY

= 4λ XY
11

since λ XY
11 = λ 22 = −λ 12 = −λ 21 .
XY XY XY

SAS code:
data service;
input Vietnam $ Slpprob $ count;
cards;
Yes Yes 173
Yes No 599
No Yes 160
No No 851
;
proc catmod order=data;
model Vietnam*Slpprob=_response_/ covb pred=freq;
loglin Vietnam Slpprob Vietnam*Slpprob;
weight count;
run;

You might also like