You are on page 1of 4

Loglinear Models 1

Loglinear models describe association and interaction patterns among a set of categorical
variables. Their most common use is the modeling of cell counts in contingency tables.
The models specify how the size of a cell count depends on the levels of the categorical
variables for that cell. The nature of this specification relates to the association and
interaction structure among the variables.

Loglinear Models for Two-way Tables


Consider an I by J table that cross-classifies n subjects on two categorical response
variables. The cell counts n ij follow a multinomial distribution. The probabilities π ij for this
multinomial distribution form the joint distribution of the two categorical response variables.
The variables are statistically independent when π ij = π i+ π +j for all i and j. The related
expression for the expected frequencies m ij = nπ ij is m ij = nπ i+ π +j for all i and j. We shall
construct loglinear models using m ij instead of π ij so they also apply for the Poisson
sampling model.

On a logarithmic scale, independence has the additive form


logm ij  = logn + logπ i+  + logπ +j .

Denoting the row variable by X and the column variable by Y we can write this expression
as
logm ij  = μ + λ Xi + λ Yj

where
I
∑ logπ h+ 
λ Xi = logπ i+  − h=1
I
J
∑ logπ +h 
λ Yj = logπ +j  − h=1
J
and
I J
∑ logπ h+  ∑ logπ +h 
μ = logn + h=1
+ h=1
I J

I J
The paramters λ Xi and λ Yj satisfy ∑ λ Xi = ∑ λ Yj = 0.
h=1 h=1
NOTE: The ANOVA 2-way design is
EY ijk  = μ + α i + β j where
αi = μi − μ
β j = μ j − μ.
Zero-sum constraints are a common way to make parameters in the model identifiable;

1
other parameter definitions are possible.

This model is called the loglinear model of independence for two-way contingency tables.
In the model, the log expected frequency for cell i, j is an additive function of a row effect
λ Xi and a column effect λ Yj . The parameter λ Xi represents the effect of classification in row i
for variable X. The larger the value of λ Xi ,the larger each expected frequency is in row i of
the table. When λ Xh = λ Xl , each expected frequency in row h equals the corresponding
expected frequency in row l. Similarly, the parameter λ Yj represents the effect of
classification in row j for variable Y.

The null hypothesis of independence between two categorical vcariables is simply the
hypothesis that this loglinear model holds. The fitted values that satisfy the model are
m ij = n i+ n +j /n ,
the estimated expected frequencies for the test of independence. Chi-square tests of
independence using X 2 and G 2 are also goodness of fit tests of this loglinear model.

To gain appreciation for the interpretation of parameters in the model of independence,


consider a Ix2 table. Using the joint probability distribution, for the i th row the log odds of
being in column 1 instead of column 2 is given by
log π m i1
π i2 = log m i2  = logm i1  − logm i2 
i1

= μ + λ Xi + λ Y1  − μ + λ Xi + λ Y2 
= λ Y1 − λ Y2
= 2λ Y1 since λ Y1 + λ Y2 = 0

Thus in each row, the odds of response in column 1 instead of column 2 is


Y
e 2λ 1
This implies that the probability of classification in a particular column is the same in all
rows.

m 11 m 22
Note: For a 2x2 table, logθ = log m 12 m 21 = 0 so that θ = 1 under the model of
independence.

Example:
The following 2x2 table classifies n = 1783 U.S.A. veterans according to service in Vietnam
and sleep problems:

2
Sleep Problems
Yes No Total
Service Yes n 11 = 173 n 12 = 599 n 1+ = 772
in Vietnam No n 21 = 160 n 22 = 851 n 2+ = 1011
Total n +1 = 333 n +2 = 1450 n ++ = 1783
Are the variables ”Service in Vietnam” and ”Sleep problems” independent?
**************
Suppose we decide to fit the loglinear model of independence.
logm ij  = μ + λ Xi + λ Yj
to this data to answer this question. We have shown that under the null hypothesis of
independence the mles for π i+ and π +j are

π i+ = nni+ = p i+
and
 n
π +j = n+j = p +j

Thus we determine estimates for μ, λ Xi , and λ Yj using


I
∑ logp h+ 
X
λ i = logp i+  − h=1
I
J
∑ logp +h 
Y
λ j = logp +j  − h=1
J

I J
∑ logp h+  ∑ logp +h 

μ = logn + h=1
+ h=1
I J

X X Y Y
which yields λ 1 = −0.1349, (λ 2 = 0.1349,λ 1 = −0.7356, λ 2 = 0.7356 and 
μ = 5.8415.

Using these estimates, we obtain estimates for logm ij  under the model of independence.
e.g.
logm 11  = 5.8415 − 0.1349 − 0.7356 = 4.9710,
logm 12  = 6.4422,
logm 21  = 5.2408, and
logm 22  = 6.7120
hence

3
m 11  = 144.18
m 12  = 627.82
m 21  = 188.82
m 22  = 822.18.
I J I J
n ij −m ij  2 n ij 
This results in Pearson’s X 2 =∑∑ = 12.49 and G 2 = 2 ∑∑ n ij log = 12.39.
m ij m ij 
i=1 j=1 i=1 j=1
The degrees of freedom associated with these statistics in the loglinear model of
independence are determined by the number of cells in the table minus the number of
independent parameters in the model. Thus, for an IxJ table,
df = IJ − 1 + I − 1 + J − 1
= I − 1J − 1.
Our X and G each have df = 1 with associated p −value of approximately 0. Thus there is
2 2

strong evidence to indicate that the loglinear model of independence is not appropriate for
these data.

SAS code:
data service;
input Vietnam $ Slpprob $ count;
cards;
Yes Yes 173
Yes No 599
No Yes 160
No No 851
;
proc catmod order=data;model Vietnam*Slpprob=_response_/ covb pred=freq;
loglin Vietnam Slpprob;
weight count;
run;

You might also like