Professional Documents
Culture Documents
We continue with the example consisting of an investigation into the association between
service in Vietnam and sleep problems. A cross-sectional study was conducted, so that the
cell counts n ij follow a multinomial distribution.
It is also possible to derive an asymptotic covariance matrix for the estimators of the
parameters in the model assuming that the model holds. To do so, we extend the delta
method to vector functions of random vectors.
Let
gp = g 1 p, g 2 p, ..., g q p ′
so that
gπ = g 1 π, g 2 π, ..., g q π ′ .
∂gπ
In addition let ∂π
denote the q by N matrix for which the entry in row i and column j is
∂g i π
∂π j .
1
For our example, p = p 11 , p 12 , p 21 , p 22 ′ , π = π 11 , π 12 , π 21 , π 22 ′ , gp = λ X1 , λ Y1 ′ . where
so that
∂λ X1 ∂λ X1 ∂λ X1 ∂λ X1
∂gπ ∂π 11 ∂π 12 ∂π 21 ∂π 22
=
∂π ∂λ Y1 ∂λ Y1 ∂λ Y1 ∂λ Y1
∂π 11 ∂π 12 ∂π 21 ∂π 22
To obtain an asymptotic covariance matrix for gp= λ X1 , λ Y1 ′ under the model, we first
compute
′
∂gπ ∂gπ
n −1 Diagπ − ππ ′
∂π ∂π
and then simply the result using
π ij = π i+ π +j
This yields
1
4nπ 1+ π 2+
0
1
0 4nπ +1+ π +2
2
As an alternative to the loglinear model of independence for the example above, consider a
model that only acknowledges a main effect for service in Vietnam:
logm ij = μ + λ Xi
where
I
∑ logπ h+
λ Xi = logπ i+ − h=1
I
and
I
∑ logπ h+
μ = log n + h=1
.
J I
I
The parameters λ Xi satisfy ∑ λ Xi = 0.
i=1
3
X X
λ 1 = −0.1349 (so λ 2 = 0.1349)
and
μ = 6.0907.
These estimates are used to obtain estimates for log m ij under the model.
For example,
log m 11 = 6.0907 − 0.1349 = 5.9558
log m 12 = 5.9558
log m 21 = 6.2256
log m 22 = 6.2256
and hence
m 11 = 386.0
m 12 = 386.0
m 21 = 505.5
m 22 = 505.5
I J I J
n ij −m ij 2 n ij
Thus Pearson’s X 2 =∑∑ = 707.36 and G 2 = 2 ∑∑ n ij log = 767.14, both with
m ij m ij
i=1 j=1 i=1 j=1
df = 2 and associated p −values of approximately 0. As a result, we can say there is strong
evidence to indicate that the above model is not appropriate for these data.
Using the delta method in the above form, it is possible to obtain the asymptotic variance of
X
λ 1 . For our example.
p = p 11 , p 12 , p 21 , p 22 ′ ,
π = π 11 , π 12 , π 21 , π 22 ′ ,
X
gp = λ 1 ,
gπ = λ X1
where
so that
4
∂gπ ∂λ X1 ∂λ X1 ∂λ X1 ∂λ X1
= .
∂π ∂π 11 ∂π 12 ∂π 21 ∂π 22
X
To obtain the asymptotic variance for gp = λ 1 under the model, we first compute
′
∂gπ ∂gπ
n −1 Diagπ − ππ ′
∂π ∂π
π i+
and then simply the result using π ij = 2
, since this is the condition under which the model
holds.
X
This yields 1
4nπ 1+ π 2+
as the asymptotic variance of gp = λ 1 .
5
J I I J
η ij η ij η ij
If η ij = logm ij , η i⋅ =∑ J
, η ⋅j =∑ I
and μ = η ⋅⋅ =∑∑ IJ
then we can write
j=1 i=1 i=1 j=1
λ Xi = η i⋅ − η ⋅⋅
λ Yj = η ⋅j − η ⋅⋅
λ XY
ij = η ij − η i⋅ − η ⋅j + η ⋅⋅
This model is called the saturated model; it is the most general model for two-way
contingency tables.
I J
The parameters .λ Xi and λ Yj are deviations about a mean and satisfy ∑ λ Xi =∑ λ Yj = 0
i=1 j=1
Thus there are I − 1 linearly independent row parameters and J − 1 linearly independent
column parameters. The λ XY ij are association parameters that reflect deviations from
independence of X and Y. They represent interactions between X and Y whereby the effect
of one variable on the expected cell count depends on the level of the other variable.
so
X
λ i = η i⋅ −
η ⋅⋅
Y
λ j = η ⋅j −
η ⋅⋅
λ XY
ij = η ij − η i⋅ − η ⋅j + η ⋅⋅
λ XY
ij = −0.1073(and λ ij = 0.1073
XY
6
These estimates are used to obtain estimates for logm ij under the model.
For example,
log m 11 = 5.8425 − 0.0683 − 0.7283 + 0.1073 = 5.1532
log m 12 = 6.3952
log m 21 = 5.0752
log m 22 = 6.7464
so that
m 11 = 173
m 12 = 599
m 21 = 160
m 22 = 851
This model describes perfectly any set of expected frequencies (hence both X 2 = 0 and
G 2 = 0 under this model). The degrees of freedom, reflecting the number of cells in the
table less the number of independent paramters in the model are
df = IJ − 1 + I − 1 + J − 1 + I − 1J − 1 = 0.
It is also possible to derive an asymptotic covariance matrix for the estimators of the
parameters in the model assumingthat the model holds. As above with the independence
model, we use the delta method based on vector functions of random vectors.
p = p 11 , p 12 , p 21 , p 22 ′ ,
π = π 11 , π 12 , π 21 , π 22 ′ ,
X Y XY
gp = λ 1 , λ 1 , λ 11 ′ ,
′
gπ = λ X1 , λ Y1 , λ XY
11 .
X Y XY
To obtain an asymptotic covariance matrix for gp = λ 1 , λ 1 , λ 11 ′ under the model, we
compute
7
′
∂gπ ∂gπ
n −1 Diagπ − ππ ′
∂π ∂π
Unlike the two previous models above, there are no special conditions under which to
simplify the result. However, the resulting matrix can be estimated by keepin in mind that
the maximum likelihood estimate of π ij under the model is
n
π ij = p ij = nij .
Direct relationships exist between the odds ratio and association parameters in loglinear
models. The relationship is simplest for a 2x2 table. For a cross-sectional study based on
a 2x2 table, using m ij = nπ ij
logθ = log π 11 π 22 m 11 m 22
π 12 π 21 = log m 12 m 21
= log m 11 + log m 22 − log m 12 − log m 21
Thus
logθ = μ + λ X1 + λ Y1 + λ XY
11 + μ + λ 2 + λ 2 + λ 22 − μ + λ 1 + λ 2 + λ 12 − μ + λ 2 + λ 1 + λ 21
X Y XY X Y XY X Y XY
This simplifies to
logθ = λ XY
11 + λ 22 − λ 12 − λ 21
XY XY XY
= 4λ XY
11
since λ XY
11 = λ 22 = −λ 12 = −λ 21 .
XY XY XY
SAS code:
data service;
input Vietnam $ Slpprob $ count;
cards;
Yes Yes 173
Yes No 599
No Yes 160
No No 851
;
proc catmod order=data;
model Vietnam*Slpprob=_response_/ covb pred=freq;
loglin Vietnam Slpprob Vietnam*Slpprob;
weight count;
run;