Professional Documents
Culture Documents
Hi Log Linear
Hi Log Linear
3 8
Lijk ≡ log mijk = u + uiA + u Bj + ukC + uijAB + uikAC + u BC
jk + uijk
ABC
(1)
Φ + Nu + ∑n i ++ ui
A
+ ∑n B
+ j+u j + ∑n C
++ k uk + ∑n AB
ij + uij +
i j k i, j
∑n AC
i + k uik + ∑n BC
+ jk u jk + ∑n ABC
ijk uijk
i ,k j ,k i , j ,k
3 8
log mijk = u + uiA + u Bj + ukC + uijAB + uikAC + u BC
jk (2)
1
2 HILOGLINEAR
and ni ++ , n+ j + , n++ k , nij + , ni + k and n+ jk are the sufficient statistics for this
reduced model. These statistics can be considered as tables of sums
configurations and denoted by C with proper subscripts. For example, nij+ > C
is the configuration C12 and ;n @
++ k is the configuration C3 . Note that
;n @, >n C , and ;n++ k @ can be obtained from >n C, ;n @ and >n C .
i ++ + j+ ij + i+k + jk
We then call the last three configurations C12 , C13 and C23 minimal
configurations or minimal statistics.
preliminary estimates to fit C12 , C13 , and C23 . Fitting to C12 gives
161 1 6 nij +
0
m$ ijk = m$ ijk
m$
106
ij +
16 2 1 6 ni + k
1
m$ ijk = m$ ijk
m$
116
i+k
HILOGLINEAR 3
136 126 n+ jk
m$ ijk = m$ ijk
m$
12 6
+ jk
1 06 K
%1 if nθ > 0
m = &0.5 if nθ = 0
θ
K'0 otherwise
Intermediate Computations
After obtaining the initial cell estimates, the algorithm proceeds to fit each of these
configurations in turn. After r cycles, the relations are
1 6
m$ θ
sr +i
= m$ θ
1
sr +i −1 6 nθ i
for 1 ≤ i ≤ s; r ≥ 0
m$
1sr +i −16
θi
Convergence Criteria
The computations stop either when a complete cycle, which consists of s steps,
does not cause any cell to change by more than a preset amount ε ; that is,
1 6
sr
m$ θ − m$ θ
1 6
sr − s
<ε for all θ
4 HILOGLINEAR
or the number of cycles is larger than a preset integer max. Both ε and max can be
specified. The default for ε is
ε = max 0.25;
%& nθ wθ ()
θ ' 1000 *
and the default for max is 20.
1nθ − m$θ 6 2
χ2 = ∑ m$ θ
θ
L2 = 2 ∑ nθ ln1nθ m$θ 6
θ
where the first summation is done over the cells with nonzero estimated cell
frequencies while the second summation is done over cells with positive observed
and estimated cell frequencies. The degrees of freedom for the above two statistics
are computed as follows:
Let Tc be the total number of the cells and P the number of parameters in the
model. Also, let zc be the number of cells such that m$ θ = 0 . The adjusted degrees
of freedom is
adjusted df = Tc − P − zc
HILOGLINEAR 5
unadjusted df = Tc − P
C j1 , K, j M
1
= j1 , K , j M ; 6 1 ≤ j1 ≤ J1 , 1≤ i ≤ M
K
The coefficient β j1 , , j M is determined through the comparison of the components
K
of A and C j1 , , j M . Let s be the number of nonzero components of A that do not
K
match (equal) the corresponding components of C j1 , , j M . Also, let matching occur
1
at component i1 , K , i k . Then the coefficient for cell j1 , K , j M is 6
β j1 , K, j M = −1 1 6 4 J − 19 × K × 4 J
s
i1 ik −1 9
6 HILOGLINEAR
K K
K K
X , ,X X , ,X
The estimate u$ j s,1 , j sL of u j s,1 , j sL is then
s1 sL s1 sL
X , K, X
u$ j s,1K, j sL = ∑β j1 , K, j 4
ln n j1 , K, jM +δ 9 (3)
K, j
s1 sL M
j1 , M
∑ ln4n K 9"$#
−1
j1 , K, j M
β 2j , K, j
1 M ! j1, , jM +δ (4)
For a large sample, estimate (3) approximately follows a normal distribution with
K, X
K
X ,
mean u j s,1 , j sL and variance (4) if the sampling model follows a Poisson,
s1 sL
Raw Residual
raw residual = nθ − m$ θ
Standardized Residual
standardized residual = nθ − m$ θ 1 6 m$ θ
1 6
χ2 k −1 − χ2 k 16
Degrees of freedom are obtained by subtracting the degrees of freedom for the
corresponding models.
References
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. 1975. Discrete multivariate
analysis: Theory and practice. Cambridge, Mass.: MIT Press.