Hi Log Linear

HILOGLINEAR
The Model Minimum Configuration

Consider an I × J × K table. Let nijk be the observed frequency and mijk the
1 6
expected frequency for cell i , j , k . A simple way to construct a linear model in
the natural logarithms of the expected cell frequencies is by analogy with analysis
of variance (ANOVA) models:
3 8
Lijk ≡ log mijk = u + uiA + u Bj + ukC + uijAB + uikAC + u BC
jk + uijk
ABC
(1)
where 1 ≤ i ≤ I , 1 ≤ j ≤ J , and 1 ≤ k ≤ K . In general, each of the seven subscripted

u-terms sums to zero over each lettered subscript.
It can be shown (Bishop et al. (1975), p. 65) that, under the commonly
encountered sampling plans, the log-likelihood is
Φ + Nu + ∑n i ++ ui
A
+ ∑n B
+ j+u j + ∑n C
++ k uk + ∑n AB
ij + uij +
i j k i, j
∑n AC
i + k uik + ∑n BC
+ jk u jk + ∑n ABC
ijk uijk
i ,k j ,k i , j ,k
where Φ is independent of any parameters and N is total number of observations.

Also, the n-terms adjacent to the unknown parameters are the sufficient statistics.
The formulation of the above log likelihood is based on the saturated model (1).
When we consider unsaturated models, terms drop out and those that remain give
the sufficient statistics. For instance, if we assume that there is no three-factor
ABC
effect, that is, uijk = 0 for all i, j, and k, or more briefly uijk = 0 , then
3 8
log mijk = u + uiA + u Bj + ukC + uijAB + uikAC + u BC
jk (2)
1
2 HILOGLINEAR
and ni ++ , n+ j + , n++ k , nij + , ni + k and n+ jk are the sufficient statistics for this
reduced model. These statistics can be considered as tables of sums
configurations and denoted by C with proper subscripts. For example, nij+ > C
is the configuration C12 and ;n @
++ k is the configuration C3 . Note that
;n @, >n C , and ;n++ k @ can be obtained from >n C, ;n @ and >n C .
i ++ + j+ ij + i+k + jk
We then call the last three configurations C12 , C13 and C23 minimal
configurations or minimal statistics.
Notation for Unlimited Number of Dimensions

To generalize results, we denote the complete set of subscripts by a single symbol
θ . Thus, nθ is the observed frequency in an elementary cell and wθ is the cell
weight. We add a subscript to θ to denote a reduced dimensionality so that nθ i is
the observed sum in a cell of Cθ i . We use the second subscript, i, solely to
distinguish between different configurations.
Iterative Proportional Fitting Procedure (IPFP)

We can obtain MLEs for the elementary cells under any hierarchical model by
iterative fitting of the minimal sufficient configurations. To illustrate the algorithm,
we consider model (2). The MLEs must fit the configurations C12 , C13 and C23 .
1 06
The basic IPFP chooses an initial table mijk and then sequentially adjusts the
preliminary estimates to fit C12 , C13 , and C23 . Fitting to C12 gives
161 1 6 nij +
0
m$ ijk = m$ ijk
m$
106
ij +
Subsequent fitting to C13 gives
16 2 1 6 ni + k
1
m$ ijk = m$ ijk
m$
116
i+k
HILOGLINEAR 3
and similarly, after fitting C23 we have
136 126 n+ jk
m$ ijk = m$ ijk
m$
12 6
+ jk
We repeat this three-step cycle until convergence to the desired accuracy is

attained. The extension of the above procedure to the general procedure for fitting s
configurations is straightforward. Let the minimal configurations be Cθ i for
i = 1, 2, K , s , with cell entries nθ i , respectively. The procedure is as follows:
Initial Cell Estimates
To start the iterations, set
1 06 K
%1 if nθ > 0
m = &0.5 if nθ = 0
θ
K'0 otherwise
Intermediate Computations
After obtaining the initial cell estimates, the algorithm proceeds to fit each of these
configurations in turn. After r cycles, the relations are
1 6
m$ θ
sr +i
= m$ θ
1
sr +i −1 6 nθ i
for 1 ≤ i ≤ s; r ≥ 0
m$
1sr +i −16
θi
Convergence Criteria
The computations stop either when a complete cycle, which consists of s steps,
does not cause any cell to change by more than a preset amount ε ; that is,
1 6
sr
m$ θ − m$ θ
1 6
sr − s
<ε for all θ
4 HILOGLINEAR
or the number of cycles is larger than a preset integer max. Both ε and max can be
specified. The default for ε is
ε = max 0.25;
%& nθ wθ ()
θ ' 1000 *
and the default for max is 20.
Goodness of Fit Tests

The Pearson chi-square statistic is
1nθ − m$θ 6 2
χ2 = ∑ m$ θ
θ
and the likelihood-ratio chi-square statistic is
L2 = 2 ∑ nθ ln1nθ m$θ 6
θ
where the first summation is done over the cells with nonzero estimated cell
frequencies while the second summation is done over cells with positive observed
and estimated cell frequencies. The degrees of freedom for the above two statistics
are computed as follows:
Adjusted Degrees of Freedom
Let Tc be the total number of the cells and P the number of parameters in the
model. Also, let zc be the number of cells such that m$ θ = 0 . The adjusted degrees
of freedom is
adjusted df = Tc − P − zc
HILOGLINEAR 5
Unadjusted Degrees of Freedom
unadjusted df = Tc − P
Parameter Estimates and Standard Errors

If a saturated model is fitted and neither nθ + δ nor wθ is equal to zero for all
cells, then the parameter estimates and their standard errors will be computed. Each
estimate of the parameters in the saturated model can be expressed as a linear
combination of the logarithms of the observed cell frequencies plus user-specified
δ , where the coefficients used in the linear combination add to zero. We discuss
the rule of obtaining the coefficients. Consider, in general case, a
J1 × J 2 × K × J M frequency table with defining variables X 1 , K , X M . Let
K
K
X s1 , , X sL
uj denote an L-term interaction involving X s1 , K , X sL at level
s1 , , j sL
j s1 , K , j sL respectively. Denote A as a vector that is constructed in the way that its

nonzero components correspond to the variables in the parameter to be estimated
K
and are set to the level of the variable. Let C j1 , , j M be a M-dim vector with
components equal to cell IDs. That is,
C j1 , K, j M
1
= j1 , K , j M ; 6 1 ≤ j1 ≤ J1 , 1≤ i ≤ M
K
The coefficient β j1 , , j M is determined through the comparison of the components
K
of A and C j1 , , j M . Let s be the number of nonzero components of A that do not
K
match (equal) the corresponding components of C j1 , , j M . Also, let matching occur
1
at component i1 , K , i k . Then the coefficient for cell j1 , K , j M is 6
β j1 , K, j M = −1 1 6 4 J − 19 × K × 4 J
s
i1 ik −1 9
6 HILOGLINEAR
K K
K K
X , ,X X , ,X
The estimate u$ j s,1 , j sL of u j s,1 , j sL is then
s1 sL s1 sL
X , K, X
u$ j s,1K, j sL = ∑β j1 , K, j 4
ln n j1 , K, jM +δ 9 (3)
K, j
s1 sL M
j1 , M
The large-sample variance of the estimate (3) is
∑ ln4n K 9"$#
−1
j1 , K, j M
β 2j , K, j
1 M ! j1, , jM +δ (4)
For a large sample, estimate (3) approximately follows a normal distribution with
K, X
K
X ,
mean u j s,1 , j sL and variance (4) if the sampling model follows a Poisson,
s1 sL
multinomial, or product-multinomial distribution. The confidence interval for the

parameter can be computed based on the asymptotic normality.
Raw Residuals and Standardized Residuals

Two different residuals are computed:
Raw Residual
raw residual = nθ − m$ θ
Standardized Residual
standardized residual = nθ − m$ θ 1 6 m$ θ
where m$ θ must be greater than 0.

HILOGLINEAR 7
Partial Associations and Partial Chi-squares

Partial associations of effects can be requested when a saturated model is specified.
16
Let χ 2 k be the chi-square for the model that contains the effects up to and
including the k-interaction terms. The test of the significance of the kth-order
interaction can be based on
1 6
χ2 k −1 − χ2 k 16
Degrees of freedom are obtained by subtracting the degrees of freedom for the
corresponding models.
Model Selection Via Backward Elimination

The selection process starts with the model specified (either via DESIGN or
MAXORDER subcommand). The partial chi-square is calculated for every term in
the generating class. Any term with zero partial chi-square is deleted, then the
effect with the largest observed significance level for the change in chi-square is
deleted, provided the significance level is larger than 0.05, the default. With the
removal of a highest-order term, a new model with new generating class is
generated. The above process of removing a term is repeated for the new model
and is continued until no remaining terms in the model can be deleted.
References
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. 1975. Discrete multivariate
analysis: Theory and practice. Cambridge, Mass.: MIT Press.
Fienberg, S. E. 1977. The analysis of cross-classified categorical data. Cambridge,

Mass.: The MIT Press.

Hi Log Linear

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hi Log Linear

Uploaded by

Copyright:

Available Formats

HILOGLINEAR

The Model Minimum Configuration

where 1 ≤ i ≤ I , 1 ≤ j ≤ J , and 1 ≤ k ≤ K . In general, each of the seven subscripted

where Φ is independent of any parameters and N is total number of observations.

Notation for Unlimited Number of Dimensions

Iterative Proportional Fitting Procedure (IPFP)

Subsequent fitting to C13 gives

and similarly, after fitting C23 we have

We repeat this three-step cycle until convergence to the desired accuracy is

Initial Cell Estimates

To start the iterations, set

Goodness of Fit Tests

and the likelihood-ratio chi-square statistic is

Adjusted Degrees of Freedom

Unadjusted Degrees of Freedom

Parameter Estimates and Standard Errors

j s1 , K , j sL respectively. Denote A as a vector that is constructed in the way that its

The large-sample variance of the estimate (3) is

multinomial, or product-multinomial distribution. The confidence interval for the

Raw Residuals and Standardized Residuals

where m$ θ must be greater than 0.

Partial Associations and Partial Chi-squares

Model Selection Via Backward Elimination

Fienberg, S. E. 1977. The analysis of cross-classified categorical data. Cambridge,

You might also like