Genlog Multinomial

GENLOG
Multinomial Loglinear and Logit Models
This chapter describes the algorithms used to calculate maximum-likelihood

estimates for the multinomial loglinear model and the multinomial logit model.
This algorithm is applicable only to aggregated data.
Notation
The following notation is used throughout this chapter unless otherwise stated:
A Generic categorical independent (explanatory) variable. Its categories are
indexed by an array of integers.
B Generic categorical dependent (response) variable. Its categories are
indexed by an array of integers.
r Number of categories of B.
c Number of categories of A.
p Number of nonredundant (nonaliased) parameters.
i Generic index for the category of B.
j Generic index for the categories of A.
k Generic index for the parameter.
nij Observed count in the ith response of B and the jth setting of A.
Nj Marginal total count at the jth setting of A. It is equal to
∑
r
nij .
i =1
N Total observed count. It is equal to
∑ ∑
c r
nij .
j =1 i =1
mij Expected count.
1
2 GENLOG Multinomial Loglinear and Logit Models
π ij Probability of having an observation in the ith response of B and the jth
∑ ∑
c r
setting of A. 0 ≤ π ij ≤ 1 and π ij = 1 .
j =1 i =1
zij Cell structure value.
αj jth normalizing constant.
βk kth nonredundant parameter.

β
3 8
A vector of β 1 ,K, β p ′ .
xijk An element in the ith row and the kth column of the design matrix for the j
setting.
The same notation is used for both loglinear and logit models so that the methods
are presented in a unified way. Conceptually, one can consider a loglinear model as
a special case of a logit model where the explanatory variable has only one level
(that is, c = 1).
Components of the Model

There are two components in a loglinear model: the random component and the
systematic component.
Random Component
The random component describes the joint distribution of the counts.
• The counts =n ,K, n B at

1j rj the jth setting of A have the multinomial
3N ,π
j 1j ,K, π 8 distribution.
rj
• The counts nij and ni ′j ′ are independent if j ≠ j ′ .

GENLOG Multinomial Loglinear and Logit Models 3
• The joint probability distribution of nij is the product of these c independent= B

multinomial distributions. The probability density function is
N!
∏
c r
∏π j nij

(1)
∏
r ij
j =1n ! ij i =1
i =1
• The expected count is E nij = mij = N j π ij . 3 8

• The covariance is
%
4
cov nij , ni ′j ′ = 9 K&'K0N π 4δ
j ij ii ′ − π i ′j 9 if j = j ′
if j ≠ j ′
where δ ab = 1 if a = b and δ ab = 0 if a ≠ b .
Systematic Component
The systematic component describes the linkage function between the expected
counts and the parameters. The expected counts are themselves functions of other
parameters. Explicitly, for i = 1,K, r and j = 1,K, c ,
%Kz eα j + vij
mij = &K0 ij if zij > 0
' if zij ≤ 0
where
p
vij = ∑x ijk β k
k =1
SPSS does not consider α 1 to α c as parameters, but as normalizing constants.

Normalizing Constants
N
= log
j
αj j = 1,K, c
∑ z e
r
(2)
vij
ij
i =1
Cell Structure Values

The cell structure values play two roles in SPSS loglinear procedures, depending on
3 8
their signs. If zij > 0 , it is a usual weight for the corresponding cell and log zij is
sometimes called the offset. If zij ≤ 0 , a structural zero is imposed on the cell
0 5
B = i, A = j . Contingency tables containing at least one structural zero are called
incomplete tables. If nij = 0 but zij > 0 , the cell B = i, A = j contains a 0 5
sampling zero. Although SPSS still considers a structural zero part of the
contingency table, it is not used in fitting the model. Cellwise statistics are not
computed for structural zeros.
Maximum-Likelihood Estimation
The multinomial log-likelihood is
c r
16 3
L β = L β 1 ,K, β p = constant + 8 ∑ ∑ n log3m 8ij ij (3)
j =1 i =1
Likelihood Equations
It can be shown that
c r
∑ ∑ 3n 8
∂L
= ij − mij x ijk for k = 1,K, p
∂β k
j =1 i =1
16 3 16 1 68 0 5
Let g β = g1 β ,K, g p β ′ be the p + 1 gradient vector with
16
gk β =
∂L
∂β k
4 9
t
The maximum-likelihood estimates β$ = β$ 1 ,K, β$ p are regarded as a solution to
the vector of likelihood equations:
16
g β =0 (4)
Hessian Matrix
The likelihood equations are nonlinear functions of β. Solving them for β$ requires
an iterative method. The Newton-Raphson method is used. It can be shown that
c r
∑ ∑ m 3x 83 8
∂ 2L
=− ij ijk − θ jk xijl − θ jl
∂β k ∂β t
j =1 i =1
where
∑m x
1
θ jk = ij ijk j = 1,K, c and k = 1,K, p (5)
Nj
i =1
16 16
Let H β be the p × p information matrix, where −H β is the Hessian matrix of
(3). The elements of H β are 16
16
hkl β = −
∂ 2L
∂β k ∂β l
k = 1, K, p and l = 1,K, p (6)
16
Note: H β is a symmetric positive-definite matrix. The asymptotic covariance
matrix of β$ is estimated by H −1 β . 16
Newton-Raphson Method
05
Let β s denote the sth approximation for the solution to (4). By the Newton-
Raphson method,
β 0 s +15 = β 0 s 5 + H −1 β 0s 5 g β 0s 5
4 94 9
Define q1β 6 = H1 β 6β + g1β 6 . Using (5) again, the kth element of q1β 6 is
c r
1 6 ∑ ∑η 3x
qk β = ij ijk − θ jk 8 (7)
j =1 i =1
where
η ij =
K%&m v + 3n − mij 8 if zij > 0 and mij > 0
K'0
ij ij ij
otherwise
Then
H β 0 s5 β 0 s +15 = q β 0 s5
4 9 4 9 (8)
05 0 5 0 5
Thus, given β s , the s + 1 th approximation β s+1 is found by solving the system
of equations in (8).
Initial Values
05
SPSS uses the β 0 , which corresponds to a saturated model as the initial value for
β. Then the initial estimates for the expected cell counts are
00 5
mij =
%&n ij +∆ if zij > 0
'0
(9)
if zij ≤ 0
where ∆ ≥ 0 is a constant.
Note: For saturated models, SPSS adds ∆ to nij if zij > 0 . This is done to avoid
numerical problems in case some observed counts are 0. We advise users to set ∆ to
0 whenever all observed counts (other than structural zeros) are positive.
The initial values for other quantities are
θ0 5 = ∑ m0 5 x
r
0 1 0
jk ij ijk (10)
Nj
i =1
and
η 0ij 5 =
%Km0 5 log4m0 5 / z 9 + 4n 00 5 9 005
&K0
0 0
0 ij ij ij ij − mij if zij > 0 and mij > 0
(11)
' otherwise
Stopping Criteria
SPSS checks the following conditions for convergence:
1. max i, j mij 0 s +1 5 − m0s5 / m 0s5 < ε provided that mij 0s5 > 0
ij ij
2. max i, j m0 ij
s +1 5 − m 0s5 < ε
ij
∑ p
4 9 / p < ε
gk2 β$
3.
k =1
The iteration is said to be converged if either conditions 1 and 3 or conditions 2

and 3 are satisfied. If p = 0 , then condition 3 will be automatically satisfied. The
iteration is said to be not converged if neither pair of conditions is satisfied within
the maximum number of iterations.
Algorithm
The iteration process uses the following steps:
1. Calculate mij
005 using (9), θ 005 using (10), and n005 using (11).
jk ij
2. Set s = 0 .
3. Calculate H β s 4 0 5 9 using (6) evaluated at m ij

0s5 4 0 5 9 using
= mij ; calculate q β s
(7) evaluated at nij = nij .

0s5
4. 0 5
Solve for β s+1 using (8).
0s +15 = 0 5 and
∑ x
p s +1
5. Calculate v ij ijk β k
k =1
%
0s +15 = K& N j zij e v
0 5
s +1
/ ∑
r vij0 s +15
if zij > 0
ij
ztj e
K'0
m ij t =1
if zij ≤ 0
6. Check whether the stopping criteria are satisfied. If yes, stop iteration and
declare convergence. Otherwise continue.
7. Increase s by 1 and check whether the maximum iteration has been reached. If yes,
stop iteration and declare the process not converged. Otherwise repeat steps 3-7.
Estimated Normalizing Constants

Using (2), the maximum-likelihood estimate for α j is
N
= log
j
α$ j j = 1,K, c
∑ z e r
i =1
ij
v$ij

where
p
v$ij = ∑x $
ijk β k
k =1
Estimated Cell Counts

The estimated expected count is
%K N z e /
∑
r

v$ij v$tj
=&
ztj e if zij > 0
m$ ij j ij
K'0
t =1
if zij ≤ 0
Goodness-of-Fit Statistics
The Pearson chi-square statistic is
c r
X2 = ∑∑ X ij
2
j =1 i =1
where
%K3nij − m$ ij 82 / m$ ij if zij > 0, nij > 0, and m$ ij > 0

Xij 2 = &SYSMIS if zij > 0, nij > 0, and m$ ij = 0
KK0
' if zij ≤ 0 or nij = m$ ij
If any Xij 2 is system missing, then X 2 is also system missing.
The likelihood-ratio chi-square statistic is
c r
G2 = 2 ∑∑G ij
2
j =1 i =1
where
%Kn 4log3n
ij ij / m$ ij 89 if zij > 0, nij > 0 and m$ ij > 0
Gij 2 =&
KSYSMIS if zij > 0, nij > 0 and m$ ij = 0
KK0 if zij > 0, nij = 0, and m$ ij ≥ 0;
' zij ≤ 0 or nij = m$ ij
If any Gij 2 is system missing, then G 2 is also system missing.
Degrees of Freedom
The degrees of freedom for each statistic is defined as a = c r − 1 − p − E , where E 0 5
is the number of cells with zij ≤ 0 or m$ ij = 0 .
Significance Level
The significance level (or the p value) for the Pearson chi-square statistic is
4
Prob χ 2a > X 2 9 and that for the likelihood-ratio chi-square statistic is
Prob4 χ 2
a > G2 9 . In both cases, χ 2a is the central chi-square distribution with a
degrees of freedom.
Analysis of Dispersion (Logit Models Only)

SPSS provides the analysis of dispersion based on two types of dispersion: entropy
and concentration. The following definitions are used:
S(A) Dispersion due to the model

S(B|A) Dispersion due to residuals
S(B) Total dispersion
R=S(A)/S(B) Measure of association
05 0 5 05
where S A + S B| A = S B . Also define
∑
c
m$ ij
j =1
π$ i =
∑
c
Nj
j =1
m$ ij
π$ i| j =
Nj
The bounds are 0 ≤ π$ i ≤ 1 and 0 ≤ π$ i| j ≤ 1 .
Entropy
05 ∑ S 0 B5
r
S B = −N i
i =1
where
0 5 %&π0$ log1π 6
Si B = i i if 0 < π$ i ≤ 1
' if π$ i = 0
and
c r
1 6 ∑ N ∑ S 1 B | A6
S B| A = − j ij
j =1 i =1
where
1 6 %K&Kπ0$
Sij B | A = i| j 3 8
log π$ i| j if 0 < π$ i| j ≤ 1
' if π$ i| j = 0
Concentration

0 5 ∑ π$
r
S B = N 1− 2
i =1
i
c r
S1 B | A6 = ∑ N 1 − ∑ π$

2
j =1
j
i =1
i| j

Degrees of Freedom
Source of Dispersion Measure Degrees of

Freedom
Due to model S(A) 0 5
f r −1
Due to residuals S(B|A) 0 N − f − 150r − 15
Total S(B) 0 N − 150r − 15
where f equals p minus the number of nonredundant columns (in the design matrix)
associated with the main effects of the dependent factors.
Residuals
Goodness-of-fit statistics provide only broad summaries of how models fit data.
The pattern of lack of fit is revealed in cell-by-cell comparisons of observed and
fitted cell counts.
Simple Residuals
The simple residual of the (i,j)th cell is
rij =
%&n − m$
ij ij if zij > 0
'SYSMIS if zij ≤ 0
Standardized Residuals
The standardized residual for the (i,j)th cell is
%K3n − m$ 8 / 3
m$ ij 1 − m$ ij / N j 8 if zij > 0 and 0 < m$ ij < N j
rijS
K
= &0
ij ij
if zij > 0 and nij = m$ ij
KKSYSMIS
' otherwise
The standardized residuals are also known as Pearson residuals even though
∑ ∑ 4r 9
c r S 2
ij ≠ X 2 . Although the standardized residuals are asymptotically
j =1 i =1
normal, their asymptotic variances are less than 1.
Adjusted Residuals
The adjusted residual is the simple residual divided by its estimated standard
error. Its definition and applications first appeared in Haberman (1973) and re-
appeared on page 454 of Haberman (1979). This statistic for the (i,j)th cell is
%K3n − m$ 8 /
ij ij sij if zij > 0 and m$ ij > 0
rijA = &0 if zij > 0 and n ij = m$ ij
K'SYSMIS otherwise
where
m$ ij
p p
sij = m$ ij 1 − − m$ ij ∑ ∑4 94 9
xijk − θ$ jk xijl − θ$ jl h kl
Nj
k =1 l =1
49
h kl is the (k,l)th element of H −1 β$ . The adjusted residuals are asymptotically
standard normal.
Deviance Residuals
Pierce and Schafer (1986) and McCullagh and Nelder (1989) define the signed
square root of the individual contribution to the G 2 statistic as the deviance
residual. This statistic for the (i,j)th cell is
3
rijD = sign nij − m$ ij 8 dij
where
%K24nij 4log3nij / m$ ij 89 − 3nij − m$ ij 89 if zij > 0, m$ ij > 0, and nij > 0

K$
dij = &2 mij if zij > 0, m$ ij ≥ 0, and n ij = 0
KK0 if zij > 0 and n ij = m$ ij
'SYSMIS otherwise
For multinomial sampling, the individual contribution to the G 2 statistic is only

3 8
2nij log nij / m$ ij , but this is negative when nij < m$ ij . Thus, an extra term
3 8
2 nij − m$ ij is added to it so that dij > 0 for all i and j. However, we still have
∑ j =1 ∑i=14rijD 9 = G 2 .
c r 2
Generalized Residual
∑ ∑
c r
Consider a linear combination of the cell counts dij nij , where dij are
j =1 i =1
real numbers.
The estimated expected value is
c r
∑ ∑ d m$ ij ij
j =1 i =1
The simple residual for this linear combination is
c r
∑ ∑ d 3n ij ij − m$ ij 8
j =1 i =1
The standardized residual for this linear combination is
∑ ∑
c
j =1
r
i =1
3
dij nij − m$ ij 8
∑ 2
∑ ∑
c r r
j =1 i =1
dij2 m$ ij −
i =1
dij mij
/ Nj
The adjusted residual for this linear combination is, as given on page 420 of
Haberman (1979),
∑ ∑
c
j =1
r
i =1
3
dij nij − m$ ij 8
V
where
c r c r 2 p p kl
V= ∑∑ dij2 m$ ij −∑ ∑ dij m$ ij − ∑ ∑ fk fl h
1
j =1 i =1 j =1 i=1 k =1 l =1
Nj
c r
fk = ∑ ∑ d m$ 3 x ij ij ijk − θ ik 8
j =1 i =1
Generalized Log-Odds Ratio

Consider a linear combination of the natural logarithm of cell counts
c r
∑∑d ij log mij3 8 (12)

j =1 i =1
where dij are real numbers with the restriction
∑d ij =0 j = 1,K, c
i =1
The quantity in (12) is estimated by
c r c r c r p
∑∑ 3 8 ∑∑
dij log m$ ij = 3 8 ∑∑∑d
dij log zij + $
ij x ijk β k (13)
j =1 i =1 j =1 i =1 j =1 i =1 k =1
The variance of (13) is
c r p p
var ∑ ∑ dij log3m$ ij 8 = ∑ ∑ wk wl h kl (14)
j =1 i =1 k =1 l =1
where
c r
wk = ∑∑d ij x ijk k = 1,K, p
j =1 i =1
Wald Statistic
The null hypothesis is
c r
H0 : ∑∑d ij 3 8
log mij = 0
j =1 i =1
The Wald statistic is
∑ ∑ d log3m$ 8
c r 2
W=

j =1 i =1
ij ij
∑ ∑ w wh
p p kl
k l
k =1 l =1
Under H0 , W asymptotically distributes as a chi-square distribution with 1 degree

4
of freedom. The significance level is Prob χ 12 ≥ W . Note: W will be system 9
missing if (14) is 0.
Asymptotic Confidence Interval

0 5
The asymptotic 1 − α × 100% confidence interval for (12) is
c r p p
∑∑ 3 8
dij log m$ ij ± zα / 2 ∑ ∑ wk wl h kl
j =1 i =1 k =1 l =1
where zα /2 is the upper α / 2 point of the standard normal distribution. The default
value of α is 0.05.
Aggregated Data
This section shows how data are aggregated for a multinomial distribution. The
following notation is used in this section:
vij Number of SPSS cases for B = i 0i = 1,K, r5 and A = j 0 j = 1,K, c5

nijs sth SPSS caseweight for B = i and A = j 1s = 1,K, vi 6
xijs Covariate
zijs Cell weight
c ijs GRESID coefficient
eijs GLOR coefficient
vij+ Number of positive zijs (cell weights) for 1 ≤ s ≤ vij
The cell count is
%K
= &∑
* +
nijs if vij+ > 0
1≤ s ≤ vij
K'0
nij
if vij = 0 or vij+ = 0
where
+
=
%&nijs if nijs > 0 and zijs > 0
'0
nijs
if nijs ≤ 0 and zijs > 0
∑
*
and means summation over the range of s with the terms zijs > 0 .
1≤ s ≤ vij
The cell weight value is
%K∑ * +
nijs zijs / nij if nij > 0 and vij+ > 0
KK 1≤ s ≤ vij
= &∑
*
zijs / vij+ if nij = 0 and vij+ > 0
KK0
zij 1≤ s ≤ vij
if vij+ = 0
K'1 if vij = 0
If no variable is specified as the cell weight variable, then all cases have unit cell
weights by default.
The cell covariate value is
%K∑ * +
nijs x ijs / nij if nij > 0 and vij > 0
KK 1≤ s ≤ vij
= &∑
*
xijs / vij+ if nij = 0 and vij+ > 0
KK
xij
1≤ s ≤ vij
if vij+ = 0 or vij = 0
K'0
The cell GRESID coefficient is
%K∑ * +
nijs cijs / nij if nij > 0 and vij > 0
KK 1≤ s ≤ vij
= &∑
*
cijs / vij+ if nij = 0 and vij+ > 0
KK
cij
1≤ s ≤ vij
if vij+ or vij = 0
K'0
There are no defaults for the GRESID coefficients.
The cell GLOR coefficient is
%K∑ * +
nijs eijs / nij if nij > 0 and vij > 0
KK 1≤ s ≤ vij
= &∑
*
eijs / vij+ if nij = 0 and vij+ > 0
KK
eij
1≤ s ≤ vij
if vij+ = 0 or vij = 0
K'0
There are no defaults for the GLOR coefficients.
References
Agresti, A. 1990. Categorical data analysis. New York: John Wiley & Sons, Inc.
Christensen, R. 1990. Log-linear models. New York: Springer-Verlag.
Haberman, S. J. 1973. The analysis of residuals in cross-classified tables.

Biometrics, 29: 205–220.
Haberman, S. J. 1979. Analysis of qualitative data, Volume 2. New York:

Academic Press.
McCullagh, P., and Nelder, J. A. 1989. Generalized linear models. 2nd ed. London:
Chapman and Hall.
Pierce, D. A., and Schafer, D. W. 1986. Residuals in generalized linear models.

Journal of the American Statistical Association, 81: 977–986.

Genlog Multinomial

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Genlog Multinomial

Uploaded by

Copyright:

Available Formats

GENLOG

Multinomial Loglinear and Logit Models

This chapter describes the algorithms used to calculate maximum-likelihood

π ij Probability of having an observation in the ith response of B and the jth

βk kth nonredundant parameter.

Components of the Model

• The counts =n ,K, n B at

• The counts nij and ni ′j ′ are independent if j ≠ j ′ .

• The joint probability distribution of nij is the product of these c independent= B

• The expected count is E nij = mij = N j π ij . 3 8

SPSS does not consider α 1 to α c as parameters, but as normalizing constants.

Cell Structure Values

The initial values for other quantities are

The iteration is said to be converged if either conditions 1 and 3 or conditions 2

3. Calculate H β s 4 0 5 9 using (6) evaluated at m ij

(7) evaluated at nij = nij .

Estimated Normalizing Constants

Estimated Cell Counts

%K3nij − m$ ij 82 / m$ ij if zij > 0, nij > 0, and m$ ij > 0

If any Xij 2 is system missing, then X 2 is also system missing.

The likelihood-ratio chi-square statistic is

If any Gij 2 is system missing, then G 2 is also system missing.

Analysis of Dispersion (Logit Models Only)

S(A) Dispersion due to the model

The bounds are 0 ≤ π$ i ≤ 1 and 0 ≤ π$ i| j ≤ 1 .

Source of Dispersion Measure Degrees of

%K24nij 4log3nij / m$ ij 89 − 3nij − m$ ij 89 if zij > 0, m$ ij > 0, and nij > 0

For multinomial sampling, the individual contribution to the G 2 statistic is only

The simple residual for this linear combination is

The standardized residual for this linear combination is

Generalized Log-Odds Ratio

∑∑d ij log mij3 8 (12)

where dij are real numbers with the restriction

The quantity in (12) is estimated by

The variance of (13) is

The Wald statistic is

Under H0 , W asymptotically distributes as a chi-square distribution with 1 degree

Asymptotic Confidence Interval

vij Number of SPSS cases for B = i 0i = 1,K, r5 and A = j 0 j = 1,K, c5

The cell count is

The cell covariate value is

Christensen, R. 1990. Log-linear models. New York: Springer-Verlag.

Haberman, S. J. 1973. The analysis of residuals in cross-classified tables.

Haberman, S. J. 1979. Analysis of qualitative data, Volume 2. New York:

Pierce, D. A., and Schafer, D. W. 1986. Residuals in generalized linear models.

You might also like