You are on page 1of 21

GENLOG

Multinomial Loglinear and Logit Models

This chapter describes the algorithms used to calculate maximum-likelihood


estimates for the multinomial loglinear model and the multinomial logit model.
This algorithm is applicable only to aggregated data.

Notation
The following notation is used throughout this chapter unless otherwise stated:
A Generic categorical independent (explanatory) variable. Its categories are
indexed by an array of integers.
B Generic categorical dependent (response) variable. Its categories are
indexed by an array of integers.
r Number of categories of B.
c Number of categories of A.
p Number of nonredundant (nonaliased) parameters.
i Generic index for the category of B.
j Generic index for the categories of A.
k Generic index for the parameter.
nij Observed count in the ith response of B and the jth setting of A.
Nj Marginal total count at the jth setting of A. It is equal to


r
nij .
i =1
N Total observed count. It is equal to

∑ ∑
c r
nij .
j =1 i =1
mij Expected count.

1
2 GENLOG Multinomial Loglinear and Logit Models

π ij Probability of having an observation in the ith response of B and the jth

∑ ∑
c r
setting of A. 0 ≤ π ij ≤ 1 and π ij = 1 .
j =1 i =1
zij Cell structure value.
αj jth normalizing constant.

βk kth nonredundant parameter.


β
3 8
A vector of β 1 ,K, β p ′ .

xijk An element in the ith row and the kth column of the design matrix for the j
setting.

The same notation is used for both loglinear and logit models so that the methods
are presented in a unified way. Conceptually, one can consider a loglinear model as
a special case of a logit model where the explanatory variable has only one level
(that is, c = 1).

Components of the Model


There are two components in a loglinear model: the random component and the
systematic component.

Random Component
The random component describes the joint distribution of the counts.

• The counts =n ,K, n B at


1j rj the jth setting of A have the multinomial

3N ,π
j 1j ,K, π 8 distribution.
rj

• The counts nij and ni ′j ′ are independent if j ≠ j ′ .


GENLOG Multinomial Loglinear and Logit Models 3

• The joint probability distribution of nij is the product of these c independent= B


multinomial distributions. The probability density function is
 N! 
∏  
c r

∏π j nij

(1)
∏
r ij
j =1n ! ij i =1
i =1

• The expected count is E nij = mij = N j π ij . 3 8


• The covariance is
%
4
cov nij , ni ′j ′ = 9 K&'K0N π 4δ
j ij ii ′ − π i ′j 9 if j = j ′
if j ≠ j ′

where δ ab = 1 if a = b and δ ab = 0 if a ≠ b .

Systematic Component
The systematic component describes the linkage function between the expected
counts and the parameters. The expected counts are themselves functions of other
parameters. Explicitly, for i = 1,K, r and j = 1,K, c ,

%Kz eα j + vij
mij = &K0 ij if zij > 0
' if zij ≤ 0

where

p
vij = ∑x ijk β k
k =1

SPSS does not consider α 1 to α c as parameters, but as normalizing constants.


4 GENLOG Multinomial Loglinear and Logit Models

Normalizing Constants

 N 
= log 
j
αj j = 1,K, c
 ∑ z e 
r
(2)
vij
ij
i =1

Cell Structure Values


The cell structure values play two roles in SPSS loglinear procedures, depending on
3 8
their signs. If zij > 0 , it is a usual weight for the corresponding cell and log zij is
sometimes called the offset. If zij ≤ 0 , a structural zero is imposed on the cell
0 5
B = i, A = j . Contingency tables containing at least one structural zero are called
incomplete tables. If nij = 0 but zij > 0 , the cell B = i, A = j contains a 0 5
sampling zero. Although SPSS still considers a structural zero part of the
contingency table, it is not used in fitting the model. Cellwise statistics are not
computed for structural zeros.

Maximum-Likelihood Estimation
The multinomial log-likelihood is

c r
16 3
L β = L β 1 ,K, β p = constant + 8 ∑ ∑ n log3m 8ij ij (3)
j =1 i =1

Likelihood Equations
It can be shown that

c r

∑ ∑ 3n 8
∂L
= ij − mij x ijk for k = 1,K, p
∂β k
j =1 i =1
GENLOG Multinomial Loglinear and Logit Models 5

16 3 16 1 68 0 5
Let g β = g1 β ,K, g p β ′ be the p + 1 gradient vector with

16
gk β =
∂L
∂β k

4 9
t
The maximum-likelihood estimates β$ = β$ 1 ,K, β$ p are regarded as a solution to
the vector of likelihood equations:

16
g β =0 (4)

Hessian Matrix

The likelihood equations are nonlinear functions of β. Solving them for β$ requires
an iterative method. The Newton-Raphson method is used. It can be shown that
c r

∑ ∑ m 3x 83 8
∂ 2L
=− ij ijk − θ jk xijl − θ jl
∂β k ∂β t
j =1 i =1

where

∑m x
1
θ jk = ij ijk j = 1,K, c and k = 1,K, p (5)
Nj
i =1

16 16
Let H β be the p × p information matrix, where −H β is the Hessian matrix of
(3). The elements of H β are 16

16
hkl β = −
∂ 2L
∂β k ∂β l
k = 1, K, p and l = 1,K, p (6)

16
Note: H β is a symmetric positive-definite matrix. The asymptotic covariance
matrix of β$ is estimated by H −1 β . 16
6 GENLOG Multinomial Loglinear and Logit Models

Newton-Raphson Method
05
Let β s denote the sth approximation for the solution to (4). By the Newton-
Raphson method,

β 0 s +15 = β 0 s 5 + H −1 β 0s 5 g β 0s 5
4 94 9
Define q1β 6 = H1 β 6β + g1β 6 . Using (5) again, the kth element of q1β 6 is

c r
1 6 ∑ ∑η 3x
qk β = ij ijk − θ jk 8 (7)
j =1 i =1

where

η ij =
K%&m v + 3n − mij 8 if zij > 0 and mij > 0
K'0
ij ij ij
otherwise

Then

H β 0 s5 β 0 s +15 = q β 0 s5
4 9 4 9 (8)

05 0 5 0 5
Thus, given β s , the s + 1 th approximation β s+1 is found by solving the system
of equations in (8).
GENLOG Multinomial Loglinear and Logit Models 7

Initial Values
05
SPSS uses the β 0 , which corresponds to a saturated model as the initial value for
β. Then the initial estimates for the expected cell counts are

00 5
mij =
%&n ij +∆ if zij > 0
'0
(9)
if zij ≤ 0

where ∆ ≥ 0 is a constant.
Note: For saturated models, SPSS adds ∆ to nij if zij > 0 . This is done to avoid
numerical problems in case some observed counts are 0. We advise users to set ∆ to
0 whenever all observed counts (other than structural zeros) are positive.

The initial values for other quantities are

θ0 5 = ∑ m0 5 x
r
0 1 0
jk ij ijk (10)
Nj
i =1

and

η 0ij 5 =
%Km0 5 log4m0 5 / z 9 + 4n 00 5 9 005
&K0
0 0
0 ij ij ij ij − mij if zij > 0 and mij > 0
(11)
' otherwise

Stopping Criteria
SPSS checks the following conditions for convergence:

1. max i, j mij  0 s +1 5 − m0s5 / m 0s5  < ε provided that mij 0s5 > 0
ij ij 
2. max i, j  m0 ij
s +1 5 − m 0s5  < ε
ij 
 ∑ p
4 9 / p < ε
gk2 β$
3.
 k =1
8 GENLOG Multinomial Loglinear and Logit Models

The iteration is said to be converged if either conditions 1 and 3 or conditions 2


and 3 are satisfied. If p = 0 , then condition 3 will be automatically satisfied. The
iteration is said to be not converged if neither pair of conditions is satisfied within
the maximum number of iterations.

Algorithm
The iteration process uses the following steps:
1. Calculate mij
005 using (9), θ 005 using (10), and n005 using (11).
jk ij

2. Set s = 0 .

3. Calculate H β s 4 0 5 9 using (6) evaluated at m ij


0s5 4 0 5 9 using
= mij ; calculate q β s

(7) evaluated at nij = nij .


0s5
4. 0 5
Solve for β s+1 using (8).
0s +15 = 0 5 and
∑ x
p s +1
5. Calculate v ij ijk β k
k =1

% 
0s +15 = K& N j  zij e v
0 5 
s +1
/ ∑
r vij0 s +15  
  if zij > 0
ij
ztj e
K'0
m ij t =1
if zij ≤ 0
6. Check whether the stopping criteria are satisfied. If yes, stop iteration and
declare convergence. Otherwise continue.
7. Increase s by 1 and check whether the maximum iteration has been reached. If yes,
stop iteration and declare the process not converged. Otherwise repeat steps 3-7.
GENLOG Multinomial Loglinear and Logit Models 9

Estimated Normalizing Constants


Using (2), the maximum-likelihood estimate for α j is

 N 
= log 
j
α$ j j = 1,K, c
 ∑ z e r
i =1
ij
v$ij

where

p
v$ij = ∑x $
ijk β k
k =1

Estimated Cell Counts


The estimated expected count is

%K N  z e /   
∑
r

v$ij v$tj
=& 
ztj e if zij > 0
m$ ij j ij
K'0
t =1
if zij ≤ 0

Goodness-of-Fit Statistics
The Pearson chi-square statistic is

c r
X2 = ∑∑ X ij
2

j =1 i =1
10 GENLOG Multinomial Loglinear and Logit Models

where

%K3nij − m$ ij 82 / m$ ij if zij > 0, nij > 0, and m$ ij > 0


Xij 2 = &SYSMIS if zij > 0, nij > 0, and m$ ij = 0
KK0
' if zij ≤ 0 or nij = m$ ij

If any Xij 2 is system missing, then X 2 is also system missing.

The likelihood-ratio chi-square statistic is

c r
G2 = 2 ∑∑G ij
2

j =1 i =1

where

%Kn 4log3n
ij ij / m$ ij 89 if zij > 0, nij > 0 and m$ ij > 0

Gij 2 =&
KSYSMIS if zij > 0, nij > 0 and m$ ij = 0
KK0 if zij > 0, nij = 0, and m$ ij ≥ 0;
' zij ≤ 0 or nij = m$ ij

If any Gij 2 is system missing, then G 2 is also system missing.

Degrees of Freedom
The degrees of freedom for each statistic is defined as a = c r − 1 − p − E , where E 0 5
is the number of cells with zij ≤ 0 or m$ ij = 0 .

Significance Level

The significance level (or the p value) for the Pearson chi-square statistic is

4
Prob χ 2a > X 2 9 and that for the likelihood-ratio chi-square statistic is
Prob4 χ 2
a > G2 9 . In both cases, χ 2a is the central chi-square distribution with a
degrees of freedom.
GENLOG Multinomial Loglinear and Logit Models 11

Analysis of Dispersion (Logit Models Only)


SPSS provides the analysis of dispersion based on two types of dispersion: entropy
and concentration. The following definitions are used:

S(A) Dispersion due to the model


S(B|A) Dispersion due to residuals
S(B) Total dispersion
R=S(A)/S(B) Measure of association

05 0 5 05
where S A + S B| A = S B . Also define


c
m$ ij
j =1
π$ i =

c
Nj
j =1

m$ ij
π$ i| j =
Nj

The bounds are 0 ≤ π$ i ≤ 1 and 0 ≤ π$ i| j ≤ 1 .

Entropy

05 ∑ S 0 B5
r
S B = −N i
i =1

where

0 5 %&π0$ log1π 6
Si B = i i if 0 < π$ i ≤ 1
' if π$ i = 0
12 GENLOG Multinomial Loglinear and Logit Models

and

c r
1 6 ∑ N ∑ S 1 B | A6
S B| A = − j ij
j =1 i =1

where

1 6 %K&Kπ0$
Sij B | A = i| j 3 8
log π$ i| j if 0 < π$ i| j ≤ 1
' if π$ i| j = 0

Concentration

 
0 5  ∑ π$ 
r
S B = N 1− 2

  i =1
i

c  r 
S1 B | A6 = ∑ N 1 − ∑ π$ 

2

j =1
j
i =1
i| j

Degrees of Freedom

Source of Dispersion Measure Degrees of


Freedom
Due to model S(A) 0 5
f r −1
Due to residuals S(B|A) 0 N − f − 150r − 15
Total S(B) 0 N − 150r − 15
where f equals p minus the number of nonredundant columns (in the design matrix)
associated with the main effects of the dependent factors.
GENLOG Multinomial Loglinear and Logit Models 13

Residuals
Goodness-of-fit statistics provide only broad summaries of how models fit data.
The pattern of lack of fit is revealed in cell-by-cell comparisons of observed and
fitted cell counts.

Simple Residuals
The simple residual of the (i,j)th cell is

rij =
%&n − m$
ij ij if zij > 0
'SYSMIS if zij ≤ 0

Standardized Residuals
The standardized residual for the (i,j)th cell is

%K3n − m$ 8 / 3
m$ ij 1 − m$ ij / N j 8 if zij > 0 and 0 < m$ ij < N j
rijS
K
= &0
ij ij
if zij > 0 and nij = m$ ij
KKSYSMIS
' otherwise

The standardized residuals are also known as Pearson residuals even though

∑ ∑ 4r 9
c r S 2
ij ≠ X 2 . Although the standardized residuals are asymptotically
j =1 i =1
normal, their asymptotic variances are less than 1.

Adjusted Residuals
The adjusted residual is the simple residual divided by its estimated standard
error. Its definition and applications first appeared in Haberman (1973) and re-
appeared on page 454 of Haberman (1979). This statistic for the (i,j)th cell is
14 GENLOG Multinomial Loglinear and Logit Models

%K3n − m$ 8 /
ij ij sij if zij > 0 and m$ ij > 0
rijA = &0 if zij > 0 and n ij = m$ ij
K'SYSMIS otherwise

where

 m$ ij
p p 
sij = m$ ij 1 − − m$ ij ∑ ∑4 94 9 
xijk − θ$ jk xijl − θ$ jl h kl
 Nj
k =1 l =1 
49
h kl is the (k,l)th element of H −1 β$ . The adjusted residuals are asymptotically
standard normal.

Deviance Residuals
Pierce and Schafer (1986) and McCullagh and Nelder (1989) define the signed
square root of the individual contribution to the G 2 statistic as the deviance
residual. This statistic for the (i,j)th cell is

3
rijD = sign nij − m$ ij 8 dij

where

%K24nij 4log3nij / m$ ij 89 − 3nij − m$ ij 89 if zij > 0, m$ ij > 0, and nij > 0


K$
dij = &2 mij if zij > 0, m$ ij ≥ 0, and n ij = 0
KK0 if zij > 0 and n ij = m$ ij
'SYSMIS otherwise

For multinomial sampling, the individual contribution to the G 2 statistic is only


3 8
2nij log nij / m$ ij , but this is negative when nij < m$ ij . Thus, an extra term

3 8
2 nij − m$ ij is added to it so that dij > 0 for all i and j. However, we still have

∑ j =1 ∑i=14rijD 9 = G 2 .
c r 2
GENLOG Multinomial Loglinear and Logit Models 15

Generalized Residual
∑ ∑
c r
Consider a linear combination of the cell counts dij nij , where dij are
j =1 i =1
real numbers.
The estimated expected value is

c r

∑ ∑ d m$ ij ij
j =1 i =1

The simple residual for this linear combination is

c r

∑ ∑ d 3n ij ij − m$ ij 8
j =1 i =1

The standardized residual for this linear combination is

∑ ∑
c
j =1
r
i =1
3
dij nij − m$ ij 8
  ∑  2 
∑  ∑ 
c r r
j =1 i =1
dij2 m$ ij −
 i =1
dij mij
 / Nj

The adjusted residual for this linear combination is, as given on page 420 of
Haberman (1979),

∑ ∑
c
j =1
r
i =1
3
dij nij − m$ ij 8
V
16 GENLOG Multinomial Loglinear and Logit Models

where

c r c r  2 p p kl
V= ∑∑ dij2 m$ ij −∑  ∑ dij m$ ij  − ∑ ∑ fk fl h
1

j =1 i =1 j =1  i=1  k =1 l =1
Nj

c r
fk = ∑ ∑ d m$ 3 x ij ij ijk − θ ik 8
j =1 i =1

Generalized Log-Odds Ratio


Consider a linear combination of the natural logarithm of cell counts

c r

∑∑d ij log mij3 8 (12)


j =1 i =1

where dij are real numbers with the restriction

∑d ij =0 j = 1,K, c
i =1

The quantity in (12) is estimated by

c r c r c r p

∑∑ 3 8 ∑∑
dij log m$ ij = 3 8 ∑∑∑d
dij log zij + $
ij x ijk β k (13)
j =1 i =1 j =1 i =1 j =1 i =1 k =1
GENLOG Multinomial Loglinear and Logit Models 17

The variance of (13) is

c r  p p
var  ∑ ∑ dij log3m$ ij 8 = ∑ ∑ wk wl h kl (14)
 j =1 i =1  k =1 l =1
where

c r
wk = ∑∑d ij x ijk k = 1,K, p
j =1 i =1

Wald Statistic
The null hypothesis is

c r
H0 : ∑∑d ij 3 8
log mij = 0
j =1 i =1

The Wald statistic is

 ∑ ∑ d log3m$ 8
c r 2

W=
 
j =1 i =1
ij ij

∑ ∑ w wh
p p kl
k l
k =1 l =1

Under H0 , W asymptotically distributes as a chi-square distribution with 1 degree


4
of freedom. The significance level is Prob χ 12 ≥ W . Note: W will be system 9
missing if (14) is 0.
18 GENLOG Multinomial Loglinear and Logit Models

Asymptotic Confidence Interval


0 5
The asymptotic 1 − α × 100% confidence interval for (12) is

c r p p

∑∑ 3 8
dij log m$ ij ± zα / 2 ∑ ∑ wk wl h kl
j =1 i =1 k =1 l =1

where zα /2 is the upper α / 2 point of the standard normal distribution. The default
value of α is 0.05.

Aggregated Data
This section shows how data are aggregated for a multinomial distribution. The
following notation is used in this section:

vij Number of SPSS cases for B = i 0i = 1,K, r5 and A = j 0 j = 1,K, c5


nijs sth SPSS caseweight for B = i and A = j 1s = 1,K, vi 6
xijs Covariate
zijs Cell weight
c ijs GRESID coefficient
eijs GLOR coefficient
vij+ Number of positive zijs (cell weights) for 1 ≤ s ≤ vij
GENLOG Multinomial Loglinear and Logit Models 19

The cell count is

%K
= &∑
* +
nijs if vij+ > 0
1≤ s ≤ vij
K'0
nij
if vij = 0 or vij+ = 0

where

+
=
%&nijs if nijs > 0 and zijs > 0
'0
nijs
if nijs ≤ 0 and zijs > 0


*
and means summation over the range of s with the terms zijs > 0 .
1≤ s ≤ vij
The cell weight value is

%K∑ * +
nijs zijs / nij if nij > 0 and vij+ > 0
KK 1≤ s ≤ vij

= &∑
*
zijs / vij+ if nij = 0 and vij+ > 0
KK0
zij 1≤ s ≤ vij
if vij+ = 0
K'1 if vij = 0

If no variable is specified as the cell weight variable, then all cases have unit cell
weights by default.
20 GENLOG Multinomial Loglinear and Logit Models

The cell covariate value is

%K∑ * +
nijs x ijs / nij if nij > 0 and vij > 0
KK 1≤ s ≤ vij

= &∑
*
xijs / vij+ if nij = 0 and vij+ > 0
KK
xij
1≤ s ≤ vij
if vij+ = 0 or vij = 0
K'0
The cell GRESID coefficient is

%K∑ * +
nijs cijs / nij if nij > 0 and vij > 0
KK 1≤ s ≤ vij

= &∑
*
cijs / vij+ if nij = 0 and vij+ > 0
KK
cij
1≤ s ≤ vij
if vij+ or vij = 0
K'0
There are no defaults for the GRESID coefficients.
The cell GLOR coefficient is

%K∑ * +
nijs eijs / nij if nij > 0 and vij > 0
KK 1≤ s ≤ vij

= &∑
*
eijs / vij+ if nij = 0 and vij+ > 0
KK
eij
1≤ s ≤ vij
if vij+ = 0 or vij = 0
K'0
There are no defaults for the GLOR coefficients.
GENLOG Multinomial Loglinear and Logit Models 21

References
Agresti, A. 1990. Categorical data analysis. New York: John Wiley & Sons, Inc.

Christensen, R. 1990. Log-linear models. New York: Springer-Verlag.

Haberman, S. J. 1973. The analysis of residuals in cross-classified tables.


Biometrics, 29: 205–220.

Haberman, S. J. 1979. Analysis of qualitative data, Volume 2. New York:


Academic Press.

McCullagh, P., and Nelder, J. A. 1989. Generalized linear models. 2nd ed. London:
Chapman and Hall.

Pierce, D. A., and Schafer, D. W. 1986. Residuals in generalized linear models.


Journal of the American Statistical Association, 81: 977–986.

You might also like