1988 - Modeling Strategies For Categorical Data Examples From Housing and Tenure Choice

W. A. V. Clark, M . C.
Deurloo
and F. M . Dieleman
Modeling Strategies for Categorical Data:

Examples from Housing and Tenure Choice
Models to investigate categorical data can be divided into preprocessing, limited

parameterization, and formal logit models. T o illustrate the advantages of pre-
processing and limited parameterization models they are applied to a data set o f
tenure and type of housing choice before the data are examined with hierarchical
logit and nested logit models. The preprocessing approaches are useful in select-
ing optimal subsets of independent variables with respect to the dependent vari-
able. The ease of application and interpretation of a limited parameterization
approach extends the clarity of the results from the preprocessing approaches.
Because some variables are only relevant at specific levels of other independent
variables, nonstandard (nested) logit models are necessary to understand the
nested relationships.
1. INTRODUCTION
Techniques that use categorical data have proliferated in the past decade.
Logit, mutinomial logit, and sequential logit models have been applied to mobil-
ity (Clark, Deurloo, and Dieleman 1984; Clark and Onaka 1985), consumer
choice (Wrigley 1985), housing choice (Quigley 1976), and transportation mode
choice (Johnson and Hensher 1982) to mention only a few of the substantive
areas. However, this work has sometimes proceeded without a detailed consid-
eration of the underlying contingency tables. In this article, we deal with the
statistical analysis of contingency tables rather than with sequential choice
modeling. It is necessary to stress this point because both statistical modeling of
categorical data and discrete choice approaches often use the same terms with
different meanings. An example of this is the term nested logit model. In a later
section, the term is not used for sequential choice modeling, as is usually the case
in geographical studies (Wrigley 1985), but to describe the situation where the
effect of an independent variable (on the dependent variable) is only operative
W. A . V. Clark is professor of geography at the University of California, Los Angeles.

M . C . Deurloo is associate professor of geography at the University of Amsterdam. F . M .
Dieleman is professor of geography at the University of Utrecht, The Netherlands.
Geographical Analysis, Vol. 20, No. 3 (July 1988) @ 1988 Ohio State University Press
Submitted 10/87. Revised version accepted 1/88.
W. A. V. Clark, M . C . Deurloo, and F . M . Dieleman / 199
at a specific level of another independent variable. This last approach is more

common in marketing research (Magidson,Swan, and Berk 1981).
When categorical independent and/or dependent variables are being em-
ployed, logit analysis is an appropriate tool to model the relationship between
the variables. But given a large data set, such as the mobility and housing choice
data for the Netherlands (about 60,000 households), or the American Housing
Survey (about 75,000 individuals), direct application of logit analysis to the data
is often problematic or impossible. One major problem is that large numbers of
variables may have significant relationships with mobility and tenure choice,
and in addition these variables may have a large number of categories. The cross
classification of, say, six variables with two to five categories each, leads to a
table with a very large number of cells. Even assuming it is possible to fit a
model, the data often become too “thin,” that is, there are not enough observa-
tions to avoid having either empty cells or small numbers of observations in
many cells. As a result, there is real doubt about the parameterization of a logit or
multinominal logit model for such a sparse table. The estimates of the parame-
ters for such a table are unreliable. Moreover, the parameters of a logit model of
a large cross-tabulationare not subject to meaningful interpretation. Given such
a situation, it may be useful to preprocess the data to select the most important
independent variables from the larger set and to reduce thenumber of categories
of each selected variable. In our experience, logit analysis of categorical data is
often improved with a thorough preprocessing of the data. The need for prepro-
cessing prior to logit analysis has already been noted by other researchers, in-
cluding Higgins and Koch (1977),Green (1978), and Magidson (1982), and reit-
erated by Wrigley (1985). In this paper, we summarize our experiences over the
last few years in modeling under such circumstances (see Clark et al. 1984,1986;
Deurloo et al. 1987, 1988).
The paper demonstrates the usefulness of several forms of preprocessingprior
to logit analysis. The preprocessing of the data leads to robust logit models that
fit the data well and that are easy to interpret because the number of parameters
is relatively small. The suggested approaches can be extended to other geo-
graphic choice problems beyond the specific applications to housing and tenure
choice described here.
2. MODELING STRATEGIES
The methods considered in this paper can be grouped into three general cate-
gories (Figure 1).The first group is a selection of approaches where the objective
is to simplify both the set of explanatory variables and their number of catego-
ries. In our experience, two techniques, proportional reduction in uncertainty
(PRU) (Kim 1984), and Chi-square automatic interaction detection (CHAID)
(Kass 1980), are good choices from a wider set of possible techniques with the
same purpose. They are both applicable in situations where a qualitative de-
pendent variable with two or more categories and a large set of qualitative inde-
pendent variables with two or more categories are to be analyzed.
Given the dependent variable, the PRU technique selects independent vari-
ables in a sequential manner. The independent variable that explains most of the
distribution of the observations over the categories of the dependent variable is
chosen first and its categorization is simplified. The criterion for selection of
additional independent variables and simplication of their categories is mea-
sured by the change in entropy in the dependent variable. Variables are added
200 / Geographical Analysis
Un-Nested
Analysis PRU
+
- MNAl
ANOTA
__*
HlER -
ARCHICAL
LOGIT
Nested NESTED
Analysis
CHAID ___, LOGIT
i
Limited Formal
Pre-Processing Parameterization Modeling
Fic. 1. Modeling Strategy
until no relevant additional variables can be included or the data are exhausted.
The PRU method is more fully explained and illustrated in section 4.
CHAID has approximately the same purpose as the PRU method and stems
from the more well-known automatic-interaction-detection (AID) method of
Sonquist and Morgan (1964). That method was limited to a binary dependent
variable. Like PRU, CHAID is a multivariable method and considers selection
and simplification of variables one after another. The difference between the
two techniques, apart from their different theoretical underpinning, lies mainly
in the nesting of independent variables within the categories of other indepen-
dent variables in the case of CHAID. Both methods add the most effective vari-
able to the tabulation at each step. Therefore, both methods lead to a selection of
independent variables that are usually not strongly interrelated. The detail of the
categorization can often be reduced dramatically without important loss of in-
formation with both methods. CHAID will be explained more fully in section 5.
In our modeling strategy, a second stage involves the use of multivariate nom-
inal scale analysis (MNA),or the analysis of tables (ANOTA).A standard proce-
dure after the preprocessing of PRU or CHAID is to estimate a logit model.
While logit transformation is the classical solution to prevent the estimated prob-
abilities in models for categorical data from taking on values outside the 0-1
range it is important to realize that this is the only reason to perform this trans-
formation; it is certainly not true that logit models are a priori more suitable than
other models. The use of MNA/ANOTA offers an alternative to proceeding
directly to logit transformations.Logit transformation may not always be desir-
able, as it entails a number of disadvantages, especially when the dependent
variable has more than two categories or when there are numerous predictors
that can eventually have more than two categories. In practice, the selection of a
qualified logit model containing interaction effects, even in cases with a binary
dependent variable, is only possible with a relatively small number of predictors.
MNA/ANOTA provides a number of simplifications as soon as the restriction is
dropped that the estimated coefficientsmust be in the 0-1 range. Then the coef-
ficients can be estimated with the ordinary-least-squares method and, conse-
quently, only a linear system of equationshas to be solved in which only bivariate
tables are used, so that even extensive problems can be handled on small (micro)
computers. Furthermore, the interpretation of the parameters becomes straight-
forward and the number of categories of the dependent variable is of no impor-
tance. It is true that the MNA/ANOTA model does not guarantee a good fit. But
for logit models, good fit can be guaranteed only for the saturated model and the
saturated model is not an end in itself, but only a starting point for a process of
reduction to a parsimonious form. In our opinion, it is more a matter of personal
taste than statisticalmethodology if modeling is done on the linear scale or on the
log-linear scale.
MNA/ANOTA is labeled as a “limited”parameterization method because as-
sumptions about the structure of the data are used to constrain the complexity of
the models, thus facilitatinginterpretation. For example, MNA/ANOTA makes
the restrictive assumption that an additive model adequately represents the data.
If the method is applied to data that have been preprocessed with PRU or
CHAID, this assumption is not unreasonable. It is possible to avoid the strongest
interactions in the data by dividing them first into relevant separate subgroups
and then selecting the predictor variables and their categories by virtue of the
preliminary CHAID analysis. In many respects CHAID can be seen as comple-
mentary to MNA/ANOTA. The strongest interaction effects are filtered out
with these procedures. This is illustrated by the fact that the estimated coeffi-
cients for a specific category of any independent variable in our models differ
little from the corresponding deviations of the average proportions of the cate-
gories of the dependent variable in the sample.
As a consequence of the constraints, the interpretation of the MNA/ANOTA
results is straightforward and comparable to multiple regression. MNA/ANOTA
is especially useful if the dependent variable has more than two categories; the
ease of computation and interpretation then compares favorably with the much
more complicated multinomial logit modeling.’ However, as mentioned above,
there are disadvantages of the ANOTA method. To reiterate, the estimated pro-
portions may be out of the 0-1 range, and combined effects of independent
variables on the dependent variable are ignored. However, the method often
gives a reasonable approximation, and can be used as the final result of an analy-
sis, if the greater sophistication of a logit model serves no definite purpose or if
the formulations and interpretation of the parameters of a logit model are not
straightforward. ANOTA can also serve as an interim step towards more formal
modeling and in that case is a good starting point for finding an adequate logit
model. It is also possible to move directly from the preprocessing to the full
parameterization stage. Omitting the intermediate stage can be justified when
the preprocessing stage has yielded a sufficient reduction in categories and vari-
ables. Log-linear modeling is most successful with small tables of three to five
variables with two to three categories.
The final, fully parametric group of models (Figure 1)are the logit models. No
preliminary assumptions about the data are introduced. If necessary, the original
cross-tabulation of the dependent variable and the independent variables can be
reproduced exactly (the saturated model). The logit models are theoretically
more sophisticated than the MNA/ANOTA method and allow the specification
of interaction effects of predictors on the dependent variable. Hierarchical logit
models are widely known and applied for both dichotomous and polytomous
dependent variables and need no further explanation here. Nonhierurchicul logit
models have not been used widely for several reasons. In the first place, the basic
literature (e.g., Bishop, Fienberg, and Holland 1975) is almost exclusively de-
* However, it is possible to treat the cell frequencies as Poisson approximationsto multinomial

random variables and use GLIM to estimate the parameters.
voted to hierarchical logit modeling. In the second place there exists a broad
variety of different nonhierarchical logit models that can be specified for any
given data set. Therefore, it is difficult to devise a way to find appropriate mod-
els, although it is sometimes logical to look for nonhierarchical logit models, such
as models which take order into account (see, for example, Deurloo et al. 1988),
or nested models in which a contrast or group of contrasts is supposed to be
operative at some levels of avariable (but not at all levels). To reiterate, the term
nested is used here in the statistical sense and not to describe sequential choice
modeling such as in Clark and Onaka (1985). In this paper, we will discuss an
example of a nested model in which size of household and type of housing
market seem to influence housing choices of lower-income households, while
they play no important role for the high-income groups. Magidson (1982) argues
that nested logit models can be useful in such situations. He also argues that a
preliminary analysis of the data with CHAID can guide the specification of rele-
vant nested logit models because of its property to nest independent variables
within the categories of previously chosen independent variables. Sometimes
CHAID is also used as a preprocessing method for hierarchical logit modeling
(e.g., Green 1978).
In our analysis, we distinguish between two approaches: a nested modeling
strategy and a non-nested modeling strategy (Figure 1). In the PRU-MNA/
ANOTA-LOGIT analyses of the data, no nesting of independent variables oc-
curs, but in the CHAID-NESTED LOGIT analyses, the emphasis is on estimat-
ing nested models. Because the statistical presentations of PRU (Kim 1984) and
CHAID (Kass 1980) are available we do not replicate those materials. The litera-
ture on MNA and its reformulation as ANOTA is somewhat less accessible and
we have provided a detailed statistical appendix on this technique (Appendix I).
Programs to run PRU, CHAID, and ANOTA are described in Appendix 11.
3. THE DATA SET AND CONTEXT

To illustrate the power of this set of techniques, PRU, CHAID, ANOTA/
MNA, and logit models are applied to one data set drawn from the 1981 Dutch
National Housing Survey. The survey focuses on characteristics of the household
and the dwelling, and also records moves since the previous (1978) survey. It
comprises about 62,000 housing occupants. The data set selected for the present
analysis was the information on households who were principal occupants of a
dwelling in the public or private rental sector in 1978, and who moved between
houses or apartments between 1978 and 1981. All households with missing values
on any of the variables considered were removed from the data set. The data set
that is used for the illustrations in this article comprises 2,923 households with
complete information on nine variables (Table 1). Housing choice of these
households is the dependent variable. Tenure and type of dwelling are the dom-
inant dimensions of choice for renters as we showed in previous research (Deur-
loo et al. 1987), and define the three choice categories. Eight variables that were
established as important predictors of housing choice in the original research
comprise the independent variables. The set of independent variables includes a
typology of housing markets in the Netherlands, as choice patterns vary consid-
erably between housing markets with a different housing stock and varying short-
ages of dwellings. All categories of the variables contain a substantial number of
observations (Table 1).
W. A. V. Clark, M . C . Deurloo, and F. M . Dielemun 1 203
TABLE I
The Nine Variables and Their Categories
\'ariahIr Categories humherot raws
Housing choice 1. multifamily rent 914
A. single family rent 1076
3. owner occupation 933
Household Characteristics
Income 1. 445
2. 878
3. 938
4. 662
Age of head of household 1. < 34 years 1397
2. 35-44 years 622
3. 45-54 years 306
4. 55-74 years 598
Size of household 1. 1 person 375
2. 2 persons 861
3. 3 or 4 persons 1391
4. 5 or more persons 296
Characteristics of Previous House
Previous tenure 1. public rental 1934
2. private rental 989
Number of rooms 1. 1 or 2 rooms 406
previous dwelling 2. 3 rooms 625
3. 4 rooms 1346
4. 5 or more rooms 546
Type previous dwelling 1. single family 1098
2. multifamily 1825
Rent previous dwelling 428
1324
50 -bO/month 513
50-550/month 325
5. > fl550/month 333
Type housing market 1. Periphery 502
2. South 672
3. Middle 727
4. Randstad 1022
4. NON-NESTED ANALYSIS OF THE CROSS-TABULATION

The PRU Technique
A complete cross-classificationof the data in Table 1would give 61,440 cells,
most of which would be empty since the sample size is only 2,923. Parametric
modeling of such a sparse table is meaningless, but the preprocessing procedure
can be used to reduce the number of variables and the number of categories of
the reduced set of variables.
The PRU technique is based on an asymmetric measure of the relationship
between a categorical dependent variable and one or more independent vari-
ables, and measures the explanatory power of the set of predictors. PRU is used
in a stepwise procedure to accomplish the reduction. At each step, the next most
effective variable is added to the tabulation of the dependent variable housing
choice and the previously selected explanatory variables. This new dimension is
examined for its effect on association between the dependent variable and the
set of independent variables in the tabulation when categories are aggregated.
This often reduces the detail of the categorizationdramaticallywithout a notice-
able effect on the level of association. In the earliest reference we know of
(McGill and Quastler 1955) the PRU measure is called the “coefficient of con-
straint.” Hays (1980)calls it “the relative reduction in uncertainty”and Nie et al.
(1975)the “uncertainty coefficient.”There is a strong analogy between the PRU
for discrete variables and the coefficient of determination (the square of Pear-
son’s correlation coefficient) for continuous variables.
The PRU is also generally related to the likelihood ratio test statistic (LRstatis-
tic) G2 (see Clark et al. 1986). But G2 can only be used to determine whether
significant differences exist. This statistic is used in the present analysis only in a
secondary manner and only after the PRU measure, which provides better in-
sight into the level of association between the dependent variable and indepen-
dent variable(s) and thus of the relevancy rather than significanceof the relation-
ships. PRU is used increasingly as a measure in the evaluation of logit models
(Kim 1984).Lammerts van Bueren (1982) discusses the measure and its useful-
ness in detail. In our opinion, the PRU has some advantages over other prepro-
cessing approaches, like those of Higgins and Koch (1977) and Conant (1980)
(see Clark et al. 1986).
The selection of variables and the reduction of the number of categories fol-
lows a simple forward step procedure. In the first step, the PRU is calculated for
the two-way cross-tabulation of housing choice and each of the independent
variables (Table2). Income is by far the most important predictor of choice, and
is therefore chosen in the first step in the construction of a meaningful cross-
tabulation.
In step lB, the simplification of the four categories of income is examined. If
any simplification is to be effected, collapsing categories 1and 2 would be most
appropriate. The decrease in PRU is lowest in that case, indicating the lowest
reduction in explained variation of choice. But, even then, the reduction in PRU
is fairly large. The substantial decrease in G2between the original table and the
smaller table also indicates this (710.4 - 658.8 = 51.6).With2 fewer degrees of
freedom, one would only combine categories 1 and 2 at the 1 percent signifi-
cance level if the decrease in G2were less than 9.2. So on the basis of PRU and G2,
income should retain its original four categories.
In step 2A, the two-way table resulting from the first step is expanded with a
new dimension. The variable housing market type increases the PRU substan-
tially and by more than any other potential explanatory variable (Table 2). The
increase is also significant at the 1percent level (theincrease of G2is 284.7 with24
df). In step 2B, categories 1and 2 of housing market type are added without real
loss in PRU, and category 3 also can be combined to these categories. It is clear
that housing choice in the Randstad is very different from elsewhere in the
Netherlands; whenever category 4 (Randstad) is collapsed with another cate-
gory the PRU decreases drastically. The PRU after simplification of housing
market type (0.152)is still higher than the second variable that had a high PRU in
step 2 (size of household, 0.150), so we can proceed to the next step.
In step 3A, size of household is added to the three-way cross-tabulation and
increases the PRU significantly. Categories 3 and 4 of this variable can be col-
lapsed (step 3B).
In the next step of the analysis, a critical point in the PRU procedure is reached.
The addition of yet another variable (rent of previous dwelling and age of head
TABLE 2
Stens in the Analvsis with the PRU Criterion
Step 1A: selection of the first variable

Income 4 0.111 710.4
Age of head of household 4 0.064 409.5
Size household 4 0.052 330.8
Rent previous dwelling 5 0.039 248.8
Type housing market 4 0.035 222.5
No. of rooms previous dwelling 4 0.024 156.2
Type previous dwelling 2 0.015 93.7
Tenure previous dwelling 2 0.007 43.0
Step 1B: income category simplification
1+2,3,4 0.103 658.8
1, 2+3,4 0.088 561.0
1, 2,3+4 0.091 581.9
Step 2A: selection of the second variable
Housing market 0.155 995.1
Size household 0.150 963.5
Age of head of household 0.145 927.6
No. of rooms previous dwelling 0.141 903.6
Type previous dwelling 0.135 863.0
Rent previous dwelling 0.134 859.5
Tenure previous dwelling 0.120 767.6
Step 2B: housing market category simplification
1,2,3,4 0.155 995.1
1+2,3,4 0.155 991.2
1+3,2,4 0.154 984.4
1+4, 2, 3 0.131 841.3
1, 2+3,4 0.153 978.3
1, 2+4,3 0.124 797.1
1, 2,3+4 0.137 877.5
1+2+3,4 0.152 973.2
1+2+4,3 0.114 730.2
Sten 3A: selection of the third variable
Size household 0.203 1301.3
Age of head of household 0.190 1215.1
Rent revious dwelling 0.181 1159.9
No. ofrooms previous dwelling 0.180 1152.2
Type previous dwelling 0.166 1065.6
Tennre previous dwelling 0.161 1031.2
Sten 3B: size household cateeorv simnlification
0.203 1301.3
0.189 1209.5
0.185 1187.2
0.199 1277.7
0.185 1185.9
1,2+3+4 0.179 1148.1
Step 4A: simplification of income in the
four-dimensional table
1,2,3,4 0.199 1277.7
1+2,3, 4 0.192 1230.5
1,2+3, 4 0.174 1117.0
1, 2,3+4 0.175 1121.1
Step 4B: Further simplification
Categories:
Income Market Size
1+2,3,4 1+2+3,4 1,2,3+4 0.192 1230.5
1+2+3,4 1+2+3,4 1,2,3+4 0.156 1000.6
1+2,3+4 1+2+3,4 1,2,3+4 0.168 1073.8
1+2,3,4 1+2+3,4 1+2,3+4 0.180 1154.8
1+2,3,4 1+2+3,4 1,2+3+4 0.172 1101.9
of household are the candidates) increases the PRU significantly, but many
empty cells occur, and even marginals now have zero cases. We are thus well
beyond the limit of a meaningful addition of variables to the table if we want to
perform logit analysis. Therefore, no further variables are added after step 3. It is
sometimes useful to perform a backward simplification procedure after the final
step in the forward selection procedure of variables and categories with PRU. It
is possible that the categorization of variables in the earlier steps of the forward
selection procedure can now be simplified further because other variables have
been added to the table. In our analysis, further simplification is considered in
steps 4A and 4B. In step 4A, only the categorization of income has to bereconsid-
ered, because housing market is already simplified to two categories, and size of
household has just been considered in step 3B. Collapsing categories 1 and 2 of
income decreases the PRU slightly, although the loss of information is significant
at the 1percent level. As argued above, we attribute more value to the absolute
level of the PRU for the selection of the model than to considerations of signifi-
cance. The simplification of income to three categories decreases the cross-
tabulation to 18 cells, with only a slight loss in PRU value. Further combination of
categories leads to a much larger decrease in PRU (as step 4B illustrates) and thus
the process of combining categories was terminated.
The PRU measure helps to select the most relevant variables from a larger set,
but the simplification of the categorization of the variables is equally important.
The original categorization of income, housing market, and size of household
would lead to a table of 192 cells, while after the combination of categories there
are only 54 cells. The PRU values for these tables are 0.219 and 0.192, respec-
tively. Therefore, the number of cells in the table is reduced to 88 percent of the
original PRU.
Table 3 is the result of the PRU procedure. Careful inspection of the table
gives an initial idea of how the chosen predictors influence housing choice. First,
with increasing income, choice of single-family rental dwellings and owner oc-
cupation increases. Second, in the Randstad, because multifamily rental dwel-
lings form a larger part of the housing stock, they are chosen more often. Third,
the larger the household, the greater the probability of choosing single-family
housing. The ANOTA and logit models that use Table 3 as input will bring out
these patterns more clearly.
The ANOTA Analysis
Multivariate nominal scale analysis, developed by Andrews and Messenger
(1973), or its reformulation as ANOTA by Keller, Verbeek, and Bethlehem
(1984) meets the demand for a simple alternative to logit and probit models in
multivariate analysis of qualitative data. The authors argue that existing models,
such as logit and probit, are not completely satisfactory for dependent variables
with more than two categories, that the logit and probit transformations hamper
the interpretation of the linear parameters, and that the computational require-
ments are substantial. MNA/ANOTA seeks to optimize ease of computation and
interpretation instead.
The core of the ANOTA model is formed from the estimated coefficients
which show the “effect” of membership in the particular (nominal) category of
the independent variable on the likelihood of membership in each (nominal)
category of the dependent variable. The coefficients are corrected for possible
interactions between the explanatory variables, and therefore represent “pure”
effects, which can be interpreted as partial regression coefficients. Thus, these
coefficients can be added together (literally) across the several independent vari-
W. A. V. Clark, M . C. Deurloo, and F . M . Dielemn / 207
TABLE 3
Housing Choice of Movers Previously in the Rental Sector by Income,
Type of Housing Market, and Household Size; Result of PRU Analysis
Housina choice
Income Housing multifam. single fam. own
(XlW) market Size rent (S) rent (S) (S) No.
<fl30 rest Neth. 1 P. 65 28 7 173

2 P. 47 44 9 228
3 or m 18 64 18 487
Randstad 1 P. 84 10 6 1%
2 P. 69 ?' A 7 1%
3 or m 60 32 9 183
30-42 rest Neth. 1 P. 24 30 46 33

2 P. 29 35 36 122
3 or m 7 48 45 456
Randstad 1 P. 38 5 57 21
2 P. 54 30 16 50
3 or m 34 40 27 2.56
>fl42 rest Neth. 1 P. * * * 8

2 P. 11 16 73 211
3 or m 4 19 77 183
Randstad 1 P. * * * 14~~
2 P. 32 15 53 124
3 or m 23 29 48 122
Total 31 37 32 2923
'Values may be unreliable because of the small cell sizes and are therefore not reported.
ables to predict the household's score on the dependent variable (the expected
probability for any household is obtained by summing the base likelihood and
the coefficients that pertain to that household, and dividing by 100). For exam-
ple, in Table 4, the expected probability of a one-person household moving from
the rental sector, with an income below fl. 30,000, living in the Randstad, and
choosing a multifamily rental dwelling, is 0.313 (the base likelihood or average)
plus 0.123 (income effect)plus 0.171 (Randstad effect) plus 0.245 (sizeof house-
hold effect), a total of 0.852. Thus, the expected probability of making a particu-
lar choice, given the categories of the independent variables of a moving house-
hold, can be determined in a straightforward manner. It is also possible to focus
on a particular column of coefficients in Table 4. The coefficients associated
with any category of any predictor sum to zero across the categories of the de-
pendent variable, and so can be interpreted as deviations from the average.
The coefficients in Table 4 show the relationship of the predictors with hous-
Coefficients for Housing Choices from the ANOTA Analysis

Coefficients Coefficients for Coefficients for
for income housing market size household
rest Rand-
Housing choice Average <flW 30-40 >40 Neth. stad lp 2p 30rmore
multifamilyrent 31.3 12.3 *.2 -15.8 -9.2 17.1 24.5 8.2 -9.6
single familyrent 36.8 7.9 1.4 -17.7 4.6 -8.6 -22.3 -5.5 7.7
owner occupation 31.9 -20.2 4.8 33.6 4.5 -8.4 -2.2 -2.7 1.9
ing choice more clearly than the percentages in Table 3, because the coefficients
are “pure” effects as a result of the assumption of independent influences of the
independent variables. If the assumptions of the model are severely violated, the
ANOTA parameters would be misleading. But this does not seem to be the case
here, as inspection of the bivariate tables for independent variables indicates.
Income, in particular, affects the choice between own and rent. Keeping size of
household and housing market type constant in the lowest income category, the
likelihood of buying a house for households who were originally in the rental
sector is low (0.319 - 0.202 = 0.117),while it is high in the highest income cate-
+
gory (0.319 0.336 = 0.655). The housing market type mainly affects the deci-
sion to select multifamily rental housing; in the Randstad the expected probabil-
ity of choosing a multifamily dwelling is nearly 0.50, while in the rest of the
Netherlands the probability is below 0.25, again keeping the household’s income
and size constant. Choice patterns vary widely between one- and two-or-more-
person households. For example, the expected probability of a single person
(controlling income and housing market type) moving into multifamily housing
is 0.558, against 0.145 for moving into a single-family rental house.
As noted earlier, the ease of computation and the straightforward manner of
interpretation shown must be traded off against a number of disadvantages of
ANOTA. Sometimes the estimated probabilities may be out of the 0-1 range for
rare combinations of categories of the predictors. As a rule of thumb, Andrews
and Messenger (1973) suggest at least ten times as many cases as number of
predictor variable categories, and that each category of the dependent variable
contains at least 10 percent of the cases to avoid inaccurate estimates. Another
disadvantage of ANOTA pertains to the overall explanatory power of the model
or of a single predictor on choice. The ANOTA analysis in itself does not give this
information, although a number of coefficients have been suggested for this
purpose (Deurloo et al. 1988). In many instances, the ANOTA results will be
treated as the final step in the analysis (e.g., Linde et al. 1986). If the disadvan-
tages are critical, or if the model assumptions are violated, the use of a logit
model is necessary.
The Hierarchical Logit Model
The description of the association between the dependent variable housing
choice and the selected predictors can be explored further with the logit ap-
proach. Relevant interaction effects between the independent variables can be
estimated and irrelevant interaction effects can be dropped, thereby simplifying
the model. In this phase of the analysis the PRU can be used to evaluate the fit of
the (unsaturated) logit models (see Kim 1984; Clark et al. 1986).
In presenting the logit models, we use the notation of fitted marginals, as is
usually done for log-linear analyses (Haberman 1978). As a consequence of this
notation, all effects between the independent variables are listed. For example,
in a four-dimensional cross-tabulation with the variables housing choice ( C H ) ,
income ( I ) , housing market ( M ) ,and size of household (S); [ Z , M , S , ] ,[CHI de-
notes the model without association between the dependent variable ( C H )and
the set of independent variables. The notation [ Z , M , S ] , [Z,CH], [ M , C H ] ,
[S ,CHI covers the unsaturated model in which the logit is the sum of the main
effects of each of the independent variables on choice, without any interaction
effects (this is the log-linear equivalent to the linear ANOTA model). Table 5
shows a selection of hierarchical multinominal logit models fitted to the data in
Table 3. The simple unsaturated models with zero or only one interaction effect
W. A. V. Clark, M . C . Deurloo, and F. M . Dieleman / 209
TABLE 5
A Selection of Hierarchical Multinomial Logit Models of Housing Choice ( C H ) ,Income ( I ) ,
Housing Market ( M ) and Size Household ( S )
(models 1-4) are not very different from the saturated model. The PRU of Table
3 is 0.192, while the PRU of model 1 is 0.180. Therefore, with only the main
effects of income, housing market type, and household size on choice in the
model, the loss of information is quite small. Of the three models with one inter-
action effect, model 4 fits best. We decided to take this simple model for inter-
pretation because the PRU of 0.186 is close to the PRU of Table 3 (although in
terms of G2the loss in explanation is significant).The more complicated model 5,
with two interaction effects and a PRU of 0.191, should be chosen when the
significance of G2is the ultimate criterion. However, some of the parameters of
model 5 are unreliable, showing very high standard errors. It is clear that the
preprocessing now bears fruit; the careful selection of the predictors and their
categorization is a prerequisite for finding such simple and robust logit models.
Table 6 presents the parameters of model 4 written out for each category of
the predictors in the form of a table with “cornered effects” (see Wrigley 1985).
The parameters of category 1 of each variable are set to zero, and the parameters
for the other categories can be directly compared to these categories. For exam-
ple, the parameter for income category three (< fl. 42,000) on choice category
three (owner occupation) has a value of 3.00; this indicates that in the highest
income category a household is twenty times as likely ( e3.00)to buy a house as to
rent a multifamily dwelling (holding other characteristicsof the household con-
stant); it also indicates that (holding other characteristics of the household con-
stant) in the highest income category a household is three times as likely to buy a
house as a household in income bracket fl30.000-42.000 (e3.00/e1.92).Thus, the
logit parameters of Table 6 can be compared directly. Presented in this tabulated
form they show major and minor effects of the choice predictors.
For the group of households used as an illustration in this article (movers from
or within the rental sector), income’s main effect is on the choice between rent
and own, as the high parameters for owner occupation show. Three-or-more-
person households show a very strong preference for single-family rental hous-
ing and owner occupation (whichis also mainly single-family).Size of household
seems to be an important determinant for the choice of type of dwelling. Living
in the Randstad in general means a decrease in the possibility of renting a single-
family dwelling (note the main effect of housing market and the interaction ef-
fect of housing market and size of household on choice category two). But living
in the Randstad affects two-or-more-personhouseholds buying a house in par-
ticular; the interaction effect of size of household and housing market shows that
these households have a much lower probability of buying a house in the Rand-
stad as compared to the rest of the Netherlands (the parameters are -1.11 and
-2.15 of the [ M , S , C H ] effect on choice category three).
~~ ~ ~~ ~ ~~ ~
TABLE 6
Coefficients of Logits Models ( l , M , S }{ l , C H } { M , S , CHI Cornered Effects
Income <fl30000 fl 3OoO-42oO > fl42oO
Housingmarket rrst Nrth Randctad rest Neth Randstad rrst Neth Randstad
Size household 1 2 =>7 1 2 =>3 1 2 =>3 1 2 =>? 1 2 =>3 1 2 =>3
Choice * Effects
constant
1,CH
multi- M,CH
family S,CH
rent M,S,CH
-.88 -.88 -.88 -.88 -83 -.88 -.88 -.% -.% -.88 -.88 -.88 -.88 -.% -.88 -.88 -.% -.% constant
single -64 .64 64 .M .64 .64 .38 .38 .38 .38 .38 .38 I ,CH
family -1.42 -1.42 -1.42 -1.42 -1.42 -1.42 -1.42 -1.42 -1.42 M,CH
rent .75 2.18 .75 2.18 .75 2.18 .75 2.18 .75 2.18 .75 2.18 S,CH
.43 - 3 4 .43 -.34 .43 -.34 M,S,CH
-2.15 -2.15 -2.15 -2.15 -2.15 -2.15

-2.15 -2.15 -2.15 -2.15 -2.15 -2.15 -2.15 -2.15 -2.15 -2.15 -2.15 -2.15 constant
owner 1.92 1.92 1.92 1.92 1.92 1.92 3.00 3.00 3.00 3.00 3.00 3.00 1,CH
occupation -.04 -.04 -.04 -.04 -.04 -.04 -.04 -.04 -.04 M,CH
.75 2.14 .75 2.14 .75 2.14 .75 2.14 .75 2.14 .75 2.14 S,CH
-1.11 -2.15 -1.11 -2.15 -1.11 -2.15 M,S,CH
'An rmpty cell has a coefficient of 0.00
5. NESTED ANALYSIS OF THE CROSS-TABULATION
CHAID Analysis
The original automatic interaction detection (AID) procedure, developed by
Sonquist and Morgan (19M),has its origins in the analysis of variance. It assumes
an interval-level dependent variable and qualitative (or categorized) indepen-
dent variables and is a stepwise procedure, providing at each step an optimal
split of the data into two subsets. The between-subset sum-of-squaresis maxi-
mized at each bisection. The technique was extended to the case where the de-
pendent variable was qualitative. One of these extensions is known as Chi-square
AID (CHAID) (Kass 1980).This technique provides optimal splits, not necessar-
ily bisections, by maximizing the significance of the Chi-square statistic at each
step. Perreault and Barksdale (1980) and Langeheine (1984) among others have
used the method in marketing research, and Deurloo et al. (1987) in housing
research, as a means of variable reduction.
As in the PRU, independent variables are selected in a “forward” fashion.
Categories of an independent variable chosen by the method are merged if they
show a comparable pattern of choices (and, in our analysis, if they are adjacent,
except for the categories of housing which are allowed to combine in any order).
For each of the categories of the independent variables selected in previous
steps, the technique considers the most important predictor in the next step.
Therefore, the results of the analysis are “nested’ and can be presented in the
form of an upside-down tree; some predictors only occur at specific levels of
other independent variables. Thus, a very detailed picture of choice patterns of
the different types of households emerges (Figure2).In this sense, CHAID puts
more emphasis on specific groups of households, while the procedure using the
PRU criterion is geared more towards finding a set of good predictors for the
whole sample of households.
Figure 2 shows results for analyses in which 500 and 250 (respectively) are the
minimum number of cases for groups of households to be further subdivided.
Like PRU, CHAID selects income as the most important predictor of housing
choice and does not simplify the categorization of this variable. As in the PRU
analysis, for the lowest income categories, housing market type and the size of
household are important predictors of choice, and choice patterns can be inter-
preted in the same way. It is interesting to note, however, that with higher in-
comes, housing choices are no longer hampered by the composition of the hous-
ing stock: type of housing market does not enter the model. Size of household
plays no role either, because many households in the higher income brackets buy
houses and owner-occupied housing is comparatively spacious. With increasing
income, the previous tenure and the age of the head of household replace hous-
ing market type and size of household as important predictors. Households orig-
inally in the public sector buy more often than households from the private rental
sector. For the last group, a move towards another rental dwelling is still an
improvement (more recent, higher-quality housing). In the Netherlands, young
families move to single-family (owner-occupied) housing, while older house-
holds move into multifamily housing, as has been documented in an earlier paper
(Hooimeijer et al. 1986). Thus, CHAID adds further detail to the results of the
previous analyses because “nesting”of the independent variables within catego-
ries of other predictors can be detected. The results of CHAID suggest that the
search for a nonhierarchical logit model that reflects this change might be fruit-
ful.
I income I
Q
' 1+2+3 h.market typ.prev
1 t 2 a00 hh 3 aaehh rooms 2+3+4 1 age hh2+3 808 hh 4
3 63.5 52.0
n 255
' 1 ty.prev 2 '

1. Multi family rent
2. Sinale family rent 3 26.2
3. Owner occupation n(l88 11216) n 183
n. Total
FIG.2. CHAID Dendrogram for Renters. Table values are the percentages moving to each destination category.
The Nested Multinomial Logit Model

Nonhierarchical log-linear models, in contrast with hierarchical models, have
greater flexibility in the application of log-linear techniques and make possible a
more exact operationalization of relationships between variables. A demonstra-
tion of such nonstandard models with the division of effects into linear, quad-
ratic, etc., components, by means of introducing orthogonal polynomials for
polytomous variables was given in Deurloo et al. (1988). In the present study,
we use another form of nonstandard logit models, a so-called nested design, that
is logically linked with the preceding CHAID analysis. CHAID builds tree dia-
grams which describe the effects of nested interactions.
The CHAID analysis shows at least five predictors that could be considered in
logit modeling. Income, size of household, housing market, previous tenure, and
age are important variables to differentiate households with distinctive patterns
of housing choice. As some of the variables seem important only at specific levels
of other variables, we utilize a nested approach in the statistical sense, that is, the
nesting is within the independent variables and does not relate to sequential
choices.
If the five independent variables mentioned above are cross-tabulated with
choice, this would lead to a table of 432 cells. The values in this table would be
too small and sparse for any meaningful logit modeling. One could proceed in
two ways: first, to select only the variables of major importance from the table
and ignore variables that seem relatively less important on close inspection of the
CHAID tree and/or combine more categories than the CHAID analysis indi-
cates, or second, try to fit nested logit models to the data. We used a combination
of these approaches.
First, the cross-classification was simplified by careful inspection of the results
of CHAID. Age of head of household was ignored because it only occurs once in
the “small” tree, for income category three (Figure 2). On the basis of the choice
patterns, it seems reasonable to recategorize size of household into a dichotomy
(categories 1+2 and 3+4). These decisions are somewhat arbitrary but necessary
before any form of nested logit modeling could be tested at all; without these
simplifications the number of cases in many instances is too small.
No simple reasonably fitting nested logit model could be found for the table
just specified with four income categories, two housing market categories, two
size-of-household categories and two categories for previous tenure, mainly be-
cause the variable “type of previous house” still had many cells with low fre-
quencies. Therefore, we decided to drop this variable completely from the anal-
ysis. On the basis of symmetry in choice patterns already discovered by the PRU
preprocessing, income categories 1 and 2 were also joined. We again experi-
mented with different nested logit models for this table. No model with only
main effects-nested or otherwise-showed a good fit. However, the relatively
simple nested model with main effects for housing market and size of household,
and a nested interaction effect of these two variables, active for the first two
income categories, but constant for income category 3 (>fl. 42.000), yields a
satisfactory fit (G2=16.75with df=lO, PRU=0.180). We denote this model by
[ Z , M , S ] [ M , C H ] [ S , C H ] [ S * M . Z , C H ] .Table7 shows its parameters in a com-
parable way to Table 6. For the main effects of size of household and housing
market type on choice the interpretation of the results in Table 7 broadly corre-
spond with the results of the hierarchical approach. Larger households more
frequently choose single-family rental housing and owner-occupation than
smaller households. The Randstad has a relatively small proportion of the hous-
ing stock in these last two categories of dwellings, and therefore households
H3'1'RG*S 9L' 9L' 9L' 9L' c9'- c9'- 69'Z- 161- 161- 69'Z- uopdnmo
H3R 1'2.1- TST- 16'1- TE'T- TST- TST-
H3'S Z6 Z6' Z6 Z6 86 Z6 laUM0
JuWuoa 86 86 86 86' 86 86 86' 86 86 86 86 86'
H3'1'R6*S ZV- ZV- ZV- ZV- W- w- T1.T- GL- CL- 11.1- juai
H3R WT- WT- 8E'T- WT- WT- WT- 4!me3
H3'S CZ'T CZ'T CZ'T B'T CZ'T B'T a1Qs
juejsuo3 89' 89' 89' 89' 89' 89' 89' 89' 89' 89' 89' 89'
H3'ZRC*S
H3R
H3'S
living in the Randstad have a lower probability of making those choices. Of

course, the nested interaction effect of housing market and size of household
within income is more difficult to interpret. The general observation that higher
income leads to more owner-occupation and (to a lesser degree) choice of single-
family housing is still true: But the interaction effect in the nested logit model
indicates a (fairly small) correction on the income parameter. Small households
in the rest of the Netherlands in the lowest two income categoriesare less likely to
choose single-familyrent and owner-occupation than a comparable hierarchical
model would indicate. For larger households, the reverse is true; choices of
single-family rent and ownership in the rest of the Netherlands in the lowest
income categories are higher than one would “predict” without the nesting in the
logit model. Therefore, as might be expected the nested logit model brings out
more complicated and detailed interaction effects on housing choice than the
ANOTA and hierarchical logit models. But a selection of variables from the
CHAID analysis and recategorization of these variables had to be undertaken
before a nested logit model with a satisfactory fit could be found. Age and pre-
vious tenure which add extra detail to the choice patterns in the CHAID analysis
could not be retained in the nested logit modeling, which appears to be meaning-
ful only for a strongly simplified data set.
6. CONCLUSION
In situations with a dependent categorical variable (especially if it has more
than two categories), and a large number of categorical independent variables, a
combination of preprocessing and logit or MNA/ANOTA modeling is a pre-
ferred modeling strategy as the housing choice examples in this paper illustrate.
The preprocessing is useful in selecting an “optimal” subset of independent vari-
ables from the larger available set and simplifying the categorization wherever
possible. Without thorough preprocessing, any attempt at logit modeling for
data sets with large numbers of empty cells, or in which there are only a few
observations, renders the parameters of a logit model unreliable if not meaning-
less. Also, a large cross-tabulationusually leads to so many parameters in the logit
model that it is impossible to provide a meaningful interpretation.
The PRU and CHAID methods are efficient approaches for preprocessing.
Both lead to a selection of relevant variables and important simplifications in the
categorization often without any loss of information about the choice patterns.
CHAID gives more detail about specific housing choices because predictor vari-
ables are “nested” within the categories of previously selected predictors. But
this also makes it more difficult to proceed from the results of the CHAID pre-
processing toward logit or ANOTA modeling than from the results of the pre-
processing with the PRU measure; a somewhat subjective interpretation of the
CHAID results seems to be unavoidable in formulating standard logit models.
Preprocessing in itself yields an indication of the structure of the relationships
between the dependent and most relevant independent variables, as Table 3 and
Figure 2 of our example illustrate. Although it has been common to stop at this
stage, we have shown that a subsequent ANOTA analysis and/or logit modeling
can still be profitable. Logit and ANOTA modeling lead to much clearer and
richer conclusions about the relationships between the variables than a mere
cross-tabulation reached by preprocessing. On the other hand, ANOTA and
logit modeling clearly profit from the preprocessing, and robust models are the
result of the combination of methods.
MNA/ANOTA analysis seems a reasonable alternative to logit modeling. The
ease of computation and interpretation of ANOTA compares favorably with

logit analysis. If it is not unreasonable to assume that the effects of independent
variables on the dependent variable are not strongly interrelated, and the focus
of interest is on simple interpretation and ease of computation rather than mathe-
matical sophistication, ANOTA certainly is a valuable alternative to logit model-
ing. In any event, the PRU and CHAID methods favor a selection of only weakly
interrelated effects of independent variables on the dependent variable.
However, logit modeling has its own advantages over MNA/ANOTA as our
example illustrates. Logit models require no assumptions about the structure of
the data so interaction effects can then be inspected for inclusion or exclusion;
specific interaction effects turn out to exist and to be of importance for housing
choice. With only an ANOTA analysis, this information (in the data) would have
been lost. In such instances it is important to evaluate the purpose of the statisti-
cal modeling.
If it is reasonable to assume that some independent variables are relevant only
at specific levels of other independent variables (as in our example where differ-
ent factors seem to influence housing choice of lower- and higher-income house-
holds), it is also logical to examine nonstandard logit models. CHAID was used
as a simplification procedure to identify such models. The nested approach illus-
trated an important alternative when fit is more important. The nested model is
also superior to the hierarchical results in that the number of parameters is
smaller and the nested model indicates more clearly the underlying dimensions
in the choice patterns. At the same time, nested logit modeling is complicated
because so many models are possible. Thus, without extensive theory, there is no
clear direction to the search for a suitable model. Finally, the data requirements
of this technique (nested modeling as compared to hierarchical modeling) are
such that it is quite likely that some independent variables must be dropped. This
means losses in the theoretical elegance of the nested approach. In our opinion,
nested logit modeling can be profitable, but should be attempted only if there is
a known and clear theoretical structure which has been derived from previous
analyses.
The modeling strategy advocated in this presentation emphasizes the value of
preliminary data processing before formal hierarchical or nested logit analysis.
Not only are there clear technical advantages to formulating a more limited and
circumscribed data set, but the substantive results are likely to be more easily
interpreted and validated against our theoretical constructs.
APPENDIX I: A STATISTICAL OVERVIEW OF MNA/ANOTA

The presentation of this statistical overview of ANOTA is based on Keller et al.
(1984) because these authors add an estimation of the standard errors of the
coefficients to the model.
Let Y denote the NXZ indicator matrix that contains the scores of the N cases
in the sample on the Z categories of the dependent variable: entry Y ni equals 1 if
individual n scores in category i of the dependent variable, and 0 otherwise.
Likewise, let X, denote the N X K , indicator matrix that contains the scores of
the N cases on the mth predictor variable, having K , categories ( m = l , ....,M):
entry Xm,k equals 1 if case n scores on category k of the mth predictor variable
and 0 otherwise. As in ordinary multiple regression, a constant with index m=O is
introduced, which has only one category (KO =1)on which every case scores: XO,
with Xo,l =1 for all n. Let X denote the matrix wherein the scores on all M+1
predictor variables are collected: X is N X K with K = Z K , .
W. A. V. Clark, M . C. Deurloo, and F. M . Dieleman / 217
The MNA/ANOTA model is derived from the ordinary regression model:

E (y) =Xb ,where y is and N vector of dependent scores, X an N X K matrix of
predictor scores, b a K vector of regression coefficients and E (y) the expecta-
tion of y . E (y) =XB is the generalization of this model to the multivariate linear
model, with Y an NXZ matrix and B correspondingly a KXZ matrix. This gener-
alization is the ANOTA model. As Y is now a matrix wherein each row contains
exactly one 1, each row of E (y)is a vector of probabilities, adding up to 1.
To estimate B , the normal equations X’Y=X’XB are considered. Assuming
that all categories of all X, have observations, these equations are equivalent to
Y‘XW=B’X‘XW, wherein W equals a K X K diagonal matrix with the recipro-
cals of the frequencies of the categories of the explanatory variables X, on the
diagonal. The last expression is used as the ANOTA estimator. Formally this is
the OLS estimator of the multivariate linear model, but the interpretation now is
very simple and interesting: the left side contains the proportion of scores on Y
for each category in X (column percentages); X’XW is a matrix with the propor-
tions of scores on X1 for each category of X, outside the diagonal blocks and it
thus can be seen as a normalizing constant, eliminating the interactionsbetween
the predictor variables X, ; and B’, which has exactly the same dimensions as
Y’XW ,can be interpreted as the matrix of normalized versions of the elements of
Y’XW.
In order to identify B uniquely, it is normalized by RB=O, with R an M X K
matrix with R=(Ro ,....,RM) and the mth row of R, equal to the vector of fre-
quencies on the categories of X, (for m=O, ....,M ) and zero elsewhere. These
restrictions just identify B if rank ( X ) = K - M . With this normalization the re-
gression coefficients represent the effect of X, on the sample proportions of Y
as deviations from the mean proportions if the distributions of XI (1Zm) given
X, ,are equal to the unconditional sample distribution of X I,and may therefore
be interpreted as “standardized deviations.”
The practical calculation of the matrix of regression coefficients B can be
performed by matrix inversion and multiplication:
B = (X’X+R’R)-’ . X‘ . Y.
[KXZI [KXKI [KXN] [KXZ]
Estimates for the standard errors of the regression coefficients can also be
calculated. They are completely different from ordinary-least-squaresstandard
errors.
Let bi represent the vector formed by taking the ith column of B , so its com-
ponents are the regression coefficients corresponding to the ith category of the
dependent variable Y.Representing the ith column of the N X I indicator matrix
Y by Yi , it follows that
Var(bi) = Var(Yi). (X’X+R’R)-’.X’.X. (X’X+R’R)-’ .
Each component of Yi (0 or 1) is independent from all other components and

Bernoulli-distributed with variance
if this component belongs to category i l of X I , i 2 of X2 ,....,jM of XM and when

p denotes the true model value. Two approximations are now applied. First
is approximated by p
the overall probability in the population that a randomly selected element be-
longs to category i of the dependent variable Y.This is not very accurate, but
when
is not too close to 0 or 1, the error is not large; for example, if

Yxl...x&j
P i j l ...iM
takes values between 0.15 and 0.85 the variance varies only between 0.13 and
0.25. In applying this approximation the heteroscedasticity of Yj is in fact
neglected.
In the second approximation the true fraction p yi is, as usual, estimated by the
sample fraction CYi. Summing up, Var(b) is estimated by
Var(b) - Cyi(1-Cyi) (X‘X+R’R)-’X’X(X’X+R’R)-l
The authors of ANOTA also propose a more accurate estimate for Var(b) in
which the bivariate tables Y X X i are used. They point out however, that although
this method is much more complex than the first, the difference in accuracy is
remarkably small.
APPENDIX 11: COMPUTER PROGRAMS FOR THE METHODS USED

The analysis with the PRU criterion can be done efficiently and quickly with
the CROSSTABS procedure in SPSS (Nie et al. 1975).The “uncertainty coeffi-
cient” is one of the options of this procedure and corresponds with PRU. The
analysis with CROSSTABS does involve a fair amount of recoding, but can still
be run with SPSS-PC on a micro computer.
For the CHAID analysisno program that we know of is included in any of the
usual statistical computer packages like SPSS, BMDP etc. Kass (1980)developed
a special program for CHAID written in PL/I. A FORTRAN V translation of his
program can also be obtained by contacting M. C. Deurloo, Department of
Geography, University of Amsterdam, Jodenbreestraat 23, 1011 NH Amster-
dam, The Netherlands.
For ANOTA an elegant computer program that runs on a micro computer can
be obtained by contacting J. G. Bethlehem, Central Bureau of Statistics, De-
partment for Statistical Methods, P. 0.Box959,2270 AZ Voorburg, The Nether-
lands.
The standard computer packages can be used to fit the logit models used here
(see Wrigley 1985).Magidson, Swan, and Berk (1981)and Breen (1984)explain
how nonstandard logit models, including nested logit models, can be fitted with
standard computer packages.
W. A. V. Clark, M . C. Deurloo, and F. M . Dieleman / 219
LITERATURE CITED
Andrews, F. M., and R. C. Messenger (1973). “Multivariate Nominal Scale Analysis.” Institute for
Social Research, University of Michigan, Ann Arbor, Michigan 48109.
Bisho , Y M M , S E Fienberg, and P. W. Holland (1975). Discrete Multivariate Analysis: Theory
anlPructice. Cambridge: MIT Press.
Breen, R. (1984). “Fittin Non-hierarchical and Association Log-linear Models UsingGLIM.”Socio-
logical Methods and fiesearch 13,77-107.
Clark, W. A. V., M. C. Deurloo, and F. M. Dieleman (1984). “Housing Consumption and Residential
Mobility.” Annals of the Association of American Geographers 74,29-43.
-(1986). “Residential Mobility in Dutch Housing Markets.” Environment and Planning A 18,
763-88.
Clark, W. A. V., andJ. Onaka (1985).“An Empirical Test of a Joint Model of Residential Mobility and
Housing Choice. Environment and Planning A 17,915-30.
Conant, R. C. (1980). “Structural Modeling Using a Simple Information Measure.” International
Journal of Systems Science 11,721-30.
Deurloo, M. C., F. M. Dieleman, and W. A. V. Clark (1987). “Tenure Choice in the Dutch Housing
Market.” Environment and Planning A 19,763-81.
-(1988).“Multinomial Response Models of Housing Choice.” Environment and Planning A 19
(forthcoming).
Green, P. E. (1978). “An AID/Logit Procedure for Analyzing Large Multiway Contingency Tables.”
Journal of Marketing Research 15,132-36.
Haberman, S. Y. (1978). Analysis of Qualitative Data, Volume 1. New York: Academic Press.
Hays, W. L. (1980). Statistics for the Social Sciences, 2nd ed. Chicester, Sussex: Holt, Rinehart, and
Winston.
Higgins, J. W., and G. G. Koch (1977). “Variable Selection and Generalized Chi-square Analysis of
Categorical Data Applied to a Large Cross-sectional Occupational Health Survey.” Znternational
Statistical Review 45, 51-62.
Hooimeijer, P., W. A. V. Clark, and F. M. Dieleman (1986). “Households in the Reduction Stage:
Implications for the Netherlands Housing Market.” Housing Studies 1, 195-209.
Johnson, L. W., and D. A. Hensher (1982).“Application of Multinomial Probit to a Two-period Panel
Data Set.” Transportation Research A 16A, 457-64.
Kass, G. V. (1980). “An Exploratory Technique for Investigating Large Quantities of Categorical
Data.” Applied
.. Statistics 19. 129-217.
Keller, W. ,A. Verbeek, and J. G . Bethlehem (1984).Analysis of Tables. Intern rapport, Voorburg:
Centraa!Bureau voor de Statistiek.
Kim, J. (1984). “PRU measure of Association for Contingency Table Analysis.”SocioZogicalMethods
and Research 13,3-44.
Lammerts van Bueren, W. M. (1982). “Measuring Association in Nominal Data.” Dissertation, De-
partment of Economics, Erasmus University of Rotterdam, PB 1738,3000 DR Rotterdam.
Langeheine, R. (1984). “Exploratieve Techniken zur Identifikation von Strukturen in Grossen Kon-
tingenztabellen.” Zeitschrift fur SoziaZ-psychologie 15,254-68.
Linde, M. A. W., F. M. Dieleman, and W. A. V. Clark (1986).“Starters in the Dutch Housing Market.”
Tiidschrift voor Economische en Sociale Geografie 77,243-50.
Magidson, J. (1982). “Some Common Pitfalls in Causal Analysis of Categorical Data.” Journal o f
Marketing Research 19,461-71.
Magidson,J., J. H. Swan, and R. A. Berk (1981).“Estimating Non-hierarchical and Nested Log-linear
Models.’ Sociological Methods and Research 10,3-49.
McGill, W. J., and H. Quastler (1955).“Standard Nomenclature, an Attempt.”In Information Theory
in Psychology, edited by H. Quastler, pp. 83-92. New York: Free Press.
Nie, H., C. H. Hull, J. G. Jenkins, K. Steinbrenner, and D. H. Bent (1975). Statistical Package for the
Social Sciences, 2nd ed. New York: McGraw-Hill.
Perreault, W. D., Jr., and H. C. Barksdale, Jr. (1980).“A Model-free Approach for Analysis of Com-
plex Contingency Data in Survey Research.” Journal of Marketing Research 27,503-15.
Quigley, J. (1976).“Housing Demand in the Short Run: An Analysis of Polytomous Choice.” Explora-
tion in Economic Research 3,76-102.
Sonquist, J. A. and J. N. Morgan (1964). “The Detection of Interaction Effects.” Institute for Social
Research, University of Michigan, Ann Arbor, Michigan 48109.
Wrigley, N. (1985). Categorical Data Analysis for Geographers and Environmental Scientists. Lon-
don: Longman.

1988 - Modeling Strategies For Categorical Data Examples From Housing and Tenure Choice

Uploaded by

Copyright:

Available Formats

You might also like

1988 - Modeling Strategies For Categorical Data Examples From Housing and Tenure Choice

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1988 - Modeling Strategies For Categorical Data Examples From Housing and Tenure Choice

Uploaded by

Copyright:

Available Formats

W. A. V. Clark, M . C.

Modeling Strategies for Categorical Data:

Models to investigate categorical data can be divided into preprocessing, limited

W. A . V. Clark is professor of geography at the University of California, Los Angeles.

at a specific level of another independent variable. This last approach is more

Fic. 1. Modeling Strategy

* However, it is possible to treat the cell frequencies as Poisson approximationsto multinomial

3. THE DATA SET AND CONTEXT

4. NON-NESTED ANALYSIS OF THE CROSS-TABULATION

Step 1A: selection of the first variable

<fl30 rest Neth. 1 P. 65 28 7 173

30-42 rest Neth. 1 P. 24 30 46 33

>fl42 rest Neth. 1 P. * * * 8

Coefficients for Housing Choices from the ANOTA Analysis

-2.15 -2.15 -2.15 -2.15 -2.15 -2.15

1 t 2 a00 hh 3 aaehh rooms 2+3+4 1 age hh2+3 808 hh 4

' 1 ty.prev 2 '

The Nested Multinomial Logit Model

living in the Randstad have a lower probability of making those choices. Of

ease of computation and interpretation of ANOTA compares favorably with

APPENDIX I: A STATISTICAL OVERVIEW OF MNA/ANOTA

The MNA/ANOTA model is derived from the ordinary regression model:

Var(bi) = Var(Yi). (X’X+R’R)-’.X’.X. (X’X+R’R)-’ .

Each component of Yi (0 or 1) is independent from all other components and

if this component belongs to category i l of X I , i 2 of X2 ,....,jM of XM and when

is not too close to 0 or 1, the error is not large; for example, if

Var(b) - Cyi(1-Cyi) (X‘X+R’R)-’X’X(X’X+R’R)-l

APPENDIX 11: COMPUTER PROGRAMS FOR THE METHODS USED

You might also like